Language Detection#
The language detection pipeline step detects the language of a Squirro item. The language information is needed for the correct indexing of the document, so this step should not be disabled.
Enrichment name |
language-detection |
Stage |
processing |
Overview#
During the language-detection
step, the body of the Squirro item and all of its sub-items is analyzed and the language extracted from it. Any other fields, including title and summary, are ignored.
If the language field is already defined on the item, it will not be overwritten.
Configuration#
There are no configuration options for this enrichment, except for the enabled
property to enable and disable it.
Languages covered#
af - Afrikaans
am - Amharic
an - Aragonese
ar - Arabic
as - Assamese
av- Avaric
az - Azerbaijani
ba - Bashkir
be - Belarusian
bg - Bulgarian
bh - ?
bn - Bengali
bo - Tibetan
br - Breton
bs - Bosnian
ca - Catalan
ce- Chechen
co - Corsican
cs - Czech
cv - Chuvash
cy - Welsh
da- Danish
de - German
dv - Divehi
el - Greek
en - English
eo - Esperanto
es - Spanish
et - Estonian
eu - Basque
fa - Persian
fi - Finnish
fr - French
fy - Western Frisian
ga- Irish
gd -Gaelic
gl - Galician
gn - Guarani
gu - Gujarati
gv - Manx
he- Hebrew
hi - Hindi
hr - Croatian
ht - Haitian
hu- Hungarian
hy - Armenian
id - Indonesian
ie - Interlingue
io - Ido
is - Icelandic
it - Italian
ja - Japanese
jv - Javanese
ka - Georgian
kk - Kazakh
km - Central Khmer
kn - Kannada
ko - Korean
ku - Kurdish
kv - Komi
kw - Cornish
ky - Kirghiz
la - Latin
lb - Luxembourgish
li - Limburgan
lo - Lao
lt - Lithuanian
lv - Latvian
mg - Malagasy
mk - Macedonian
ml - Malayalam
mn - Mongolian
mr - Marathi
ms - Malay
mt - Maltese
my - Burmese
ne - Nepali
nl - Dutch
nn - Norwegian Nynorsk
no - Norwegian
oc - Occitan
or - Oriya
os - Ossetian
pa - Punjabi
pl - Polish
ps - Pashto
pt - Portuguese
qu - Quechua
rm - Romansh
ru - Russian
sa - Sanskrit
sc - Sardinian
sd - Sindhi
sh - ?
si - Sinhala
sk - Slovak
sl - Slovenian
so - Somali
sq - Albanian
sr - Serbian
su - Sundanese
sv - Swedish
sw - Swahili
ta - Tamil
te - Telugu
tg - Tajik
th - Thai
tk - Turkmen
tl - Tagalog
tr - Turkish
tt - Tatar
ug - Uighur
uk - Ukrainian
ur - Urdu
uz - Uzbek
vi - Vietnamese
vo - Volapük
wa - Walloon
yi - Yiddish
yo - Yoruba
zh - Chinese