Language Detection

Language Detection#

The language detection pipeline step detects the language of a Squirro item. The language information is needed for the correct indexing of the document, so this step should not be disabled.

Enrichment name

language-detection

Stage

processing

Overview#

During the language-detection step, the body of the Squirro item and all of its sub-items is analyzed and the language extracted from it. Any other fields, including title and summary, are ignored.

If the language field is already defined on the item, it will not be overwritten.

image1

Configuration#

There are no configuration options for this enrichment, except for the enabled property to enable and disable it.

Languages covered#

af - Afrikaans

am - Amharic

an - Aragonese

ar - Arabic

as - Assamese

av- Avaric

az - Azerbaijani

ba - Bashkir

be - Belarusian

bg - Bulgarian

bh - ?

bn - Bengali

bo - Tibetan

br - Breton

bs - Bosnian

ca - Catalan

ce- Chechen

co - Corsican

cs - Czech

cv - Chuvash

cy - Welsh

da- Danish

de - German

dv - Divehi

el - Greek

en - English

eo - Esperanto

es - Spanish

et - Estonian

eu - Basque

fa - Persian

fi - Finnish

fr - French

fy - Western Frisian

ga- Irish

gd -Gaelic

gl - Galician

gn - Guarani

gu - Gujarati

gv - Manx

he- Hebrew

hi - Hindi

hr - Croatian

ht - Haitian

hu- Hungarian

hy - Armenian

id - Indonesian

ie - Interlingue

io - Ido

is - Icelandic

it - Italian

ja - Japanese

jv - Javanese

ka - Georgian

kk - Kazakh

km - Central Khmer

kn - Kannada

ko - Korean

ku - Kurdish

kv - Komi

kw - Cornish

ky - Kirghiz

la - Latin

lb - Luxembourgish

li - Limburgan

lo - Lao

lt - Lithuanian

lv - Latvian

mg - Malagasy

mk - Macedonian

ml - Malayalam

mn - Mongolian

mr - Marathi

ms - Malay

mt - Maltese

my - Burmese

ne - Nepali

nl - Dutch

nn - Norwegian Nynorsk

no - Norwegian

oc - Occitan

or - Oriya

os - Ossetian

pa - Punjabi

pl - Polish

ps - Pashto

pt - Portuguese

qu - Quechua

rm - Romansh

ro - Romanian, Moldavian,

ru - Russian

sa - Sanskrit

sc - Sardinian

sd - Sindhi

sh - ?

si - Sinhala

sk - Slovak

sl - Slovenian

so - Somali

sq - Albanian

sr - Serbian

su - Sundanese

sv - Swedish

sw - Swahili

ta - Tamil

te - Telugu

tg - Tajik

th - Thai

tk - Turkmen

tl - Tagalog

tr - Turkish

tt - Tatar

ug - Uighur

uk - Ukrainian

ur - Urdu

uz - Uzbek

vi - Vietnamese

vo - Volapük

wa - Walloon

yi - Yiddish

yo - Yoruba

zh - Chinese