Under construction

Five literary languages financed by the Kone Foundation: Hill Mari (mrj), Livonian (liv), Moksha (mdf), Nenets (yrk), Olonets (olo),

Finnish translations and transducer development as of Dec. 16, 2013

Source
languages
Number of lemmas
with translations
Approximate number of lemma:stem pairs
presently in the transducer
Lemma:stem pairs with
complex regular inflection
Hill Mari 9,575 53,309 12,411
Livonian 9,805 11,781 10,627
Moksha 13,518 59,276 15,662
Olonets (Livvi) 7,095 12,124 6,638
Nenets 8,463 9,309 8,982

A few more literary languages and their statistics: Erzya (myv), Komi Zyrian (kpv), Meadow Mari (mhr), Udmurt (udm)

Finnish translations and transducer development as of Dec. 19, 2013

Source
languages
Number of lemmas
with translations
Approximate number of lemma:stem pairs
presently in the transducer
Lemma:stem pairs with
complex regular inflection
Erzya 41,649
Komi Zyrian 34,419
Udmurt 10,021

Transducer coverage of lesser Uralic languages on Giellatekno.

HFST Transducer development as of Jan. 24, 2014

Languages ISO CODES Size of test corpus
in word forms
Unrecognized
word forms
Coverage Size of test corpus
in unique word forms
Unrecognized
unique word forms
Unique word
form coverage
Unrecognized word forms
six letters or less
Erzya myv 1,189,209 45,554 96.17% 126,232 6,455 94.89% 6,111
Hill Mari mrj 310,174 96,552 68.87% 56,386 33,109 41.28% 8,858
Ingrian izh NA NA NA 32,005 14,965 53.24% 3915
Komi (Zyrian) kpv 634,545 31,661 95.01% 63,997 14,057 78.03% 4,160
Livonian liv 9,293 1,914 79.40% 3,073 1,050 65.83% 549
Meadow and Eastern Mari mhr 212,393 59,459 72.01% 42,558 25,038 41.17% 6,771
Moksha mdf 861.114 298,440 65.34% 109,933 74,072 32.62% 13,251
Nenets yrk 48,041 18,865 60.73% 16,623 10,069 39.43% 2,478
Olonets olo 249,813 62,400 75.02% 38,443 23,693 38.37% 3,760
Udmurt udm 107,453 45,342 57.80% 23,862 15,361 35.63% 3,943
V├Áro vro 647,634 418,167 35.43% 80,228 72,169 10.05% 17,417

Finnish translation statistics for lemma sets in many of the Uralic languages
It should be noted that a 20,000 lemma translation funded by the Kone Foundation is under way 2013-2014: Hill Mari (mrj); Livonian (liv); Moksha (mdf); Nenets (yrk), and Olonets (Livvi Karelian) (olo).
A translation of around 31,000 lemmas is under way for Komi Zyrian (kpv); this is also funded by the Kone Foundation.
STATS
(2014-01-24)
(2014-02-19)


Contact Jack Rueter: First name dot last name at helsinki dot fi.


Last modified: Thu Dec 19 9:26:17 EEST 2013