The Sources of the Words

Project srmorh uses two sources for its lexical biuld. The first is the Serbian words list derived from hunspell and some Serbian literature classics. The second source is Dictionary of Croatian Languages. At the moment the two sources are not merged.

Serbian Words Database

The database currently counts 272461 unique word forms. The word forms include inflectional (i.e. nouns in different cases) and noninflectional words (i.e. conjuction, preposition).

The script conversions (Cyrrilic to Latin and vice versa) is done automatically without detailed checks.

The textual sources are extracted so that all words are lowercased and without a dash. There might be a lot of "noise" in the database.

Below is the link to download the database in Serbian.

Croatian Words Database

The Croatian words are from Igaly's Dictionary of Croatian Languages. The plain text wordlist was taken, filtered and converted to document-based source. You can search it and see the Cyrillic counterparts of the words.

Parsing log for the Serbian wordlist

(May not be up to date.) words 3490 unique 13620 unique 26% Bozji_ljudi.txt words 2683 unique 6107 unique 44% DECJA_ZBIRKA_PESAMA_Vojislava_Ilica.txt words 8259 unique 27395 unique 30% Antologija_novije_srpske_lirike.txt words 1491 unique 2736 unique 54% KORA.txt words 222006 unique 222032 unique 100% hunspell-sr.txt words 7210 unique 24068 unique 30% Druga_pevanija.txt words 14929 unique 82273 unique 18% Koreni.txt words 10082 unique 44986 unique 22% Dorcol.txt words 16287 unique 73885 unique 22% Beleske_jedne_Ane.txt words 12593 unique 73453 unique 17% Dositej_Obradovic _Basne.txt words 4163 unique 10268 unique 41% Antologija_srpske_poezije_za_decu.txt words 13222 unique 100663 unique 13% Antologija_narodnih_pripovedaka.txt words 6515 unique 21163 unique 31% Bespuce_Vuk_Milicevic.txt words 4663 unique 28499 unique 16% GOSPODJA_MINISTARKA.txt words 13514 unique 65400 unique 21% Autobiografija_nusic.txt words 24170 unique 151780 unique 16% Antologija_narodnih_junackih_pesama.txt words 14502 unique 65119 unique 22% Afrika.txt words 9550 unique 48557 unique 20% Veciti_mladozenja.txt words 26011 unique 112693 unique 23% PROLJECA_IVANA_GALEBA_Vladan_Desnica.txt words 5510 unique 16672 unique 33% GORSKI_VIJENAC.txt words 6130 unique 30816 unique 20% Gazda_Mladen.txt words 15379 unique 59142 unique 26% Najbolje_godine_i_druge_price.txt words 11264 unique 61041 unique 18% Pripovetke_Milovan_Glisic.txt words 12441 unique 47989 unique 26% BASTA_SLJEZOVE_BOJE_Branko_Copic.txt words 9076 unique 25865 unique 35% Kratka_istorija_srpske_knjizevnosti.txt words 15703 unique 83264 unique 19% DVA_SRPSKA_ROMANA_Novica_Petkovic.txt words 11599 unique 64580 unique 18% Gorski_car.txt words 9831 unique 49818 unique 20% Iz_starog_jevandjelja_i_Stari_dani.txt words 9546 unique 49380 unique 19% Glasam_za_ljubav.txt total (with repeats) 1663264 total 1663264 unique 272489 unique 16%