The Manas-UdS Kyrgyz Corpus is an annotated corpus of the Kyrgyz language.
Part one comprises 1,205,888 words of 84 literary texts of five genres: novel, novelette, epic, minor epic, and fairy tale. The corpus is annotated with lemma and part-of-speech tags and rich per-text meta-data. The texts were sourced from the Bizdin Muras foundation which promotes the development of the Kyrgyz language (http://bizdin.kg).
Part two adds Kyrgyz proverbs (also from the Bizdin Muras foundation) and ca. 1 Million words of newspaper text generously provided by Erkin-Too, the state official newspaper of the Kyrgyz Republic (https://erkin-too.kg/).
Persistent identifier http://hdl.handle.net/21.11119/0000-0004-B62D-D
Aida Kasieva, Jörg Knappen, Stefan Fischer, and Elke Teich (2020). A new Kyrgyz corpus: sampling, compilation, annotation, poster at 42. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, Hamburg (Germany), March 2020 https://www.zfs.uni-hamburg.de/dgfs2020/programm/abstracts/dgfs2020-clp-kasieva.pdf
The newspaper texts in the corpus are copyright © by
The other texts are licenced from Bizdin Muras Foundation for non-commercial use with author attribution.
The Manas-UdS Kyrgyz Corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
To the Hompage of the UdS CLARIN-D repository | Terms of use | Impressum