Manas-UdS Kyrgyz Corpus

Logo of Kyrgyz-Turkish Manas University in Bishkek Logo of Saarland University


The Manas-UdS Kyrgyz Corpus is an annotated corpus of the Kyrgyz language.

Part one comprises 1,205,888 words of 84 literary texts of five genres: novel, novelette, epic, minor epic, and fairy tale. The corpus is annotated with lemma and part-of-speech tags and rich per-text meta-data. The texts were sourced from the Bizdin Muras foundation which promotes the development of the Kyrgyz language (

Part two adds Kyrgyz proverbs (also from the Bizdin Muras foundation) and ca. 1 Million words of newspaper text generously provided by Erkin-Too, the state official newspaper of the Kyrgyz Republic (


Persistent identifier

Aida Kasieva, Jörg Knappen, Stefan Fischer, and Elke Teich (2020). A new Kyrgyz corpus: sampling, compilation, annotation, poster at 42. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, Hamburg (Germany), March 2020


The newspaper texts in the corpus are copyright © by Erkin-Too, the state official newspaper of the Kyrgyz Republic. They are licenced for strictly non-commercial use under the condition that the copyright owner is acknowledged with a full citation.

The other texts are licenced from Bizdin Muras Foundation for non-commercial use with author attribution.

Creative Commons License

The Manas-UdS Kyrgyz Corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


To the Hompage of the UdS CLARIN-D repository | Terms of use | Impressum