Manas-UdS Kyrgyz Corpus

Logo of Kyrgyz-Turkish Manas University in Bishkek Logo of Saarland University

Description

The Manas-UdS Kyrgyz Corpus is an annotated corpus of the Kyrgyz language.

Part one comprises 1,205,888 words of 84 literary texts of five genres: novel, novelette, epic, minor epic, and fairy tale. The corpus is annotated with lemma and part-of-speech tags and rich per-text meta-data. The texts were sourced from the Bizdin Muras foundation which promotes the development of the Kyrgyz language (http://bizdin.kg).

Part two adds Kyrgyz proverbs (also from the Bizdin Muras foundation) and ca. 1 Million words of newspaper text generously provided by Erkin-Too, the state official newspaper of the Kyrgyz Republic (https://erkin-too.kg/).

Citation

Persistent identifier http://hdl.handle.net/21.11119/0000-0004-B62D-D

Aida Kasieva, Jörg Knappen, Stefan Fischer, and Elke Teich (2020). A new Kyrgyz corpus: sampling, compilation, annotation, poster at 42. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, Hamburg (Germany), March 2020 https://www.zfs.uni-hamburg.de/dgfs2020/programm/abstracts/dgfs2020-clp-kasieva.pdf

Licence

The newspaper texts in the corpus are copyright © by Erkin-Too, the state official newspaper of the Kyrgyz Republic. They are licenced for strictly non-commercial use under the condition that the copyright owner is acknowledged with a full citation.

The other texts are licenced from Bizdin Muras Foundation for non-commercial use with author attribution.

Creative Commons License

The Manas-UdS Kyrgyz Corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Download

To the Hompage of the UdS CLARIN-D repository | Terms of use | Impressum