In this tutorial we will learn how to part-of-speech-tag a text using the GUI of TreeTagger.

Tagging is the task of labeling each word in a sequence of words with the appropriate part-of-speech (pos). The labels asigned are specified in a so-called tagset, a set of part-of-speech tags. The size and choice of the tagset can vary greatly. Usually the size is between 50 and 200 tags.

TreeTagger:

Tagsets used by the trained parameter files of TreeTagger:

How to get started

Tag your first text with TreeTagger

The tagged file

  • with the default settings, TreeTagger will have tokenized, lemmatized and part-of-speach tagged your text
  • the tagged file is now in a one-word-per-line format
  • each line has three TAB separated columns
    • word<TAB>pos<TAB>lemma

Tagging text with XML/SGML tags

Back