In this tutorial we will learn how to use WebLicht to linguistically annotate texts.

About WebLicht

WebLicht is a webbased environment for the annotation of corpora. It includes tools for tokenization, lemmatization, pos-tagging and parsing (among others), which can be combined individually to tool chains.
No installation of the tools is required.
The results of the annotation can be viewed online and/or downloaded for further processing

Getting started

WebLicht is free for academic use. As Saarland University is a member of a CLARIN service provider federation, you already have an account.

To log in:

Starting WebLicht

Input Selection

Select an input by:

Depending on the input you may have to give additional Information such as Language and Document Type.

Building a chain

You can choose between two different chain building modes:

Choose

Building customized chains using Advanced Mode

In the Advanced Mode you build your chain step-wise and bottom-up. In the window at the top you are presented the Next Choices of tools. The choice of tools depends on the current state of your tool chain, which is displayed in the window Input and Chain Selection below.

WebLicht uses a dedicated stand-alone XML-format for data transfer between the different tools: the TCF format. If your data is in another input format the first tool in your chain is always a TCF Converter.

You can either drag-and-drop the tool from the Next Choices window to the Input and Chain Selection window or simply double-click on the tool. This will add the selected tool to your tool chain and adopt the Next Choices window. The example below shows a tool chain consisting out of a TCF Converter and a Tokenizer. The next choices for this chain include part-of-speech taggers, parsers and named entity recognizers.

A chain for part-of-speech tagging of our input data could look like this:

The icons at the bottom of the input data and the tools provide additional options:

  • Choose new inpt for this chain: change the input data but keep the tool chain
  • View Metadata: information on the selected tools (description, creator, contact, PID)
  • Remove tools starting here: remove all tools from the selected tool up to the end of the chain

Once you click on Run Tools a status bar appears below the Input and Chain Selection window. Once a tool is finished additional additional options are available for the results after each processing step:

  • Save Results: download the processed data in the TCF-format
  • Visualize Results: view results in a new window of the browser

In the Visualization window you can inspect the different layers of the annotation. Besides additional download options are available, e.g.:

  • Download as Excel sheet
  • Download as CSV

Parsed data (constituency, dependency) can additionally be search using TüNDRA (the Tübingen aNnotated Data Retrieval Application), which is available directly from within WebLicht with an icon.

Tundra allows for:

  • browsing the results (Treebank) by sentence
  • search the results
  • downloading the visualized parse in several formats (svg,png,jpg)
  • displaying the results in different graph types