In this tutorial we will learn how to use WebLicht to linguistically annotate texts.
WebLicht is a webbased environment for the annotation of corpora. It includes tools for tokenization, lemmatization, pos-tagging and parsing (among others), which can be combined individually to tool chains.
No installation of the tools is required.
The results of the annotation can be viewed online and/or downloaded for further processing
WebLicht is free for academic use. As Saarland University is a member of a CLARIN service provider federation, you already have an account.
To log in:
Select an input by:
Depending on the input you may have to give additional Information such as Language
and Document Type
.
You can choose between two different chain building modes:
Choose
Easy Mode
to use pre-defined chains orAdvanced Mode
to build customized chains.Advanced Mode
In the Advanced Mode
you build your chain step-wise and bottom-up. In the window at the top you are presented the Next Choices
of tools. The choice of tools depends on the current state of your tool chain, which is displayed in the window Input and Chain Selection
below.
WebLicht uses a dedicated stand-alone XML-format for data transfer between the different tools: the TCF format. If your data is in another input format the first tool in your chain is always a TCF Converter
.
You can either drag-and-drop the tool from the Next Choices
window to the Input and Chain Selection
window or simply double-click on the tool. This will add the selected tool to your tool chain and adopt the Next Choices
window. The example below shows a tool chain consisting out of a TCF Converter
and a Tokenizer
. The next choices for this chain include part-of-speech taggers, parsers and named entity recognizers.
A chain for part-of-speech tagging of our input data could look like this:
The icons at the bottom of the input data and the tools provide additional options:
Choose new inpt for this chain
: change the input data but keep the tool chainView Metadata
: information on the selected tools (description, creator, contact, PID)Remove tools starting here
: remove all tools from the selected tool up to the end of the chainOnce you click on Run Tools
a status bar appears below the Input and Chain Selection
window. Once a tool is finished additional additional options are available for the results after each processing step:
Save Results
: download the processed data in the TCF-formatVisualize Results
: view results in a new window of the browserIn the Visualization
window you can inspect the different layers of the annotation. Besides additional download options are available, e.g.:
Download as Excel sheet
Download as CSV
Parsed data (constituency, dependency) can additionally be search using TüNDRA (the Tübingen aNnotated Data Retrieval Application), which is available directly from within WebLicht with an icon.
Tundra allows for:
Treebank
) by sentence