Practical lab sessions on Corpus Linguistics
Language Science and Technology
Corpus Tools
Online Corpus Tools
Cosmas II
(free registration required)
Words and Phrases
(trial (20 queries a day), then free registration required)
frequency lists
analyze tests
BYU corpora
(trial (20 queries a day), then free registration required)
OPUS
- open parallel corpus with three different interfaces:
OPUS multilingual search interface (1)
Europarl v7 search interface (2)
Europarl v3 search interface (2)
OpenSubtitles search interface (2)
EUconst search interface (2)
Word Alignment Database (3)
DWDS
: Digitales Wörterbuch der Deutschen Sprache
DiaCollo
: Kollokationsanalyse in diachroner Perspektive
DTA
: Deutsches Text Archiv
Voyant tool
analyse aller Texte möglich
Wortschatz Universität Leipzig
ANNIS
: A web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.
Humboldt-Universität zu Berlin, Corpus Linguistics and Morphology
has a number of mostly smaller corpora available without a login.
The Georgetown University ANNIS
runs some freely available corpora
Corpus Analysis Tools
Download Tools
AntConc and other Ants
KWICFinder - Key Word in Context Concordances from the Web with
kfNgram - ngrams in text and HTML files
Simple Concordance Program
TextSTAT - Simple Text Analysis Tool
Concordance
(at the moment not available because of compatibility problems)
CasualConc
Commercial
WordSmith
50 Pfund
Web Tools
Voyant Tools
– word frequencies, concordance, word clouds, visualizations
TAPorWare
– various data cleaning, annotating, and summarizing tools in a web interface
Netlytic
– word frequencies, concordance, dictionary tagging, network analysis
Wmatrix
– frequency profiles, concordances, compare frequency lists, n-grams and c-grams, collocations
Natural Language Processor & Analyzer
- word frequencies, collocations, concordance, tokenizer, etc.
ManyEyes
– interactive text visualizations (network diagram, word tree, phrase net, tag cloud, word cloud)
Overview
– Automatic topic tagging and visualization
Monk Workbench
– Corpus selection from library holdings, frequencies and corpora comparisons, supervised classification
LIWC
- Web version will output a few linguistic dimensions; full version can be licensed for ~$100
Back