In this tutorial we will learn how to download query results in a TAB-deliminated format.
You can download any combination of attributes (positional and/or structural) for a query result in a TAB-separated format.
Things you have to know first
anchors
. The main anchors
are the corpus position of the first and the last word in the hit.anchors
are the so-called target
and keyword
. Both of which are optional and have to be specifically defined for a query (we will ignore them in this tutorial).In order to understand the concept better, let’s have a look an example. The diagram below shows the beginning of a corpus. In the first column you see the corpus position and in the second column the words.
0 Alice
1 's
2 adventures
3 in
4 wonderland
5 Down
6 the
7 Rabbit-Hole
8 ALICE
9 was
10 beginning
11 to
12 get
13 very
14 tired
15 of
16 sitting
17 by
18 her
19 sister
20 on
Let’s assume we looked for noun phrases in this subset of the corpus and we got the following results.
Alice 's adventures
wonderland
the Rabbit-Hole
ALICE
her sister
Then the internal represenation of this query would be:
match matchend
0 2
4 4
6 7
8 8
18 19
The number in the first column refers to the corpus position of the first word in the hit (the anchor match
), the number in the second column refers to the last word in the hit (the anchor matchend
).
Attention: each hit is represented by two corpus positions (the anchors: match
and matchend
) not matter of the length of the string. For hits consisting solely of one word match
and matchend
are the same.
Download
from the Menu in the upper right corner of the concordance window and click on Go
.Download query as plain-text tabulation
Under Frequently-used tabulations
you can find a number of preinstalled tabulation commands.
The following will show how to specify a custom tabulation.
Let us start with a simple example by assuming that we are interested in the distribution of the different verb forms of content verbs in the registers of the Brown corpus.
The query we need is: [pos="VV.*"]
. As we are looking for one token only match
and matchend
will be the same for this query.
Thus, the information (attributes) we need to extract for each hit is: - the verb form and - the register. The result we are aiming at looks like this. Where we have the verb form in the first column and register in the second column for each hit in the query results.
If we have another look at the tabulation window above, we can see that for each output column we have to specify:
attribute
we want to extract, andattribute
.The “range” is defined by a beginning and end position relative to one of the available anchors
, usually match
or matchend
. With the offset
we can refer to positions relative to one of the anchors
, e.g.:
-1
refers to word preceding the result1
refers to the word following the resultIn our example, we want to extract: - the verb form - the attribute pos
- for the range of each hit, as match
and matchend
are the same, we can leave the beginning and end as match
- the register - the attribute text_reg
- once for each hit, i.e. it suffices to extract it at one posiition, i.e. we can also leave beginning and end as match
The tabulation window for the extraction of our example will look as follows:
The tabulation download allows for a number of different output options. For the simple tabulation format we aimed at we need to choose simple tabulate output
.
Other output options are:
sort and group output
which will calculate frequencies for each combination of attributessort and group output, display as matrix
, which calculates a matrix[word="we"] [pos="V[HB].*|RB"]* [pos="VV.*"]
matchend
match