Download query results in a TAB-deliminated format

In this tutorial we will learn how to download query results in a TAB-deliminated format.

You can download any combination of attributes (positional and/or structural) for a query result in a TAB-separated format.

Things you have to know first

A CQP corpus is based on an index
All words in a corpus are numbered beginning from “0”. We refer to this number as corpus position.
In the internal representation each hit is represented by so-called anchors. The main anchors are the corpus position of the first and the last word in the hit.
Other anchors are the so-called target and keyword. Both of which are optional and have to be specifically defined for a query (we will ignore them in this tutorial).

In order to understand the concept better, let’s have a look an example. The diagram below shows the beginning of a corpus. In the first column you see the corpus position and in the second column the words.

0   Alice
1   's
2   adventures
3   in
4   wonderland
5   Down
6   the
7   Rabbit-Hole
8   ALICE
9   was
10  beginning
11  to
12  get
13  very
14  tired
15  of
16  sitting
17  by
18  her
19  sister
20  on

Let’s assume we looked for noun phrases in this subset of the corpus and we got the following results.

Alice 's adventures
wonderland
the Rabbit-Hole
ALICE
her sister

Then the internal represenation of this query would be:

match  matchend
 0        2
 4        4
 6        7
 8        8
18       19

The number in the first column refers to the corpus position of the first word in the hit (the anchor match), the number in the second column refers to the last word in the hit (the anchor matchend).
Attention: each hit is represented by two corpus positions (the anchors: match and matchend) not matter of the length of the string. For hits consisting solely of one word match and matchend are the same.

How to download results

choose Download from the Menu in the upper right corner of the concordance window and click on Go.
you are now on the window for downloading concordance lines
scroll all the way down on the page and click on Download query as plain-text tabulation
You will see the following window

Under Frequently-used tabulations you can find a number of preinstalled tabulation commands.

The following will show how to specify a custom tabulation.

Let us start with a simple example by assuming that we are interested in the distribution of the different verb forms of content verbs in the registers of the Brown corpus.

The query we need is: [pos="VV.*"]. As we are looking for one token only match and matchend will be the same for this query.

Thus, the information (attributes) we need to extract for each hit is: - the verb form and - the register. The result we are aiming at looks like this. Where we have the verb form in the first column and register in the second column for each hit in the query results.

If we have another look at the tabulation window above, we can see that for each output column we have to specify:

the attribute we want to extract, and
the “range” for which we want to extract the attribute.

The “range” is defined by a beginning and end position relative to one of the available anchors, usually match or matchend. With the offset we can refer to positions relative to one of the anchors, e.g.:

match with offset -1 refers to word preceding the result
matchend with offset 1 refers to the word following the result

In our example, we want to extract: - the verb form - the attribute pos - for the range of each hit, as match and matchend are the same, we can leave the beginning and end as match - the register - the attribute text_reg - once for each hit, i.e. it suffices to extract it at one posiition, i.e. we can also leave beginning and end as match

The tabulation window for the extraction of our example will look as follows:

The tabulation download allows for a number of different output options. For the simple tabulation format we aimed at we need to choose simple tabulate output.

Other output options are:

sort and group output which will calculate frequencies for each combination of attributes
sort and group output, display as matrix, which calculates a matrix

Examples

Phenomena: distribution of content verbs after the personal pronoun “we” across academic disciplines
Corpus: DaSciTex
Query: [word="we"] [pos="V[HB].*|RB"]* [pos="VV.*"]
Columns for extraction:
- lemma of content verb; attribute: lemma; range: matchend
- academic discipline; attribute: text_ad; range: match
Output option: matrix