In this tutorial we will learn how to download query results in a TAB-deliminated format.

You can download any combination of attributes (positional and/or structural) for a query result in a TAB-separated format.

Things you have to know first

In order to understand the concept better, let’s have a look an example. The diagram below shows the beginning of a corpus. In the first column you see the corpus position and in the second column the words.

0   Alice
1   's
2   adventures
3   in
4   wonderland
5   Down
6   the
7   Rabbit-Hole
8   ALICE
9   was
10  beginning
11  to
12  get
13  very
14  tired
15  of
16  sitting
17  by
18  her
19  sister
20  on

Let’s assume we looked for noun phrases in this subset of the corpus and we got the following results.

Alice 's adventures
wonderland
the Rabbit-Hole
ALICE
her sister

Then the internal represenation of this query would be:

match  matchend
 0        2
 4        4
 6        7
 8        8
18       19

The number in the first column refers to the corpus position of the first word in the hit (the anchor match), the number in the second column refers to the last word in the hit (the anchor matchend).
Attention: each hit is represented by two corpus positions (the anchors: match and matchend) not matter of the length of the string. For hits consisting solely of one word match and matchend are the same.

How to download results

Under Frequently-used tabulations you can find a number of preinstalled tabulation commands.

The following will show how to specify a custom tabulation.

Let us start with a simple example by assuming that we are interested in the distribution of the different verb forms of content verbs in the registers of the Brown corpus.

The query we need is: [pos="VV.*"]. As we are looking for one token only match and matchend will be the same for this query.

Thus, the information (attributes) we need to extract for each hit is: - the verb form and - the register. The result we are aiming at looks like this. Where we have the verb form in the first column and register in the second column for each hit in the query results.

If we have another look at the tabulation window above, we can see that for each output column we have to specify:

The “range” is defined by a beginning and end position relative to one of the available anchors, usually match or matchend. With the offset we can refer to positions relative to one of the anchors, e.g.:

In our example, we want to extract: - the verb form - the attribute pos - for the range of each hit, as match and matchend are the same, we can leave the beginning and end as match - the register - the attribute text_reg - once for each hit, i.e. it suffices to extract it at one posiition, i.e. we can also leave beginning and end as match

The tabulation window for the extraction of our example will look as follows:

The tabulation download allows for a number of different output options. For the simple tabulation format we aimed at we need to choose simple tabulate output.

Other output options are:

Examples

  • Phenomena: distribution of content verbs after the personal pronoun “we” across academic disciplines
  • Corpus: DaSciTex
  • Query: [word="we"] [pos="V[HB].*|RB"]* [pos="VV.*"]
  • Columns for extraction:
    • lemma of content verb; attribute: lemma; range: matchend
    • academic discipline; attribute: text_ad; range: match
  • Output option: matrix