Corpus Query with Regular Expressions: Performing Queryies with Patterns

In this tutorial we will learn how to apply patterns in CQP Web, search for more than one word, respectively for a sequence of words which we asume would appear often in a corpus.

Simple patterns

Up to now, we have looked for one token only. However, we can also look for a sequence of tokens, a pattern.
Just to recall: One set of square brackets stands for one token. To list a sequence of patterns, we simply need a sequence of square brackets.

A simple word sequence:

"it" "is" "possible" "that"

A sequence of part-of-speech tags, here: adjective-noun-cooccurrences in order to find adjective-noun collocations

[pos="JJ.*"][pos="NN.*"]

Using coordinate constructions to find semantically related words

[word="\w+"][word="and|or"][word="\w+"]

\w stands for any word character (compare to ., which stands for any character)

verbally derived adjectives:

[(word=".+(ed|ing)") & (pos="JJ")][pos="NN"]

Simple patterns with optional token

Often, the pattern we are looking for may be modified at certain positions. In oder words, we want to include discontinuous pattern with optional modifying elements- We can use the same quantifiers for tokens as we used for characters (?*+{n,m})

A query with one unspecified optional token

"it" "is" []? "possible" "that"

A query with n to m optional tokens

"it" "is" []{1,3} "possible" "that"

A query with 0 and more optional tokens

"it" "is" []* "possible" "that"

Why is the last query not such a good idea?

A query with 0 and more optional tokens within a sentence

"it" "is" []* "possible" "that" within s