In this tutorial we will learn how to apply patterns in CQP Web, search for more than one word, respectively for a sequence of words which we asume would appear often in a corpus.
Up to now, we have looked for one token only. However, we can also look for a sequence of tokens, a pattern.
Just to recall: One set of square brackets stands for one token. To list a sequence of patterns, we simply need a sequence of square brackets.
A simple word sequence:
"it" "is" "possible" "that"
A sequence of part-of-speech tags, here: adjective-noun-cooccurrences in order to find adjective-noun collocations
[pos="JJ.*"][pos="NN.*"]
Using coordinate constructions to find semantically related words
[word="\w+"][word="and|or"][word="\w+"]
\w
stands for any word character (compare to .
, which stands for any character)
verbally derived adjectives:
[(word=".+(ed|ing)") & (pos="JJ")][pos="NN"]
Often, the pattern we are looking for may be modified at certain positions. In oder words, we want to include discontinuous pattern with optional modifying elements- We can use the same quantifiers for tokens as we used for characters (?*+{n,m}
)
A query with one unspecified optional token
"it" "is" []? "possible" "that"
A query with n to m optional tokens
"it" "is" []{1,3} "possible" "that"
A query with 0 and more optional tokens
"it" "is" []* "possible" "that"
Why is the last query not such a good idea?
A query with 0 and more optional tokens within a sentence
"it" "is" []* "possible" "that" within s