Skip to main content
OperationSyntaxDescriptionExample
token[]Square brackets enclose a single token.[] Any word or punctuation mark.
token text""Quotation marks denote a token with the specified text.["Grantor"] A token with the word “Grantor”.
variable:A colon is used to assign a name to the token sequence that follows it. Note: The variable will be available only within that same rule, that is, until the right-hand part is finished with a semicolon.[t: "Contract"] Finds a token that contains the word “Contract” and assigns it to the variable t.
lemmaL"word" Lemma"word"”L” or “Lemma” before a word means that the search should find all forms of that word.[L"rule"] Tokens such as “rule”, “rules”, “ruled”, and “ruling” will all be found.
regular expression/regular expression/Single forward slashes enclose a regular expression. As with the Value from Regular Expression search elements, the PCRE2 regular expression syntax is used.`/[1]?\d:\d\s+(([ap].m.)([AP]M))?/` The template will match the time in 12-hour format, for example, “2:00 p.m.”, “9:34 AM”.
option setting for regular expressions/regular expression/iThe option setting can be put after the forward slash that closes the regular expression. The “i” option, for example, means case-insensitive matching of the regular expression./[1]?\d:\d{2}\s+([ap]\.?m\.?)?/i This template simplifies the example above with the help of case-insensitive matching. It will match the time in 12-hour format, for example, “2:00 p.m.”, “9:34 AM”.
token predicates<first_in_paragraph>The token is first in a paragraph.["Section" <first_in_paragraph>] The word “Section” starts the paragraph. You can use this, for example, to extract the section number immediately following this token.
token predicates<punctuator>The token is a punctuation mark.["Tenant"] [<punctuator>]{0,2} The word “Tenant” may be followed by up to two punctuators, for example a closing bracket and a comma.
token predicates<initial_letter_capitalized>The token begins with a capital letter.[L"agreement" <initial_letter_capitalized>] This template will match “Agreement” and “Agreements”, but not “agreement”.
token predicates<mixed_capitalization>Some, but not all, letters of the token are capitalized.[L"letter" <mixed_capitalization>] This template will match, for example, “Letters” and “letteR”.
token predicates<all_letters_capitalized>The token is in all capital letters.[t: @NEROrganization <all_letters_capitalized>]+ This template will match organization named entities written in all caps.
logical OR``A vertical bar is used to specify alternative token text or alternative conditions for the token.`[“Lender""Co-Lender”] [“shall”] [“have”]` This template will match either of these strings: “Lender shall have” or “Co-Lender shall have”
logical ANDspaceSpace is used as a logical conjunction for token conditions.[t: "Section" <first_in_paragraph>] Finds a token that contains the word “Section” AND starts a paragraph.
object condition@object_nameAn at sign (”@”) checks that the token is located within an object region. The following objects are supported: Named entity objects, the same types that are also available as specialized search elements, prefixed by “NER”: NERPerson, NEROrganization, NERAddress, NERLocation, NERDate, NERDuration, NERMoney; Search elements; Sentence - A separate object is created for each sentence in the text flow; Paragraph - A separate object is created for each paragraph in the text flow.[t: @NEROrganization]+ ["Lender"] Assigns the name t to a token sequence that contains a NEROrganization entity and is followed by the keyword “Lender”. [t: @NERPerson @Preamble_Segment ]+ Finds a person’s name in the preamble segment (represented by an Input field search element).
separating similar objects@object_name( same )”same” means that on a repeating token, the same object is matched to the sequence of tokens instead of several objects of this type. If the object condition is using a logical OR, “same” should not be used.For example, if you have a list of people’s names following one another, they will all be detected as NERPerson. To extract one person’s name at a time, use the “same” condition. [t: @NERPerson( same )]+ Assigns the name t to the first person found.
relative position@object_name( right_to( another_object )) @object_name( left_to( another_object ))”right_to” means that object_name is found after the another_object specified in the brackets. “left_to” means that object_name is found before the another_object specified in the brackets.[ t: @NERAddress( same, right_to( id1 ), left_to( id2 ) )] Finds a NERAddress named entity between id1 and id2.
logical OR (as used in object conditions)``A vertical bar can also be used as a logical disjunction for object conditions.`[t: @NERPerson( right_to( id1 )right_to( id2 ) )]` Assigns the name t to a person’s name located either after id1 or after id2.
token sequencespaceTokens in a sequence are separated by spaces.["Grantor"] [":"] This template will match “Grantor:”. Tokens may be specified simply for context, even if these words don’t need to be extracted.
alternative token sequences`[token1]([token2][token3])`A vertical bar is used to specify alternative token sequences. Round brackets set the priority.`[“will”] ([“start”]([“take”] [“place”])) [“on”]This template will match either of these strings: "will start on" or "will take place on". Note that if we had omitted the brackets, the template would also have matched "will start place on". The brackets ensure that the "take place" phrase is either present completely or not at all. **Note:** For more complex real-life cases of alternative token sequences, you may find it more convenient to write a separate rule for each alternative:[“will”] [“start”] [“on”] => …; [“will”] [“take”] [“place”] [“on”] => …;`
optional token[]?A question mark means that the token is optional.["Tenant"] ["."]? The word “Tenant” may be followed by a dot.
optional repeating token[]*An asterisk means that the token is optional and may be repeated several times.["Grantor"] []* ["Tenant"] The keywords “Grantor” and “Tenant” may be separated by any number of tokens, or none.
required repeating token[]+A plus sign means that the token should be found at least once and may be repeated.[@NERPerson]+ Specifies that a person’s name should be found, possibly over several tokens, because the name usually consists of several words.
token with specified number of repetitions[]{n,} []{n,m}Numbers in curly brackets mean that the token should be repeated from n to m times. If the second number is not specified, the token should be repeated at least n times. Note: As you can see, {0,} is equivalent to *, while {1,} is equivalent to +.["Grantor"] []{1,3} ["Tenant"] The keywords “Grantor” and “Tenant” should be separated by 1 to 3 tokens. This may be more useful than *, because you will be able to specify that the two keywords are not too distant from each other.