Skip to main content
For each search element, the code in an Extraction Rules activity contains a series of rules that are executed sequentially. Each rule consists of two parts and ends with a semicolon:
Left-hand part
=>
Right-hand part;
The left-hand part checks a condition or finds values that fulfill a condition. The right-hand part creates new search element instances and assigns the values to them.

Example

The simple rule below finds a search element kw_Contract that contains the keyword “Contract”:
[ t: "Contract" ] // looking for a word "Contract" and assigning the token to a new variable called t
=>
kw_Contract( t ); // creating a new instance of the kw_Contract search element on each repetition of the word "Contract"

Left-hand part structure

The left-hand part consists of up to three parts:
Existence_condition
Token_template
Additional_check
  1. Existence condition (optional): Finds an object that has the specified attributes. We recommend assigning a name to the object if it exists, so that you can use it again within the same rule. The condition may also specify the absence of an object.

Examples

If you need to find only one instance of the SellerName search element, specify at the beginning of the rule that no instance has yet been found:
~SellerName // checks that the SellerName search element hasn't yet been found
Note that you may not assign a variable name when using negative conditions, because a negative condition means that there is no object. This condition checks the existence of an address named entity:
a: NERAddress // checks the existence of an address named entity and assigns it to a new variable called a
See Existence condition for details on the syntax used in this part.
  1. Token template (required): Consists of a sequence of tokens. A token is a word from a natural language or a punctuation mark. Tokens don’t usually contain spaces, apart from some rare cases where the space is inside a set construction used as a single word: such as, a lot.

Example

For example, the following token template consists of one token matching the “Grantor” keyword, one token with a colon, and a repeating token with the Person named entity (specified as repeating because a person’s name may consist of several words, each a separate token):
["Grantor"] [":"] [p: @NERPerson]+
The template will match a text string like “Grantor: Anne Smith” and assign the value “Anne Smith” to the variable p. See Token template for details on the syntax used in this part.
  1. Additional check (optional): Checks more conditions for objects that were already mentioned in the rule.

Example

For example, suppose you’re looking for the parties of a contract, and have grouped the fields into two separate group instances, one for each party. Having found an organization name and an address, you can check whether these search elements have the same parent. If they do, this means that this address belongs to this organization.
n: Party.OrgName // the organization name search element was found earlier
[ a: @Party.Address ] // finds the token on which the address search element was found
// Checks that organization name and address search elements
// are children of the same instance of the Party group element
parent( n ) == parent( obj( a ) )
See Additional check for details on the syntax used in this part.

Right-hand part structure

The right-hand part consists of one or several comma-separated parts that create new instances of groups and search elements and assign values to the search elements:
Create_group_A,
Create_element_B_and_assign_value
Usually, the left-hand part of the rule finds a token, a token sequence, or an object in the left part and assigns it to a variable. Then, the right-hand part can use this variable to write its value into one of the search elements. For example, the following code writes the value of t into the Element1 search element:

Example

[t: "Lease"] ["Agreement"]
=>
Root.kw_AgreementType( t );
See Right-hand part for details on the syntax used in this part. Note: Code only creates instances of search elements and group search elements that already exist in the Extraction Rules activity. No new search elements can be created using code. To create additional search elements in the structure, use the Search Elements tab in the Activity Editor.

Referencing search elements

To access a search element in code, you need to use a name that will identify that element unambiguously. If an element has a unique name, you can refer to it simply by its name. If there are several elements with the same name, you need to specify a path that is long enough to identify the element.

Example

Consider the following nested structure of search elements:
  • Property
    • Type
    • Address
  • Buyer
    • FullName
    • Address
To reference the name of the buyer in code, you can simply use “FullName”, because the element with this name is unique. “Buyer.FullName” or “Root.Buyer.FullName” will work just as well. To reference the buyer’s address, you will have to add at least the name of the parent group, because there are two elements called “Address”. Here, you should use “Buyer.Address” or “Root.Buyer.Address”.

Language-specific rules

You can specify the document language for which the rule should be executed. Doing so will skip the rule for other languages. Precede the rule by a hash (#) and the two-letter language code (ISO-639-1 standard).

Codes for the supported languages

LanguageCode
Englishen
Germande
Frenchfr
Spanishes
Italianit
Portuguese (Standard)pt
Japaneseja
Russianru
For example, if you process documents in English and in Spanish, you may want to use language-specific keywords:

Example

#en
[ t: "Grantor" ] // the keyword for English documents
=>
kw_landlord( t );

#es
[ t: "Arrendador" ] // the keyword for Spanish documents
=>
kw_landlord( t );