Skip to main contentA Search Element is a description of one or several document image objects, which allows you to set search conditions for an object in a specific area. An element contains information about the type of image object being searched for, the object’s properties and its search area. Search results obtained using element properties are used by the activity to form regions for objects detected on the image—an area consisting of one or several rectangles encompassing the detected object. The location of fields and other elements is then determined relative to the location of detected elements.
To create an element, click Create Element and select the appropriate element type in the list that will appear. Once an element has been created, you need to set up its properties in the Properties pane (see Element Properties for more information). The specified properties can also be viewed and edited in code format (see FlexiLayout Language for more information). Elements can be moved around in the tree depending on the structure of the document. Note that the location of the elements in the tree determines their search order. Element search is carried out in descending order.
When you create a search element, choose its type depending on what object you need to find. Below you will find short descriptions of search element types available in the Extraction Rules activity.
Once you create the search element, configure its properties that are described here.
Static Text
This element describes predefined text. Most document images usually contain some static text. This may either be the name of the document (for example, “Invoice”) or additional labeling for data fields (e.g. “Date”, “to:”, “from:”). Such text is recognized as a Recognized Words object during pre-recognition and usually serves as an anchor for detecting values for the corresponding fields (for example, the date is typically written next to the label “Date” ) This text may consist of a word or a phrase. Phrases are different from words in that they contain at least one space. A phrase may also be written over several lines. When searching for this element, Recognized Words and Recognized Lines objects detected during image pre-recognition and located in the search area of the element are considered.
Character String
This element describes a sequence of characters on a single line (left to right). Character sequences are compiled from recognized text objects (Recognized Words), for example, from whole words or from several fragments of text objects. Thus element is designed for searching for text that is not pre-defined. Recognized Words objects detected during image pre-recognition and located inside the element search area are considered.
Usually, the activity searches for character sequences in areas around static text that has already been detected. For example, when looking for an invoice number, the static text “Invoice No.” needs to be found first, after which the activity looks for a character sequence to the right of the static text on the same line (numbers only in this case).
Note: The activity does not search for Character String elements in Arabic and Hebrew due to their left-to-right restriction.
Paragraph
This element describes a paragraph of text. A search using this element considers all text objects intersecting with the search area. This element is designed to look for paragraphs of text that is not predefined. Recognized Words and Recognized Lines objects detected during pre-recognition and located in the element search area are considered.
Key value
This is a group element designed to search for fields that have a label. To create this element, you need to specify the properties of the label, the main search field, and the space between them. You can also specify a type and appropriate properties for the primary element. Static text and white gap act as secondary search means for the primary field. When searching for the static text, Recognized Words and Recognized Lines objects detected during image pre-recognition and located inside the element search field are considered. Once the static text element has been detected, the activity searches corresponding field containing the element value.
Date
This element describes the date. Dates can be written in different formats, with the day and year values always written as numbers, while the month value can also sometimes be written using letters. The date format is specified by the user.
Amount of Money
This element describes number values that are either integers or have two decimal places. By default, a placeholder symbol for the decimal part is allowed. For example, 12. will be recognized as 12.00. The whole number part can be divided into groups using delimiters (spaces and the following symbols are permitted to act as delimiters by default: . , ’). The number being searched for can have a prefix and a suffix, for example, a text element that comes before or after the number value. The prefix has to be on the same line as the number value. This format is usually used for amounts of money, with the currency name acting as the prefix.
Phone
This element describes a telephone number, which is usually accompanied by a keyword (e.g. “Tel.”, “Home Tel.”, etc.) and a city/region code separated from the rest of the number using brackets. The telephone number and the corresponding keyword must be located on the same line.
Region
This element represents a region on a document image without any specifications regarding its contents. The element region can consist of several unconnected regions. This element is used in the activity to mark out regions regardless of the objects they contain. This element should be used when the same field is detected using different elements on different documents, e.g. Date and Character String to look for a date that is specified in both standard and non-standard formats. In this case, Date and Character String are secondary elements, while Region is used to record the extraction results.
Separator
This element describes a vertical or horizontal separator. This element is designed for searching for separators. Separator objects detected during image pre-recognition and located entirely within the element search area are considered. These objects can either be located within the search area entirely, or can merely intersect with the area.
White Gap
This element describes a rectangular area on the image, which almost never contains other objects. Can be used as a secondary element for other element searches. For example, if there is always a white gap between the address and the document header on the documents being processed, a White Gap element can be used to search for the element that contains the address.
Barcode
This element describes barcodes. This element is designed to detect barcode types supported by Advanced Designer. Barcode objects detected during image pre-recognition and located inside the element search area are considered.
Object Collection
This element describes a collection of objects of various types, all of which satisfy the search conditions. The Object Collection element is usually used to look for objects that cannot be detected using any other element type. For example, this element can be used to find standalone punctuation marks that are not part of any line of text or other text objects, as well as text that could not be recognized due to a lot of unrelated objects. This element can also be used to find non-text objects like images and markings.
Group
This is a collection of several other elements (termed subelements). Subelements can be both simple and group elements. We do not recommend having group elements that contain no subelements.
Group elements can be used to do the following:
- Grouping elements together. This makes debugging independent parts of your Extraction Rules activity easier. For example, your activity may contain 100 elements split up into 3 parts: header, main body, footer. Each of these 3 group elements contains more group elements designed to look for small fragments of the logical part of your activity. Besides minimizing the number of possibilities that have to be considered by the activity, using such a structure ensures that debugging and editing is easier in the future, since it is split up into independent parts.
- Ensuring a logical hierarchy of the elements in the tree, which makes navigating the activity easier.
- Reducing the possible number of element hypotheses, which speeds up the search for the resulting hypothesis for the activity as a whole. Grouping elements together lets that group of elements be considered as a single entity with its own hypothesis, which makes it possible to have a quality measurement for the group as a whole.
- A compound element allows you to specify search area restrictions shared across all subelements. The search area for a specific subelement of a group element will be calculated as an intersection of the subelement and group element search areas.
Elements (both group and simple) can either be required, optional, or prohibited. If an optional compound element contains a required subelement, the subelement not being matched will result in the formulation of the null hypothesis for the group element. This will not interrupt the matching of the Extraction Rules activity.
Repeating Group
This element is designed to look for repeating element groups (of unknown instances). A common example of this is a data table. A repeating element is different from a regular compound element in terms of its repetition parameters. This group can appear several times on a single page as well as across the whole document. Since this group repeats within a document, the element allows you to describe all of its instances (including its repetition parameters) as a single element. As such, using Repeating Group lets you describe the document structure much quicker.
This element can be used to do the following:
- Search for tables.
- Search for a header on each page of a multi-page document.
- Search for an unknown number of repeating data entries.
This element is designed to detect a field region extracted by another activity. This element can be used to find other elements.
For example, if a skill contains a field that always remains on the image after training, it can be used as an anchor field when searching for elements using an Extraction Rules activity. To do so, create an Input Field element and select the appropriate field in the skill structure. This will create a Region element that contains code linking the element and the selected field in the Search Conditions tab.
Deep Learning
This element provides access to a value found by a Deep Learning activity that feeds its output to the Extraction Rules activity within a Hypothesis Filtering container. This element is available only within a Hypothesis Filtering container.
You can control the output of a Deep Learning activity by specifying conditions for this search element. For example, if the Deep Learning activity is configured to find a repeating value, you can set the preferred location of the instance that you need to extract.