Skip to main content
You can add a text field by doing one of the following:
  • Specify a field region on the document image by clicking on the value of a field (highlighted green when moused over) or by marking out a rectangular region around the field value. After this, your new field will appear on the data form. You can modify the field name by double-clicking on it in the data form or by clicking on it in the field properties. You can select the whole name by triple-clicking on it. To open the field properties, click the Field options button.
  • Add a new field to the data form by clicking Add Field on the toolbar and then marking out the field region on the image. This will specify the data detected inside the selected region as its field value in the data form.
You can also add new text field regions to existing fields in the data form by selecting the appropriate field in the data form and then clicking its location on the document image. If a field should contain more than one word, select multiple words by marking the entire field region.

Adding a text field with multiple regions

Some text fields require multiple regions on a single document due to the following:
  • Some field values may begin on one line of text and end on another.
  • Some field values may begin on one page and end on another.
To add a text field with multiple regions, do the following:
  • Add a field using a method described above.
  • Hold down the Shift key and select additional regions for the added field.
Regions of a text field can also be marked up:
  • On different pages
  • Within another region of a field (in this case, the inner region will be highlighted with a darker color, and if it is in focus, it will be highlighted in yellow).

General properties of the text field

  • Field name. The unique name of the field in a particular skill. The field name cannot contain special characters like full stops, commas, slashes, colons, asterisks, question marks, quotation marks, less-than signs, greater-than signs, or vertical bars. The maximum allowed length of a field name is 90 characters.
  • Data type. The type of data that a field contains. This is a crucial text field parameter as it affects recognition accuracy. Each type of data has its own set of restrictions for the field value, narrowing down the possible values for a character and making data extraction more accurate.
Data typeDescription
TextMay contain Latin and Cyrillic letters, digits, hieroglyphics and special characters.
DateThe date and time in any format. The following characters can be used as separators: the dot (.), the space ( ), the hyphen (-), the backslash (), and the forward slash (/).
NumberMay contain digits, decimal separators, and the percentage character (%). The following characters may be used as decimal separators: the dot (.), the comma (,), the hyphen (-), the equal sign (=), and the space ( ). The following characters may be used as thousands separators: the dot (.), the comma (,), the single quotation mark (’), and the space ( ).
MoneyContains both a number value and a currency symbol. The currency symbol may be placed either before or after the amount.
Example: The lowercase letter “l” (L), the uppercase letter “I” (i), and the digit “1” may all have a similar appearance. If a character that looks like this is detected in a Number or Currency Amount field, it will be recognized as “1,” since fields of this type may not contain letters.
  • Allow multiple items. Specifies whether the field is repeating or not. Instances of repeating fields may refer to multiple objects of the same type. For example, names of children or account numbers.
  • Required field. Specifies that the value of the field cannot be left empty. Enabling this property adds a validation rule to the page. If the field is empty after extraction, the document will be sent to manual review with an error.
  • Key field. Specifies if the value of the field is used to search for documents.
  • Dimension field. Specifies if the value of the field is used to get detailed information about skill transactions in Skill Monitor.

Text Appearance

This group of properties refers to the appearance of characters expected in the field.
  • Text origin. Specifies whether the field may contain only printed characters, only handwritten characters, or both. If you add a field by marking a rectangular region on the document, the value of this property is set depending on which characters are found in this region. If you add a field by clicking Add Field, the value of this property is set to Printed by default.
You can find a list of all languages for which handwritten text recognition is supported here.
Note: Handwritten text recognition is enabled for new Document skills by default. To disable or enable it again, click the skill settings icon to the right of the name of the skill, and then go to the Languages tab and select the Handwritten option in the Text Appearance section.
  • Eliminate field background. This option can be used to improve recognition accuracy if the field has a frame, boxes for individual characters, or placeholder text. If you enable this option, you must upload the blank form document that will be used as a template for the background recognition and label the corresponding field on the blank form. The blank form document appears in the Document Set and is marked with an icon.
  • Special fonts. If the field is expected to contain text typed in a specific font, you can use this option to select the font type, which will improve recognition accuracy. Multiple fonts can also be selected.

Supported fonts

FontDescriptionFont sample
FaxA font typically used by fax machines.Fax Font
GothicTexts printed in Gothic type.Gothic Font
IndexA special set of characters that includes only digits written in ZIP-code style.Index Font
Matrix printerTexts printed on a dot-matrix printer.Matrix Font
MICR CMC-7A special MICR barcode font (CMC-7).CMC-7 Font
MICR E-13BA special set of numeric characters printed with magnetic ink. MICR (Magnetic Ink Character Recognition) characters are found on a variety of documents, including on personal checks.E-13B Font
OCR-AA monospaced font designed for Optical Character Recognition. Largely used by banks, credit card companies, and similar businesses.OCR-A Font
OCR-BA font designed for Optical Character Recognition.OCR-B Font
ReceiptThe recognizer will expect text of low quality, mostly in a monospaced or normal font typically used on receipts.Receipt Font
TypewriterTypewritten texts.Typewriter Font

Additional properties of the text field

Additional properties depend on the data type specified for the field.

Text

The Value settings group:
  • Maximum length. The maximum allowed number of characters in the field. If the number of characters in the extracted value exceeds this length, an error message will be displayed. If there is a manual review stage in the process, the document will be sent to manual review.
  • Regular expression. The option lets you add a regular expression (i.e. a formal description of the field value structure). A field set up using a regular expression can contain letters, digits, and other characters as set out in the Data Form.
Using a regular expression can improve the extraction accuracy, i.e. finding patterns in text, which will affect the extraction result by restricting the valid character set. For example, if a text field contains only numbers, you can specify a regular expression that describes the structure of the field as containing only digits. In this case, when recognizing the field, the program will try to recognize each character as a digit. You can also specify a regular expression for a specific phone number format (example 1), or check that the field contains 2 words and one of them is a number, for example 50 lbs (example 2): Example 1 (for phone numbers like 1-(234)-567-8900 or 2 (987) 654 3211)
/^(1|2)(\-|\s)\([\d]{3}\)(\-|\s)[\d]{3}(\-|\s)[\d]{4}$/
Example 2 (for weight values of 50lb/50lbs/50Lb/50Lbs/50 lb/50 lbs, etc)
/^[\d]*(\s)?(L|l)b(s)?$/
Note: Regular expressions do not affect the text recognition of a PDF document.

Date

The Value may include settings group:
  • Time. This option should be selected for Date fields that may contain the time as well. If a time value is not permitted, it will not be extracted during recognition.
  • Day of week. This option lets the day of the week be specified in the Date field. If a day of the week value is not permitted, it will not be extracted during recognition.
  • Month by name. This option lets the month be specified as a word.
Acceptable order of components settings group lets you select a suitable date format from the following options: Day-Month-Year, Month-Day-Year, and Year-Month-Day. You may also specify several different formats at once. If the detected date format does not correspond to the specified date formats, the document will be sent to manual review with an error. The Acceptable date settings group lets you specify a range of valid dates. You can specify a valid range by selecting a number of months before and after the day on which the document was processed. The number of months should be specified as an integer. A rule is used to check whether the specified date is within the specified range. If it is not, the rule will display an error, and the document will be sent to manual review.

Number

The Value settings group lets you specify what kind of number the detected value is (integer or decimal), as well as what number formats may be detected in this field. If the value of this field will not satisfy the specified requirements, the document will be sent to manual review.
  • Integers only specifies that the value can only be an integer. If a number with a separator is detected in the field while this option is enabled, they will be treated as thousands separators.
  • Fractional part may contain more than two digits. Enable this option if the decimal part of the extracted value is expected to have more than two digits. The following characters may be used as decimal separators: dot (.), comma (,), hyphen (-), equals sign (=), and space ( ).
  • May have negative values. This option allows the extracted value to be negative. Negative values may be denoted either by a minus sign or by brackets.
  • May include ’%’ symbol. This option allows the extracted value to have a percentage character before or after the value.
The Number must be within interval settings group lets you specify an interval that the value must fall within to be valid. The interval range is set by specifying a maximum and minimum value. These values can be both integers and decimals. Negative values can also be specified. A rule is then used to check whether the value is within the specified range. If it is not, the rule will display an error, and the document will be sent to manual review.

Money

Money properties are identical to the Number properties, with the only exception being that the Money field is not allowed to contain a percentage character.

See also