- Specify a field region on the document image by clicking on the value of a field (highlighted green when moused over) or by marking out a rectangular region around the field value. After this, your new field will appear on the data form. You can modify the field name by double-clicking on it in the data form or by clicking on it in the field properties. You can select the whole name by triple-clicking on it. To open the field properties, click the Field options button.
- Add a new field to the data form by clicking Add Field on the toolbar and then marking out the field region on the image. This will specify the data detected inside the selected region as its field value in the data form.
Adding a text field with multiple regions
Some text fields require multiple regions on a single document due to the following:- Some field values may begin on one line of text and end on another.
- Some field values may begin on one page and end on another.
- Add a field using a method described above.
- Hold down the Shift key and select additional regions for the added field.
- On different pages
- Within another region of a field (in this case, the inner region will be highlighted with a darker color, and if it is in focus, it will be highlighted in yellow).
General properties of the text field
- Field name. The unique name of the field in a particular skill. The field name cannot contain special characters like full stops, commas, slashes, colons, asterisks, question marks, quotation marks, less-than signs, greater-than signs, or vertical bars. The maximum allowed length of a field name is 90 characters.
- Data type. The type of data that a field contains. This is a crucial text field parameter as it affects recognition accuracy. Each type of data has its own set of restrictions for the field value, narrowing down the possible values for a character and making data extraction more accurate.
| Data type | Description |
|---|---|
| Text | May contain Latin and Cyrillic letters, digits, hieroglyphics and special characters. |
| Date | The date and time in any format. The following characters can be used as separators: the dot (.), the space ( ), the hyphen (-), the backslash (), and the forward slash (/). |
| Number | May contain digits, decimal separators, and the percentage character (%). The following characters may be used as decimal separators: the dot (.), the comma (,), the hyphen (-), the equal sign (=), and the space ( ). The following characters may be used as thousands separators: the dot (.), the comma (,), the single quotation mark (’), and the space ( ). |
| Money | Contains both a number value and a currency symbol. The currency symbol may be placed either before or after the amount. |
- Allow multiple items. Specifies whether the field is repeating or not. Instances of repeating fields may refer to multiple objects of the same type. For example, names of children or account numbers.
- Required field. Specifies that the value of the field cannot be left empty. Enabling this property adds a validation rule to the page. If the field is empty after extraction, the document will be sent to manual review with an error.
- Key field. Specifies if the value of the field is used to search for documents.
- Dimension field. Specifies if the value of the field is used to get detailed information about skill transactions in Skill Monitor.
Text Appearance
This group of properties refers to the appearance of characters expected in the field.- Text origin. Specifies whether the field may contain only printed characters, only handwritten characters, or both. If you add a field by marking a rectangular region on the document, the value of this property is set depending on which characters are found in this region. If you add a field by clicking Add Field, the value of this property is set to Printed by default.
Note: Handwritten text recognition is enabled for new Document skills by default. To disable or enable it again, click the skill settings icon to the right of the name of the skill, and then go to the Languages tab and select the Handwritten option in the Text Appearance section.
- Eliminate field background. This option can be used to improve recognition accuracy if the field has a frame, boxes for individual characters, or placeholder text. If you enable this option, you must upload the blank form document that will be used as a template for the background recognition and label the corresponding field on the blank form. The blank form document appears in the Document Set and is marked with an icon.
- Special fonts. If the field is expected to contain text typed in a specific font, you can use this option to select the font type, which will improve recognition accuracy. Multiple fonts can also be selected.
Supported fonts
| Font | Description | Font sample |
|---|---|---|
| Fax | A font typically used by fax machines. | ![]() |
| Gothic | Texts printed in Gothic type. | ![]() |
| Index | A special set of characters that includes only digits written in ZIP-code style. | ![]() |
| Matrix printer | Texts printed on a dot-matrix printer. | ![]() |
| MICR CMC-7 | A special MICR barcode font (CMC-7). | ![]() |
| MICR E-13B | A special set of numeric characters printed with magnetic ink. MICR (Magnetic Ink Character Recognition) characters are found on a variety of documents, including on personal checks. | ![]() |
| OCR-A | A monospaced font designed for Optical Character Recognition. Largely used by banks, credit card companies, and similar businesses. | ![]() |
| OCR-B | A font designed for Optical Character Recognition. | ![]() |
| Receipt | The recognizer will expect text of low quality, mostly in a monospaced or normal font typically used on receipts. | ![]() |
| Typewriter | Typewritten texts. | ![]() |
Additional properties of the text field
Additional properties depend on the data type specified for the field.Text
The Value settings group:- Maximum length. The maximum allowed number of characters in the field. If the number of characters in the extracted value exceeds this length, an error message will be displayed. If there is a manual review stage in the process, the document will be sent to manual review.
- Regular expression. The option lets you add a regular expression (i.e. a formal description of the field value structure). A field set up using a regular expression can contain letters, digits, and other characters as set out in the Data Form.
Note: Regular expressions do not affect the text recognition of a PDF document.
Date
The Value may include settings group:- Time. This option should be selected for Date fields that may contain the time as well. If a time value is not permitted, it will not be extracted during recognition.
- Day of week. This option lets the day of the week be specified in the Date field. If a day of the week value is not permitted, it will not be extracted during recognition.
- Month by name. This option lets the month be specified as a word.
Number
The Value settings group lets you specify what kind of number the detected value is (integer or decimal), as well as what number formats may be detected in this field. If the value of this field will not satisfy the specified requirements, the document will be sent to manual review.- Integers only specifies that the value can only be an integer. If a number with a separator is detected in the field while this option is enabled, they will be treated as thousands separators.
- Fractional part may contain more than two digits. Enable this option if the decimal part of the extracted value is expected to have more than two digits. The following characters may be used as decimal separators: dot (.), comma (,), hyphen (-), equals sign (=), and space ( ).
- May have negative values. This option allows the extracted value to be negative. Negative values may be denoted either by a minus sign or by brackets.
- May include ’%’ symbol. This option allows the extracted value to have a percentage character before or after the value.










