Scope of field extraction training

Field extraction training can be used for structured and semi-structured documents. Training is available for simple fields, field groups, and recurring groups.

Simple fields

The program searches for a field with the environment that corresponds to the specified parameters.The program can be trained to extract fields based on one document, however it is recommended that you use at least 3 documents for higher precision of results.

Field groups

The program is trained to extract fields combined into groups as separate fields without taking into account the relations between them. Therefore the training results are determined by the environment of fields and not their group affiliation.

Recurring groups

The program is trained to detect recurring groups when a document contains several recurring horizontally separable groups that are similar in terms of their environment. In this case, the program processes each group as a recurring line. It also assumes that a line can be First, Last, and Any.The program is trained to extract fields for each line type as fields of a simple field group.

To achieve better results, it is recommended to upload from 3 to 50 document samples of each type during training.

To fine-tune the field extraction, use ABBYY FlexiLayout Studio. The trained Document Definition may be exported to FlexiLayout Studio and taken as a basis for a new FlexiLayout.

Variable field locations on documents that belong to the same type

The program is able to detect fields in documents that belong to the same type but look very different, for example, invoices from different vendors, bank statements, driving licenses of different states, various forms, and so on. ABBYY FlexiCapture allows you to process such documents by means of a special feature called document variant. It enables you to create a set of variants for documents that belong to the same type, when every variant will correspond to a certain position of fields. The training of fields with variable location includes creation and training of the classifier that distinguishes between types of documents. For details, please see Creating a classifier. When the variant of a document is determined, the program uses the general field extraction training mechanism.

Scenarios of field extraction training

Configuring auto-learning for field extraction

⌘I

​Variable field locations on documents that belong to the same type

Variable field locations on documents that belong to the same type