Standard scenario
Standard scenario
In the standard scenario, complete the following steps:
- Create a Document Definition.
- In the section properties of the Document Definition, select Allow field location training.
- Create the necessary fields in the section. Select Can have region in the properties of each field.
- Save and publish the Document Definition.
- Switch to Field Extraction Training Batches mode and create a new batch.
- Select the Document Definition that you have created.
- Select Default variant from the list of variants.
- Load document images and recognize them. We recommend loading 3 to 50 images for each kind of document.
- Adjust the positions of the fields.
- Set the following states for your documents:
- Select some of the documents, right-click the selection, and click Set Document State → For training on the shortcut menu.
-
Select the remaining documents, right-click the selection, and click Set Document State → For testing on the shortcut menu.
Note: We recommend using 60% of the documents in the training batch for training and the remaining 40% for testing.
- Train the program to extract the fields:
- (Standalone) Click Fields Training → (Standalone) Train
- (Distributed) Click Fields Training → (Distributed) Train to start the training on the same machine where the Project Setup Station component is installed.
- (Distributed) Right-click the batch and select (Distributed) Send for Training on the shortcut menu if you want the training to be performed on a Processing Station.
Projects with multiple document variants
Projects with multiple document variants
In the case of documents with variable field locations, you need to create variants and train a classifier to distinguish these variants. For more about documents with variable field locations and variants, see Variable field locations on documents that belong to the same type.Complete the following steps:
- Create a Document Definition.
- In the section properties of the Document Definition, select Allow field location training.
- Create the necessary fields in the section. Select Can have region the properties of each field.
-
Add section variants using one of the following three methods:
- Create variants manually. To do this, click the Data Sets tab in the section properties and then click the View… button. Then click the Add… button to add variants.
- Load variants from a database. To do this, click the Data Sets tab in the section properties and then click the Set Up… button. From the drop-down list, select Database as the data source.
- Create variants using a script. To do this, click the Data Sets in the section properties and then click the Set Up… button. From the drop-down list, select Script as the data source.
- Save and publish the Document Definition.
-
Train a classifier on the newly created variants:
- Switch to Open Classifier Training Batches mode and load document images into a new batch.
- Assign a reference class to each document, using variants as separate classes:
- Click Set Class… → Add… → Add…
- Select Specify variant.
- Select a variant from the list.
- Train a classifier by clicking Classification Training → Train.
- Train ABBYY FlexiCapture to detect the field regions on each variant:
- Switch to Field Extraction Training Batches mode.
- Create a new batch. Select your Document Definition and then select a variant to train.
- Load document images and recognize them. We recommend loading 3 to 50 images for each kind of document.
- Adjust the positions of the fields.
- Set the following states for your documents:
- Select some of the documents, right-click the selection, and click Set Document State → For training on the shortcut menu.
- Select the remaining documents, right-click the selection, and click Set Document State → For testing on the shortcut menu.
- Train the program to extract the fields:
- (Standalone) Click Fields Training → (Standalone) Train
- (Distributed) Click Fields Training → (Distributed) Train to start the training on the same machine where the Project Setup Station component is installed.
- (Distributed) Right-click the batch and select (Distributed) Send for Training on the shortcut menu if you want the training to be performed on a Processing Station.
We recommend configuring auto-learning for field extraction. With auto-learning configured, the program automatically learns to extract fields as the operators work on the configured project.
(Distributed) Sending training batches to a Processing Station for training
(Distributed) As the training process can take a long time and consume a lot of computational resources, the administrator can choose to train batches on a Processing Station.(Distributed) Before sending a training batch to a Processing Station, please, make sure that:
- (Distributed) At least one Processing Station has been added on the Processing Server.
- (Distributed) The project has been uploaded to the server.
