Additional training is not available for NLP models loaded into Document Definitions.
- Add a training stage after the verification stage. Training will start when the conditions specified for the training batch are met. For more information about setting up workflow stages, see Workflow setup.
- Manually send documents to the training stage. To do this, right-click the document in the working batch and select Train on the shortcut menu.
- When training is initiated, ABBYY FlexiCapture automatically creates a generic training batch in the list of training batches (if it does not contain one already). All documents related to a specific Document Definition will be copied into this batch, regardless of their variant.
- Each document is assigned either the For training or the For testing status.
- Documents marked For training undergo training. As a result, a new NLP model is created.
- The new model that has been created during training is then tested using documents marked For testing.
- If the overall performance of the new model is not worse than that of the existing model, the existing model will be replaced with the new one. Otherwise, the new model will be rejected.
- On the Project Setup Station, open the project with the NLP model. For more information about setting up an NLP model, see Creating NLP models.
- Navigate to Field Extraction Training Batches by selecting Fields Training > Open Field Extraction Training Batches. Alternatively, you can either use the Ctrl + Alt + B key combination, or select Field Extraction Training Batches
on the shortcut menu. - Create a new batch by selecting File > New Batch. Alternatively, you can use the Ctrl + N key combination. Choose the appropriate Document Definition and variant and then select the NLP Batch option on the shortcut menu.
- Add your documents, recognize them, edit the order of sections, and start the training by selecting Train on the shortcut menu. Alternatively, you can either use the Ctrl + F7 key combination or click the Train Batch
button on the toolbar.
- All the fields described by the Document Definition should be marked up in the training documents.
- It is recommended to have between 100 and 500 documents in each training batch. This number of documents will enable the program to select the best parameters for your NLP model without slowing down the training process.
- For a variant with an existing training batch, the NLP model created for that particular batch will be used.
- For all other variants, the NLP model created for the generic training batch will be used.
- Maximum documents in each training batch
If the maximum number of documents is reached, any new documents added into a training batch will replace the old documents. - Maximum percentage of replaced documents
Indicates the percentage of old documents that can be replaced with new ones during one training session. Documents that have been sent to the training stage but were not included in the batch will not be used to train the new NLP model. - Start training if batch contains more than __ new documents or more than __ % of new documents
Training will start when at least one of the following is true: the number of new documents added into a training batch is greater than the specified value; the percentage of new documents relative to the total number of documents in a batch is equal to or greater than the specified value. Otherwise, training will not start, and an entry will be added into the background task log saying that there are not enough new documents to start training. - Percentage of documents to be used for training
Specifies the percentage of documents marked For testing and For training. For example, if you limit the percentage of “For training” documents to 70%, the remaining 30% will be marked “For testing”.
- Information about the training batch settings.
- Information about both the new and the old NLP models.
- Training time.
- The version of the NLP component used to train the NLP model.
- Document and field training statistics.
- Information about how recent the exported data is.
If the isActual parameter is false, the batch was modified after the training and creation of a new NLP model: documents may have been added or removed, document markup might have changed, etc. For up-to-date statistics, training should be launched again.
