Deep Learning activity for NLP - ABBYY Documentation

The Deep Learning for NLP activity is designed to train the skill to extract fields from unstructured documents using Natural Language Processing (NLP) technology. Deep Learning activity can only be set up for a field that has been previously extracted by another activity. For example, you can extract a text paragraph using a Segmentation activity and then set up a Deep Learning activity to extract fields from that paragraph.

This activity only supports fields of type Text.

Training requirements

As Deep Learning activities are trained using reference labeling, it is important that the location of the fields be specified correctly on all document images. The greater the number of labeled documents in the training set, the higher the quality of field extraction. The recommended number of sample documents is as follows:

For high-variability documents at least 150 sample documents (2-3 sample documents per variant) are required.
For low-variability documents we can start training with 1 sample documents but at least 2-3 sample documents per variant is required.

Use separate document sets

You can use a separate document set to train your Deep Learning activity. To do so, select the Deep Learning activity from the drop-down list next to the skill name. Then, in the drop-down list to the left of the Upload button, select the necessary document set or click Create Set… to create a new one. You can upload, delete, and rotate documents on this tab as described in the Documents section.

Supported languages

Supported languages: English, French, German, Japanese, Russian, Spanish, Italian, Portuguese (Standard), and Dutch.

Set up a Deep Learning activity

Add the activity

On the Activities tab, add a Deep Learning activity to the document processing flow. Note that your Deep Learning activity should be placed after the activity that will extract the field used as the source by your Deep Learning activity.

Select the source field

Use the Field drop-down list in the Activity Properties pane to select the source field corresponding to the unstructured text fragment from which fields should be extracted.

Select output fields

Select the fields that should be extracted from the source field. You can select fields that are on the same nesting level as the source field or one level below it.

Label documents

Click Activity Editor and go to the Fields tab to label your documents by specifying the regions for the fields that should be extracted from the source field. The labeling process in the Activity Editor is identical to the regular document labeling process with one exception — the fields to be extracted by the Deep Learning activity should be located within the region of the source field.Use the following guidelines to determine the size of the document set:

Deep Learning activity for NLP can be started with 1 sample documents but at least 2-3 sample documents per variant is required.
If the training set contains between 1 and 150 documents, you can start training your activity, but Advanced Designer will display a warning saying “We recommend adding at least 150 documents”.
If the training set contains between 150 and 10,000 documents, you can start training your activity right away. This is the recommended number of documents to have in your training set.
If the training set contains more than 10,000 documents, Advanced Designer will display a warning saying that the skill may become unstable.

Train the activity

Click Train Activity to train the activity.

Review results

Once the activity has been trained, activity testing will start automatically. After testing has completed, navigate to the Results tab and analyze the field extraction results for your activity. Statistics displayed on the Results tab are identical to the general statistics for the skill displayed on the Results tab. If required, make any necessary changes to your labeling and train the activity again.

The activity can only be trained and tested using documents with confirmed labeling. Documents have unconfirmed labeling if the reference labeling was generated automatically based on the predicted labeling, unless you copy predicted labeling to reference using the corresponding option in the document context menu. You can check the labeling status for each document on the Documents tab. To confirm labeling for a document, you should review it on the Fields tab.

​Training requirements

​Use separate document sets

​Supported languages

​Set up a Deep Learning activity

Training requirements

Use separate document sets

Supported languages

Set up a Deep Learning activity