Skip to main content
The Deep Learning for NLP activity is designed to train the skill to extract fields from unstructured documents using Natural Language Processing (NLP) technology. Deep Learning activity can only be set up for a field that has been previously extracted by another activity. For example, you can extract a text paragraph using a Segmentation activity and then set up a Deep Learning activity to extract fields from that paragraph.
Note: This activity only supports fields of type Text.

Training Requirements

As Deep Learning activities are trained using reference labeling, it is important that the location of the fields be specified correctly on all document images. The greater the number of labeled documents in the training set, the higher the quality of field extraction. The recommended number of sample documents is as follows:
  • For high-variability documents — at least 300 sample documents (2-3 sample documents per variant) is required.
  • For low-variability documents — at least 50 sample documents is required.

Using Separate Document Sets

You can use a separate document set to train your Deep Learning activity. To do so, select the Deep Learning activity from the drop-down list next to the skill name. Then, in the drop-down list to the left of the Upload button, select the necessary document set or click Create Set… to create a new one. You can upload, delete, and rotate documents on this tab as described in the Documents section.

Supported Languages

Supported languages: English, French, German, Japanese, Russian, Spanish, Italian, Portuguese (Standard), and Dutch.

Setting Up a Deep Learning Activity

To set up a Deep Learning activity, do the following:
  1. On the Activities tab, add a Deep Learning activity to the document processing flow. Please note that your Deep Learning activity should be placed after the activity that will extract the field used as the source by your Deep Learning activity.
  2. Use the Field drop-down list in the Activity Properties pane to select the source field corresponding to the unstructured text fragment from which fields should be extracted.
  3. Select the fields that should be extracted from the source field. You can select fields that are on the same nesting level as the source field or one level below it.
  4. Click Activity Editor and go to the Fields tab to label your documents by specifying the regions for the fields that should be extracted from the source field. The labeling process in the Activity Editor is identical to the regular document labeling process with one exception — the fields to be extracted by the Deep Learning activity should be located within the region of the source field.
The following guidelines will help you decide on the size of the document set:
  • If the training set contains fewer than 50 documents, you will need to upload and label more documents before you can start the training process.
  • If the training set contains between 50 and 150 documents, you will be able to start training your activity, but Advanced Designer will display a warning saying that you should label at least 150 documents to achieve good extraction quality.
  • If the training set contains between 150 and 10,000 documents, you will be able to start training your activity right away. This is the recommended number of documents to have in your training set.
  • If the training set contains more than 10,000 documents, Advanced Designer will display a warning saying that the skill may become unstable.
  1. Click Train Activity to train the activity.
  2. Once the activity has been trained, activity testing will start automatically. After testing has completed, navigate to the Results tab and analyze the field extraction results for your activity. Statistics displayed on the Results tab are identical to the general statistics for the skill displayed on the Results tab. If required, make any necessary changes to your labeling and train the activity again.
The activity can only be trained and tested using documents with confirmed labeling. Documents have unconfirmed labeling if the reference labeling was generated automatically based on the predicted labeling, unless you copy predicted labeling to reference using the corresponding option in the document context menu. You can check the labeling status for each document on the Documents tab. To confirm labeling for a document, you should review it on the Fields tab.