- On the Activities tab, add a Segmentation activity to the document processing flow. Please note that your Segmentation activity should precede the activity that will extract the fields from the text segments.
- On the Activity Properties pane, select all the fields that correspond to the segments to be extracted.
Note: Only fields of type Text that have data type set to Text are supported.
- Click Activity Editor. Make further modifications to the document labeling on the Fields tab if required.
- Click Train Activity. Training can be performed in Fast or Thorough mode.
- Fast mode is selected by default. This mode will work even on small document sets, and the activity will be trained quickly.
- If you are not satisfied with the results obtained in Fast mode, consider switching to Thorough mode that trains a Deep Learning model. This mode requires more documents in the training set and takes longer to train, but it will be able to perform better on a wide variety of documents. The document set must contain at least 50 labeled documents, but we recommend having at least 150 labeled documents. To switch to Thorough mode, use a drop-down menu next to the Train Activity button.
- You may want to test both modes and choose the one that works best for your documents.
Note: Thorough mode will only work with English-language documents.
- Once the activity has been trained, activity testing will start automatically. After testing has completed, navigate to the Results tab and analyze the field extraction results for your activity. Statistics displayed on the Results tab are identical to the general statistics for the skill displayed on the Results tab. If required, make any necessary changes to your labeling and train the activity again.
Note: The activity can only be trained and tested using documents with confirmed labeling. Documents have unconfirmed labeling if the reference labeling was generated automatically based on the predicted labeling, unless you copy predicted labeling to reference using the corresponding option in the document context menu. You can check the labeling status for each document on the Documents tab. To confirm labeling for a document, you should review it on the Fields tab.Supported languages: English, Russian, German, French, Spanish, Italian, Portuguese (Standard), Japanese, and Dutch.
