Skip to main content
To train and test a skill, you need a set of labeled documents, that is, documents with the locations of the fields and their data types explicitly indicated. The most straightforward way to get such a set is to label some documents manually, but this may be time-consuming — especially if you intend to use Deep Learning, which requires large sets of labeled documents. To save time and effort, ABBYY provides several ways to reuse labeled documents from other skills or from manually reviewed processing results.

Labeling Documents Manually

Label each uploaded document by marking out the locations of all the fields and specifying types of data they are expected to contain. To ensure reliable training results, follow these guidelines.

Importing Labeled Documents From a Folder

Labeled documents can also be obtained from the following sources: In each case, you will need to export the labeled documents to a folder. Then, you will be able to import the documents with labeling from that folder and use them for training your skill.

Skill Training Sets

When publishing a skill for use in production, you will usually remove the training set, leaving only a few sample documents in the published version. You also have the option of exporting your training set to a folder if you intend to use the same training set for training a new version of your skill. To export documents and their labeling to a folder, click the more icon next to the name of the document set and select Export Set with Labeling. The destination folder will contain the following files and subfolders:
  • documentdefinition.json.
  • skillsettings.json.
  • A <Document name> subfolder containing document images, documentinfo.json, and labeling.json files for each document.

Manually Reviewed Processing Results

When processing results are corrected by manual reviewers, a set of labeled documents is created. To reuse such labeled documents, set up export of the field data to JSON with the Values, metadata, and field structure for each document option enabled, and of the document images into any image format. The destination folder will contain a separate subfolder for each transaction. Each <Transaction ID> subfolder will contain the following:
  • The <Applied skill name>.json file with the field data.
  • Exported images, depending on the chosen format: <Applied skill name>.pdf, <Applied skill name>.tiff, or <Applied skill name>/Pages subfolder with page_*.jpg files for each page.

FlexiCapture

You can reuse documents that were labeled earlier in FlexiCapture. For details, see Importing labeled documents from FlexiCapture.

How to Import

To import labeled documents from the folder created during export, select the All Documents set, click the dropdown menu next to the Upload button and then select the Import Labeled Documents From Folder… option in the drop-down list. Next, select the folder you created earlier. Note: Don’t make any changes to the folder that was created during export. If you change the subfolder structure or rename some of the files, the import procedure may encounter an error.

How Duplicates are Treated

If any of the imported documents has the same name as an existing document, Advanced Designer will ask you whether you want to update the labeling of the existing document or import the duplicate as a new document. If you select Update Labeling:
  • In the case of identically named fields, their location and settings in the existing document will be overwritten with those in the imported document.
  • Any fields present in the imported document but absent in the existing document will be added to the existing document.
If you select Import As New Documents, duplicates will be renamed and imported with their labeling intact.