Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt

Use this file to discover all available pages before exploring further.

Label a representative set of documents to train and test a Document skill. The guidelines below cover structured and semi-structured documents.
For unstructured documents, see labeling unstructured documents.

Structured documents

Structured documents always include the exact same type of information in the exact same locations. One example of structured documents are pre-formatted forms. You only need to label a few sample documents for training, because the layout doesn’t vary. Use the following guidelines when labeling structured documents:
  • Accurately specify the region of each field — field values alone are not enough for training.
  • To mark out the region of a field, don’t click on its value; mark out the entire placeholder instead.
  • If a field contains no value, mark out the empty placeholder.
  • If a field consists of multiple parts, hold down the Shift key to add the parts. All parts must be on the same page.
  • If a fixed form contains a table, mark out all the rows, including any empty ones.
  • If you add a field after labeling has already started, label the new field on every document in the training set where it occurs.

Semi-structured documents

Semi-structured documents generally contain the same or similar types of information, but the location, size, and number of fields may vary from document to document. Examples include bills, payment orders, and invoices. Use the following guidelines when labeling semi-structured documents:
  • Accurately specify the region of each field — field values alone are not enough for training.
  • To mark out the region of a field, click on its value — the word or words it contains — and the region is created automatically.
  • If a field contains no value, don’t create a region for it.
  • Don’t mark out parts of words — Vantage can only learn on whole words.
  • If a field consists of multiple parts, hold down the Shift key to add the parts. All parts must be on the same page.
  • For repeating data, analyze your documents first and choose the right structure:
    If your repeating data looks like…Use
    A table with a common header and values that don’t have adjacent keywordsA Table field
    Less-structured data where values have keywords next to themA Group with Allow multiple items enabled
    Different layouts across documentsPick the option that fits the majority of your documents
  • To label a table, mark out the cells in the first row one by one — Vantage auto-creates the columns. Then click Continue table from this row, and verify that the full table is labeled correctly.
  • Don’t place a field region inside another field’s region — whether the parent is an individual field (such as an address) or a table cell. To extract data from a large text fragment, use Advanced Designer.
  • If you add a field after labeling has already started, review all documents and label the new field on every document where it occurs.
If tables are large and document pages look similar, you can delete the similar pages and label only the first page, the last page, and a few in between.

Adding fields

Mark fields in the Editor tab and configure field properties by type.

Labeling unstructured documents

Advanced Designer guide for labeling unstructured documents.

Set up a Document skill

Create, train, and publish a Document skill, including structured forms and Online learning.

Training and testing a Document skill

Advanced Designer guide for training, testing, and measuring Document skill quality.