Label a representative set of documents to train and test a Document skill. The guidelines below cover structured and semi-structured documents.Documentation Index
Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
Use this file to discover all available pages before exploring further.
Structured documents
Structured documents always include the exact same type of information in the exact same locations. One example of structured documents are pre-formatted forms. You only need to label a few sample documents for training, because the layout doesn’t vary. Use the following guidelines when labeling structured documents:- Accurately specify the region of each field — field values alone are not enough for training.
- To mark out the region of a field, don’t click on its value; mark out the entire placeholder instead.
- If a field contains no value, mark out the empty placeholder.
- If a field consists of multiple parts, hold down the Shift key to add the parts. All parts must be on the same page.
- If a fixed form contains a table, mark out all the rows, including any empty ones.
- If you add a field after labeling has already started, label the new field on every document in the training set where it occurs.
Semi-structured documents
Semi-structured documents generally contain the same or similar types of information, but the location, size, and number of fields may vary from document to document. Examples include bills, payment orders, and invoices. Use the following guidelines when labeling semi-structured documents:- Accurately specify the region of each field — field values alone are not enough for training.
- To mark out the region of a field, click on its value — the word or words it contains — and the region is created automatically.
- If a field contains no value, don’t create a region for it.
- Don’t mark out parts of words — Vantage can only learn on whole words.
- If a field consists of multiple parts, hold down the Shift key to add the parts. All parts must be on the same page.
-
For repeating data, analyze your documents first and choose the right structure:
If your repeating data looks like… Use A table with a common header and values that don’t have adjacent keywords A Table field Less-structured data where values have keywords next to them A Group with Allow multiple items enabled Different layouts across documents Pick the option that fits the majority of your documents - To label a table, mark out the cells in the first row one by one — Vantage auto-creates the columns. Then click Continue table from this row, and verify that the full table is labeled correctly.
- Don’t place a field region inside another field’s region — whether the parent is an individual field (such as an address) or a table cell. To extract data from a large text fragment, use Advanced Designer.
- If you add a field after labeling has already started, review all documents and label the new field on every document where it occurs.
Related topics
Adding fields
Mark fields in the Editor tab and configure field properties by type.
Labeling unstructured documents
Advanced Designer guide for labeling unstructured documents.
Set up a Document skill
Create, train, and publish a Document skill, including structured forms and Online learning.
Training and testing a Document skill
Advanced Designer guide for training, testing, and measuring Document skill quality.
