Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt

Use this file to discover all available pages before exploring further.

A Document skill extracts field values from structured and semi-structured documents of a single type. Documents of the same type share the same fields, validation rules, and structure — for example, invoices, agreements, and shipping lists are each a single document type.
A Document skill processes only one file per transaction. To process multiple files in a single transaction, use the Extract activity of a Process skill.

Structured vs. semi-structured documents

TypeField locationExamplesWhere to build
StructuredFixed on every instanceQuestionnaires, application forms, tax formsVantage. Use Advanced Designer when you need to combine structured-document processing with other Vantage technologies.
Semi-structuredVaries in labeling, number, and placement per instanceInvoices, agreements, shipping listsVantage or Advanced Designer.

Training a Document skill

To start training a Document skill, label the fields on one document. As you train, Vantage automatically suggests field locations to speed up the labeling process.

Document type variants

Documents of a single type almost always have identical fields, validation rules, and structure, but variants of the same type can differ slightly — for example, based on the year the document was issued. A single Document skill can handle any number of variants; the right training approach depends on how many variants you need to cover.

Choosing an activity by scale

For structured forms (up to 10 variants), use the Vantage Document skill. Treat additional variants as separate document types. For semi-structured documents, the recommended approach depends on the number of variants:
Variants in your document setRecommended approachExpected accuracy
HundredsOnline learning in VantageNear-flawless extraction
ThousandsDeep Learning activity~80–90%, depending on document complexity
A subset of essential variantsFast Learning and/or Extraction Rules activitiesHigh accuracy on complex documents
The Deep Learning, Fast Learning, and Extraction Rules activities are available only in Advanced Designer. To use them, open your Document skill in Advanced Designer — the skill can still be referenced from Skill Designer and Process skills once published.

Training and testing recommendations

  • Use a representative training set. Include at least 2–3 documents per variant. Even a single sample per variant is better than none. When the set doesn’t cover every variant, use the Deep Learning activity — it generalizes from image patterns, spatial structure, field contents, and surrounding labels, and can process variants it wasn’t trained on.
  • Test with production-like distributions. Use a random sample drawn from your real document flow so that each variant appears in the test set at roughly the same frequency it appears in production. This keeps your accuracy estimate valid.

Next steps

Set up a Document skill

Create, train, and publish a Document skill, including structured forms and Online learning.

Adding fields

Mark fields in the Editor tab and configure field properties by type.

Labeling documents

Guidelines for labeling structured and semi-structured documents during training.

Analyze extracted data

Review field extraction statistics and correct reference labeling on the Result Review tab.