A Document skill extracts field values from structured and semi-structured documents of a single type. Documents of the same type share the same fields, validation rules, and structure — for example, invoices, agreements, and shipping lists are each a single document type.Documentation Index
Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
Use this file to discover all available pages before exploring further.
A Document skill processes only one file per transaction. To process multiple files in a single transaction, use the Extract activity of a Process skill.
Structured vs. semi-structured documents
| Type | Field location | Examples | Where to build |
|---|---|---|---|
| Structured | Fixed on every instance | Questionnaires, application forms, tax forms | Vantage. Use Advanced Designer when you need to combine structured-document processing with other Vantage technologies. |
| Semi-structured | Varies in labeling, number, and placement per instance | Invoices, agreements, shipping lists | Vantage or Advanced Designer. |
Training a Document skill
To start training a Document skill, label the fields on one document. As you train, Vantage automatically suggests field locations to speed up the labeling process.Document type variants
Documents of a single type almost always have identical fields, validation rules, and structure, but variants of the same type can differ slightly — for example, based on the year the document was issued. A single Document skill can handle any number of variants; the right training approach depends on how many variants you need to cover.Choosing an activity by scale
For structured forms (up to 10 variants), use the Vantage Document skill. Treat additional variants as separate document types. For semi-structured documents, the recommended approach depends on the number of variants:| Variants in your document set | Recommended approach | Expected accuracy |
|---|---|---|
| Hundreds | Online learning in Vantage | Near-flawless extraction |
| Thousands | Deep Learning activity | ~80–90%, depending on document complexity |
| A subset of essential variants | Fast Learning and/or Extraction Rules activities | High accuracy on complex documents |
The Deep Learning, Fast Learning, and Extraction Rules activities are available only in Advanced Designer. To use them, open your Document skill in Advanced Designer — the skill can still be referenced from Skill Designer and Process skills once published.
Training and testing recommendations
- Use a representative training set. Include at least 2–3 documents per variant. Even a single sample per variant is better than none. When the set doesn’t cover every variant, use the Deep Learning activity — it generalizes from image patterns, spatial structure, field contents, and surrounding labels, and can process variants it wasn’t trained on.
- Test with production-like distributions. Use a random sample drawn from your real document flow so that each variant appears in the test set at roughly the same frequency it appears in production. This keeps your accuracy estimate valid.
Next steps
Set up a Document skill
Create, train, and publish a Document skill, including structured forms and Online learning.
Adding fields
Mark fields in the Editor tab and configure field properties by type.
Labeling documents
Guidelines for labeling structured and semi-structured documents during training.
Analyze extracted data
Review field extraction statistics and correct reference labeling on the Result Review tab.
