Skip to main content
A Document skill extracts field values from one type of document. You can build Document skills in either ABBYY Vantage (cloud) or Advanced Designer (Windows desktop). Use Advanced Designer when you need to combine multiple Vantage technologies, add NLP, or branch the processing flow on document type — see Use cases for typical scenarios. For background on document categories, see Extract data from documents in Advanced Designer.

Document type variants

Documents of the same type usually share the same fields, validation rules, and structure, but variants differ in small ways — for example, by the year a tax form was issued. One Document skill can be trained on multiple variants. The technology you choose depends on how many variants you need to handle:
VariantsBest fit
Up to ~10 (fixed forms)Forms activity — see Process structured documents in Advanced Designer.
Most essential variantsFast Learning and/or Extraction Rules activities.
HundredsOnline Learning in Vantage refines the skill from manual-review feedback.
ThousandsDeep Learning activity extracts with ~80–90% accuracy depending on document complexity.
If a fixed form has many more than ~10 variants, treat each as a separate document type.

Training and testing a Document skill

For best results, train and test the skill with three different document sets:
  • Training set — used to train the skill.
  • Test set — used to measure accuracy during development.
  • Blind set — an additional test set the skill has never seen, used to evaluate true generalization.
Use different documents in each set. Reusing training documents in the test set inflates accuracy estimates.

Training set

Aim for a representative set with 2–3 documents per variant. If you can’t cover every variant, the Deep Learning activity generalizes from image patterns and surrounding labels, so it can process variants it wasn’t explicitly trained on. Recommended document counts depend on the activities you use:
ActivityHigh-variability documentsLow-variability documents
Deep Learning for semi-structured documentsAt least 200–300 (2–3 per variant)At least 10 (2–3 per variant)
SegmentationAt least 100At least 20
Deep Learning for NLPAt least 150 (2–3 per variant)Can start with 1; aim for 2–3 per variant
Even if you can’t hit the recommended counts, one document per variant is better than none.

Test set

Match the test-set distribution to your production document flow so the accuracy estimate is meaningful. For example, if invoices from one vendor make up 30% of production traffic, the test set should contain about 30% of that vendor’s invoices. The simplest way to hit this ratio is to test against random samples of production documents.

Blind set

Use documents the skill has never seen during training or testing. The blind-set results are your best estimate of real-world quality.

Configuring a Document skill

After you create a Document skill on the start page, configure it in this order:
1

Skill settings

Click the settings button next to the skill name to view and adjust skill settings.
2

Upload documents

On the Documents tab, upload the documents the skill will work with.
3

Define fields

On the Fields tab, create the fields you want to extract and label their locations on sample documents.
4

Configure activities

On the Activities tab, build the document processing flow.
5

Test the skill

On the Results tab, test the skill on sample documents and review extraction quality.
6

Publish

On the Publish tab, publish the skill to make it available in the Skill Catalog in ABBYY Vantage.
After publishing, your skill appears alongside built-in skills, read-only skills, and any derived skills in the Skill Catalog.

Next steps

Skill settings

Configure recognition, training, and processing options.

Activities

Choose and combine activities for the processing flow.

Derived skills

Build a new skill on top of a built-in or read-only Vantage skill.

Use cases

See worked scenarios for common document types.