Document type variants
Documents of the same type usually share the same fields, validation rules, and structure, but variants differ in small ways — for example, by the year a tax form was issued. One Document skill can be trained on multiple variants. The technology you choose depends on how many variants you need to handle:| Variants | Best fit |
|---|---|
| Up to ~10 (fixed forms) | Forms activity — see Process structured documents in Advanced Designer. |
| Most essential variants | Fast Learning and/or Extraction Rules activities. |
| Hundreds | Online Learning in Vantage refines the skill from manual-review feedback. |
| Thousands | Deep Learning activity extracts with ~80–90% accuracy depending on document complexity. |
If a fixed form has many more than ~10 variants, treat each as a separate document type.
Training and testing a Document skill
For best results, train and test the skill with three different document sets:- Training set — used to train the skill.
- Test set — used to measure accuracy during development.
- Blind set — an additional test set the skill has never seen, used to evaluate true generalization.
Use different documents in each set. Reusing training documents in the test set inflates accuracy estimates.
Training set
Aim for a representative set with 2–3 documents per variant. If you can’t cover every variant, the Deep Learning activity generalizes from image patterns and surrounding labels, so it can process variants it wasn’t explicitly trained on. Recommended document counts depend on the activities you use:| Activity | High-variability documents | Low-variability documents |
|---|---|---|
| Deep Learning for semi-structured documents | At least 200–300 (2–3 per variant) | At least 10 (2–3 per variant) |
| Segmentation | At least 100 | At least 20 |
| Deep Learning for NLP | At least 150 (2–3 per variant) | Can start with 1; aim for 2–3 per variant |
Even if you can’t hit the recommended counts, one document per variant is better than none.
Test set
Match the test-set distribution to your production document flow so the accuracy estimate is meaningful. For example, if invoices from one vendor make up 30% of production traffic, the test set should contain about 30% of that vendor’s invoices. The simplest way to hit this ratio is to test against random samples of production documents.Blind set
Use documents the skill has never seen during training or testing. The blind-set results are your best estimate of real-world quality.Configuring a Document skill
After you create a Document skill on the start page, configure it in this order:Skill settings
Click the settings button next to the skill name to view and adjust skill settings.
Upload documents
On the Documents tab, upload the documents the skill will work with.
Define fields
On the Fields tab, create the fields you want to extract and label their locations on sample documents.
Configure activities
On the Activities tab, build the document processing flow.
Test the skill
On the Results tab, test the skill on sample documents and review extraction quality.
Publish
On the Publish tab, publish the skill to make it available in the Skill Catalog in ABBYY Vantage.
Next steps
Skill settings
Configure recognition, training, and processing options.
Activities
Choose and combine activities for the processing flow.
Derived skills
Build a new skill on top of a built-in or read-only Vantage skill.
Use cases
See worked scenarios for common document types.
