Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt

Use this file to discover all available pages before exploring further.

After training a Classification skill, open the Result tab in the Classification Skill Designer to see how accurately the classifier labels each class and to diagnose errors in the training set. Statistics are updated automatically every time the classifier is trained. If accuracy is low, jump to Classification errors for the common causes and how to fix them.
Result tab in the Classification Skill Designer, showing per-class accuracy and document counts.

Prerequisites

  • A Classification skill that has been trained at least once.

What the Result tab shows

  • General classification accuracy — percentage of correctly classified documents across the full set.
  • Per-class accuracy — percentage of documents classified correctly for each class.
  • Per-class document counts — number of correctly and incorrectly classified documents per class.
  • Last trained — time and date of the most recent training run.

Results table

The results table contains all non-empty user classes (excluding No class). Classes are sorted first by accuracy (worst to best), then by document count, and finally alphabetically by name. A scrollbar appears if all rows don’t fit on screen. Clicking a row opens the corresponding class in the Documents tab. Renaming a class in the Documents tab updates the name in the Result tab automatically. If you delete a class after training, its name appears grayed out in the Result tab; the row is removed only the next time the classifier is trained.

When to stop iterating

There is no fixed accuracy threshold for a Classification skill — the right target depends on your downstream tolerance for misrouted documents and how much manual review is acceptable. As a practical guide, aim for high per-class accuracy (not just overall), iterate on the causes below while the gap is closing, and stop once a class either meets your business requirement or has clearly plateaued despite rebalanced, clean training data. If a class plateaus well below the others, treat it as indistinguishable and merge it with its nearest neighbor. Once the skill is in production, continue tracking Document Classifier Accuracy over time in the Analytics Dashboard and consider Online learning for continuous improvement.

Classification errors

Most cases of incorrect classification are caused by errors in the training set — for example, incorrectly assigned reference classes or an insufficient number of documents for a given class.

Incorrectly assigned reference classes

To fix this, reassign affected documents and retrain:
1

Open the affected class in the Documents tab

Click Review Prediction in Document Set in the Actions pane, or click the row in the results table.
2

Select a misclassified document

Select a document that was incorrectly assigned a reference class.
3

Assign the correct class

Click the name of the correct class in the Actions pane.
4

Repeat for every affected document

Repeat the previous two steps for every document that was incorrectly assigned a reference class.
5

Retrain the classifier

Click the Train button in the Actions pane.

Insufficient or imbalanced training data

Insufficient classifier quality may be caused by the following:
  • An insufficient number of uploaded documents
  • A substantially uneven distribution of documents among classes
  • An insufficient number of samples of the most common document variants for the given class
Improve classifier quality by adding the missing documents to the training set. Aim for between 100 and 1,000 documents per class, and include sample documents for the most common variants of each class in roughly a one-to-one ratio. After you have added your new documents to the training set, assign a class to each and retrain the classifier.

Confused classes

If two classes are consistently confused because they don’t differ meaningfully in shape, layout, or text, merge them into a single class. Separate the documents later in the pipeline using extracted field values if the distinction still matters.
For example, a class for invoices under $10,000 and a class for invoices over $10,000 will likely be confused, since the only difference between them is the total amount due. Merge them into one Invoice class, and branch on the amount downstream — after the total has been extracted.

Train a classifier

Prior step — create a training set, assign classes, and run training.

Enable Online learning

Continue improving the skill after it’s in production.

Analytics Dashboard

Track Document Classifier Accuracy over time.