Skip to main content
While a classifier is being trained, statistics about the classification results are being collected. Analyzing these statistics helps understand how to improve the quality of a classifier. Classification statistics can be found in the Result tab in the Classification Skill Designer and are updated automatically every time the classifier is trained. Classification Skill Analysis Tabs This tab contains the following information:
  • General classification accuracy. The percentage of correctly classified documents in relation to the total number of documents in the set.
  • Classification accuracy for each class. The percentage of documents that were classified correctly for a given class.
  • The number of correctly classified documents and incorrectly classified documents of each class.
  • The time and date when the classifier was last trained.
The results table contains all non-empty user classes (excluding No class). Classes in the table are sorted first by classifier accuracy (from worst to best), then by the number of documents in the class, and lastly alphabetically by name. If all rows in the table cannot be displayed on the screen at once, a scrollbar will be displayed. Clicking a row in the results table directs the user to the corresponding class in the Documents tab. Modifying the name of a class in the Documents tab also updates it in the Result tab. If you delete a class after training the corresponding classifier, the name of this class will be grayed out in the Result tab. The row containing this class is removed from the results table only when the classifier is trained again.

Classification errors

Most cases of incorrect classification are caused by errors that have been made when creating the training set (for example, incorrectly assigned reference classes or an insufficient number of specific pages in a document set).

Incorrectly assigned reference classes

To fix this type of error, assign the correct class to that particular training set document and re-train the classifier as follows:
  1. Navigate to the Documents tab by clicking Review Prediction in Document Set in the Actions pane. Alternatively, click the row with the appropriate class in the results table.
  2. Select a document that was incorrectly assigned a reference class.
  3. Click the name of the correct class in the Actions pane.
  4. Repeat steps 2 and 3 for every document that was incorrectly assigned a reference class.
  5. Click the Train button in the Actions pane.

Insufficient number of pages in the document set

Insufficient classifier quality may be caused by the following:
  • An insufficient number of uploaded documents
  • A substantially uneven distribution of documents among classes
  • An insufficient number of samples of the most common document variants for the given class
In this case, classifier quality can be improved by adding the missing documents to the training set. We recommend that you upload between 100 and 1000 documents for each class. We also suggest that your document set includes sample documents for the most common document variants of each class in approximately a one-to-one ratio. After you have added your new documents to the training set, assign a class to each and retrain the classifier.

Confused classes

Classification errors can also be caused by classes that do not differ significantly from each other with regards to their parameters. In this case, you should review the number of classes, and if necessary, unify the confused classes into a single one. For example, a class for invoices for less than 10,000 USD and a class for invoices for over 10,000 USD may be confused if their only significant difference is the total amount due. In this case, these classes must be unified into a single class for the Classification skill, and the invoices should only be separated from each other at a later stage if needed (for example, when the total amount due has already been extracted from the invoice).

See also