Skip to main contentTo train a classifier, you will need a training set that contains documents that have already been assigned a reference class.
Creating a training set
-
In the Classification Skill Designer, open the Documents tab.
-
Create the appropriate classes by using the Create Class command in the Actions pane or by clicking Create class above the class list in the left part of the screen.
Tip: To rename an existing class, click the icon next to the name of the class and select Rename class.
-
Select a class from the class list in the left part of the screen and upload documents to it by clicking either Upload documents in the center of the Classification Skill screen, the Upload button in the toolbar, or Upload Documents in the Actions pane.
When documents are being uploaded, a progress indicator is displayed at the top of the Skill Designer, to the right of the bookmarks. The indicator tooltip contains information about the number of documents that still need to be uploaded and processed.
Note: Documents that have been uploaded to the No class group are not used for classifier training and testing.
For each class, the number of documents of that class will be displayed. If your document set contains very few classes or if the classes differ significantly from each other, you can have a small number of documents for each class. On the other hand, if there are too many classes, or if the differences between the classes are not significant enough, we recommend that you upload between 10 and 100 documents for each class, as using a smaller number of documents may result in classification errors in this case. In any case, we do not recommend uploading more than 1,000 documents for a class. To maximize classification accuracy, include sample documents for the most common document variants of each class (approximately one document per variant).
By default, all uploaded documents are displayed as a list. This makes the files easy to navigate if they have informative names. You can also switch to thumbnail view, which may be preferable if you have visually differing documents. To switch between these two viewing modes, use the following toolbar buttons:
If more than 50 documents are uploaded, they will be displayed over several pages.
In either view, you can open a document preview window by clicking the button located to the left of a document’s name.
The default width of this window is 35% of the browser window width. If needed, you can increase the width of the preview window up to 80% of the browser window’s width by dragging the left border of the preview window. User-specified widths will be preserved until the browser cache is cleared.
If required, you can change the orientation of document pages in each class manually by clicking Rotate in the toolbar, which will rotate the pages 90° counter-clockwise. Alternatively, you can select one of the following options from the drop-down list: Rotate Left, Rotate Right, or Rotate 180°. You can also use the document preview window to change the page orientation of a specific document by clicking the button and choosing an appropriate rotation option.
If a file has not uploaded for some reason (e.g. the file is in an unsupported format), its name will displayed in red.
Changing a document’s assigned class
You can change the reference class assigned to an uploaded document by doing the following:
- Mark one or several documents that need to be assigned a new class by selecting the checkbox to the left of their names.
- From the list that will appear in the Actions pane, select the appropriate class for the document or documents and click the Assign button. If the correct class does not appear on the list, specify a new name in the Search for class field in the Actions pane and click Create.
Removing documents from a set
You can remove documents from a set in one of the following ways:
- Mark one or several documents to be removed by selecting the checkbox to the left of their names. You can mark all documents of a specific class by selecting the checkbox to the left of the name of that class above the document list (if the documents of the class are displayed over several pages, this will mark only the documents displayed on the current page). Click the icon next to one of the marked documents and then click Delete. Confirm your choice in the dialog box that will appear. This will delete the marked documents.
Tip: Even if a document is not marked for deletion, you can still delete it by clicking the icon next to its name.
- Click the icon next to a class name in the class list and then click Delete All Documents. Confirm your choice in the dialog box that will appear. This will delete all the documents of the selected class. Alternatively, you can click Delete Class with All Documents, which will delete the class itself as well as all documents in it.
Training a classifier
To train a classifier using a specially prepared training set, click the Train button in the Actions pane. The Train button will only be active if there are at least two different non-empty classes in the training set.
Once training has been completed, the Completed icon will be displayed next to the Train button. Additionally, the way the class list is displayed in the Documents tab will also change. In addition to the number of uploaded documents of each class, the number of documents with a predicted class that is different from the reference class will also be displayed.
You can stop classifier training by clicking Cancel under the Train button in the Actions pane.
See also
Analyzing the classification results