It may be difficult to specify field extraction properties for cases where a single Document skill needs to process documents that vary significantly with regards to their field placement (despite being of the same type). For example, the same skill can be used to process invoices from different vendors, where the same fields may be placed in locations that differ from vendor to vendor. To improve the extraction quality for such skills, you can choose to classify its documents into classes, which are document subgroups (with common properties) for a single document type, and set up separate extraction activities for each one. Classifying documents into classes may also be required when you need to improve extraction quality for one of the classes. For example, a single skill may be used to process bank statements compiled by different banks. One statement type may have a lower extraction quality compared to the rest. To improve the extraction quality for that skill, you can sort the statements into classes and set up an Extraction Rules activity for the class that has an unsatisfactory extraction quality. The Classify By Text and Image activity is designed to sort a skill’s documents into classes that require their own extraction activities to be created and set up.Documentation Index
Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
Use this file to discover all available pages before exploring further.
Setup overview
Upload images and assign classes
Upload images, create classes, and assign expected classes to documents.
Create and set up the activity in the Activities tab
Create a Classify By Text and Image activity in the workflow. When it is created, a field to record the classification results will be created in the skill structure. The value of this field will be used to classify documents. This field will be displayed in the skill field structure, however, it will be marked as hidden and will not be editable.A Classify By Text and Image activity does not return a confidence value for a class. It only returns its name.
Set up the activity in the Activity Editor
Step 1: Upload documents
Upload documents that will be used to set the activity up by clicking Upload in the toolbar and selecting an upload method:- Upload Documents… Use the dialog box that will open to select the appropriate documents. The selected documents will be displayed in the No Class list.
- Upload Folder Like Classes… Use the dialog box that will open to select a folder that contains subfolders with images. Each subfolder should contain images of a single class. Uploading documents this way will automatically create classes that correspond to subfolders, with documents in those respective subfolders classified to be of that class. As such, you will not need to manually create classes in the Activity Editor.
Step 2: Create classes
Create classes that correspond to the different types of documents being processed by clicking either Create Class in the toolbar or Create in the Assign class pane. If your documents were uploaded using Upload folder like classes, make sure that all required classes have been created.Step 3: Classify documents
Classify your documents using one of the following methods:- Select all documents of a single class in the list and click an appropriate class name in the Assign class pane.
- If an appropriate class has not been created yet, select all appropriate documents in the list and create a class by clicking either Create Class in the toolbar or Create in the Assign class pane.
- Select all documents of a single class and drag them to the list that corresponds to that class.
Additional options
If required, you can change the orientation of document pages using the Rotate drop-down menu on the toolbar. You can select one of the following options: Rotate All Pages Left, Rotate All Pages Right, or Rotate All Pages 180º. To switch view modes, use the following buttons in the toolbar:- List view. Displays documents as a list
- Thumbnail view. Displays documents as thumbnails
Train a classifier and view classification results
Once documents have been classified, train your activity using the Train Activity button. After training has finished, statistics regarding the classification results will be displayed on the Results tab. Analyzing these statistics helps identify problem classes and evaluate the general quality of the classifier.General statistics
The top pane displays general statistics for all documents and classes of the activity. These statistics help evaluate the general quality of your classifier:- accuracy. The percentage of documents the expected class of which matched the class assigned by the program.
- F-Measure. Use to evaluate classification precision and completeness.
- Recall. The ratio of documents correctly classified as a specific class to all documents of that class.
- Precision. The ratio of documents correctly classified as a specific class to all documents classified as that class (both correctly and incorrectly).
