Analyzing Extracted Data

The Results tab in the Document Skill Designer contains field extraction statistics for Document skills. Analyzing these statistics helps understand how the quality of extracted data can be improved. All the fields extracted by the skill are displayed in the Fields column. Fields that are part of field groups are collected into collapsed drop-down lists named after their field groups.

The following field extraction statistics are available:

The Accuracy column shows the percentage of fields with correctly extracted values (the ALL FIELDS row), as well as the percentage of correctly extracted values for individual fields.
- Accuracy values for individual fields are calculated as Accuracy = Correct / (Correct + Recognition Issue + Located Incorrectly + Not Detected).
- The ALL FIELDS accuracy value is calculated using the same formula, but the numbers in the denominator are aggregates across all fields.
The Correct column shows the number of field instances whose extracted value matched the reference value.
The Recognition Issue column shows the number of field instances detected in the document but not recognized correctly.
The Located Incorrectly column shows the number of field instances whose values differ from the predicted values because their regions were detected in locations different from those specified in the labeling.
The Not Detected column shows the number of undetected field instances.
The Frequency in Documents column shows the percentage of documents containing the given field.

Tip: By default, these statistics are displayed for all fields. You can hide individual fields in the list and view the statistics for desired fields only. To do so, click the filter icon at the top of the Fields column and select the desired fields.

Reviewing fields extracted with errors

You can view documents that contain fields extracted with errors. To do so, click a statistic in the Recognition Issue, Located Incorrectly, or Not Detected column. Example: Clicking the statistic in the Recognition Issue column for the Order Date field will open a tab where you can view the documents in which the Order Date field was extracted with a recognition issue. On the Result Review tab that opens, you can review the extraction results, document labeling errors, and recognition issues. You can also compare the labeling created when setting up the Document skill with the labeling created during training. Documents can be viewed on this tab in one of the following modes:

The Reference mode displays the reference labeling created when setting up the skill (i.e., before it was trained), as well as the field values extracted using that labeling. Field values and regions can be edited in this mode.
The Predicted mode displays the field values and regions obtained when processing documents. Field values and regions cannot be edited in this mode.
The Difference mode displays the differences between the reference and predicted labeling. Identical field values and regions are displayed in green, while differing field values and regions are displayed in red. Field values and regions cannot be edited in this mode.

You can switch between these three modes by clicking their tabs on the toolbar. If a field was labeled incorrectly when setting up the skill and the correct result was obtained when processing a document, you can correct the reference labeling. To do so, switch to the Difference mode and click the icon located above the value of the field that contains the labeling error:

The Field in Reference box will contain the value extracted using the reference labeling. Click Copy from Predicted to replace the incorrect value with the value extracted when processing the document.

Tip: A recognition issue means that one or more characters in the field value were not recognized correctly. To fix an error of this type, modify the properties of the field so that such characters are interpreted correctly.

Example: If a field may contain only numbers, set its data type to “Number.” This will the prevent, say, the number “1” from being recognized as “l” (lowercase L) or “I” (uppercase i), either of which might look very much like “1” on a document. If the Field in Reference box contains the correct value but the processing result is not correct, we recommend increasing the number of documents in the set and retraining the skill. To go to the next document that contains the same type of error in the same field, click Go to Next Document in the Actions pane.

Introduction

Quickstart

Skill Catalog

Skill Designer

Advanced Designer

Runtime Guide

Tenant Admin Guide

Scanning Station Guide

Developer Guide

Release Notes

Reviewing fields extracted with errors

Introduction

Quickstart

Skill Catalog

Skill Designer

Advanced Designer

Runtime Guide

Tenant Admin Guide

Scanning Station Guide

Developer Guide

Release Notes

​Reviewing fields extracted with errors

Reviewing fields extracted with errors