Document Definition Wizard - ABBYY Documentation

How to create a Document Definition for forms

Select an image that will be used as a reference copy.

A Document Definition for forms is created based on an image obtained by scanning a blank form. In the case of forms, you must obtain an image of a blank form because it is on that image that you will indicate the position of each field.

The image must be of high quality and have no distortions like skews, shifts, etc. In the case of color forms, do not apply any color filters when scanning a blank form, because you need to keep the background intact in order to create a Document Definition. The background removal filters should be applied later when scanning filled out forms.

If your document contains several pages, load the first page and follow the recommendations provided in the Creating Document Definitions for multipage documents section to add the remaining pages.Click Next to proceed to the next step.

Specify the main properties of the Document Definition, including its name and description and the language of your documents.

By default, only the languages for which dictionaries are provided are shown in the drop-down language list. To see all the available languages, select the Show all languages option. Be sure to specify the correct language. Specifying the wrong language will result in recognition errors. The languages that have the abbreviation “ICR” next to them can be used for recognizing handwritten, hand-printed and machine-printed text. If you don’t know the language of your documents in advance, you can specify several candidate languages from which the program will then select the appropriate language.

Specifying too many languages may slow down document processing and will cause recognition errors.

The text type specified in this step will be used by default, but you can always change the text type for any field later (this may be necessary if different fields contain text in different languages).Select the Use recognition settings from batch type option if you want to use the recognition settings specified for the batch type.

Disabling synchronization may slow down Document Definition matching.

Click Next to proceed to the next step.

Specify the types of fields that you wish to be detected automatically. The program will have no problem finding specially marked text entry fields or checkmarks inside boxes. However, if the text entry fields on your form have no special marking or if here are no special boxes for checkmarks next to explanatory text, you may want to indicate their possible positions manually.

The program will always attempt to detect anchors on forms.

Click Finish. The Document Definition Editor will open, where you need to mark up the fields and static elements on the page image and define their properties.

How to create a Document Definition for semi-structured or unstructured documents

Select an image that will be used as a reference copy (optional if you are creating a FlexiLayout).

If you want to use a FlexiLayout created in ABBYY FlexiLayout Studio, select the Load FlexiLayout option and specify the path to the file containing the FlexiLayout. For details, see the Creating a Document Definition based on a flexible description section.A FlexiLayout can be created automatically if the Allow field location training option is selected.Click Next to proceed to the next step.

Specify the main properties of the Document Definition, including its name and description and the language of your documents.

By default, only the languages for which dictionaries are provided are shown in the drop-down language list. To see all the available languages, select the Show all languages option. Be sure to specify the correct language. Specifying the wrong language will result in recognition errors. The languages that have the abbreviation “ICR” next to them can be used for recognizing handwritten, hand-printed and machine-printed text. If you don’t know the language of your documents in advance, you can specify several candidate languages from which the program will then select the appropriate language.

Specifying too many languages may slow down document processing and will cause recognition errors.

The text type specified in this step will be used by default, but you can always change the text type for any field later (this may be necessary if different fields contain text in different languages).Select the Use recognition settings from batch type option if you want to use the recognition settings specified for the batch type.

Disabling synchronization may slow down Document Definition matching.

Click Finish.The Document Definition Editor will open.

How to create a Document Definition for documents that do not required automatic data extraction

Select the source of the image used for the document sample (optional).

Click Next to proceed to the next step.

Specify the main properties of the Document Definition, including its name and description and the language of your documents.

By default, only the languages for which dictionaries are provided are shown in the drop-down language list. To see all the available languages, select the Show all languages option. Be sure to specify the correct language. Specifying the wrong language will result in recognition errors. The languages that have the abbreviation “ICR” next to them can be used for recognizing handwritten, hand-printed and machine-printed text. If you don’t know the language of your documents in advance, you can specify several candidate languages from which the program will then select the appropriate language.

Specifying too many languages may slow down document processing and will cause recognition errors.

The text type specified in this step will be used by default, but you can always change the text type for any field later (this may be necessary if different fields contain text in different languages).Select the Use recognition settings from batch type option if you want to use the recognition settings specified for the batch type.

Disabling synchronization may slow down Document Definition matching.

Click Finish.

How to create a Document Definition for a document set

From the list of all documents available in the project, select the documents that belong to the document set. If required, add a summary section to group the main fields of the set in one section, so that they can all be verified on the same data form.

Click Next to proceed to the next step.

Specify the main properties of the Document Definition, including its name and description and the language of your documents.

By default, only the languages for which dictionaries are provided are shown in the drop-down language list. To see all the available languages, select the Show all languages option. Be sure to specify the correct language. Specifying the wrong language will result in recognition errors. The languages that have the abbreviation “ICR” next to them can be used for recognizing handwritten, hand-printed and machine-printed text. If you don’t know the language of your documents in advance, you can specify several candidate languages from which the program will then select the appropriate language.

Specifying too many languages may slow down document processing and will cause recognition errors.

The text type specified in this step will be used by default, but you can always change the text type for any field later (this may be necessary if different fields contain text in different languages).Select the Use recognition settings from batch type option if you want to use the recognition settings specified for the batch type.

Disabling synchronization may slow down Document Definition matching.

Click Finish.