Skip to main content
To create a new Document Definition or to create a document set from already enabled Document Definitions, select Project Document Definitions… on the main menu and then click New… Next, select what kind of documents you need to process. Forms Forms are documents with a fixed layout, i.e. the fields are positioned identically on all copies, each of which is an exact replica of the master form created by a designer.
  1. Select an image that will be used as a reference copy.
A Document Definition for forms is created based on an image obtained by scanning a blank form. In the case of forms, you must obtain an image of a blank form because it is on that image that you will indicate the position of each field.Note: The image must be of high quality and have no distortions like skews, shifts, etc. In the case of color forms, do not apply any color filters when scanning a blank form, because you need to keep the background intact in order to create a Document Definition. The background removal filters should be applied later when scanning filled out forms.If your document contains several pages, load the first page and follow the recommendations provided in the Creating Document Definitions for multipage documents section to add the remaining pages.Click Next to proceed to the next step.
  1. Specify the main properties of the Document Definition, including its name and description and the language of your documents.
By default, only the languages for which dictionaries are provided are shown in the drop-down language list. To see all the available languages, select the Show all languages option. Be sure to specify the correct language. Specifying the wrong language will result in recognition errors. The languages that have the abbreviation “ICR” next to them can be used for recognizing handwritten, hand-printed and machine-printed text. If you don’t know the language of your documents in advance, you can specify several candidate languages from which the program will then select the appropriate language.Important! Specifying too many languages may slow down document processing and will cause recognition errors.The text type specified in this step will be used by default, but you can always change the text type for any field later (this may be necessary if different fields contain text in different languages).Select the Use recognition settings from batch type option if you want to use the recognition settings specified for the batch type.Note: Disabling synchronization may slow down Document Definition matching.Click Next to proceed to the next step.
  1. Specify the types of fields that you wish to be detected automatically. The program will have no problem finding specially marked text entry fields or checkmarks inside boxes. However, if the text entry fields on your form have no special marking or if here are no special boxes for checkmarks next to explanatory text, you may want to indicate their possible positions manually.
Note: The program will always attempt to detect anchors on forms.Click Finish. The Document Definition Editor will open, where you need to mark up the fields and static elements on the page image and define their properties.
Semi-structured or unstructured documents In the case of semi-structured and unstructured documents, the field layout may vary from document to document. To extract fields from this kind of documents, a FlexiLayout will be used. Additionally, Natural Language Processing (NLP) technologies may be used to extract fields from unstructured documents.
  1. Select an image that will be used as a reference copy (optional if you are creating a FlexiLayout).
If you want to use a FlexiLayout created in ABBYY FlexiLayout Studio, select the Load FlexiLayout option and specify the path to the file containing the FlexiLayout. For details, see the Creating a Document Definition based on a flexible description section.A FlexiLayout can be created automatically if the Allow field location training option is selected.Click Next to proceed to the next step.
  1. Specify the main properties of the Document Definition, including its name and description and the language of your documents.
By default, only the languages for which dictionaries are provided are shown in the drop-down language list. To see all the available languages, select the Show all languages option. Be sure to specify the correct language. Specifying the wrong language will result in recognition errors. The languages that have the abbreviation “ICR” next to them can be used for recognizing handwritten, hand-printed and machine-printed text. If you don’t know the language of your documents in advance, you can specify several candidate languages from which the program will then select the appropriate language.Important! Specifying too many languages may slow down document processing and will cause recognition errors.The text type specified in this step will be used by default, but you can always change the text type for any field later (this may be necessary if different fields contain text in different languages).Select the Use recognition settings from batch type option if you want to use the recognition settings specified for the batch type.Note: Disabling synchronization may slow down Document Definition matching.Click Finish.The Document Definition Editor will open.
Documents that do not require automatic data extraction These are documents that do not require automatic field detection. OCR technology may be employed to make full-text searches possible or the documents may be left unrecognized. The aim of processing such documents is to digitize them and make them searchable by users, who will carry out searches based on the value of the key fields. For details, see the Document Definitions without field extraction section.
  1. Select the source of the image used for the document sample (optional).
Click Next to proceed to the next step.
  1. Specify the main properties of the Document Definition, including its name and description and the language of your documents.
By default, only the languages for which dictionaries are provided are shown in the drop-down language list. To see all the available languages, select the Show all languages option. Be sure to specify the correct language. Specifying the wrong language will result in recognition errors. The languages that have the abbreviation “ICR” next to them can be used for recognizing handwritten, hand-printed and machine-printed text. If you don’t know the language of your documents in advance, you can specify several candidate languages from which the program will then select the appropriate language.Important! Specifying too many languages may slow down document processing and will cause recognition errors.The text type specified in this step will be used by default, but you can always change the text type for any field later (this may be necessary if different fields contain text in different languages).Select the Use recognition settings from batch type option if you want to use the recognition settings specified for the batch type.Note: Disabling synchronization may slow down Document Definition matching.Click Finish.
Document set A document set is a collection of logically related documents. For a document set, a Document Definition is created that includes other Document Definitions and, optionally, a summary section with information gathered from the documents in the set. For details, see the Creating and setting up document sets section.
  1. From the list of all documents available in the project, select the documents that belong to the document set. If required, add a summary section to group the main fields of the set in one section, so that they can all be verified on the same data form.
Click Next to proceed to the next step.
  1. Specify the main properties of the Document Definition, including its name and description and the language of your documents.
By default, only the languages for which dictionaries are provided are shown in the drop-down language list. To see all the available languages, select the Show all languages option. Be sure to specify the correct language. Specifying the wrong language will result in recognition errors. The languages that have the abbreviation “ICR” next to them can be used for recognizing handwritten, hand-printed and machine-printed text. If you don’t know the language of your documents in advance, you can specify several candidate languages from which the program will then select the appropriate language.Important! Specifying too many languages may slow down document processing and will cause recognition errors.The text type specified in this step will be used by default, but you can always change the text type for any field later (this may be necessary if different fields contain text in different languages).Select the Use recognition settings from batch type option if you want to use the recognition settings specified for the batch type.Note: Disabling synchronization may slow down Document Definition matching.Click Finish.