Document Definition properties - ABBYY Documentation

The basic Document Definition properties, such as its name, language and writing style, are configured with the help of the Document Definition Wizard. Other properties are configured by default. You can view and change the properties of a Document Definition in the properties dialog box that opens when you select the menu item Document Definition → Document Definition Properties… in the Document Definition editor window. The dialog box has the following tabs:

The General

On this tab you can rename the Document Definition and enter or edit its description. The Enabled option includes/excludes the Document Definition from document processing.

The Recognition

The program uses fast recognition that is called full-text recognition for classification, matching of a FlexiLayout, and highlighting of text on images. This tab is used to specify settings of full-text recognition. Please note that field recognition settings are specified in field properties.

Prefer settings from batch type**.** Select this option if you want to synchronize full-text recognition settings.
Note: Disabling the synchronization may lead to slower Document Definition matching.
Languages**.** It is necessary to set a correct language for recognition to proceed without errors. This property defines both the language itself and other settings like date format, currency, etc.
Select a Recognition mode from the list:
- Fast mode. Colored and half-tone images are binarized prior to recognition (i.e. converted to black and white). Fastrecognition takes less time and provides mostly satisfactory results.
- Balanced mode. The program additionally takes into account image colors. Recognition in this mode is slower but delivers better quality.
  Note: No further modifications are planned for algorithms powering Balanced recognition mode, as the best possible speed and quality figures have been achieved.
- Normal mode is default. It is also recommended when pre-recognition in the Fast / Balanced mode results in multiple errors.
- Accurate mode is intended for extracting data from barely readable digital images or from poor-quality documents.
  Note: Accurate mode takes the most time and is therefore recommended for use only on problem images/documents.
Advanced recognition settings…
- You may select one or several options from the Correct page orientation, if page group for a page to rotate 180°, 90° clockwise or 90° counter-clockwise when its orientation is selected automatically.
- If necessary, specify the way a blank form will print (Typographic, Matrix printer, Typewriter) in the Text type section and select the Use pattern option to add a sample pattern.
- Barcodes**.** Parameters of barcode processing:
  - Disable barcode extraction. Select this option if barcodes should not be found on images. This will speed up document recognition considerably.
  - Extract 2D barcodes: Data Matrix, Aztec, QR Code**.** Select this option if your images contain barcodes of specified types. If the option is not selected, Data Matrix, Aztec, and QR Code barcodes will not be found on images.
Extract post barcodes. Select this option if your images contain postal barcodes. If this option is not selected, postal barcodes will not be found on images.
Important! Extracting barcodes slows down recognition.
CJK pre-recognition
- Separated furigana mode**.** Select this option to improve recognition of phonetic tips (furigana) in the Japanese language.
Named entity recognition: Extract named entities – extraction of information using NLP methods.
Note: Requires an NLP module and a specific license type.
Vertical text extraction – Vertical text extraction parameters:
- Extract for all languages – Detects vertically-oriented text written in any of the supported languages.
- Do not extract – Prevents the detection of vertically-oriented text.
- Extract for CJK languages – Detects vertical text written in Chinese, Japanese or Korean.
Click the Advanced… button to configure correction of linear and nonlinear distortions of images, specify direction of scanner’s automatic feeder, etc.
Note: Select the Correct linear distortion option to specify parameters of image spreading/compression by height and width. Images are scaled by existing anchors (black square, cross or corner) as well as horizontal and vertical separators.
Amount of Money – A combination of a numerical amount and a currency code or symbol. In order to avoid any recognition errors for visually similar characters like 1, I, and i, or s and $, a regular expression is used which allows letters only in certain combinations that represent currency codes, either preceding or immediately following the numerical amount. The major currency codes are listed in Currencies.

You can modify the list of possible currency codes and symbols if required. For example, if you know what currency codes and symbols may occur in your documents, removing any redundant currencies from the list will improve the quality of recognition. You can also add custom currency codes and symbols to the list. To modify the list, click the […] button on the right. In the Currency Symbols dialog box, you can add or remove currency codes or symbols. Alternatively, open the field properties dialog box, click the Data tab, and make the necessary changes. For more information, see Data types of the text entry field. Note: A Document Definition can only have one list of possible currency codes and symbols. This list is applied to all Amount of Money fields.

The Assembly

This tab is intended for configuring assembly rules for multipage documents. In the simplest scenario, the Document Definition comprises a single section that occurs once. If a Document Definition consists of several sections, this tab will show the list of their names. You can specify the number of occurrences of each section by modifying the numbers in the Min number and Max number columns.

Use key fields equality assembling rule enable this option if you want to perform a check of document assembly based on key fields. Then select a key field for each section in the Key Field column. When you input documents, only documents with the matching values of key fields in each section will be considered correctly assembled. If their values do not match, an assembly error message will be displayed.
Use standard assembly rules - enable this option if you want to perform a check of document assembly using the following standard rules:
- Disable sections order check - enable this option if you want to disable the checks for the order of sections in the document (e.g. if the order of sections does not affect document assembly). The program will still check that all the sections are present in the document, but their order will be ignored.
- Enable annex pages - enable this option if you want to process documents with annexes. If processing document with annexes is enabled, you can also select the option Detect annexes using preset document structure, without analyzing (fast) to enable faster detection of annexes on the basis of the present document structure.

Note: The Detect annexes using preset document structure, without analyzing (fast) option is effective only for documents created by means of separation during the import stage or by applying a special flag in API. Such document are excluded from the assembly.

Use custom assembly rules - enable this option if you want to perform a check of document assembly using a document assembly script. A custom assembly script can be executed both separately and together with the standard assembling rules. To start editing the script, click the Edit Assembly Script… button. The Script Editor window will open.

For details see Creating Document Definitions for multipage documents, Assembling pages into documents and Creating Document Definitions for documents with annexes.

The Rules

This tab is intended for actions with Document Definition rules. You can delete, edit or create new rules. For details see Rule validation.

The Export Destinations

This tab shows the current export settings of the given Document Definition. To change the export settings, click the Edit… button

The Data Form

On this tab you can modify the font outline and size for displaying recognized data.

The Data Text Settings group contains font settings for displaying recognized values.
The Label Text Settings group contains settings for displaying the explanatory text (field names).

For details see Configuring data presentation in the document window.

The Data Sets

On this tab you can create and edit custom data sets. For details, see Using vendor and business unit databases.

The Event Handlers

On this tab you can specify event handlers for documents of the current type. For details, see Event Handlers.

The .NET References

On this tab you can add external assemblies to be used in scripts and global modules. Both standard and compiled user assemblies can be used. To add an assembly, click Add… In the dialog box that opens select the type: Standard assembly name or Attached file. Depending on the selected type either enter the standard assembly name or browse to an assembly file. For details see External assemblies.