Document processing in ABBYY FineReader Engine consists of several steps: page preprocessing, analysis, recognition, page synthesis, document synthesis, and export. This section deals with page preprocessing, analysis, recognition, and page/document synthesis. For details about export parameters, see Tuning Export Parameters. Let’s consider the processing stages in order:Documentation Index
Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
Use this file to discover all available pages before exploring further.
- Page preprocessing
During this stage, FineReader Engine automatically improves image quality and corrects defects that can interfere with OCR: page orientation, inverted images, and geometrical distortions. - Layout analysis
During analysis, FineReader Engine finds areas that contain different types of data. These areas are called “blocks.” - Recognition
Parts of the image that lie inside the blocks are recognized in ways that depend on the block type. - Page synthesis
The text and background colors, hyperlinks, and other formatting are detected. - Document synthesis
Finally, the font styles and document logical structure are recreated: FineReader Engine detects headings in the recognized document, reconstructs the table of contents, detects captions to pictures and tables, and other elements of document structure.
Page processing
To set the parameters of processing of each page, use the properties of the PageProcessingParams subobject of the DocumentProcessingParams object. The PageProcessingParams object is the parent for a group of objects that set up the page processing parameters:- PagePreprocessingParams
- ColorObjectsProhibitingParams
- PageAnalysisParams
- ObjectsExtractionParams
- RecognizerParams
- SynthesisParamsForPage
Document processing
To set parameters of document processing, in addition to page processing parameters, you need to also set the parameters of document synthesis via the SynthesisParamsForDocument object. During document synthesis font styles and formatting are detected. Those of FineReader Engine objects that deal with document fonts and styles become meaningful only after document synthesis. You may omit the stage of document synthesis in the following cases:- If you are going to export recognized text to TXT format. When exporting to this format, synthesis information is not used.
- If you are going to export a document to PDF ImageOnly format. The recognized text and layout information are not used in this mode.
Methods having the word “Process” in their names (for example, IFRDocument::Process ) include the stage of document synthesis. Processing methods of the FRPage object do not include it, so after using them and before export, you must explicitly call some method that performs document synthesis.
Tuning document processing
A step-by-step procedure that uses the parameter objects mentioned above should look like this:- Create a DocumentProcessingParams object with the help of the CreateDocumentProcessingParams method of the Engine object.
- Set up the necessary properties of the PageProcessingParams subobject. You do not need to set up all the properties of all the subobjects, as on creation they are initialized with reasonable defaults. You only have to tune up those of the properties that you want to have values other than default ones.
- If necessary, set up the necessary properties of SynthesisParamsForDocument subobject. You do not need to set up all the properties of all the objects and subobjects, as on creation they are initialized with reasonable defaults. You only have to tune up those of the properties that you want to have values other than the default ones. Check that the value of the PerformSynthesis property of the DocumentProcessingParams object is true.
- You can pass the DocumentProcessingParams object or a set of its subobject to one of the processing methods of the FRDocument, FRPage, and Engine objects.
C# code
C# code
- Linux: CustomLanguage
- Windows: CustomLanguage, VisualComponents; and demo tools: MultiProcessingRecognition, PDFExportProfiles
