- Preprocessing of scanned images
- Simultaneous recognition of a large volume of documents
- Export to an archive format
Scenario implementation
The code samples provided in this topic are Windows -specific.
Step 1. Loading ABBYY FineReader Engine
Step 1. Loading ABBYY FineReader Engine
To start your work with ABBYY FineReader Engine, you need to create the Engine object. The Engine object is the top object in the hierarchy of the ABBYY FineReader Engine objects and provides various global settings, some processing methods, and methods for creating the other objects.To create the Engine object, you can use the InitializeEngine function. See also other ways to load Engine object (Win).
C#
C++ (COM)
Step 2. Loading settings for the scenario
Step 2. Loading settings for the scenario
ABBYY FineReader Engine enables loading of all processing settings that are most suitable for this scenario using the LoadPredefinedProfile method of the Engine object. This method receives the profile name as an input parameter. Please see Working with Profiles for more information.ABBYY FineReader Engine supports 2 variants of settings for this scenario:
If you wish to change processing settings, use appropriate parameter objects. Please see Additional optimization for specific tasks for further information.
Profile name | Description |
|---|---|
DocumentArchiving_Accuracy | The settings have been optimized for accuracy:
|
DocumentArchiving_Speed | The settings have been optimized for processing speed:
|
C#
C++ (COM)
Step 3. Loading and preprocessing the images
Step 3. Loading and preprocessing the images
ABBYY FineReader Engine provides the FRDocument object which allows processing multi-page documents. Using of this object allows you to preserve the logical organization of the document.To load images of a single document and preprocess them, you should create the FRDocument object and add images to it. You may do one of the following:
- Create the FRDocument object using the CreateFRDocumentFromImage method of the Engine object. This method creates the FRDocument object and loads images from the specified file.
- Create the FRDocument object with the help of the CreateFRDocument method of the Engine object, then add images to the created FRDocument object from file (use the AddImageFile, AddImageFileWithPassword, or AddImageFileWithPasswordCallback method of the FRDocument object).
C#
C++ (COM)
Step 4. Document recognition
Step 4. Document recognition
To recognize a document, we suggest that the methods of the FRDocument object analysis and recognition be used. This object provides a whole array of methods for document analysis, recognition, and synthesis. The most convenient method allowing document analysis, recognition, and synthesis using just one method is the Process method. It also uses simultaneous processing features of multiprocessor and multicore systems in the most efficient manner. However, you can also perform consecutive preprocessing, analysis, recognition, and synthesis using Preprocess, Analyze, Recognize, and Synthesize methods.
C#
C++ (COM)
Step 5. Document export
Step 5. Document export
To save a recognized document, you can use the Export method of the FRDocument object by assigning the FileExportFormatEnum constant as one of the parameters. In this scenario, you can save the document, for example, to the PDF format using MRC in the export mode PEM_ImageOnText (property TextExportMode of the PDFExportParams object). You can change the default parameters of export using the corresponding export object. Please see Additional optimization for specific tasks below for further information.After you have finished your work with the FRDocument object, release all the resources that were used by this object. Use the IFRDocument::Close method.
C#
C++ (COM)
Step 6. Unloading ABBYY FineReader Engine
Step 6. Unloading ABBYY FineReader Engine
After finishing your work with ABBYY FineReader Engine, you need to unload the Engine object. To do this, use the DeinitializeEngine exported function.
C#
C++ (COM)
Required resources
You can use the FREngineDistribution.csv file to automatically create a list of files required for your application to function. For processing with this scenario, select in the column 5 (RequiredByModule) the following values: Core Core.Resources Opening Opening, Processing Processing Processing.OCR Processing.OCR, Processing.ICR Processing.OCR.NaturalLanguages Processing.OCR.NaturalLanguages, Processing.ICR.NaturalLanguages Export Export, Processing Export.Pdf Export.Pdf, Opening.Pdf If you modify the standard scenario, change the required modules accordingly. You also need to specify the interface languages, recognition languages and any additional features which your application uses (such as, e.g., Opening.PDF if you need to open PDF files, or Processing.OCR.CJK if you need to recognize texts in CJK languages). See Working with the FREngineDistribution.csv File for further details.Additional optimization for specific tasks
Below is the overview of the Help topics containing additional information regarding customization of settings at different stages of document processing:-
Scanning - Windows Only
- Scanning
Description of the ABBYY FineReader Engine scenario for document scanning.
- Scanning
-
Recognition
- Tuning Parameters of Preprocessing, Analysis, Recognition, and Synthesis
Customization of document processing using objects of analysis, recognition and synthesis parameters.
- Tuning Parameters of Preprocessing, Analysis, Recognition, and Synthesis
-
Recognize handwriting
The DocumentArchiving_*** profiles do not include handwritten or handprinted text recognition. If you need to recognize handwriting, set the DetectHandwritten property of the PageAnalysisParams object to TRUE. -
PageProcessingParams Object
This object enables customization of analysis and recognition parameters. Using this object, you can indicate which image and text characteristics must be detected (inverted image, orientation, bar codes, recognition language, recognition error margin). -
SynthesisParamsForPage Object
This object includes parameters responsible for restoration of a page formatting during synthesis. -
SynthesisParamsForDocument Object
This object enables customization of the document synthesis: restoration of its structure and formatting. -
MultiProcessingParams Object - Implemented for Linux and Windows
Simultaneous processing may be useful when processing a large number of images. In this case, the processing load will be spread over the processor cores during image opening and preprocessing, layout analysis, recognition, and export, which makes it possible to speed up processing.
Reading modes (simultaneous or consecutive) are set using the MultiProcessingMode property, and the RecognitionProcessesCount property controls the number of processes which may be started. -
Export
- Tuning Export Parameters
Customization of document export using objects of export parameters. - PDFExportParams Object
This object allows you to tune PDF (PDF/A) export with only several parameters. - To customize the PDF (PDF/A) format export mode, use the TextExportMode property of the PDFExportParams object, and to customize MRC settings, use the MRCMode property.
- In addition, you can customize image export settings to ensure faster processing, additional reduction of file size, etc. For example, you can save a colored image as a grayscale, or black and white image, if this fits your scenario (use the Colority property of the PDFExportParams object).
- You can change the image resolution in such a way that the resulting electronic copy may subsequently be printed out on a printer, viewed on a computer screen, or you can select low resolution allowing only for the reading of a text and providing very poor quality of graphics (use the Resolution and ResolutionType property of the PDFExportParams object).
- Tuning Export Parameters
-
Separation into documents
- Under this scenario, the batch of images may have to be separated into documents. ABBYY FineReader Engine 12 does not support automatic document separation. However, you can use ABBYY FlexiCapture Engine to implement automatic separation. The documents may be separated, for instance, based on the number of pages in a document or based on pages having separating barcodes. When implementing barcode separation, you can use the scenario for extraction of barcode values only from the document.
