- Preprocessing of scanned images or photos
- Recognition of small text fragments
- Working with the recognized data
Scenario implementation
The code samples provided in this topic are Windows -specific.
Step 1. Loading ABBYY FineReader Engine
Step 1. Loading ABBYY FineReader Engine
To start your work with ABBYY FineReader Engine, you need to create the Engine object. The Engine object is the top object in the hierarchy of the ABBYY FineReader Engine objects and provides various global settings, some processing methods, and methods for creating the other objects.To create the Engine object, you can use the InitializeEngine function. See also other ways to load Engine object (Win).
C#
C++ (COM)
Step 2. Loading settings for the scenario
Step 2. Loading settings for the scenario
The most suitable settings can be selected by using the LoadPredefinedProfile method of the Engine object. This method receives the profile name as the input parameter. The most suitable settings can be selected by using the predefined profile called FieldLevelRecognition. For more about profiles, see Working with Profiles.If you wish to change the settings used for processing, use the corresponding parameter objects. See the Additional optimization section below for more information.
C#
C++ (COM)
Step 3. Loading and preprocessing the images
Step 3. Loading and preprocessing the images
ABBYY FineReader Engine provides a FRDocument object for processing multi-page documents. To load the images of a document and preprocess them, you should create the FRDocument object and add images to it. You can do one of the following:
- Create an FRDocument object using the CreateFRDocumentFromImage method of the Engine object. This method creates an FRDocument object and loads images from a specified file.
- Create an FRDocument object with the help of the CreateFRDocument method of the Engine object, then add the images into the created FRDocument object from a file (use the AddImageFile, AddImageFileWithPassword, or AddImageFileWithPasswordCallback method of the FRDocument object).
C#
C++ (COM)
Step 4. Setting up the fields to be recognized
Step 4. Setting up the fields to be recognized
Now you need to create blocks that contain your fields, and for each specify block type and known characteristics of the data inside.Perform layout analysis of the document using the Analyze method, or manually add blocks that contain the fields you need to recognize. See Working with Layout and Blocks for instructions.For each field, you can now specify its own parameters of recognition. For example, if a field contains some text, use the ITextBlock::RecognizerParams property:
- set the text type with the help of the TextTypes property of the RecognizerParams object. E.g., if the field contains digits written in ZIP-code style, use TT_Index text type.
- set the language using the SetPredefinedTextLanguage method. Using special predefined languages (Windows only) can be helpful if you know the type of information contained in the field. E.g., if the field contains an address in the US, select the English_US_Address predefined language. This will ensure that the text is recognized more reliably.
- set the SaveCharacterRecognitionVariants and SaveWordRecognitionVariants properties of the RecognizerParams object if you need to use the recognition variants for further verification of the result, as described below in step 6. Note that this setting is not available for handwritten or handprinted texts.
C#
C++ (COM)
Step 5. Recognition
Step 5. Recognition
As the document layout has already been analyzed and additionally modified by you, do not call the analysis methods again. Use the Recognize method, which performs recognition and page synthesis for all pages in the document. In this scenario, you need to extract the data from fields and not export the recognized document, therefore you will not need document synthesis.
C#
C++ (COM)
Step 6. Working with the recognized data
Step 6. Working with the recognized data
Use the Text object to access the recognized text fragment (you can get this object for a text block via the ITextBlock::Text property). Use the Paragraphs property to get the collection of paragraphs in the fragment and the IParagraphs::Item method to access the individual paragraphs. The IParagraph::Text property provides access to the recognized text of a paragraph.You can use the IParagraph::Words to get the collection of words in a paragraph. Use the IWords::Item method to access individual words in the collection. The IWord::Text property returns the line that contains the recognized word. Use the GetRecognitionVariants method of the Word object or the GetWordRecognitionVariants method of the Paragraph object to get the recognition variants for a word.The attributes of individual characters can be accessed via the GetCharParams method of the Paragraph object. This method provides access to the CharParams object, which contains the parameters of the recognized character. The recognition variants for a character are accessible via the ICharParams::CharacterRecognitionVariants property.For detailed information on working with text, see Working with Text. For information on using the Engine in voting algorithms, see Using Voting API.After you have finished your work with the FRDocument object, release all the resources that were used by this object. Use the IFRDocument::Close method.
Step 7. Unloading ABBYY FineReader Engine
Step 7. Unloading ABBYY FineReader Engine
After finishing your work with ABBYY FineReader Engine, you need to unload the Engine object. To do this, use the DeinitializeEngine exported function.
C#
C++ (COM)
Required resources
You can use the FREngineDistribution.csv file to automatically create a list of files required for your application to function. For processing with this scenario, select in the column 5 (RequiredByModule) the following values: Core Core.Resources Opening Opening, Processing Processing Processing.OCR Processing.OCR, Processing.ICR Processing.OCR.NaturalLanguages Processing.OCR.NaturalLanguages, Processing.ICR.NaturalLanguages If you modify the standard scenario, change the required modules accordingly. You also need to specify the interface languages, recognition languages and any additional features which your application uses (such as, e.g., Opening.PDF if you need to open PDF files, or Processing.OCR.CJK if you need to recognize texts in CJK languages). See Working with the FREngineDistribution.csv File for further details.Additional optimization
These are the sections of the help file where you can find additional information about setting up the parameters for the various processing stages:- Recognition
- Working with Languages
Using built-in and custom recognition languages. - Working with Dictionaries
Using dictionaries to improve recognition quality. - Recognizing Words with Spaces
Using dictionaries to recognize words with spaces (such as New York, etc.). - Recognizing Handwritten Texts
Using ICR (Intelligent Character Recognition). - Recognizing Checkmarks
Setting up recognition of checkmarks and groups of checkmarks. - Special Predefined Languages in ABBYY FineReader Engine - Windows
The list of recognition languages that contain special language units: addresses, date and time, human names, etc. These languages can be used for field recognition.
- Working with Languages
- Working with the recognized data
- Working with Text
Working with the recognized text, paragraphs, words, and characters. - Using Voting API
Working with words and character recognition alternatives.
- Working with Text
