One of the main recognition parameters is the language which is used during recognition. It is important to set the right language before analysis and recognition. Recognition language can be easily specified with the help of the IRecognizerParams::SetPredefinedTextLanguage method. This method affects the IRecognizerParams::TextLanguage property. By default, this parameter is initialized with the English recognition language. You can also use language autodetection (see the IRecognizerParams::LanguageDetectionMode property for details). Below you can find useful information about the languages supported in ABBYY FineReader Engine by default and objects that provide advanced functionality for working with recognition languages.Documentation Index
Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
Use this file to discover all available pages before exploring further.
Predefined languages
ABBYY FineReader Engine provides a set of languages supported by default. These languages are called “predefined languages.” The collection of available predefined languages represented by the PredefinedLanguages object is accessible via the PredefinedLanguages property of the Engine object. It is a collection of PredefinedLanguage objects. The predefined languages are identified by their internal names. You may directly specify a recognition language by the name of the corresponding predefined language via the IRecognizerParams::SetPredefinedTextLanguage method. For the list of the internal names of the predefined languages, see Predefined Languages in ABBYY FineReader Engine.Recognition language for a text
The language which is used during recognition is represented by the TextLanguage object. The RecognizerParams object that specifies the recognition parameters stores a reference to the TextLanguage object. The recognition functions take this object either as a subobject of the PageProcessingParams object passed to them as an input parameter or from a block in a Layout object. The TextLanguage object exposes the following main properties:- Internal name. We recommend selecting a unique name for the internal language; it is already unique for the languages supplied in the ABBYY FineReader Engine distribution pack. Be sure to make the names of new languages unique.
- Letter sets. The TextLanguage object contains the following letter sets: punctuation marks that may be encountered between words, prohibited characters, and additional punctuation marks that go immediately before and after words.
- Prohibiting dictionaries. You can create a collection of prohibiting dictionaries using the ProhibitingDictionaries property of the TextLanguage object. The words from these dictionaries cannot be used as variants of a recognized word. But if no variants are left and using a prohibited word is the only option, words from these dictionaries may still appear in the recognized text. See Working with Dictionaries.
Recognition language for characters
During recognition, the text is separated into words, with one or several recognition languages corresponding to each word. One recognition language is assigned to each character in a word. This recognition language is represented by the BaseLanguage object and is accessible via the ITextLanguage::BaseLanguages property. The BaseLanguage object has the following properties:- Internal name. We recommend selecting a unique name for the internal language; it is already unique for the languages supplied with the ABBYY FineReader Engine distribution pack. Be sure to make the names of new languages unique.
- Letter sets. A letter set comprises letters that form the alphabet of the language, letters that form its extended alphabet (used in loan words), punctuation marks that go immediately before and after words, characters that are allowed inside words but are ignored by the internal spelling check system, and symbols allowed in subscript and superscript.
- Dictionary. A recognition language for a word may have a dictionary attached to it. See Working with Dictionaries.
Creating a compound recognition language
ABBYY FineReader Engine provides an easy way to create compound recognition languages made up of several predefined recognition languages. This is done via the LanguageDatabase object. For example, you may create a recognition language that includes both English and German words:- Create a LanguageDatabase object by calling the IEngine::CreateLanguageDatabase method.
- Call the ILanguageDatabase::CreateCompoundTextLanguage method with the parameter “English,German”.
- Use the received TextLanguage object for text recognition.
- Create a LanguageDatabase object by calling the IEngine::CreateLanguageDatabase method.
- Load the languages into the LanguageDatabase object using the ILanguageDatabase::LoadFrom method.
- Get the required language by its name as a TextLanguage object from the LanguageDatabase object.
- Use the received TextLanguage object for text recognition.
