Chinese, Japanese, and Korean languages are often grouped together under the abbreviation “CJK”. They have several features in common, such as the use of Chinese characters and of vertical as well as horizontal writing direction. This section deals with certain peculiarities of recognizing and exporting texts in CJK languages with ABBYY FineReader Engine 12. First, in order to recognize CJK languages, you must have an ABBYY FineReader Engine license that supports the Chinese, Japanese, and Korean language modules. For more information about licenses and modules, see the Licensing section.Documentation Index
Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
Use this file to discover all available pages before exploring further.
Recognition languages
ABBYY FineReader Engine supports the following predefined recognition languages for CJK texts: ChinesePRC and ChineseTaiwan are still accepted to allow backwards compatibility but new implementations should use ChineseSimplified and ChineseTraditional.
- “ChineseSimplified”
- “ChineseTraditional”
- “Japanese”
- “JapaneseModern”
- “Korean”
- “KoreanHangul”
Fonts
To prevent garbling of Asian characters, you must specify for document synthesis a font that includes the necessary set of characters, e.g., Arial Unicode MS, SimSun. You can set the font with the help of the ISynthesisParamsForDocument::FontSet property. The SystemFontSet property of the FontSet object is set by default to selecting those of the system fonts which correspond to the recognition languages of the document.Export
You can export CJK languages to PDF/A in “text under the image” mode (IPDFExportParams::TextExportMode = PEM_ImageOnText) to ensure that the document looks the same.The procedure of recognition and export
To process documents written in CJK languages, do the following:- Create a DocumentProcessingParams object using the CreateDocumentProcessingParams method of the Engine object.
- Specify the recognition language. Use the SetPredefinedTextLanguage method of the RecognizerParams subobject of the PageProcessingParams subobject.
- Select the font set suitable for CJK languages. Use the ISynthesisParamsForDocument::FontSet property of the SynthesisParamsForDocument subobject.
- Pass the configured DocumentProcessingParams object to the Process method of the FRDocument object. If you use methods of the Engine object, you should call one of the synthesis methods of the Engine object with the configured SynthesisParamsForDocument object as a parameter before export.
- Perform export of the recognized text with the help of the Export method of the FRDocument object. If you export to PDF of PDF/A format, specify the required export mode.
Do not use the Word object and its properties or the IsWordFirst , IsWordLeftmost properties of the CharParams object for the texts written in CJK languages. The processing technology divides the text lines into “words” only for internal purposes, and those groups of symbols do not coincide with the actual words.
C++ code
C++ code
C++ (COM) code
C++ (COM) code
C# code
C# code
