Skip to main content

Comparing Documents

New “Compare Documents” ModuleFor quick verification of the document’s integrity, the new “Compare Documents” Module in ABBYY FineReader Engine enables detecting content differences in two versions of the same document.
Comparison of bilingual documentsThe new option of the “Compare Documents” Module provides the ability to automatically detect the bilingual nature of such a document and its complex layout and to compare each column (and thus each language version) separately.

Input of Office formats in Linux and Windows

Processing of Office documents

In addition to a broad set of image formats, FineReader Engine can now process input documents that are created in one of Office document formats:

  • Text documents: .doc, .docx, .rtf, .htm / .html, .txt, .odt
  • Tables: .xls, .xlsx, .ods
  • Presentations: .ppt, .pptx, .odp

Opening Office documents from memory

The new method for opening Microsoft Office and Apache OpenOffice files directly from memory allows increasing the speed of the document import step, which accelerates the overall document processing speed.

MRZ Capture

Data capture from a Machine-Readable Zone (MRZ)The new feature allows automatic data extraction from a machine-readable zone (MRZ) in ID documents and allows faster entering and verification of personal data during customer onboarding or verification processes.

Improved Japanese OCR

Leading recognition accuracyWith the new version of ABBYY Fine Reader Engine, Japanese OCR has seen some major improvements, bringing recognition accuracy to a new level previously unattainable for most solutions.

Improved Arabic OCR

End-to-end recognition for Arabic on poor imagesArabic OCR on low-quality images where general technology provides low confident results with a lot of errors.

Improved Korean OCR

Deep learning language model for KoreanA trained model for Korean language selects the best word recognition variant from recognition hypotheses or even generates new one based on a recognition context (preceding and following words).
New neural network-based OCR technologies

Improvements in OCR technologies

With the help of neural network approaches in OCR technologies, ABBYY FineReader Engine was enhanced with handwritten and handprinted Latin symbols processing:

  • Language model for consistent and accurate choice of word variants
  • End-to-end recognition for Latin scripts to process the multilingual documents

Machine learning barcode recognition technology

The neural network architecture introduces a new model of barcode recognition performing detection of the approximate region of a barcode, its classification, and obtaining the output represented as a region with the most likely type of barcode.

New recognition mode

The new Accurate mode allows you to get the maximum quality of the output document, assuming a reasonable slowdown in the recognition speed. This mode is best suited for low-quality or photo-generated invoices, contracts, receipts, and ID cards.

OCR quality improvements for text near stamps and signatures

Detecting text near stamps and signaturesWhenever an agreement contains stamps or signatures, the text nearby is recognized separately from them, thus improving the quality of the processed documents.

New licensing options

Online License usage as Network and StandaloneThe Developer’s Help for FineReader Engine 12 has been extended by additional information about different possibilities to license the SDK, describing the individual types of licensing options in an easy-to-understand comparison table.
Using grace periodsWith the new option, customers can use the ABBYY FineReader Engine license for some time after the expiration date, thereby enlarging the license validity period.

ICR and OMR technologies in Linux and macOS version

Handwritten text and checkmark recognitionWith ABBYY FineReader Engine 12, you may recognize handwritten, handprinted characters and the checkmarks of various types. ICR and OMR technologies are implemented to extract the data from the handwritten documents and develop new data extraction solutions.
Ability to run Engine in cloud environments
New deployment optionsNew licensing type allows deployment in Virtual and Cloud environments, allowing you to offer a broader spectrum of solutions. The licensing mechanism requires internet connection and supports proxy servers. <Note> Applicable to FineReader Engine for Linux and Windows . </Note>
.NET Core wrapper in FRE for Windows
New development frameworkTo increase the efficiency of development teams using containers and other native environments for the popular way of software development and deployment, ABBYY FineReader Engine now offers a pre-built .NET Core 6 wrapper.
New libraries in ABBYY FineReader Engine
NeoML library usageNeoML is an open-source end-to-end machine learning framework that allows you to build, train, and deploy Machine Learning models. This framework is used by engineers for computer vision and natural language processing tasks, including image preprocessing, classification, document layout analysis, OCR, and data extraction from structured and unstructured documents.
Embedded PDFium for processing PDFsPDFium is a cross-platform native library conforming to PDF standards and controlling all operations related to PDF, including processing, parsing, rendering, and obtaining the output.
Enhanced Document Classification
Document Classification using NLP and Machine LearningWith ABBYY FineReader Engine 12, incoming documents can be automatically sorted into different categories. Machine learning, OCR and natural language processing technologies are employed to train the image-based and text-based classifiers on representative documents. The received information is then used during classification step.
Text-based classifier: advanced security of training dataTo train and optimize the text-based classifier, documents representing each document category must be imported. In order to protect data contained in these documents, implemented hashing algorithms avoid the possibility to recover information from the sample documents.
Enhanced Classification Demo SampleABBYY FineReader Engine is able to process PDFs, scanned or photographed document images as well as documents in Office formats. To reflect this capability in the classification process, the provided pre-compiled Demo Sample for classification was enhanced and allows now to import Office documents in addition to PDFs and image formats.

Code sample for command-line interface (CLI)

Ready-to-use code sampleWith this code sample, developers can efficiently utilize ABBYY FineReader Engine libraries and integrate document processing capabilities in command-line-based applications.
Implementation of PDF meta-data extractor
Digitally-born PDF file processingAuxInfo is a supplementary object of PDFium providing meta-data information from a PDF file. ABBYY R&D PDFTools team implemented its own AuxInfo object working with PDFium.

Improved PDF processing

Improvements for PDF with “mixed”
contents

ABBYY FineReader Engine provides new capabilities for processing the PDF documents containing both image-only and digitally-born pages:

  • Adaptive recognition to improve and speed up PDF processing
  • Text layer quality classifier for preserve good one in the output format
  • Indication of digital signature presence in PDF
  • New content reuse mode for processing the document with mixed contents

Using additional content in PDF

To ensure more flexible forming the PDF contents, ABBYY FineReader Engine offers the new options:

  • Opening PDF Portfolios and processing their contents
  • Adding custom images to the output PDF and managing their positions
Additional language support
Farsi OCRABBYY FineReader Engine features updated and improved Farsi recognition options, opening up the door for more effective work with documents from Iran, Afghanistan and many other countries of the Middle East.
Georgian OCRThe Georgian language was added as new OCR language.
OCR for simple mathematical formulasExtracting characters of simple mathematical formulas allows better recognition of scientific documents containing simple single-line mathematical formulas inside the text.
Technical preview for Burmese OCRBurmese OCR was added as a technical preview to highlight future capabilities.
Special languages for Arabic and Japanese dates captureFineReader Engine supports special languages for field recognition in FineReader Engine for Windows. The new version adds improved date recognition in Arabic and Japanese.
Technical preview for Bangla OCRBangla OCR was added for a technical preview to demonstrate potential functionality.

Improved document layout recreation

Improved table reconstructionWith ABBYY FineReader Engine 12, extracted tables from documents keep their formatting better than ever.
Detection and recreation of balanced columnsWhenever a document contains balanced columns of text (e.g., contracts, scientific papers, articles, etc.), now the initial structure stays intact, thus simplifying document processing.
New “single-column” document modelThe main improvements of the new algorithm are in the detection and analysis of tables and charts.
Enhanced table structure analysisWith the improved mechanism of document conversion, ABBYY FineReader Engine can detect tables with columns of numbers in the “Accounting” format.

Internal process optimization for faster processing

New scheme of the ILayout object iterationA new scheme that speeds up the iteration of the ILayout object obtained after processing the document outside the main process. <Note> Applicable to FineReader Engine for Linux and Windows . </Note>

New scanning options in FRE for Windows

More scanning capabilities

ABBYY FineReader Engine 12 has lots of device-based scanning features:

  • automatic deletion of blank pages from the document
  • automatic page crop
  • automatic skew correction
  • automatic detection of colority
Online documentation
Documentation available onlineIn addition to the built-in documentation, you can now use the online version providing “just in time” information about the features and capabilities of ABBYY FineReader Engine.

Latest .NET Framework versions in FRE for Windows

.Net COM Interop wrappers support

The distributive now includes .Net COM Interop wrappers for the following .Net Framework versions:

  • 3.5 SP1
  • 4.6
  • 4.7
  • 4.8
New export formats
JSONJSON (JavaScript Object Notation) is as an open-standard, language-independent file format to transmit data objects consisting of attribute–value pairs and array data types. FineReader Engine now supports exporting OCR results in JSON format.
New ALTO versionsALTO (Analyzed Layout and Text Object) is an XML Schema that details technical metadata to describe the layout and content of physical text resources, such as the pages of a book or newspaper. The latest versions of this schema (4.0, 4.1, 4.2) are supported in FineReader Engine 12.
PDF/A-2b and PDF/A-3bPDF/A is an ISO-standardized version of the Portable Document Format (PDF), specialized for use in archiving and the long-term preservation of electronic documents. Now, FineReader Engine supports all PDF/A conformance levels.

Full functionality