Skip to main content
When working with paper documents, you need to find and correct the mistakes or intentionally made changes. Use Document Comparison API to search for these changes quickly and efficiently. This scenario is used to compare the documents of special importance, such as contracts and bank documentation, with their copies. The comparison result contains the information about differences in the type of content (text only), kind of modification (deleted, inserted, or modified), and their locations in the original and the copy. You may get the list of the detected differences or the region of any change and save the comparison result to an external file for further processing or long-term storage. To compare the documents or pages, the files obtained by scanning or saved in the electronic format typically go through several processing stages, each of which has its own peculiarities:
  1. Preprocessing of scanned files or images
The files and their copies require some preprocessing prior to recognition, if they include some defects or purposely made notations, such as signatures or stamps.
  1. Recognition with full restoration of document structure and formatting
When recognizing a document, various layout elements (text, tables, images, separators, etc.) of the document are identified. In the course of the document synthesis, the logical structure of the document is restored, while the page synthesis enables one to fully restore the document formatting (fonts, styles, etc.)
  1. Documents or pages comparison
To compare the documents or pages with their copies, use the files that were recognized using ABBYY FineReader Engine. You may use the two versions of a document across different formats. After comparison, you got the result with the list of the changes, use it to retrieve the information about the location of the changes. If you are using manual verification, use this information to highlight the changes in the text, making the operator’s job easier.
  1. Export to an external format
You may also save the comparison result in XML and DOCX format. The procedure described below is also illustrated by the Document Comparison sample for Linux and macOS, and the Windows Document Comparison demo tool.

Scenario implementation

The code samples provided in this topic are Windows -specific.
Below follows a detailed description of the recommended method of using ABBYY FineReader Engine in this scenario.
To start your work with ABBYY FineReader Engine, you need to create the Engine object. The Engine object is the top object in the hierarchy of the ABBYY FineReader Engine objects and provides various global settings, some processing methods, and methods for creating the other objects.To create the Engine object, you can use the InitializeEngine function. See also other ways to load Engine object (Win).

C#

public class EngineLoader : IDisposable
{
    public EngineLoader()
    {
        // Initialize these variables with the full path to FREngine.dll, your Customer Project ID,
        // and, if applicable, the path to your Online License token file and the Online License password
        string enginePath = "";
        string customerProjectId = "";
        string licensePath = "";
        string licensePassword = "";
        // Load the FREngine.dll library
        dllHandle = LoadLibraryEx(enginePath, IntPtr.Zero, LOAD_WITH_ALTERED_SEARCH_PATH);
           
        try
        {
            if (dllHandle == IntPtr.Zero)
            {
                throw new Exception("Can't load " + enginePath);
            }
            IntPtr initializeEnginePtr = GetProcAddress(dllHandle, "InitializeEngine");
            if (initializeEnginePtr == IntPtr.Zero)
            {
                throw new Exception("Can't find InitializeEngine function");
            }
            IntPtr deinitializeEnginePtr = GetProcAddress(dllHandle, "DeinitializeEngine");
            if (deinitializeEnginePtr == IntPtr.Zero)
            {
                throw new Exception("Can't find DeinitializeEngine function");
            }
            IntPtr dllCanUnloadNowPtr = GetProcAddress(dllHandle, "DllCanUnloadNow");
            if (dllCanUnloadNowPtr == IntPtr.Zero)
            {
                throw new Exception("Can't find DllCanUnloadNow function");
            }
            // Convert pointers to delegates
            initializeEngine = (InitializeEngine)Marshal.GetDelegateForFunctionPointer(
                initializeEnginePtr, typeof(InitializeEngine));
            deinitializeEngine = (DeinitializeEngine)Marshal.GetDelegateForFunctionPointer(
                deinitializeEnginePtr, typeof(DeinitializeEngine));
            dllCanUnloadNow = (DllCanUnloadNow)Marshal.GetDelegateForFunctionPointer(
                dllCanUnloadNowPtr, typeof(DllCanUnloadNow));
            // Call the InitializeEngine function 
            // passing the path to the Online License file and the Online License password
            int hresult = initializeEngine(customerProjectId, licensePath, licensePassword, 
                "", "", false, ref engine);
            Marshal.ThrowExceptionForHR(hresult);
        }
        catch (Exception)
        {
            // Free the FREngine.dll library
            engine = null;
            // Deleting all objects before FreeLibrary call
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();
            FreeLibrary(dllHandle);
            dllHandle = IntPtr.Zero;
            initializeEngine = null;
            deinitializeEngine = null;
            dllCanUnloadNow = null;
            throw;
        }
    }
    // Kernel32.dll functions
    [DllImport("kernel32.dll")]
    private static extern IntPtr LoadLibraryEx(string dllToLoad, IntPtr reserved, uint flags);
    private const uint LOAD_WITH_ALTERED_SEARCH_PATH = 0x00000008;
    [DllImport("kernel32.dll")]
    private static extern IntPtr GetProcAddress(IntPtr hModule, string procedureName);
    [DllImport("kernel32.dll")]
    private static extern bool FreeLibrary(IntPtr hModule);
    // FREngine.dll functions
    [UnmanagedFunctionPointer(CallingConvention.StdCall, CharSet = CharSet.Unicode)]
    private delegate int InitializeEngine(string customerProjectId, string licensePath, 
        string licensePassword, string tempFolder, string dataFolder, bool isSharedCPUCoresMode, 
        ref FREngine.IEngine engine);
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int DeinitializeEngine();
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int DllCanUnloadNow();
    // private variables
    private FREngine.IEngine engine = null;
    // Handle to FREngine.dll
    private IntPtr dllHandle = IntPtr.Zero;
    private InitializeEngine initializeEngine = null;
    private DeinitializeEngine deinitializeEngine = null;
    private DllCanUnloadNow dllCanUnloadNow = null;
}
ABBYY FineReader Engine provides the FRDocument object which allows processing multi-page documents. Using this object allows you to preserve the logical organization of the document, retaining the original text and columns, fonts, styles, etc. Use the FRPage object if you want to compare pages.To load images of a single document and preprocess them, you should create the FRDocument object and add images to it. You may do one of the following:

C#

// Create the FRDocument object from an image file
FREngine.IFRDocument frDocument = engine.CreateFRDocumentFromImage( "C:\\MyImage.tif", null );
To recognize a document, we suggest that you use the analysis and recognition methods of the FRDocument object. This object provides a whole array of methods for document analysis, recognition, and synthesis. The most convenient method which provides document analysis, recognition, and synthesis all-in-one is the Process method. It also uses simultaneous processing features of multiprocessor and multicore systems in the most efficient manner. However, you can also perform consecutive preprocessing, analysis, recognition, and synthesis using Preprocess, Analyze, Recognize, and Synthesize methods.
You may set the recognition parameters for your documents by loading a suitable predefined profile (see Working with Profiles for more information).

C#

// Process the document with default parameters
// You may change them if necessary, for example, by loading a profile beforehand
frDocument.Process( null );
To compare the documents or pages with their copies:
  1. Make sure your ABBYY FineReader Engine license supports the Compare Documents module.
  2. Create a Comparator object with the help of the CreateComparator method of the Engine object.
  3. [optional] Use the ComparisonParams object to set up the properties to the values you need.
  4. Call the CompareDocuments method of the Comparator object to compare the original document with the copy. You will receive a ComparisonResult object containing the information about the detected changes.

C#

// Perform the documents comparison 
FREngine.IComparator comparator = engine.CreateComparator();
FREngine.IComparisonResult comparatorResult = 
    comparator.CompareDocuments( referenceFRDocument, userFRDocument, null, null );
The ComparisonResult object contains the full list of the differences and provides the methods to get the differences for individual pages. You may access the changes from the original document and its copy with the GetChangesForReferencePage and GetChangesForUserPage methods. Use the ChangeLocation object to get the information about the location of the change and its RegionForPage property to get the region of the change on the specified page.

C#

// Get the information about detected modification and its location in the original document
FREngine.IChanges changes = comparatorResult.Changes;
foreach( FREngine.IChange change in changes ) {
      FREngine.ModificationTypeEnum modificationType = change.ModificationType;
      FREngine.IChangeLocation referenceLocation = change.ReferenceLocation;
      // Now you can highlight these changes on the page for your human operator to check
      ... 
}
To export the comparison result, call the Export method of the ComparisonResult object and pass the path to the file as an input parameter. The data may be saved to an XML or a DOCX with a Track Changes file.C#
// Save to XML format
comparisonResult.Export( "C:\\ComparisonResult.xml", FREngine.ComparatorExportFormatEnum.CEF_Xml, null );
After finishing your work with ABBYY FineReader Engine, you need to unload the Engine object. To do this, use the DeinitializeEngine exported function.

C#

public class EngineLoader : IDisposable
{
    // Unload FineReader Engine
    public void Dispose()
    {
        if (engine == null)
        {
            // Engine was not loaded
            return;
        }
        engine = null;
        // Deleting all objects before FreeLibrary call
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        int hresult = deinitializeEngine();
 
        hresult = dllCanUnloadNow();
        if (hresult == 0)
        {
            FreeLibrary(dllHandle);
        }
        dllHandle = IntPtr.Zero;
        initializeEngine = null;
        deinitializeEngine = null;
        dllCanUnloadNow = null;
        // throwing exception after cleaning up
        Marshal.ThrowExceptionForHR(hresult);
    }
    // Kernel32.dll functions
    [DllImport("kernel32.dll")]
    private static extern IntPtr LoadLibraryEx(string dllToLoad, IntPtr reserved, uint flags);
    private const uint LOAD_WITH_ALTERED_SEARCH_PATH = 0x00000008;
    [DllImport("kernel32.dll")]
    private static extern IntPtr GetProcAddress(IntPtr hModule, string procedureName);
    [DllImport("kernel32.dll")]
    private static extern bool FreeLibrary(IntPtr hModule);
    // FREngine.dll functions
    [UnmanagedFunctionPointer(CallingConvention.StdCall, CharSet = CharSet.Unicode)]
    private delegate int InitializeEngine( string customerProjectId, string LicensePath, string LicensePassword, , , , ref FREngine.IEngine engine);
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int DeinitializeEngine();
    [UnmanagedFunctionPointer(CallingConvention.StdCall)]
    private delegate int DllCanUnloadNow();
    // private variables
    private FREngine.IEngine engine = null;
    // Handle to FREngine.dll
    private IntPtr dllHandle = IntPtr.Zero;
    private InitializeEngine initializeEngine = null;
    private DeinitializeEngine deinitializeEngine = null;
    private DllCanUnloadNow dllCanUnloadNow = null;
}

Required resources

You can use the FREngineDistribution.csv file to automatically create a list of files required for your application to function. For processing with this scenario, select in the column 5 (RequiredByModule) the following values: Core Core.Resources Opening Opening, Processing Processing Processing.OCR Processing.OCR, Processing.ICR Processing.OCR.NaturalLanguages Processing.OCR.NaturalLanguages, Processing.ICR.NaturalLanguages Export Export, Processing If you modify the standard scenario, change the required modules accordingly. You also need to specify the interface languages, recognition languages and any additional features which your application uses (such as, e.g., Opening.PDF if you need to open PDF files, or Processing.OCR.CJK if you need to recognize texts in CJK languages). See Working with the FREngineDistribution.csv File for further details.

Additional optimization for specific tasks

Below is the overview of the Help topics containing additional information regarding customization of settings at different processing stages:
  • Scanning - Windows Only
    • Scanning
      Description of the ABBYY FineReader Engine scenario for document scanning.
  • Recognition
    • Tuning Parameters of Preprocessing, Analysis, Recognition, and Synthesis
      Customization of document processing using objects of analysis, recognition, and synthesis parameters.
    • PageProcessingParams Object
      This object enables customization of analysis and recognition parameters. Using this object, you can indicate which image and text characteristics must be detected (inverted image, orientation, barcodes, recognition language, recognition error margin).
    • SynthesisParamsForPage Object
      This object includes parameters responsible for restoration of a page formatting during synthesis.
    • SynthesisParamsForDocument Object
      This object enables customization of the document synthesis: restoration of its structure and formatting.
    • MultiProcessingParams Object - Implemented for Linux and Windows
      Simultaneous processing may be useful when processing a large number of images. In this case, the processing load will be spread over the processor cores during image opening and preprocessing, layout analysis, recognition, and export, which makes it possible to speed up processing.
      Reading modes (simultaneous or consecutive) are set using the MultiProcessingMode property. The RecognitionProcessesCount property controls the number of processes that may be started.

See also

Basic Usage Scenarios Implementation