Iterating Document Pages

When you work with analysis, recognition, or synthesis results, you may need to iterate document pages and blocks on them. In this case, you should take into account that when you request a document element (e.g., a page or a block) for viewing or editing, the data is loaded into memory and stays there until it is unloaded explicitly. This may lead to an “out of memory” error. To prevent it, follow the recommendations below.

When you use the processing methods of the FRDocument object, layouts and image documents in Linux and macOS will be unloaded to disk automatically. In Windows, layouts and image documents can be saved to disk automatically; however, this behavior is based on the selected value of the PageFlushingPolicy property. But if you request a page of a document and its blocks for viewing or editing, the data is not unloaded or saved automatically. If you are viewing or editing a number of pages, you should unload the data after you have finished working with each page. It means that you should call the Flush method of the FRPage object.

Windows users working with .NET should also call GC.Collect and GC.WaitForPendingFinalizers.

On the other hand, if you make some changes while iterating through pages, you need to call the Flush method to save these changes. Otherwise, if you call, for example, the SaveToFolder or one of the export methods of the corresponding FRDocument, your changes will be saved to the folder (or the exported document) but will be cleared from the object you are working with. In Windows, the behavior of the Flush method depends on the value of the IFRDocument::PageFlushingPolicy property. If the value of PageFlushingPolicy is PFP_FlushToDisk, a call to the method unloads layout and image documents to disk. If it is PFP_Auto, a call to the method unloads the data to disk, if there are more than 30 pages in the document; otherwise, the data is kept in memory. If it is PFP_KeepInMemory, the data is not unloaded. This is convenient when you process both small and large documents in one solution, as you can use the same code for processing.

Windows Samples

C# code

FREngine.IFRPages pages = document.Pages;
int pagesCount = pages.Count;
for (int i = 0; i < pagesCount; i++)
{
 FREngine.IFRPage page = pages.Item(i);
 // Iterate pages – work with blocks, paragraphs, characters
 iteratePage(page);
 // Force Garbage Collector to free all unreferenced objects from page
 GC.Collect();
 GC.WaitForPendingFinalizers();
 // Unload unused page data
 // If the page was changed in iteratePage(), call Flush(true) to keep changes
 page.Flush(true);
 // If there is no need to keep changes, call Flush(false)
 // page.Flush(false);
}

C++ (COM) code

FREngine::IFRPagesPtr pages = document->Pages;
int pagesCount = pages->Count;
for( int i = 0; i < pagesCount; i++ ) {
 FREngine::IFRPagePtr page = pages->Item( i );
 // Iterate pages – work with blocks, paragraphs, characters
 iteratePage( page );
 // Unload unused page data
 // If the page was changed in iteratePage(), call Flush( true ) to keep changes
 page->Flush( true );
 // If there is no need to keep changes, call Flush(false)
 // page->Flush( false );
}

Documentation Index

​Iterating document pages

​Windows Samples

Iterating document pages

Windows Samples