Skip to main content
When ABBYY Vantage processes a PDF document, it needs to decide how to extract the text. PDFs can contain an embedded text layer (searchable text written directly into the file) or they may be image-only files that require optical character recognition (OCR) to extract text. The PDF Processing Mode setting gives you explicit control over which method Vantage uses. This is especially useful when:
  • You are working in regulated industries where reproducibility and auditability of extraction results are required.
  • Your document set contains PDFs with low-quality or unreliable embedded text layers that would produce better results with OCR.
  • You are migrating from ABBYY FlexiCapture and need to replicate the processing behavior of your existing workflows.
  • You need consistent, predictable processing behavior across all documents regardless of their content.

Available Modes

ModeDescriptionWhen to use
Default (Recommended)Uses the embedded PDF text layer when available and supplements it with OCR as needed. This is the standard Vantage processing behavior.General use. Recommended for most document sets with a mix of text-layer and image-only PDFs.
Use Text Layer OnlyExtracts text exclusively from the embedded PDF text layer. If no text layer exists, Vantage falls back to OCR automatically.Use when you have high-quality, trusted text layers and want faster extraction without full OCR. Useful for regulated environments where the existing text layer is the authoritative source.
Use OCR OnlyIgnores any embedded PDF text layer and performs full OCR on every page of the document.Use when PDF text layers are known to be unreliable or corrupt, or when you need consistent OCR-based extraction across all documents regardless of their structure.

Example Scenarios

The following examples show typical situations where each mode is the best choice.
Your organization processes digitally-born PDF invoices exported from a vendor’s ERP system. The embedded text layer is accurate and machine-generated. Using Use Text Layer Only delivers fast, reliable extraction without running unnecessary OCR.
Your document set consists of PDFs produced by a legacy scanning system that embeds a low-quality text layer during scanning. That embedded layer contains recognition errors that degrade field extraction. Use OCR Only bypasses it entirely and extracts clean text directly from the page image.
You work in a regulated industry (such as financial services or healthcare) where extraction results must be fully reproducible and auditable. Locking the mode to either Use Text Layer Only or Use OCR Only ensures the same processing path is always used, regardless of how documents arrive.

Where to Configure

The PDF Processing Mode setting is available in the following locations:
  • OCR Skill settings — General tab, under Image Processing
  • OCR Activity settings within a Process Skill — General tab, under Image Processing

Supported Technology Core Versions

PDF Processing Mode is supported for skills using Technology Core 3. It is not available for earlier Technology Core versions.