- You are working in regulated industries where reproducibility and auditability of extraction results are required.
- Your document set contains PDFs with low-quality or unreliable embedded text layers that would produce better results with OCR.
- You are migrating from ABBYY FlexiCapture and need to replicate the processing behavior of your existing workflows.
- You need consistent, predictable processing behavior across all documents regardless of their content.
Available Modes
| Mode | Description | When to use |
|---|---|---|
| Default (Recommended) | Uses the embedded PDF text layer when available and supplements it with OCR as needed. This is the standard Vantage processing behavior. | General use. Recommended for most document sets with a mix of text-layer and image-only PDFs. |
| Use Text Layer Only | Extracts text exclusively from the embedded PDF text layer. If no text layer exists, Vantage falls back to OCR automatically. | Use when you have high-quality, trusted text layers and want faster extraction without full OCR. Useful for regulated environments where the existing text layer is the authoritative source. |
| Use OCR Only | Ignores any embedded PDF text layer and performs full OCR on every page of the document. | Use when PDF text layers are known to be unreliable or corrupt, or when you need consistent OCR-based extraction across all documents regardless of their structure. |
Example Scenarios
The following examples show typical situations where each mode is the best choice.Use Text Layer Only
Use Text Layer Only
Your organization processes digitally-born PDF invoices exported from a vendor’s ERP system. The embedded text layer is accurate and machine-generated. Using Use Text Layer Only delivers fast, reliable extraction without running unnecessary OCR.
Default (Recommended)
Default (Recommended)
You process a high-volume mix of scanned paper documents and digitally-born PDFs in the same workflow. Some files have clean text layers; others don’t. Default (Recommended) handles both automatically without any per-document configuration.
Use OCR Only
Use OCR Only
Your document set consists of PDFs produced by a legacy scanning system that embeds a low-quality text layer during scanning. That embedded layer contains recognition errors that degrade field extraction. Use OCR Only bypasses it entirely and extracts clean text directly from the page image.
Regulated Environments
Regulated Environments
You work in a regulated industry (such as financial services or healthcare) where extraction results must be fully reproducible and auditable. Locking the mode to either Use Text Layer Only or Use OCR Only ensures the same processing path is always used, regardless of how documents arrive.
Where to Configure
The PDF Processing Mode setting is available in the following locations:- OCR Skill settings — General tab, under Image Processing
- OCR Activity settings within a Process Skill — General tab, under Image Processing
