Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt

Use this file to discover all available pages before exploring further.

To split a stream of pages from a multi-document file into separate documents ready for processing, build a Document Splitter skill. Document Splitter skills can only be created in Advanced Designer. The scenarios below combine Classify, Extraction Rules, and Splitter Script activities to find document boundaries.

Pick a scenario

ScenarioWhen to useKey activities
Same-type documents in one streamMultiple invoices in one fileExtraction Rules + Splitter Script (+ Classify)
Separate documents and store annexesDocuments have explanatory annex pages to retainClassify + Splitter Script (+ Extraction Rules)
Separate documents and determine their typeStream contains documents of different types (e.g., loan application packet)Classify + Extraction Rules + Splitter Script
Reorder and clean up pagesPages arrive out of order, with blank or garbage pagesExtraction Rules + Splitter Script

Same-type documents in one stream

Use this scenario when a file contains multiple documents of the same type — for example, a stack of invoices from one vendor for a billing period. Each invoice has its own number and may carry page numbers; use that data to find boundaries.

Separate documents and store annexes

Use this scenario when documents are accompanied by explanatory pages (annexes) that must be retained but not extracted from.

Separate documents and determine their type

Use this scenario when the stream contains documents of different types — for example, a loan application packet with identity documents, income statements, bank statements, and utility bills.

Reorder and clean up pages

Use this scenario when pages arrive out of order or include blank or garbage pages from a haphazard scan. Reordering is only possible if the pages carry an ordering signal — for example, printed page numbers.
  • Add a field to extract page numbers (or any ordering signal).
  • Add a field that detects whether the page contains any text — pages with none can be treated as blank or garbage.
  • Use the Splitter Script activity to reorder the pages and route blank/garbage pages into a separate output document.

Build the Document Splitter skill

Document Splitter skill processing flow with Classify, Extraction Rules, and Splitter Script activities
1

Create a Document Splitter skill

Open Advanced Designer and click Create Splitter Skill on the start page.
2

Upload documents

On the Documents tab, upload your files. Each document set should contain the files for a single business transaction. The source files are converted into separate pages — every activity except the Splitter Script activity processes each page individually.
3

Add classification and extraction activities

Configure the processing flow to extract the data needed to identify document boundaries and types. Add a Classify activity when the stream contains multiple document types, or when first pages differ visually from the rest. Add fields and other activities as needed to capture data that helps separate documents of the same type or label document classes.
4

Configure the Splitter Script activity

Add document types on the Splitter Script Properties pane, then write the script that turns the flow of pages into a set of documents. The script has access to every page in the transaction and can read data produced by other activities to decide which pages start a new document.
5

Test and publish

Click Test Skill Using Selected Documents to evaluate the results. When the results are good enough, publish the skill.

Splitter Script activity

Define document boundaries by analyzing the data extracted from each page.

Classify activities

Label each page with its document type or first-page status.

Extraction Rules activity

Extract identifiers like page numbers, invoice numbers, or document keywords.

Document Splitter skills

Reference for Document Splitter skill structure, settings, and publishing.