Splitting a Flow of Pages into Separate Documents

To split a flow of pages from files containing multiple documents into separate documents that are ready for further processing, create a Document Splitter skill.

Separating Documents of the Same Type

Suppose you have a file that contains multiple documents of the same type (for example, a collection of invoices from one vendor for a certain period). Each invoice will have its own number and may have page numbers printed on it. This and other data can be used to separate documents from one another. You can use an Extraction Rules activity to set up extraction of invoice numbers and page numbers. You can also use a Classify activity if the first page of a document differs significantly from the other pages. Then you can use the Splitter Script activity to analyze the extracted values and determine if the current page is the first page of a new document.

Separating Documents and Removing Annexes

Suppose the documents to be processed are accompanied by explanatory documents which should be stored but from which no data should be extracted. In this case, you can use a Classify activity to classify the pages into documents of the required type and their annexes. You can also use an Extraction Rules activity to see if any valuable data can be found on a page. A page without any valuable data is probably an annex page. Then you can use the Splitter Script activity to append the annex pages to each document or place them into separate documents.

Separating Documents and Determining Their Type

Suppose you have a file that contains multiple documents of different types (for example, a loan application accompanied by identity documents, income statements, bank statements, utility bills, and other documents). In this case, you can use a Classify activity to classify each page, and an Extraction Rules activity to extract data necessary for determining whether the current page is the first page of a new document. Then you can use the Splitter Script activity to set up rules for separating documents and determining their type.

Re-ordering Pages and Removing Empty Pages

Suppose you have to re-order pages or remove blank or garbage pages resulting from haphazard scanning. Apparently, re-ordering is only possible if the pages contain some data which indicates the correct order (page numbers, for example). In this case, you can create a field which will extract page numbers. You can also create a field to look for any text on a page to further discard blank pages as garbage. Using the Splitter Script activity, you can re-order pages according to their numbers and create a separate document that will contain all blank or garbage pages.

Steps to Create a Document Splitter Skill

Open ABBYY Vantage Advanced Designer and create a new Document Splitter skill by clicking Create Splitter Skill on the start page.
On the Documents tab, upload your files. Each document set should contain files of a single business transaction. The set of source files will be converted into separate pages. Note that all activities except the Splitter Script activity will process each page separately.
Configure the document processing flow to extract data that will help determine the document type of each page in the transaction and find where one document ends and another document starts. a. Set up a Classify activity to classify pages if the flow of source pages contains several types of documents or if the first page of each document significantly differs from the other pages. b. If necessary, label fields or add other activities to extract data that can be used to separate documents of the same type or determine a document’s class.
Set up the Splitter Script activity by adding document types on the Splitter Script Properties pane and configuring the script that will convert the flow of pages into a set of documents. The script has access to all the pages of a transaction and can analyze data from the other activities to determine which pages are the first pages of new documents.
Test your skill by clicking Test Skill Using Selected Documents and analyze the results you obtain.
When you are satisfied with the results, publish your skill.

Introduction

Quickstart

Skill Catalog

Skill Designer

Advanced Designer

Runtime Guide

Tenant Admin Guide

Scanning Station Guide

Developer Guide

Release Notes

Splitting a Flow of Pages into Separate Documents

Separating Documents of the Same Type

Separating Documents and Removing Annexes

Separating Documents and Determining Their Type

Re-ordering Pages and Removing Empty Pages

Steps to Create a Document Splitter Skill

Introduction

Quickstart

Skill Catalog

Skill Designer

Advanced Designer

Runtime Guide

Tenant Admin Guide

Scanning Station Guide

Developer Guide

Release Notes

​Separating Documents of the Same Type

​Separating Documents and Removing Annexes

​Separating Documents and Determining Their Type

​Re-ordering Pages and Removing Empty Pages

​Steps to Create a Document Splitter Skill

Separating Documents of the Same Type

Separating Documents and Removing Annexes

Separating Documents and Determining Their Type

Re-ordering Pages and Removing Empty Pages

Steps to Create a Document Splitter Skill