Separating Documents of the Same Type
Suppose you have a file that contains multiple documents of the same type (for example, a collection of invoices from one vendor for a certain period). Each invoice will have its own number and may have page numbers printed on it. This and other data can be used to separate documents from one another. You can use an Extraction Rules activity to set up extraction of invoice numbers and page numbers. You can also use a Classify activity if the first page of a document differs significantly from the other pages. Then you can use the Splitter Script activity to analyze the extracted values and determine if the current page is the first page of a new document.Separating Documents and Removing Annexes
Suppose the documents to be processed are accompanied by explanatory documents which should be stored but from which no data should be extracted. In this case, you can use a Classify activity to classify the pages into documents of the required type and their annexes. You can also use an Extraction Rules activity to see if any valuable data can be found on a page. A page without any valuable data is probably an annex page. Then you can use the Splitter Script activity to append the annex pages to each document or place them into separate documents.Separating Documents and Determining Their Type
Suppose you have a file that contains multiple documents of different types (for example, a loan application accompanied by identity documents, income statements, bank statements, utility bills, and other documents). In this case, you can use a Classify activity to classify each page, and an Extraction Rules activity to extract data necessary for determining whether the current page is the first page of a new document. Then you can use the Splitter Script activity to set up rules for separating documents and determining their type.Re-ordering Pages and Removing Empty Pages
Suppose you have to re-order pages or remove blank or garbage pages resulting from haphazard scanning. Apparently, re-ordering is only possible if the pages contain some data which indicates the correct order (page numbers, for example). In this case, you can create a field which will extract page numbers. You can also create a field to look for any text on a page to further discard blank pages as garbage. Using the Splitter Script activity, you can re-order pages according to their numbers and create a separate document that will contain all blank or garbage pages.
Steps to Create a Document Splitter Skill
- Open ABBYY Vantage Advanced Designer and create a new Document Splitter skill by clicking Create Splitter Skill on the start page.
- On the Documents tab, upload your files. Each document set should contain files of a single business transaction. The set of source files will be converted into separate pages. Note that all activities except the Splitter Script activity will process each page separately.
- Configure the document processing flow to extract data that will help determine the document type of each page in the transaction and find where one document ends and another document starts. a. Set up a Classify activity to classify pages if the flow of source pages contains several types of documents or if the first page of each document significantly differs from the other pages. b. If necessary, label fields or add other activities to extract data that can be used to separate documents of the same type or determine a document’s class.
- Set up the Splitter Script activity by adding document types on the Splitter Script Properties pane and configuring the script that will convert the flow of pages into a set of documents. The script has access to all the pages of a transaction and can analyze data from the other activities to determine which pages are the first pages of new documents.
- Test your skill by clicking Test Skill Using Selected Documents and analyze the results you obtain.
- When you are satisfied with the results, publish your skill.
