Skip to main content
If you create a Document Splitter skill, the document processing flow will end with a Splitter Script activity. This final step serves for splitting the flow of pages in a transaction into a set of documents. Unlike other activities, this activity processes all pages in a transaction at once.

Setting Up the Activity

To set up the activity, do the following:
  1. Add possible document types by clicking the plus icon on the Splitter Script Properties pane and creating classes one by one.
  2. Click Script Editor on the Splitter Script Properties pane.
  3. Set up a script that will determine how pages are assembled into documents. The script has access to all pages of a transaction. The common scenario is to iterate through pages checking each time if a page starts a new document. If it doesn’t, it is appended to the previous document. For a detailed description of objects that can be used in your script, see Object model.
  4. Click Save.

Sample Scripts

In this section you will find sample scripts that correspond to a variety of Document Splitter skill use cases. All the following script examples assume that a field called “ResultClassId” exists in the document definition. This field should contain the document class.

Separating Documents of the Same Type

The input file contains invoices from one vendor for a certain period. The first page of each invoice contains the invoice number. The first page of a document may contain specific data, e.g. a title or an invoice number. To determine that the current page is the first page of a new document, you need to check if the corresponding field was found on the page. You can also analyze the field values on the consecutive pages (e.g. page numbers or invoice numbers). In the following example, we compare the invoice numbers found on two consecutive pages and check if the “FirstPageMarker” field was found on the current page. If the field was found or if the invoice number differs, the current page is considered to be the first page of a new document.
var documents = [];
var currentDocument = null;
var currentInvoiceNumber = "";

for (let i = 0; i < Context.Pages.length; i++)
{
    const page = Context.Pages[i];
    let invoiceNumberField = page.GetField("InvoiceNumber");
    let invoiceNumberNormalized = invoiceNumberField.Text.replace(/[. -]/g, '');
    let firstPageMarker = page.GetField("FirstPageMarker");
    var hasInvoiceNumber = invoiceNumberField !== null &&  invoiceNumberNormalized !== "";
    var hasNewInvoiceNumber = hasInvoiceNumber && invoiceNumberNormalized !== currentInvoiceNumber;
    var hasFirstPageMarker = firstPageMarker !== null && firstPageMarker.Text;

    if (!currentDocument || hasNewInvoiceNumber || (!hasInvoiceNumber && hasFirstPageMarker))
    { 
        currentDocument = new Document('invoice');
        documents.push(currentDocument);
    }

    if (hasInvoiceNumber)
    {
        currentInvoiceNumber =  invoiceNumberNormalized;
    }
 
    currentDocument.Pages.push(page);
}

return documents;

Separating Documents and Removing Annexes

The document contains an annex or empty pages which should be stored without extracting any data from them. To determine whether the document has an annex or empty pages, you need to check if there are any pages on which no valuable data can be found. For instance, add a field that looks for any word and consider all pages where no words could be found to be blank. In the following example, we separate empty pages from non-empty ones based on the text of a field.
var empty = new Document('empty'); 
var invoice = new Document('invoice'); 

for (let i = 0; i < Context.Pages.length; i++) { 
    const page = Context.Pages[i]; 

    // Get the text value of the "Field" property of the current page 
    var currentResult = page.GetField("Field").Text; 

    // Check if the current result has a length greater than 0 
    if (currentResult.length > 0) { 
        invoice.Pages.push(page); 
    } else { 
        empty.Pages.push(page); 
    } 
}

return [invoice, empty];

Separating Documents and Determining Their Type

Case 1. Each Document Type is Represented by a Single Document

A loan application contains documents of different types. Each type is represented by a single document. All documents are delivered in a single file. To determine that the current page is the first page of a new document, you can simply compare its class with the class of the previous page.
var documents = [];
var currentResultClassId = null;
var currentDocument = null;

for (let i = 0; i < Context.Pages.length; i++)
{
    const page = Context.Pages[i];
    const pageResultClassId = page.GetField('ResultClassId').Text;

    // Check if the current page has the same class as the previous page. If not, begin a new document.
    if (pageResultClassId != currentResultClassId || !currentDocument)
    {
        currentResultClassId = pageResultClassId;
        currentDocument = new Document(pageResultClassId);
        documents.push(currentDocument);
    }

    currentDocument.Pages.push(page);
}

return documents;

Case 2. Each Document Type is Represented by One or Several Documents

A loan application submitted by two co-applicants contains a large array of documents of different types. Each type may be represented by several documents in sequence. For example, an application can contain images of the IDs of both co-applicants, several bank statements, etc. To determine that the current page is the first page of a new document, you need to combine the two strategies described above. In the following example, we first compare the classes of the two consecutive pages. If the class has changed, the current page is considered to be the first page of a new document. If the pages belong to the same class, we check if the “Title” field was found on the current page. The page on which this field was found is considered to be the first page of a new document.
var documents = [];
var currentResultClassId = null;
var currentDocument = null;

for (let i = 0; i < Context.Pages.length; i++)
{
    const page = Context.Pages[i];
    const pageResultClassId = page.GetField('ResultClassId').Text;

    // Check if the current page has the same class as the previous page. If not, begin a new document.
    if (pageResultClassId != currentResultClassId)
    {
        currentResultClassId = pageResultClassId;
        currentDocument = new Document(pageResultClassId);
        documents.push(currentDocument);
    }
    // Begin a new document if the "Title" field was found on a current page.
    else if (page.GetField("Title").Text || !currentDocument)
    {
        currentDocument = new Document(pageResultClassId);
        documents.push(currentDocument);
    }

    currentDocument.Pages.push(page);
}

return documents;

Reordering Pages and Removing Empty Pages

The document contains pages arranged in an incorrect order. Each page contains a field indicating its number (for example, “Page 1 of 10”). To organize the pages, you need to create a field which will extract page numbers. Additionally, you can also create a field that will indicate whether a page is blank or not by looking for any text on that page. This can be used to discard blank or garbage pages (e.g., those resulting from duplex scanning of page folds with empty back sides) as described in the Separating documents and removing annexes section. In the following example, we reorder pages according to their numbers.
var documents = new Document('document'); 
var currentDocument = null; 
let arr = []; 

for (let i = 0; i < Context.Pages.length; i++) { 
    arr[i] = Context.Pages[i]; 
} 

// Sort the "arr" array based on the numerical values of a specific field in each page 
arr.sort((page1, page2) => parseInt(page1.GetField("Field").Text) - parseInt(page2.GetField("Field").Text)); 

// Iterate over the sorted "arr" array and append each "page" object to the "Pages" property of the "documents" document 
for (let i = 0; i < arr.length; i++) { 
    const page = arr[i]; 
    documents.Pages.push(page); 
} 

return [documents];