Data Export Formats

Vantage provides the following export options for document fields, text, and images.

Options for Exporting Extracted Field Values (Fields Tab)

Format	Export Option	Description	File Name
JSON	Values, metadata, and field structure for each document	Full data extraction results. The structure of the output file is described in the Developer’s Guide.	`<Applied_skill_name>.json`
JSON	Values only	Field values and rule errors. The structure of the output file is described in the Developer’s Guide.	`<Applied_skill_name>_fields.json`
CSV	Values only	Field values. Note: When exporting image fields, their values in the CSV file will be empty. If a repeating structure is nested within a group, its name will appear as “New Group/New Table”, but in the name of a child file the slash will be replaced with an underscore (for example, `New Group_New Table_055fe8c.csv`)	`<Applied_skill_name>.csv`*

*If a document contains repeating objects (repeating fields, repeating groups of fields, tables), they are exported as separate files. The following naming scheme is used:

The name of the parent CSV file: <Applied_skill_name>.csv
The name of child CSV files for repeating objects: <Field_path>_<random 7-character identifier>.csv
If the name of the child CSV file turns out to be longer than 250 characters, an alternate naming scheme will be used: <Field_ID>_<random 7-character identifier>.csv

When exporting to a shared folder, the parent CSV file is saved in the transaction folder. If there are multiple documents of the same type in the transaction and a separate CSV file is generated for each document, the following rules and naming scheme is used:

A numeric postfix starting from 2 is added to the name of the CSV file.
A subfolder is created in the transaction folder for child CSV files.
The subfolder will be named as follows: <Applied_skill_name>_<N> or <Applied_skill_name>_fields_<N> (if JSON export - Values only is enabled), where N is the sequential number of the document in the transaction (starting from 2 if there is more than one document in the transaction).
For repeating objects, the name of the child CSV file is specified in the field value of the parent CSV file.

The CSV file is formatted as follows:

The field names are written into the first row of the CSV file.
A comma is used to separate the columns.
The encoding type used is UTF-8 with BOM.
Empty instances of the repeating fields or groups, or empty table rows, are not be exported, meaning the resulting CSV file will not have any empty rows.

Options for Exporting Document Text (Text Tab)

Format	Export Option	Description	File Name
JSON	Text only	A JSON file that contains only recognized text; the document layout is not preserved. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible.	`<Applied_skill_name>_text.json`
JSON	Preserve document structure	A JSON file that contains recognized text with the document layout preserved.	`<Applied_skill_name>_text.json`
XML	Text only	An XML file that contains only recognized text; the document layout is not preserved. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible.	`<Applied_skill_name>.xml`
XML	Preserve document structure	An XML file that contains recognized text with the document layout preserved.	`<Applied_skill_name>.xml`
ALTOXML	Text only	An ALTO XML file that contains only recognized text; the document layout is not preserved. The file corresponds to ALTO standard, schema version 4.2. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible.	`<Applied_skill_name>.xml`
ALTOXML	Preserve document structure	An ALTO XML file that contains recognized text with the document layout preserved. The file corresponds to ALTO standard, schema version 4.2.	`<Applied_skill_name>.xml`
TXT		A plain text document. The original document structure is preserved using whitespaces.	`<Applied_skill_name>.txt`
DOCX	Editable	An editable Word document which may not look exactly like the original.	`<Applied_skill_name>.docx`
DOCX	Exact	A non-editable Word document. The original document structure is fully preserved.	`<Applied_skill_name>.docx`
XLSX		An editable Excel document. The original document structure is preserved.	`<Applied_skill_name>.xlsx`
PPTX		An editable PowerPoint document. The original document structure is preserved.	`<Applied_skill_name>.pptx`
HTML		An HTML document that preserves the original document structure.	`<Applied_skill_name>.html`

When exporting to a shared folder, all files are saved in the transaction folder. Numeric postfixes starting from 2 will be added to the file names if there is more than one document of the same type in the transaction. The exported text reflects field value changes made by the Manual Review Operator during manual review.

Note: Export options (Text only and Preserve document structure) for JSON, XML, and ALTOXML cannot be different. If you specify another export option for one of these formats, this option will be applied for the other formats.

Options for Exporting Document Image (Image Tab)

Format	Export Option	Description	File Name
PDF	PDF/A-3a	A PDF file with a text layer over the document image. The text layer reflects field value changes made by the Manual Review Operator during manual review.	`<Applied_skill_name>.pdf`
PDF	PDF/A-3b	A PDF file with a text layer over the document image. The text layer reflects field value changes made by the Manual Review Operator during manual review.	`<Applied_skill_name>.pdf`
PDF	Image-only	A non-editable PDF in PDF/A-3b standard without a text layer.	`<Applied_skill_name>.pdf`
TIFF		A file that contains an enhanced image in TIFF format.	`<Applied_skill_name>.tiff`
JPEG	Maximum quality	A file that contains an enhanced image in JPEG format. If you choose this compression option, the image quality level will be set to 95%.	`pages/page_<N>.jpg`
JPEG	Smaller file size	A file that contains an enhanced image in JPEG format. If you choose this compression option, the image quality level will be set to 75%. This will allow you to save the image in a readable form while still reducing its size.	`pages/page_<N>.jpg`

Note: For each PDF export option, you can choose between “smaller file size” (default option) and “maximum quality”. Smaller file size is achieved by using Mixed Raster Content (MRC) compression, which determines optimal compression rates separately for the text, the pictures, and the background.

When exporting to a shared folder, a subfolder is created for each document in the transaction. The following rules and naming scheme will be used:

The subfolder will be named as follows: <Applied_skill_name>_<N>, or <Applied_skill_name>_fields_<N> (if JSON export - Values only is enabled). N is the sequential number of the document in the transaction (starting from 2 if there is more than one document in the transaction).
Within this subfolder, a Pages subfolder is created to store the JPG files. The file names are formatted as page_<N>.jpg, where N is the sequential number of the page.
PDF and TIFF files are saved in the transaction folder.
Numeric postfixes starting from 2 will be added to the file names if there is more than one document of the same type in the transaction.

General Naming Scheme

Most of the exported files will contain <Applied_skill_name> in their names, which stands for one of the following:

The name of the last Document skill applied to the document.
The name of the last Classification skill applied to the document if no Document skills were applied.
“Unknown” if no document or Classification skills were applied, while at least one of them exists in the Process skill flow.

If there are multiple output files and export to a shared folder is configured, incrementing numbers will be appended to the file names in order to make each name unique.

Transactions with Errors

If a transaction is not completed successfully, Vantage generates an Error.json file, which contains a JSON string containing the following information about the transaction:

The transaction identifier
The transaction status (Failed)
The error message
The array containing all source file identifiers and names in the transaction

By default, exported data is stored for 2 weeks, in accordance with the retention policy.

Introduction

Quickstart

Skill Catalog

Skill Designer

Advanced Designer

Runtime Guide

Tenant Admin Guide

Scanning Station Guide

Developer Guide

Release Notes

Options for Exporting Extracted Field Values (Fields Tab)

Options for Exporting Document Text (Text Tab)

Options for Exporting Document Image (Image Tab)

General Naming Scheme

Transactions with Errors

Introduction

Quickstart

Skill Catalog

Skill Designer

Advanced Designer

Runtime Guide

Tenant Admin Guide

Scanning Station Guide

Developer Guide

Release Notes

​Options for Exporting Extracted Field Values (Fields Tab)

​Options for Exporting Document Text (Text Tab)

​Options for Exporting Document Image (Image Tab)

​General Naming Scheme

​Transactions with Errors

Options for Exporting Extracted Field Values (Fields Tab)

Options for Exporting Document Text (Text Tab)

Options for Exporting Document Image (Image Tab)

General Naming Scheme

Transactions with Errors