Skip to main content
Vantage provides the following export options for document fields, text, and images.

Options for Exporting Extracted Field Values (Fields Tab)

FormatExport OptionDescriptionFile Name
JSONValues, metadata, and field structure for each documentFull data extraction results. The structure of the output file is described in the Developer’s Guide.<Applied_skill_name>.json
JSONValues onlyField values and rule errors. The structure of the output file is described in the Developer’s Guide.<Applied_skill_name>_fields.json
CSVValues onlyField values. Note: When exporting image fields, their values in the CSV file will be empty. If a repeating structure is nested within a group, its name will appear as “New Group/New Table”, but in the name of a child file the slash will be replaced with an underscore (for example, New Group_New Table_055fe8c.csv)<Applied_skill_name>.csv*
*If a document contains repeating objects (repeating fields, repeating groups of fields, tables), they are exported as separate files. The following naming scheme is used:
  • The name of the parent CSV file: <Applied_skill_name>.csv
  • The name of child CSV files for repeating objects: <Field_path>_<random 7-character identifier>.csv
  • If the name of the child CSV file turns out to be longer than 250 characters, an alternate naming scheme will be used: <Field_ID>_<random 7-character identifier>.csv
When exporting to a shared folder, the parent CSV file is saved in the transaction folder. If there are multiple documents of the same type in the transaction and a separate CSV file is generated for each document, the following rules and naming scheme is used:
  • A numeric postfix starting from 2 is added to the name of the CSV file.
  • A subfolder is created in the transaction folder for child CSV files.
  • The subfolder will be named as follows: <Applied_skill_name>_<N> or <Applied_skill_name>_fields_<N> (if JSON export - Values only is enabled), where N is the sequential number of the document in the transaction (starting from 2 if there is more than one document in the transaction).
  • For repeating objects, the name of the child CSV file is specified in the field value of the parent CSV file.
The CSV file is formatted as follows:
  • The field names are written into the first row of the CSV file.
  • A comma is used to separate the columns.
  • The encoding type used is UTF-8 with BOM.
  • Empty instances of the repeating fields or groups, or empty table rows, are not be exported, meaning the resulting CSV file will not have any empty rows.

Options for Exporting Document Text (Text Tab)

FormatExport OptionDescriptionFile Name
JSONText onlyA JSON file that contains only recognized text; the document layout is not preserved. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible.<Applied_skill_name>_text.json
JSONPreserve document structureA JSON file that contains recognized text with the document layout preserved.<Applied_skill_name>_text.json
XMLText onlyAn XML file that contains only recognized text; the document layout is not preserved. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible.<Applied_skill_name>.xml
XMLPreserve document structureAn XML file that contains recognized text with the document layout preserved.<Applied_skill_name>.xml
ALTOXMLText onlyAn ALTO XML file that contains only recognized text; the document layout is not preserved. The file corresponds to ALTO standard, schema version 4.2. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible.<Applied_skill_name>.xml
ALTOXMLPreserve document structureAn ALTO XML file that contains recognized text with the document layout preserved. The file corresponds to ALTO standard, schema version 4.2.<Applied_skill_name>.xml
TXTA plain text document. The original document structure is preserved using whitespaces.<Applied_skill_name>.txt
DOCXEditableAn editable Word document which may not look exactly like the original.<Applied_skill_name>.docx
DOCXExactA non-editable Word document. The original document structure is fully preserved.<Applied_skill_name>.docx
XLSXAn editable Excel document. The original document structure is preserved.<Applied_skill_name>.xlsx
PPTXAn editable PowerPoint document. The original document structure is preserved.<Applied_skill_name>.pptx
HTMLAn HTML document that preserves the original document structure.<Applied_skill_name>.html
When exporting to a shared folder, all files are saved in the transaction folder. Numeric postfixes starting from 2 will be added to the file names if there is more than one document of the same type in the transaction. The exported text reflects field value changes made by the Manual Review Operator during manual review.
Note: Export options (Text only and Preserve document structure) for JSON, XML, and ALTOXML cannot be different. If you specify another export option for one of these formats, this option will be applied for the other formats.

Options for Exporting Document Image (Image Tab)

FormatExport OptionDescriptionFile Name
PDFPDF/A-3aA PDF file with a text layer over the document image. The text layer reflects field value changes made by the Manual Review Operator during manual review.<Applied_skill_name>.pdf
PDFPDF/A-3bA PDF file with a text layer over the document image. The text layer reflects field value changes made by the Manual Review Operator during manual review.<Applied_skill_name>.pdf
PDFImage-onlyA non-editable PDF in PDF/A-3b standard without a text layer.<Applied_skill_name>.pdf
TIFFA file that contains an enhanced image in TIFF format.<Applied_skill_name>.tiff
JPEGMaximum qualityA file that contains an enhanced image in JPEG format. If you choose this compression option, the image quality level will be set to 95%.pages/page_<N>.jpg
JPEGSmaller file sizeA file that contains an enhanced image in JPEG format. If you choose this compression option, the image quality level will be set to 75%. This will allow you to save the image in a readable form while still reducing its size.pages/page_<N>.jpg
Note: For each PDF export option, you can choose between “smaller file size” (default option) and “maximum quality”. Smaller file size is achieved by using Mixed Raster Content (MRC) compression, which determines optimal compression rates separately for the text, the pictures, and the background.
When exporting to a shared folder, a subfolder is created for each document in the transaction. The following rules and naming scheme will be used:
  • The subfolder will be named as follows: <Applied_skill_name>_<N>, or <Applied_skill_name>_fields_<N> (if JSON export - Values only is enabled). N is the sequential number of the document in the transaction (starting from 2 if there is more than one document in the transaction).
  • Within this subfolder, a Pages subfolder is created to store the JPG files. The file names are formatted as page_<N>.jpg, where N is the sequential number of the page.
  • PDF and TIFF files are saved in the transaction folder.
  • Numeric postfixes starting from 2 will be added to the file names if there is more than one document of the same type in the transaction.

General Naming Scheme

Most of the exported files will contain <Applied_skill_name> in their names, which stands for one of the following:
  • The name of the last Document skill applied to the document.
  • The name of the last Classification skill applied to the document if no Document skills were applied.
  • “Unknown” if no document or Classification skills were applied, while at least one of them exists in the Process skill flow.
If there are multiple output files and export to a shared folder is configured, incrementing numbers will be appended to the file names in order to make each name unique.

Transactions with Errors

If a transaction is not completed successfully, Vantage generates an Error.json file, which contains a JSON string containing the following information about the transaction:
  • The transaction identifier
  • The transaction status (Failed)
  • The error message
  • The array containing all source file identifiers and names in the transaction
By default, exported data is stored for 2 weeks, in accordance with the retention policy.