Options for Exporting Extracted Field Values (Fields Tab)
| Format | Export Option | Description | File Name |
|---|---|---|---|
| JSON | Values, metadata, and field structure for each document | Full data extraction results. The structure of the output file is described in the Developer’s Guide. | <Applied_skill_name>.json |
| JSON | Values only | Field values and rule errors. The structure of the output file is described in the Developer’s Guide. | <Applied_skill_name>_fields.json |
| CSV | Values only | Field values. Note: When exporting image fields, their values in the CSV file will be empty. If a repeating structure is nested within a group, its name will appear as “New Group/New Table”, but in the name of a child file the slash will be replaced with an underscore (for example, New Group_New Table_055fe8c.csv) | <Applied_skill_name>.csv* |
- The name of the parent CSV file:
<Applied_skill_name>.csv - The name of child CSV files for repeating objects:
<Field_path>_<random 7-character identifier>.csv - If the name of the child CSV file turns out to be longer than 250 characters, an alternate naming scheme will be used:
<Field_ID>_<random 7-character identifier>.csv
- A numeric postfix starting from 2 is added to the name of the CSV file.
- A subfolder is created in the transaction folder for child CSV files.
- The subfolder will be named as follows:
<Applied_skill_name>_<N>or<Applied_skill_name>_fields_<N>(if JSON export - Values only is enabled), where N is the sequential number of the document in the transaction (starting from 2 if there is more than one document in the transaction). - For repeating objects, the name of the child CSV file is specified in the field value of the parent CSV file.
- The field names are written into the first row of the CSV file.
- A comma is used to separate the columns.
- The encoding type used is UTF-8 with BOM.
- Empty instances of the repeating fields or groups, or empty table rows, are not be exported, meaning the resulting CSV file will not have any empty rows.
Options for Exporting Document Text (Text Tab)
| Format | Export Option | Description | File Name |
|---|---|---|---|
| JSON | Text only | A JSON file that contains only recognized text; the document layout is not preserved. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible. | <Applied_skill_name>_text.json |
| JSON | Preserve document structure | A JSON file that contains recognized text with the document layout preserved. | <Applied_skill_name>_text.json |
| XML | Text only | An XML file that contains only recognized text; the document layout is not preserved. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible. | <Applied_skill_name>.xml |
| XML | Preserve document structure | An XML file that contains recognized text with the document layout preserved. | <Applied_skill_name>.xml |
| ALTOXML | Text only | An ALTO XML file that contains only recognized text; the document layout is not preserved. The file corresponds to ALTO standard, schema version 4.2. Note: Selecting this option makes export to DOCX, XLSX, and PPTX impossible. | <Applied_skill_name>.xml |
| ALTOXML | Preserve document structure | An ALTO XML file that contains recognized text with the document layout preserved. The file corresponds to ALTO standard, schema version 4.2. | <Applied_skill_name>.xml |
| TXT | A plain text document. The original document structure is preserved using whitespaces. | <Applied_skill_name>.txt | |
| DOCX | Editable | An editable Word document which may not look exactly like the original. | <Applied_skill_name>.docx |
| DOCX | Exact | A non-editable Word document. The original document structure is fully preserved. | <Applied_skill_name>.docx |
| XLSX | An editable Excel document. The original document structure is preserved. | <Applied_skill_name>.xlsx | |
| PPTX | An editable PowerPoint document. The original document structure is preserved. | <Applied_skill_name>.pptx | |
| HTML | An HTML document that preserves the original document structure. | <Applied_skill_name>.html |
Note: Export options (Text only and Preserve document structure) for JSON, XML, and ALTOXML cannot be different. If you specify another export option for one of these formats, this option will be applied for the other formats.
Options for Exporting Document Image (Image Tab)
| Format | Export Option | Description | File Name |
|---|---|---|---|
| PDF/A-3a | A PDF file with a text layer over the document image. The text layer reflects field value changes made by the Manual Review Operator during manual review. | <Applied_skill_name>.pdf | |
| PDF/A-3b | A PDF file with a text layer over the document image. The text layer reflects field value changes made by the Manual Review Operator during manual review. | <Applied_skill_name>.pdf | |
| Image-only | A non-editable PDF in PDF/A-3b standard without a text layer. | <Applied_skill_name>.pdf | |
| TIFF | A file that contains an enhanced image in TIFF format. | <Applied_skill_name>.tiff | |
| JPEG | Maximum quality | A file that contains an enhanced image in JPEG format. If you choose this compression option, the image quality level will be set to 95%. | pages/page_<N>.jpg |
| JPEG | Smaller file size | A file that contains an enhanced image in JPEG format. If you choose this compression option, the image quality level will be set to 75%. This will allow you to save the image in a readable form while still reducing its size. | pages/page_<N>.jpg |
Note: For each PDF export option, you can choose between “smaller file size” (default option) and “maximum quality”. Smaller file size is achieved by using Mixed Raster Content (MRC) compression, which determines optimal compression rates separately for the text, the pictures, and the background.When exporting to a shared folder, a subfolder is created for each document in the transaction. The following rules and naming scheme will be used:
- The subfolder will be named as follows:
<Applied_skill_name>_<N>, or<Applied_skill_name>_fields_<N>(if JSON export - Values only is enabled). N is the sequential number of the document in the transaction (starting from 2 if there is more than one document in the transaction). - Within this subfolder, a Pages subfolder is created to store the JPG files. The file names are
formatted as page_<N>.jpg, where N is the sequential number of the page. - PDF and TIFF files are saved in the transaction folder.
- Numeric postfixes starting from 2 will be added to the file names if there is more than one document of the same type in the transaction.
General Naming Scheme
Most of the exported files will contain<Applied_skill_name> in their names, which stands for one of the following:
- The name of the last Document skill applied to the document.
- The name of the last Classification skill applied to the document if no Document skills were applied.
- “Unknown” if no document or Classification skills were applied, while at least one of them exists in the Process skill flow.
Transactions with Errors
If a transaction is not completed successfully, Vantage generates anError.json file, which contains a JSON string containing the following information about the transaction:
- The transaction identifier
- The transaction status (Failed)
- The error message
- The array containing all source file identifiers and names in the transaction
