Skip to main content
To extract data from documents, create a Document skill. The scenario and technologies involved (available as activities) largely depend on the structure of documents you’re going to process. In general, documents can be categorized into the following types:

Document Types

Structured Documents

Structured documents (also known as fixed forms) always include the same information and have either the same layout or a very limited number of layouts. Examples of structured documents include forms, questionnaires, and surveys. Sample Structured Document

Semi-structured Documents

Semi-structured documents generally contain identical information, but the location, size, and number of fields may vary from document to document, which makes extracting data more difficult. Vantage will rely on spatial and logical relations existing between certain elements and fields to locate and extract the required data. Examples of semi-structured documents include invoices, payment orders, and bills of lading. Sample Semi-structured Document If your document set consists of structured or semi-structured documents, check out the scenarios in the Processing structured documents and Processing semi-structured documents sections.

Unstructured Documents

Unstructured documents consist of freeform text divided into paragraphs and sentences containing data that needs to be extracted. In some unstructured documents, a field may spill over the next page. Examples of unstructured documents include contracts, emails, and research articles. Sample Unstructured Document If your document set consists of unstructured documents, check out the scenarios in the Processing unstructured documents section.

Mixed Document Sets

If your document set contains both semi-structured and unstructured documents, or if your documents may have both semi-structured and unstructured content (for example, paragraphs of plain text alternating with tables), check out the scenarios in the Processing mixed document sets and documents of mixed structure section.