Skip to main content
Documents from some companies may have uniquely complex structures. To extract data from such documents correctly, you need to set up company-specific extraction activities. This can be done within a single Document skill by using a Classify By Company activity. This activity classifies documents by companies using data catalogs.

Using Data Catalogs

A data catalog is a set of data that contains specific information. For example, this can be company-related information like company names, addresses, bank account numbers, etc. Data catalogs can be used to look for any companies on the document, such as vendors, consignees, banks, shipping companies, etc. A data catalog can be populated with data from a CSV file with a structure identical to that of the data catalog. For more information about using data catalogs, see the Using data catalogs section in the Skill Designer Guide. When documents are processed using a Classify By Company activity, specific company data is searched for in the connected data catalog. The result is a unique company identifier that is recorded in the corresponding field in the skill. This company identifier is then used to set up a company-specific document processing algorithm. The company identifier is the only field that is required for a Classify By Company activity to work; however, you can fill out more fields using information about the company stored in the data catalog to avoid setting up additional extraction activities for those fields. To do so, you need to map data catalog columns to the appropriate fields in the skill. A Classify By Company activity uses data catalogs of type Document Issuer Companies, available on the Vantage server to which you are connected. In some cases, you may want to search the document for a pair of related companies (for example, a supplier and a purchaser of goods). To do this, add a second, optional, data catalog of type Document Receiver Companies. If you need to find two unconnected companies, you may either use this option or add two Classify By Company activities.

Looking for a Pair of Companies

Each of your company’s business units may have its own database of suppliers. If a supplier works with several business units, it will have multiple entries with different IDs. In this case, you will need to find the exact entry for the supplier that corresponds to the business unit. To achieve this, fill in the Company Correlation ID column in the Document Issuer Companies data catalog. The search will then look for pairs of companies where the correlation ID for the document issuer matches the document receiver ID. If some correlation IDs are missing, pairs with matched correlation IDs will have priority. When searching for pairs of companies, hypotheses are generated using the correlation between a supplier and a business unit. A Classify By Company activity selects five document receiver company records that most reliably match the field values on the document image. Then, for each record, the activity selects five document issuer company records where the Company Correlation ID is identical to the Receiver Company ID. If the data catalog does not contain any records where the Company Correlation ID is identical to the Receiver Company ID, records with an empty Company Correlation ID are selected instead.
Important! To find valid pairs, you must fill in the right Company Correlation IDs for all records.
As a result, the best matching issuer–receiver pair is selected. For more information about company detection, see How company detection works. If your document issuer companies may work with any of the document receivers, you don’t need to fill in the Company Correlation ID column in your Document Issuer Companies data catalog, and the search will consider all the possible company pairs.

Data Catalog Types

The Document Issuer Companies Data Catalog

NameDescription
Issuer Company IDObtained by detecting the document issuer. Identifies the document issuer in an external information system.

Note: This is the unique identifier of the entry in the data catalog if all of the company’s business units use the same customer database. Otherwise, the entry in the data catalog is uniquely identified by a combination of Issuer Company ID and Company Correlation ID.
Company Correlation IDThe identifier of the company’s business unit.

Note: If the company’s business units use different customer databases, this data catalog column must be filled in, as the unique key of the entry in the data catalog will be a combination of Issuer Company ID and Company Correlation ID. For more information, see Looking for a pair of companies.
Tax IDThese columns can be used in a unique company identifier search.
National Tax IDThese columns can be used in a unique company identifier search.
IBANThese columns can be used in a unique company identifier search.
NameThese columns can be used in a company name and address search.
Postal CodeThese columns can be used in a company name and address search.
StreetThese columns can be used in a company name and address search.
CityThese columns can be used in a company name and address search.
State or ProvinceThese columns are not used in the company detection process. They can only be used to fill in the document fields.
CountryThese columns are not used in the company detection process. They can only be used to fill in the document fields.
Bank AccountThese columns are not used in the company detection process. They can only be used to fill in the document fields.
Bank CodeThese columns are not used in the company detection process. They can only be used to fill in the document fields.

The Document Receiver Companies Data Catalog

NameDescription
Receiver Company IDObtained by detecting the document receiver. Identifies the document receiver in an external information system. The unique key of the entry in the data catalog.
Tax IDThis column can be used in a unique company identifier search.
NameThese columns can be used in a company name and address search.
Postal CodeThese columns can be used in a company name and address search.
StreetThese columns can be used in a company name and address search.
CityThese columns can be used in a company name and address search.
State or ProvinceThese columns are not used in the company detection process. They can only be used to fill in the document fields.
CountryThese columns are not used in the company detection process. They can only be used to fill in the document fields.