Best Practices for Accurate Detection
To ensure that the detection results are as accurate as possible, make sure that:- Unique company identifiers are filled in. Filling in unique value columns (Tax ID, National Tax ID, IBAN) will significantly improve the probability of correct detection, since these values are unique for all companies.
- There are no duplicate company records. The absence of duplicate records will increase the probability of correctly detecting the company.
- There are no unrelated records. Outdated or invalid records in the data catalog may cause the company to be detected incorrectly because of coincidental similarities between various field values.
- All fields are filled in for each company record. Specify as much accurate information about companies as possible. The more accurate the information, the higher the probability of correctly detecting the companies.
Company Detection Process
Company detection includes the following steps:Step 1: Unique Identifier Search
The values of the following fields are considered to be unique company identifiers:- Tax ID
- National Tax ID
- IBAN
- letters are changed to upper case
- spaces and the following characters are removed: ”.”, ”,”, ”—”, ”/”, ”****“
Step 2: Company Name and Address Search
The entire text detected on the document image is used to query the data catalog. Next, the Name, Street, Postal code, and City values received from the data catalog are matched against the values detected on the image (exact matching is used).Step 3: Generating Hypotheses
Based on the companies found in steps 1 and 2, a set of hypotheses is generated. A Classify By Company activity evaluates these hypotheses and selects five document issuer and five document receiver company records that most reliably match the field values detected on the document image. These records are then used to form 25 pairs, with each pair treated as a separate hypothesis. A trained model then rates the hypotheses by reliability, selecting the best matching issuer–receiver pair.Even if the number of document receiver companies is very small (for example, if there is only one document receiver company), using a Document Receiver Companies data catalog is still recommended, as it will prevent a document receiver company from being incorrectly detected as a document issuer company.
Results of Detecting Document Issuer and Receiver Companies
As a result of detecting issuer and receiver companies on a document the following identifiers will be found:- The issuer company identifier in the Document Issuer Companies data catalog
- The receiver company identifier in the Document Receiver Companies data catalog
