- Recognition mode (Fast / Balanced / Normal / Accurate) determines the speed of recognition and the quality of the text layer obtained as a result. To specify a recognition mode, in the Document Definition Editor, click Document Definition → Document Definition Properties… → Recognition).
- Recognition languages are the languages to be used for recognition. To specify recognition languages, in the Document Definition Editor, click Document Definition → Document Definition Properties… → Document Definition Settings, and then click Edit in the Countries and Languages group to select the required languages.
Note: Recognition languages in FlexiCapture for Invoices are tied to the country settings. When adding an invoice country to the Countries and Languages group, the corresponding languages will automatically appear in the Document Definition settings. Invoice fields are extracted upon recognition.
To detect and capture fields on an invoice, the program can use:
Both methods are described below, together with the algorithm that either combines the results obtained by using both these methods or selects the best result.Using a FlexiLayout
Business unit and vendor
Business unit and vendor
The following may be used to determine the Vendor and Business Unit:
For example, the identifier “DE12345” may be recognized as “OE12345”. The detected prefix OE will then be replaced with the correct prefix DE.
The VATID, NationalVATID, and IBAN fields detected on a document image will be used to query the Data set. The VATID, NationalVATID, and IBAN column values received from the Data set fields are normalized the same manner as the values detected on the image, after which they are matched (exact matching is used) to the normalized values of fields detected on the image.
Note: To get the best possible name and company search results, make sure that the corresponding Data set columns are filled in. Company name and address information is especially important in cases where the company cannot be identified using VATID, NationalVATID, or IBAN.
- Document Definition settings: IBAN, VATID, and NationalVATID formats, as well as the corresponding keywords;
- Data set record fields: IBAN, VATID, NationalVATID, Name, Street, City, ZIP.
Automatic company detection algorithm
The detail and quality of the information filled in in the Data set columns has a significant impact on detection quality. To ensure that the search results are as accurate as possible, make sure that:- The unique company identifiers are filled in
Filling in unique value columns (VATID, NationalVATID, IBAN) will significantly improve the probability of correct detection, since these values are unique for all companies. - There are no repeating company records
The absence of any repeating records will improve the probability of correctly detecting the company. For more information about eliminating duplicate records, see Eliminating duplicate records in the external database. - There are no unrelated records
The presence of outdated or invalid records in the Data set may cause the company to be detected incorrectly because of coincidental similarities between various field values. - All fields are filled in for every company record
Specify as much information about companies as possible. The more fields are filled in in the Data set, the higher the probability of correctly detecting the company. - Multiple-value columns are used to store the same information that is denoted in different ways, and not different information altogether
For example, if a single company has several addresses, there must be a separate record for each of them, even if all other fields contain the same information. For more information, see Preparing vendor and business unit databases.
- Unique identifier search
- VATID,
- NationalVATID,
- IBAN.
- letters are changed to upper case ,
- spaces and the following characters are removed: ” . ”, ” , ”, ” — ”, ” / ”, ” ** ”.
For example, the identifier “DE12345” may be recognized as “OE12345”. The detected prefix OE will then be replaced with the correct prefix DE.
The VATID, NationalVATID, and IBAN fields detected on a document image will be used to query the Data set. The VATID, NationalVATID, and IBAN column values received from the Data set fields are normalized the same manner as the values detected on the image, after which they are matched (exact matching is used) to the normalized values of fields detected on the image.
- Company name and address search
Note: To get the best possible name and company search results, make sure that the corresponding Data set columns are filled in. Company name and address information is especially important in cases where the company cannot be identified using VATID, NationalVATID, or IBAN.
- Hypothesis formation
Hypothesis filtering
Hypotheses are split into the following based on match reliability (Data set record and the document image field value):- reliably matching the document image;
- unreliably matching the document image.
- true — filtering is enabled, and the final hypothesis will be selected exclusively from the reliable hypotheses (default value);
- false — filtering is disabled, and the final hypothesis will be selected from all hypotheses regardless of their reliability;
- When detecting vendors, no unreliable hypotheses for vendors will be considered. If there are no reliable hypotheses, a vendor will not be detected.
- When detecting business units:
- if at least one reliable hypothesis has been found, no unreliable hypotheses will be considered;
- if the set of hypotheses obtained during steps 1 through 3 does not contain at least one reliable hypothesis, the flag value will be ignored. The final hypothesis will be selected from the unreliable hypotheses.
- There are usually a lot less company business unit records than there are vendor records. They also change far less frequently, meaning that it is easier to keep them up-to-date. Therefore, detecting a reliable hypothesis increases the probability of the final hypothesis being correct. However, detecting a business unit is important even if no reliable hypotheses have been found, since the most important factor pertaining to the reliability of the detection result is the reliability evaluation of the vendor-BU pairs.
- There are usually a lot more vendor records, and the Data set contains more columns because vendors specify more information about their own company on their invoices (as opposed to the business unit). Records can also contain outdated information, meaning that unreliable hypothesis filtering will depend on both the quality of the Data set and the verification scenario type.
Results of detecting the vendor and business unit
The main results of detecting the vendor and business unit on the invoice are:- the identifier of the vendor record in the Vendors data set
- the identifier of the business unit record in the BusinessUnits data set
- Name
- VatID
- NationalVatID
- IBAN
- Street
- Zip
- City
- Name
- VatID
- Street
- Zip
- City
How to change the way the program detects the vendor or business unit
The better a vendor or business unit record in the data set matches the text extracted from an invoice image, the more accurately the program detects the vendor or business unit.First, you need to identify the data in the external database that corresponds to the data set columns used for finding the company on an invoice. The external database and the data set have to be properly connected (see Using vendor and business unit databases).If one and the same company occurs both in the list of vendors and in the list of business units, you must specify the same VATID for the respective records in both data sets (even if there is no VATID on invoices). This will prevent the program from detecting the vendor and business unit incorrectly.To compensate for possible variations in field values on images, use:- normalization of data set columns (see Normalization of Values in data sets),
- multiple-value data set columns (see Multiple-value columns in a data set).
Using pre-determined vendor and business unit values in conjunction with extracted values
The vendor or the business unit of the invoice’s company can be determined in advanced based on the invoice’s source (name of the Scanning Operator or the e-mail address of the message’s sender). You can specify the vendor and/or the business unit explicitly prior to automatic detection. To do so, set the value of the document’s registration parameter fc_Predefined:InvoicePredefinedVendorId (fc_Predefined:InvoicePredefinedBusinessUnitId) to the identifier (Id) of an entry in the Vendors or BusinessUnits data set. Doing this does not prevent automatic detection of the vendor and/or the business unit from taking place. Thanks to this, in addition to the pre-determined vendor and/or business unit, you will get a confidence value (this value indicates how well the pre-determined values match values extracted from the image), as well as the regions of fields from the Vendor and/or Business Unit field groups.Invoice Header field group
Invoice Header field group
InvoiceNumber, InvoiceDate
An invoice’s header includes, among others, the InvoiceNumber and InvoiceDate fields.These fields are detected using keywords that are specified in the language properties of the Document Definition. The vendor and the business unit are detected first, providing information about the countries of the vendor and business unit. The countries determine languages (languages that correspond to a country are specified in the Document Definition). The set of keywords for finding fields is taken from the countries of the vendor and the business unit.You can change the way the program looks for regions of fields by editing keywords (see Keywords) and by using training (see Training).How does the program determine that a document is an invoice?
FC determines whether a document is an invoice when applying the FlexiLayout.The conditions listed below indicate that a document is an invoice. Not all of these conditions have to be met, but each one caries a certain weight.- InvoiceNumber and InvoiceDate fields were detected.
- Keywords from the InvoiceIdentifiers located element were detected (See Keywords).
- A vendor or a business unit was detected on the document.
Amounts field group
Amounts field group
FCFORINVOICES captures the following fields from an invoice:
Information from the Document Definition is used to find sums and tax rates:
Field | Invoice Processing (Au-NZ), Invoice Processing (US), Invoice Processing (CA), Invoice Processing (EU), Invoice Processing (JP) | Invoice Processing (ES) |
|---|---|---|
The total sum of the invoice (Total) and the currency of the invoice (Currency) | Yes | Yes |
Taxes:
| Yes | Yes |
| No | Yes |
Additional tax (AdditionalCosts) | Yes | Yes |
- Rates of taxes payable in the vendor’s country (you can specify these on the Tax Rates tab of the country’s properties, See Country and language settings)
- Keywords for tax rates (you can specify these on the Keywords tab of the language’s properties. Also See Keywords).
- AmountTotalHighConfidenceLabels: keywords that only occur near the Total field, such as “Pay this amount.”
- AmountTotalLowConfidenceLabels: keywords that can occur near the Total field but can also occur near other fields. For example, the keyword “Total” can appear near the Total field but may also occur near a field that contains the total weight of all items on an invoice.
- Numbers that occur two or three times in the same line or in the same column on the image. Such numbers may be the Total on invoices where no taxes are specified.
- Numbers that are sums of the numbers located above them in the same column.
- The largest (by absolute value) numbers located in the end of the document.
Purchase Order field group
Purchase Order field group
FCFORINVOICES can extract all purchase order numbers and their corresponding sums from the invoice.This feature is disabled by default (See Purchase order matching).To extract Purchase Order numbers, you will need a data set with a list of possible Purchase Order numbers and their sums (see PurchaseOrders data set).The Purchase Order field can be extracted using:For more on XML configuration files, see Editing invoice processing settings in XML files.
- a regular expression;
- A data set containing possible purchase order numbers (see PurchaseOrders data set).
- Use the VendorId column of the data set. In this case the program will only use Purchase Order numbers from the invoice’s vendor.
- Filter out purchase orders for which an invoice has already been received and only add the numbers of purchase orders for which no invoice has been received yet to the data set.
The Line Items field group
The Line Items field group
FCFORINVOICES can extract invoice line items from images.Extraction of invoice line items is disabled by default (See Additional fields).For a list of fields which the program extracts automatically, See Captured fields.FCFORINVOICES first searches the image for a table. During this search, it uses the keywords for column titles which are specified for every language in the Document Definition’s properties. Keywords for columns of invoice line items are also used for classifying items, i.e. for determining the type of each invoice line item column.After this, the program uses information about detected columns and mathematical expressions to find invoice line items in the invoice’s table.Finally, the program searches invoice line items for fields from columns.Training can be used to improve the quality of automatic line item extraction.
Using neural networks
One of the main advantages offered by neural networks is their ability to self-learn: neural networks can detect complex dependencies existing among input data and make some useful generalizations. The program includes two neural networks that can be used to capture the following fields:- InvoiceNumber
- InvoiceDate
- Total
- Vendor \ Name
- Vendor \ Address
- Business Unit \ Name
- Business Unit \ Address
- Purchase Orders \ Order Number
- LineItems:
- OrderNumber
- OrderDate
- Position
- ArticleNumber
- Description
- Quantity
- Unit of measurement
- Unit Price
- Total Price Netto
- VATPercentage
Disabling the neural networks
By default, the neural networks will be used as the second method of capturing document fields. If you need to process documents other than invoices within your invoice project, you may want to disable the neural network, as it was specifically trained to capture invoice fields and may not perform well on other types of documents. To disable the neural network for the Line Items group:- Open the Document Definition Editor.
- Click Document Definition Properties… → Document Definition Settings → Additional Fields and Features.
- Disable the Thorough extraction of invoice line items option.
- Open the Document Definition Editor.
- Click Document Definition Properties… → Document Definition Settings → Additional Fields and Features.
- Disable the Thorough extraction of invoice header fields option.
Combining the field detection results
How the program combines the field detection results or selects the best result depends on the field. As a general rule, precedence will be given to the results obtained by the respective neural network. Exceptions to this rule are searches based on data sets and searches using regular expressions created for specific customer documents. Invoice Header field group The results obtained by the neural network will always have precedence for the following fields:- Invoice Number
- Invoice Date
- Total
- Name
- VATID (ABN)
- Address
