Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt

Use this file to discover all available pages before exploring further.

Vantage can normalize extracted data to ensure uniform representation. The following data types can be normalized: To normalize data extracted from a field, specify its data type:
1

Open Field options

In the data form, click the field settings button next to the field.
2

Select a data type

In the Field options dialog, select the data type from the dropdown.
For normalization to work, set the data-type-specific properties for each field so that Vantage extracts everything that needs to be normalized. Click Advanced in the Field options dialog to access these properties. See Properties by data type.
Normalized value shown on hover over a field

Normalize dates

When normalizing dates, Vantage converts extracted dates into ISO 8601 format:
  • YYYY-MM-DD for dates
  • HH:MM:SS for time
For accepted separators, see Data types.

Examples

Extracted dataNormalized data
15.06.20232023-06-15
2023/06/15 22:172023-06-15 22:17:00
06-15-20232023-06-15
02/11/20222022-02-11 or 2022-11-02
Saturday, December 3rd, 20222022-12-03
The second of May 20222022-05-02
If both Day-Month-Year and Month-Day-Year formats are enabled, Vantage may not be able to normalize the date unambiguously. In that case, you can choose between the two candidate dates.
Dates written out in words are normalized only when they’re in English and English is selected in the skill settings. Vantage may not be able to normalize a date in the following cases:
  • The date is incomplete — for example, 4:39 am (time values are only normalized when extracted together with a date).
  • Adverbs of time are used instead of exact dates — for example, last month, a few days ago.
  • Extra words or characters appear next to the date or time — for example, 2016/06/15 22.
  • Uncommon date representations are used — for example, 14 Jumada Al-Awwal 1445.

Normalize numbers

Vantage can normalize numbers using Western or Indian digit grouping:
  • Western — Groups digits by threes from right to left, using commas to separate thousands, millions, and so on.
  • Indian — Groups the first three digits from the right, then by twos for tens of thousands, lakhs, tens of lakhs, crores, and so on.
Vantage parses the extracted string and converts it into a standardized format using a dot (.) to separate integer and fractional parts. For accepted separators, see Data types.

Examples

Extracted dataNormalized data
12,345,67812345678
-12,345.678-12345.678
12.000012
1.0001000 or 1
12,345.678 %12345.678
1,23,45,67,890 (Indian numbering system)1234567890
twenty-first21
If the part after the dot has three digits (as in 1.000), you need to choose between the two candidate values — whether the dot separates thousands or the integer from the fractional part.
Numbers written out in words are normalized only when they’re in English and English is selected in the skill settings. Vantage may not be able to normalize a number in the following cases:
  • Extra words or characters appear next to the number — for example, EURO12,345.678 or 5 kilos.
  • There is an irregular number of digits between the fractional and integer parts, or between the decimal and thousands parts — for example, 123,456,7890. The fractional part must contain 3 or fewer digits. If 123,456,789 is extracted, the normalized value is 123456789; if 123,456,78 is extracted, the normalized value is 123456.78.
  • Irregular number representations are used.

Normalize money amounts

A money amount contains a number value and a currency symbol, with the symbol before or after the amount. When normalizing, Vantage outputs the currency symbol first, followed by the amount normalized as a number. Currency is identified by symbol or name — , EURO, and euros all map to the euro. The normalized value uses the exact symbol or name found in the extracted text.

Examples

Extracted dataNormalized data
12,345.678 EUROEURO 12345.678
12,345.678 ¥¥ 12345.678
13,87EE 13.87
13 euro 87euro 13.87
fifty dollarsdollars 50
₹1,23,455₹ 123455
Amounts written out in words are normalized only when they’re in English and English is selected in the skill settings.
Vantage may not be able to normalize a money amount when invalid words are used to denote a currency — for example, 12 ttt.

Text field

Add a Text field, choose a data type, and configure recognition properties.

Labeling documents

Guidelines for labeling structured and semi-structured documents during training.

Supported recognition languages

Full list of OCR languages supported across Vantage skills.