Skip to main content
Vantage can normalize extracted data to ensure uniformity in data representation. The following data types can be normalized: To normalize data extracted from a field, you must specify its data type. To do this:
  1. In the data form, click the button next to the field.
  2. In the Field options dialog box, select the desired data type from the drop-down list.
For normalization to work, be sure to select the right additional properties for each field, so that all the data to be normalized is extracted. You can modify the additional properties of a field by clicking the Advanced button in the Field options dialog box. For more information, see Additional properties of the text field. To see a normalized value, hover over its field: Normalization Example

Normalizing dates

Extracted dates can contain date and time in any format. The following characters can be used as separators: the dot (.), the space ( ), the hyphen (-), the backslash (), and the forward slash (/). When normalizing dates, Vantage converts extracted dates into a standardized ISO 8601 format:
  • YYYY-MM-DD for dates: year followed by month followed by day.
  • HH:MM:SS for time: hours followed by minutes followed by seconds.

Examples of normalized dates

Extracted dataNormalized data
15.06.20232023-06-15
2023/06/15 22:172023-06-15 22:17:00
06-15-20232023-06-15
02/11/20222022-02-11 or 2022-11-02
Note: If both the Day-Month-Year and Month-Day-Year date formats are enabled for the date, Vantage may not be able to normalize the date unambiguously. If this is the case, you will be presented with a choice between two possible dates.
Saturday, December 3rd, 20222022-12-03
The second of May 20222022-05-02
Note: Dates written out in words will be normalized only if written in English and if English is selected in the skill settings.
Vantage may not be able to normalize dates for any of the following reasons:
  • A date is incomplete, for example: “4:39 am” (time values will only be normalized if extracted together with their dates).
  • Adverbs of time are used instead of exact dates, for example: “last month”, “a few days ago”.
  • Extra words or characters appear next to the date or time, for example: “2016/06/15 22”.
  • Uncommon date representations are used, for example: “14 Jumada Al-Awwal 1445”.

Normalizing numbers

Extracted numbers can contain digits, decimal separators, and the percentage character (%). The following characters can be used as decimal separators: the dot (.), the comma (,), the hyphen (-), the equal sign (=), and the space ( ). The following characters can be used as thousands (millions, etc.) separators: the dot (.), the comma (,), the single quotation mark (’), and the space ( ). Numbers using Western or Indian digit grouping systems can be normalized. The Western system groups digits by threes, from right to left, using commas to separate thousands and millions. The Indian system also uses commas, it also groups the first three digits from the right, but then goes on to group digits by twos for tens of thousands, lakhs, tens of lakhs, crores, tens of crores, and so on. When normalizing numbers in either numbering system, Vantage parses extracted strings of numerical data and converts them into the standardized format as shown in the table below, with the dot (.) used to separate integer and fractional parts.

Examples of normalized numbers

Extracted dataNormalized data
12,345,67812345678
-12,345.678-12345.678
12.000012
1.0001000 or 1
Tip: If the part after the dot has three digits, you will need to choose one of the two possible values, deciding whether the dot separates thousands or the integer from the fractional part.
12,345.678 %12345.678
1,23,45,67,890 (Indian numbering system)1234567890
twenty-first21
Note: Numbers written out in words will be normalized only if written in English and if English is selected in the skill settings. Vantage may not be able to normalize numbers for any of the following reasons:
  • Extra words or characters appear next to the number, for example: “EURO12,345.678” or “5 kilos”.
  • There is an irregular number of digits between the fractional and integer parts, or decimal and thousands parts, for example: “123,456,7890”. In this case, the fractional part must contain 3 or fewer digits. Moreover, if “123,456,789” is extracted, the normalized value will be “123456789” and if “123,456,78” is extracted, the normalized value will be “123456.78”.
  • Irregular number representations are used.

Normalizing money amounts

An amount of money can contain both a number value and a currency symbol. The currency symbol may be placed either before or after the amount. When normalizing amounts of money, Vantage parses the extracted monetary strings (such as currency symbols, decimals, or digit separators) and converts them into a standardized currency format, ensuring uniformity in currency symbols and decimal separators: the currency symbol comes first, followed by the amount normalized as a number. Vantage can identify the currencies of different countries that are denoted in different ways (for example, euros can be represented by E, €, or euros). The normalized value for currency will exactly match the currency symbol or name in the extracted text.

Examples of normalized amounts of money

Extracted dataNormalized data
12,345.678 EUROEURO 12345.678
12,345.678 ¥¥ 12345.678
13,87EE 13.87
13 euro 87euro 13.87
fifty dollarsdollars 50
₹1,23,455₹ 123455
Note: Amounts written out in words will be normalized only if written in English and if English is selected in the skill settings.
Vantage may not be able to normalize amounts of money if invalid words are used to denote a currency, for example: “12 ttt”.

See also

Text field