How does normalization work during data extraction in program?
How does normalization work during data extraction in program?
FCFORINVOICES offers two types of normalization for values from the data set.
1. Text
This type of normalization is useful when comparing strings such as company names and addresses.- White space (this includes newline and tab characters) and separation symbols are replaced with regular spaces.
- Periods used as separators (periods that are placed between words) are replaced with spaces and periods in abbreviations are removed.
- Normalization of conjunction symbols (&, +, -, /, ~):
- Sets of words that begin with a single-letter word and are separated by the same conjunction symbol are joined into a single word, e.g. R & D becomes R&D;
- In all other cases conjunction symbols are replaced with spaces, e.g. Procter&Gamble becomes Procter Gamble.
- Double spaces are removed.
- A list specified in advance is used to split words. For example, CoKG is split into Co KG.
- Spaces in recognized text are used to split it into separate words.
- A list specified in advance is used to replace suffixes in each word. For example, you can replace the suffix strasse with the suffix str.
- Automatic replacement of strings of words according to list specified in advance. For example, you can replace the work Limited with the abbreviation Ltd.
More...
More...
The Normalization.xml file can be modified after the Dataset has been created (separately for each Dataset). To modify the standard normalization settings, do the following:
- Download the settings file using the DownloadNormalizationSettings FCAdminTools command.
- Make the appropriate changes.
- Upload the settings file using the UpdateNormalizationSettings FCAdminTools command.
Significant changes may be made to the normalization algorithm in future versions of the program.
2. Alphanumeric code
This normalization type is useful when comparing alphanumeric codes such as tax ID numbers, bank accounts and post indexes. All symbols except for numerals and letters are removed from values, allowing you to compare values while ignoring spaces, dashes, slashes and other arbitrary characters that these values may contain. When normalization is applied, the Store normalized value option becomes available when mapping the data set column to a column in an external database.- When this option is enabled, normalized values will be stored in the data set.
- When this option is disabled, original values from the external database will be copied to the data set.
