Describing the search text
Click the Character String tab in the Properties dialog box to describe the corresponding object. To open the Properties dialog box, right-click the element in the FlexiLayout tree and select Properties… from the shortcut menu.Show Properties dialog box, Character String tab
Show Properties dialog box, Character String tab

Describing search text by means of a regular expression
A regular expression defines possible combinations of characters. If you use a regular expression, the hypothesis must meets its conditions. This method is usually used on good quality documents which are recognized without errors. To enter a regular expression, select the Regular expression option and enter the expression in the field next to it. You can also click theRegular expression alphabet
Regular expression alphabet
Name in the list | Symbol in the field | Example |
Any character | * | “k”*“t” – allows ‘kit’, ‘kat’, etc. |
Letter | C | C”at” – allows cat, bat, Rat, mat, etc. |
Upper case letter | A | A”at” – allows Cat, Bat, Rat, Mat, etc.. |
Lower case letter | a | a”at” – allows car, bat, rat, mat, etc. |
Letter or digit | Х | X – allows any single letter or digit. |
Digit | N | N”th” allows 5th, 4th, 6th, etc. |
String | "" | "cat” |
Or | | | “dr”(“i”|“u”)“nk” – allows “drink” or “drunk”. |
Character from the set | [] | [hm]“at” – allows ‘hat’ or ‘mat’. |
Character not from the set | [^] | [^b]“at” – allows ‘cat’, ‘mat’, ‘rat’, but does not allow bat. |
Any number of repeats (applies to the expression or sub-expression to the left) | {-} | [AB74]{-} – allows any combination of A, B, 7, 4 of any length. |
Number of repeats is n | {n} | N{2}“th” allows 25th, 84th, 11th, etc. |
n to m repeats | {n-m} | N{1-3}“th” allows 5th, 84th, 111th, etc. |
0 to n repeats | {-n} | N{-2}“th” allows th, 84th, 4th, etc. |
n or more repeats | {n-} | N{2-}“th” allows 25th, 834th, 311th, 34576th, etc. |
Subexpression | () |
Describing search text by means of an alphabet
An alphabet lists characters that may occur in the search text. This methods is used whenever the character string cannot be described by means of a regular expression or there are too many errors in the recognized text as a result of poor image quality. You can specify several alphabets for a Character String element. If the format of the text is unknown, no alphabets are specified. In this case the program will consider all possible characters when looking for the object corresponding to the element. To describe search text by means of an alphabet:- Select a hypotheses generation mode. To use the characters in the search area to generate all the possible hypotheses, including intersecting and embedded hypotheses, select Allow embedded hypotheses. To generate hypotheses of maximum length, clear Allow embedded hypotheses.
- Create one or more alphabets.
- Click Add… 2. In the Add New Alphabet dialog box, select the required code page from the Code page list, 3. In the Character map, select the characters which occur in the search text. The selected characters and their number are displayed in the Selected on screen/selected in all field. 4. In the Percentage of alphabet characters field, specify the required percentage of alphabet characters in the search text.
- In the Percentage of non-alphabet characters field, specify the allowed percentage of characters which do not belong to any of the alphabets.
- Select Whole words only if you wish to find whole words only.
- Use the Detect words by interword space option to specify how lines should be divided into words. Disable this option to detect words automatically. Enabling this option will divide a line into words whenever the space between neighboring characters is greater than or equal to the value entered in Min interword space.
Note. In the case of automatic word detection, word ends are detected based on spaces or other symbols that separate words (e.g. ” , ”, ” ; ”, ” / ”, ” ? ” - the exact set of symbols depends on the selected pre-recognition language), or based on other attributes. To make sure that the program correctly divides lines into words, review the text objects on the test images (View → Images → Objects → Recognized Words). - In the Word count fields, specify the number of words in the character string. The number of words is specified by means of a fuzzy interval. The default interval is {-1,-1,INF,INF} (i.e. the program looks for hypotheses containing any number of words).
- In the Max space length field, specify the maximum length of the space inside the object. Measured in the user-defined units of measurement. You can estimates the length of the space by looking at the coordinates of the neighboring objects. Rest the mouse cursor on a neighboring objects to display its coordinates in the status bar. When looking for a text, characters will be added to the character string until the distance between neighboring elements exceeds Max space length.
- In the Character count field, specify the length of the character string (i.e. the number of characters in the string). The number of characters is specified by means of a fuzzy interval and assesses the quality of the hypothesis based on its length.
Use the
button to specify fuzzy intervals in a separate window that visualizes fuzzy intervals for your convenience.
