Skip to main content
ParameterDescription
String Value()The value of the text on the image within the region of the hypothesis.
The program considers all the text objects which horizontally intersect the search area (vertically the objects must fit within the search area in their entirety). The text objects are then grouped into lines. Lines are built left to right. The program stops building a line when the maximum length of space (set in the Max. space length property) is exceeded. In the resulting lines, the program identifies character strings, each of which contains characters only from one of the user-defined character sets. In a similar fashion, the program divides lines into fragments. Next, the program formulates a hypothesis for each of the fragments. Depending on whether the Allow embedded hypotheses is selected or not, hypothesis are formulated on two different principles. Suppose the program detected three fragments at a previous stage. If the Allow embedded hypotheses is selected, hypotheses are formulated as follows: hypothesis 1: fragment 1 hypothesis 2: fragment 1 + fragment 2 hypothesis 3: fragment 1 + fragment 2 + fragment 3 hypothesis 4: fragment 2 hypothesis 5: fragment 2 + fragment 3 hypothesis 6: fragment 3 For each hypothesis, the program will check that the portion of characters of each character set does not exceed the value set in the Portion in text, % field. Similarly, the program checks that the percentage of non-alphabet characters does not exceed the value set in the Allowed errors field. If it least one of the checks fails, no hypothesis is formulated. If the Allow embedded hypotheses is not selected, the embedded hypotheses in the list above will be discarded. Embedded hypotheses are those which are contained within another hypothesis in the list above. If the checks were successful for all of the hypotheses, only the following hypothesis will remain: fragment 1 + fragment 2 + fragment 3. Thus, if the Allow embedded hypotheses is not selected, the program formulates hypotheses of maximum length which meet all of the conditions. Even though embedded hypotheses are excluded, hypotheses may intersect. This may be a stand-alone character or word, or a string of characters which are part of other hypotheses but for which no separate hypotheses have been formulated. For example, the program may formulate two hypotheses (i.e. two strings) - one ending in a certain word or phrase and another starting with that word or phrase. E.g. hypothesis 1: fragment 1 + fragment 2 hypothesis 2: fragment 2 + fragment 3 Once all the possible hypotheses have been generated, the program calculates the Search condition quality for each (this is an estimate of how well a hypothesis meets the search constraints set in the Search Conditions). At this stage, the quality is calculated based on whether the length of the hypothesis in characters falls within the fuzzy interval specified in the Character count property, on whether the length of the total gap in the line falls within the fuzzy interval specified in TotalGapLength, and on whether the number of words in the line fall within the fuzzy interval specified in the Word count. The overall quality of a hypothesis is calculated by multiplying all the qualities.