Skip to main content
The program looks for tables and divides them into columns and rows by relying on the Separators and White Gaps on the image. Additionally, table headers and footers are used to facilitate table detection. Headers and footers serve as top and bottom boundaries for table bodies, No information is extracted from them. The header contains the names of the columns which may be used to divide the table into columns. Once a table has been detected, the program formulates hypotheses for the entire table element, its columns, rows, and cells. A Table hypothesis has the following properties:
PropertyDescription
Element nameThe full name of the element.
PageThe number of the page on which the element was detected.
Surrounding rectThe coordinates of the rectangle which surrounds the region of the hypothesis.
WidthThe width of the region of the hypothesis.
HeightThe height of the region of the hypothesis.
Header foundShows whether the table header has been found.
Footer foundShows whether the table footer has been found.
Body foundShows whether the table body has been found.
Order nameThe name of the detected order of columns in the table.
DetectedShows whether the object described by the element has been found (true) or whether a null hypothesis has been formulated (false).
From the best pathShows whether the found hypothesis belongs to the best path in the tree of hypotheses (true) or not (false).
Pre-search qualityHow well the hypothesis matches the properties of the element specified by the settings in the Properties dialog box and by the code in the Advanced pre-search relations field.
Post-search qualityThe quality of the hypothesis after the conditions in the Advanced post-search relations field have been applied.
Chain qualityThe quality of the chain of hypotheses, from the first subelement of the group to the current subelement. Chain quality is calculated by multiplying the qualities of all the subelements in the chain and is used to compare rival chains of hypotheses.
A Table header hypothesis has the following properties:
PropertyDescription
Element nameThe full name of the element.
PageThe number of the page on which the element was detected.
Surrounding rectThe coordinates of the rectangle which surrounds the region of the hypothesis.
WidthThe width of the region of the hypothesis.
HeightThe height of the region of the hypothesis.
Column name listShows the found table columns.
DetectedShows whether the object described by the element has been found (true) or whether a null hypothesis has been formulated (false).
From the best pathShows whether the found hypothesis belongs to the best path in the tree of hypotheses (true) or not (false).
Pre-search qualityHow well the hypothesis matches the properties of the element specified by the settings in the Properties dialog box and by the code in the Advanced pre-search relations field.
Post-search qualityThe quality of the hypothesis after the conditions in the Advanced post-search relations field have been applied.
Chain qualityThe quality of the chain of hypotheses, from the first subelement of the group to the current subelement. Chain quality is calculated by multiplying the qualities of all the subelements in the chain and is used to compare rival chains of hypotheses.
A Table footer hypothesis has the following properties:
PropertyDescription
Element nameThe full name of the element.
PageThe number of the page on which the element was detected.
Surrounding rectThe coordinates of the rectangle which surrounds the region of the hypothesis.
WidthThe width of the region of the hypothesis.
HeightThe height of the region of the hypothesis.
DetectedShows whether the object described by the element has been found (true) or whether a null hypothesis has been formulated (false).
From the best pathShows whether the found hypothesis belongs to the best path in the tree of hypotheses (true) or not (false).
Pre-search qualityHow well the hypothesis matches the properties of the element specified by the settings in the Properties dialog box and by the code in the Advanced pre-search relations field.
Post-search qualityThe quality of the hypothesis after the conditions in the Advanced post-search relations field have been applied.
Chain qualityThe quality of the chain of hypotheses, from the first subelement of the group to the current subelement. Chain quality is calculated by multiplying the qualities of all the subelements in the chain and is used to compare rival chains of hypotheses.
A Table body hypothesis has the following properties:
PropertyDescription
Element nameThe full name of the element.
PageThe number of the page on which the element was detected.
Surrounding rectThe coordinates of the rectangle which surrounds the region of the hypothesis.
WidthThe width of the region of the hypothesis.
HeightThe height of the region of the hypothesis.
Order nameShows the name of the found column order.
Found columnsShows the names of the found columns.
Rows numberShows the number of rows found in the table.
DetectedShows whether the object described by the element has been found (true) or whether a null hypothesis has been formulated (false).
From the best pathShows whether the found hypothesis belongs to the best path in the tree of hypotheses (true) or not (false).
Pre-search qualityHow well the hypothesis matches the properties of the element specified by the settings in the Properties dialog box and by the code in the Advanced pre-search relations field.
Post-search qualityThe quality of the hypothesis after the conditions in the Advanced post-search relations field have been applied.
Chain qualityThe quality of the chain of hypotheses, from the first subelement of the group to the current subelement. Chain quality is calculated by multiplying the qualities of all the subelements in the chain and is used to compare rival chains of hypotheses.

More:

Working with tables Search area Additional search constraints