Structured documents always include the exact same type of information in the exact same locations. One example of structured documents are pre-formatted forms. You will need to label just a few sample documents for training, as there is no variant in their layout.Use the following guidelines when labeling structured documents:
Be sure to accurately specify the region of each field, as field values alone are not enough for training.
To mark out the region of a field, don’t click on its value, but mark out the entire placeholder instead.
If a field contains no value, mark out the empty placeholder.
If a field consists of multiple parts, hold down the Shift key to add the parts. Please note that all parts should be on the same page.
If a fixed form contains a table, mark out all the rows, including those that are empty.
If a field is added after some labeling has already been done, this new field must be labeled on all the documents in the training set. Please review all of your documents and label the new field on all the documents where it occurs.
Semi-structured documents generally contain the same or similar types of information, but the location, size, and number of fields may vary from document to document. Examples of semi-structured documents include bills, payment orders, and invoices.Use the following guidelines when labeling semi-structured documents:
Be sure to accurately specify the region of each field, as field values alone are not enough for training.
To mark out the region of a field, click on its value (i.e. the word or words it contains), and the region will be created automatically.
If a field contains no value, do not create a region for such a field.
Do not mark out parts of words, as the program can only learn on whole words.
If a field consists of multiple parts, hold down the Shift key to add the parts. Please note that all parts should be on the same page.
If you have a repeating structure, analyze your documents first and create either a table or a repeating group. If your documents contain tables with a common header and values that do not have any keywords next to them, create a table. If your data is less structured and has keywords located next to the values, create a group with the Allow multiple items option. If data is organized differently on different documents, select the option that best fits the majority of the documents.
When labeling a table, mark out the first row, then click Continue table from this row, making sure that the entire table has been labeled correctly. To mark out the cells in the first row, click on its cells one by one, and the corresponding columns will be created automatically. Proceed until all the table has been marked out.
Tip: If tables are large and document pages are similar in appearance, you can delete the similar pages and label the first and the last page and some pages in between.
Do not instruct the program to find fields inside the region of another field, regardless of whether it is an individual field (such as an address) or a table cell (such as “Description”). If you need to extract data from a large text fragment, use the Advanced Designer.
If a field is added after some labeling has already been done, this new field must be labeled on all the documents in the training set. Please review all of your documents and label the new field on all the documents where it occurs.