Extracting the issue date
Dates in these documents can be easily extracted using the Date element, so this time we will use the search element that was created automatically for this field.- Open the Manage Fields dialog on the Fields tab and select a “Date” field to be used in this activity. Click Save.
- Go to the Search Elements tab. You will see a search element of type Date created for the “Date” field. It is mapped to the field automatically.
- Create a Group search element called “IssueDateGroup”. Make the element optional.
- Add a Static Text element called “kwDate” to find the label which will help us locate the actual date.
- This document class contains documents in Dutch or French language, so there are several options for the label text. You can enter each option on a new line in the Text to find dialog. Enter the text “Date” on the first line and “Datum” on the second line.
- Disable the Search for parts of words option.
- Drag and drop the “Date” search element into the group and place it under the “kwDate” element.
- Specify the search area for the “Date” element. a. Delete the Nearest to relation that was automatically added when the element was created. b. Select the “kwDate” element as the one nearest to the element we’re searching for. c. The date can be located to the right of the keyword or below it. Specify the search area below the “kwDate” element. d. The search area should also include the line on which the keyword is located. Click the bottom boundary icon to the right of the element name and select Top Boundary of Region. The lines may be uneven, so set the Below value to -10 to extend the search area a little bit above the line.
- Click Match to make sure the date is located correctly.
Extracting the sickness dates
We’ll extract these dates using Key value elements. The Key value element allows to search both for a static text label and the value. However, it doesn’t allow too much variation in the value location and properties. In these documents, the sickness dates are formatted so that each date component is in a separate cell of a table. The table cells can be located in non-standard places in each document, but the relative position of the cells is always the same. We can’t count on the table cell boundaries being very clear but we will still use the Table Cell element because it allows for fuzzy borders and will be convenient if we decide to train the activity on more documents. So we’ll use the Group element to organize the search elements hierarchy. Note: You can use the Table Cell element not only for fields located inside the document tables. It can also be useful if you need to extract data from a form where the content is located in similar boxes or table-like structures. If these boxes have clear dividing lines, the Table Cell element will prove very effective.-
Open the Manage Fields dialog and add the following fields to the current activity:
- Start Date
- End Date
- Go to the Search Elements tab and create the Group element for the start date extraction. Set the following parameters for the elements included in the group:
| Parameter | Value |
|---|---|
| Group search element: | |
| Name | StartDateGroup |
| Static Text search element: | |
| Name | kwStartDate |
| Text to find | Vanaf / From, A partir du, Van |
| Search for parts of words | Disabled |
| Table Cell search elements: | |
| Name | StartDateDay |
| Search pattern | Number |
| Character count | {1, 1, 3, 3} |
| Search for parts of words | Disabled |
| Search area | Below the “kwStartDate” element, nearest to “kwStartDate” |
| Table Cell search element: | |
| Name | StartDateMonth |
| Search pattern | Number |
| Character count | {1, 1, 3, 3} |
| Search for parts of words | Disabled |
| Search area | Below the “kwStartDate” element, right of “StartDateDay”, nearest to “StartDateDay” |
| Table Cell search element: | |
| Name | StartDateYear |
| Search pattern | Number |
| Character count | {2, 2, 4, 4} |
| Search for parts of words | Disabled |
| Search area | Below the “kwStartDate” element, right of “StartDateMonth”, nearest to “StartDateMonth” |
- Create a copy of the “StartDateGroup” element and rename it to “EndDateGroup”.
- Rename the group’s sub-elements: “kwStartDate” to “kwEndDate”, “StartDateDay” to “EndDateDay”, “StartDateMonth” to “EndDateMonth”, “StartDateYear” to “EndDateYear”.
- Change the text to find of the “kwEndDate” element to “Tot en met / Till and incl., Jusqu’ au, Tot en met”.
- Specify the search area for the “EndDateDay” element. It should be located below the “kwEndDate” element and nearest to it. Delete the other relations.
-
Open the Manage Fields dialog and add a Data Composition Field called “Start Date Composed”. Map the following elements to the fields:
- “StartDateDay” to
Day - “StartDateMonth” to
Month - “StartDateYear” to
Year
- “StartDateDay” to
-
Create a Data Composition Field called “End Date Composed”. Map the following elements to the fields:
- “EndDateDay” to
Day - “EndDateMonth” to
Month - “EndDateYear” to
Year
- “EndDateDay” to
- Map the “Start Date Composed” and “End Date Composed” data composition fields to the “Start Date” and “End Date” fields.
Extracting the type of sick note
We’ll extract the type of sick note using a checkmark in just the same way as we did for the German documents.- Open the Manage Fields dialog on the Fields tab and enable the “Type of Sick Note” checkmark group. Enable the “Primary” and “Secondary” checkmarks in the group to be used in the current activity. Click Save.
- Build a structure similar to what was built for the German documents, but keep in mind that in Dutch and Belgian documents the label (the text near the checkmark) goes first. The order of child elements for such groups does matter. a. Create a Group element called “TypeOfSickNoteGroup”. b. Create a copy of this group and rename it to “PrimaryGroup”. Place it inside “TypeOfSickNoteGroup”. c. Add a Static Text element called “kwCheckmark” to the “PrimaryGroup” group. d. Set the text to find to “eerste / Primary, première, primair”.
| Parameter | Value |
|---|---|
| Static Text search element: | |
| Name | Checkmark |
| Text to find | X |
| Character count | {1, 1, 3, 3} |
| Search for parts of words | Disabled |
| Search area | Right of “kwCheckmark”, nearest to “kwCheckmark” |
| Static Text search element: | |
| Name | XMark |
| Text to find | X |
| Character count | {1, 1, 3, 3} |
| Search for parts of words | Disabled |
| Search area | Below the “kwCheckmark” top boundary, Below value = -15, Left of “kwCheckmark”, Above the “kwCheckmark” bottom boundary, Above value = -15, Nearest to “kwCheckmark” |
| Under what conditions | Do not find element if “Checkmark” is found |
| Region search element: | |
| Name | CheckmarkRegion |
| Search Conditions section of the Code Editor | if Checkmark.IsFound then RSA: Checkmark.Rect; else if XMark.IsFound then RSA: XMark.Rect; else DontFind; |
- Open the Manage Fields window and add a “Relapse” checkmark to the “Type of Sick Note” checkmark group. Enable all checkmarks in the group to be used in the current activity and click Save.
- Map the checkmarks to the corresponding Region elements and delete the elements that were automatically created when enabling the fields.
