Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt

Use this file to discover all available pages before exploring further.

With the German activity configured, set up the second Extraction Rules activity for Dutch and Belgian sick notes. Dutch and Belgian sick notes have a different structure from the German documents and vary widely across the class, so Fast Learning isn’t viable here. Some fields are unique to these documents — you’ll add them to the data form as you configure the activity. Start with fields that appear on every document, then extend the form for the new ones.
Switch activities without closing the Activity Editor by clicking the current activity name next to the skill name and selecting “Sick Note BE-NL” from the drop-down. Then select the first document in the set.

Extracting the issue date

Dates in these documents can be easily extracted using the Date element, so this time we will use the search element that was created automatically for this field.
1

Add the Date field

  1. Open the Manage Fields dialog on the Fields tab and select a “Date” field to be used in this activity. Click Save.
  2. Go to the Search Elements tab. A search element of type Date has been created for the “Date” field and mapped automatically.
2

Create the IssueDateGroup with the kwDate label

  1. Create a Group search element called “IssueDateGroup” and make it optional.
  2. Add a Static Text element called “kwDate” inside the group — this finds the label that helps locate the actual date.
  3. Since this document class contains documents in Dutch or French, enter the label text options on separate lines in the Text to find dialog: “Date” on the first line and “Datum” on the second line.
  4. Disable the Search for parts of words option.
3

Add the Date element to the group

Drag and drop the “Date” search element into the group and place it under the “kwDate” element.
4

Configure the Date element's search area

  1. Delete the Nearest to relation that was automatically added when the element was created.
  2. Select the “kwDate” element as the one nearest to the element being searched for.
  3. The date can be located to the right of the keyword or below it. Specify the search area below the “kwDate” element.
  4. The search area should also include the line on which the keyword is located. Click the bottom boundary icon to the right of the element name and select Top Boundary of Region. The lines may be uneven, so set the Below value to -10 to extend the search area slightly above the line.
5

Verify the date is found

Click Match to make sure the date is located correctly.The search element structure should look like this:
Search element hierarchy for the Belgian-Dutch issue date: IssueDateGroup containing kwDate keyword and the Date element

Extracting the sickness dates

We’ll extract these dates using Key value elements. The Key value element allows you to search both for a static text label and the value. However, it doesn’t allow too much variation in the value location and properties. In these documents, the sickness dates are formatted so that each date component is in a separate cell of a table. The table cells can be located in non-standard places in each document, but the relative position of the cells is always the same. We can’t count on the table cell boundaries being very clear but we will still use the Table Cell element because it allows for fuzzy borders and will be convenient if we decide to train the activity on more documents. So we’ll use the Group element to organize the search elements hierarchy.
You can use the Table Cell element not only for fields located inside the document tables. It can also be useful if you need to extract data from a form where the content is located in similar boxes or table-like structures. If these boxes have clear dividing lines, the Table Cell element will prove very effective.
1

Add the Start Date and End Date fields

Open the Manage Fields dialog and add the following fields to the current activity:
  • Start Date
  • End Date
Click Save.
2

Create the StartDateGroup with Table Cell elements

Go to the Search Elements tab and create the Group element for the start date extraction. Set the following parameters for the elements included in the group:
ParameterValue
Group search element:
NameStartDateGroup
Static Text search element:
NamekwStartDate
Text to findVanaf / From, A partir du, Van
Search for parts of wordsDisabled
Table Cell search elements:
NameStartDateDay
Search patternNumber
Character count{1, 1, 3, 3}
Search for parts of wordsDisabled
Search areaBelow the “kwStartDate” element, nearest to “kwStartDate”
Table Cell search element:
NameStartDateMonth
Search patternNumber
Character count{1, 1, 3, 3}
Search for parts of wordsDisabled
Search areaBelow the “kwStartDate” element, right of “StartDateDay”, nearest to “StartDateDay”
Table Cell search element:
NameStartDateYear
Search patternNumber
Character count{2, 2, 4, 4}
Search for parts of wordsDisabled
Search areaBelow the “kwStartDate” element, right of “StartDateMonth”, nearest to “StartDateMonth”
The Table Cell element returns the text from the cell as it is. The search pattern uses Number (digits only), so the text returned by the element is a number.
3

Create the EndDateGroup as a copy

  1. Create a copy of the “StartDateGroup” element and rename it to “EndDateGroup”.
  2. Rename the group’s sub-elements: “kwStartDate” to “kwEndDate”, “StartDateDay” to “EndDateDay”, “StartDateMonth” to “EndDateMonth”, “StartDateYear” to “EndDateYear”.
  3. Change the text to find of the “kwEndDate” element to “Tot en met / Till and incl., Jusqu’ au, Tot en met”.
  4. Specify the search area for the “EndDateDay” element: below the “kwEndDate” element and nearest to it. Delete the other relations.
4

Add Data Composition fields for the dates

  1. Open the Manage Fields dialog and add a Data Composition Field called “Start Date Composed”. Map the following elements to the fields:
    • “StartDateDay” to Day
    • “StartDateMonth” to Month
    • “StartDateYear” to Year
    Click Save.
  2. Create a Data Composition Field called “End Date Composed”. Map the following elements:
    • “EndDateDay” to Day
    • “EndDateMonth” to Month
    • “EndDateYear” to Year
    Click Save.
  3. Map the “Start Date Composed” and “End Date Composed” data composition fields to the “Start Date” and “End Date” fields.
The search element structure should look like this:
Search element hierarchy for the Belgian-Dutch sickness dates: StartDateGroup and EndDateGroup, each containing a keyword Static Text and three Table Cell elements for day, month, and year

Extracting the type of sick note

We’ll extract the type of sick note using a checkmark in just the same way as we did for the German documents.
1

Enable the Primary and Secondary checkmarks

Open the Manage Fields dialog on the Fields tab and enable the “Type of Sick Note” checkmark group. Enable the “Primary” and “Secondary” checkmarks in the group to be used in the current activity. Click Save.
2

Create the TypeOfSickNoteGroup and PrimaryGroup label

Build a structure similar to the one for the German documents, but keep in mind that in Dutch and Belgian documents the label goes first — the order of child elements for such groups matters.
  1. Create a Group element called “TypeOfSickNoteGroup”.
  2. Create a copy of this group, rename it to “PrimaryGroup”, and place it inside “TypeOfSickNoteGroup”.
  3. Add a Static Text element called “kwCheckmark” to the “PrimaryGroup” group.
  4. Set the text to find to “eerste / Primary, première, primair”.
In these documents, the text near the checkmark is located to the left of the checkmark, so the search area goes to the left, not to the right.
3

Configure the Checkmark, XMark, and CheckmarkRegion elements

Configure the remaining elements inside “PrimaryGroup” according to this table:
ParameterValue
Static Text search element:
NameCheckmark
Text to findX
Character count{1, 1, 3, 3}
Search for parts of wordsDisabled
Search areaRight of “kwCheckmark”, nearest to “kwCheckmark”
Static Text search element:
NameXMark
Text to findX
Character count{1, 1, 3, 3}
Search for parts of wordsDisabled
Search areaBelow the “kwCheckmark” top boundary, Below value = -15, Left of “kwCheckmark”, Above the “kwCheckmark” bottom boundary, Above value = -15, Nearest to “kwCheckmark”
Under what conditionsDo not find element if “Checkmark” is found
Region search element:
NameCheckmarkRegion
Search Conditions section of the Code Editorif Checkmark.IsFound then RSA: Checkmark.Rect; else if XMark.IsFound then RSA: XMark.Rect; else DontFind;
4

Create the SecondaryGroup and RelapseGroup

  1. Create a copy of “PrimaryGroup” and rename it to “SecondaryGroup”. Change the text to find of its “kwCheckmark” element to “prolongation”, “verlenging”.
  2. Dutch and Belgian sick notes are divided into three types — ‘relapse’ is an additional type compared to German sick notes. Create another copy of “PrimaryGroup” and rename it to “RelapseGroup”.
  3. Change the text to find of the RelapseGroup’s “kwCheckmark” element to “Herval” and enable the Match case option to exclude words occurring in the middle of a sentence.
The search element structure should look like this:
Search element hierarchy for the Belgian-Dutch type-of-sick-note: TypeOfSickNoteGroup containing PrimaryGroup, SecondaryGroup, and RelapseGroup, each with kwCheckmark, Checkmark, XMark, and CheckmarkRegion elements
5

Add the Relapse checkmark and map fields

  1. Open the Manage Fields window and add a “Relapse” checkmark to the “Type of Sick Note” checkmark group. Enable all checkmarks in the group to be used in the current activity and click Save.
  2. Map the checkmarks to the corresponding Region elements and delete the elements that were automatically created when enabling the fields.

Testing the activity

We have configured all the necessary search elements and fields. Select all documents, click Match, and switch to the Fields tab to review the field regions on the document images. Keep in mind that a region will be passed to a field only if it belongs to the hypothesis from the best path. Once you’re satisfied with the results, click the copy icon above the document image to copy predicted labeling to reference labeling.

What’s next

Step 9. Configure business rules

Add business rules to validate and normalize extracted field values.

Tutorial overview

Back to the tutorial introduction.