Skip to main content
Now we are ready to start extracting data. The layouts of German documents don’t vary significantly, so some fields can be extracted using the Fast Learning activity. We will use this method to extract the following data:
  • Date when the sick note was issued.
  • The first day of sickness and the last day of the sick leave.
  • The name of the health insurance company.
  • Doctor’s ID.
  • The following data for the patient:
    • Insurance ID
    • German insurance ID
    • Date of birth
AD_Tutorial_Fields
  1. Double-click the Fast Learning activity in the pipeline.
  2. Go to the Fields tab. You will see a window similar to the Document Skill Editor in Vantage.
  3. Select the first German document.
  4. Label the first field.
    a. Click the icon to add a text field on the data form.
    b. Select the region containing the issue date on the document image. The field will be filled with the text from the region.
    c. Double-click the field name and change it to “Date”.
    d. Click the icon to the right of the field name and change the field type to Date.
    e. Open Advanced field settings and select the Day-Month-Year option in the Acceptable orders of components section.
    f. Click Save.
  5. Repeat step 4 and label the “Start Date” and “End Date” fields.
  6. Repeat steps 4.a-4.c and label the “Health Insurer” field.
  7. Click the icon to create a group. Rename it to “Patient”.
  8. Expand the “Patient” group and click the First group item placeholder. Select the region for the field and rename the field to “Insurance ID”.
  9. Create and label the fields “German Insurance ID” and “Date of Birth” in the “Patient” group. Configure the “Date of Birth” field options as described in step 4.
  10. Repeat steps 5 and 6 to create the “Doctor” group and label the “Doctor ID” field.
  11. Select the next German document in the document set on the left side of the page. Label the fields you created.
  12. Repeat step 11 for all German documents in the document set.
  13. Click Train Activity. After training finishes, the achieved accuracy will be displayed in the header of the Results tab.
  14. If the accuracy is too low, go to the Results tab and fix the extraction issues. This process is similar to fixing the extraction issues for a Document skill in Vantage. Remember to retrain the activity in order to update the extraction results.
We do not extract other data using Fast Learning due to various reasons like the following:
  • The patient’s name and address are located in the same field. The name can occupy one or two lines, and the address may be missing.
  • Location of the checkmarks specifying the sick note type vary.
All these factors lead to poor extraction quality or inability to locate the field using the Fast Learning activity.