Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt

Use this file to discover all available pages before exploring further.

A “mixed” document set can mean two things in Advanced Designer:
  • A single set that contains both semi-structured and unstructured documents (different document types).
  • A single document that contains mixed structure — for example, an unstructured contract with embedded tables, titles, headers, or footers.

Pick a scenario

ScenarioWhen to useKey activities
Semi-structured + unstructured in one setBoth belong to one logical type with shared output fieldsClassify + IF + Fast Learning + Segmentation + Deep Learning for NLP
Table cells with embedded fieldsExtract values from inside table cells (e.g., names in a Closing Disclosure)Fast Learning + NER (+ Address Parsing)
Unstructured with embedded tables/titles/headers/footersMostly unstructured documents with semi-structured fragmentsSegmentation + Extraction Rules
Each scenario below shares a common pattern; only the activities you add to the processing flow differ.

Common workflow

1

Create a Document skill

Open Advanced Designer and click Create Document Skill on the start page.
2

Upload documents

On the Documents tab, upload the documents you’ll use to set up the skill.
3

Define fields and label

On the Fields tab, create and configure the fields the skill will extract. Label documents in the Reference section.
4

Add and configure activities

On the Activities tab, add the activities for your scenario (described below). Open each activity in the Activity Editor to configure and train it.
5

Test and publish

Click Test Skill Using Selected Documents to evaluate results. When the results are good enough, publish the skill.

Semi-structured and unstructured documents in one set

Use this scenario when one Document skill must process both semi-structured and unstructured documents — both belong to the same logical type and share the same set of output fields. Classify each document with a Classify By Text and Image activity, which combines text and geometry to handle low-quality images and documents that differ only by graphic features (signatures, seals). For best results, upload a roughly equal number of documents for each variant so the classifier has balanced training data. Then branch the flow with an IF activity:
Document processing flow with Classify and IF branching into Fast Learning and Segmentation + Deep Learning for NLP

Table cells with fields embedded in cell text

Use this scenario when you need to extract specific values from inside table cells in semi-structured documents — for example, a borrower’s name and partial address embedded in a Closing Disclosure cell. Extract the cell as one block of text with a Fast Learning activity, then run NLP activities on that block to pull out the embedded fields:
Document processing flow with Fast Learning feeding Named Entities (NER) and Address Parsing activities

Unstructured documents with tables, titles, headers, or footers

Use this scenario for documents that are mostly unstructured (for example, contracts) but contain embedded semi-structured fragments such as tables, titles, headers, or footers. Detect plain-text paragraphs with a Segmentation activity and detect the semi-structured fragments with an Extraction Rules activity. Once each fragment is isolated, use the appropriate activity to extract its fields.
Sample document with paragraphs of unstructured text alongside a semi-structured table

Classify By Text and Image

Classify documents by combining text and visual features.

Fast Learning activity

Extract fields from semi-structured documents and table cells.

Segmentation activity

Isolate the paragraphs that contain unstructured fields.

Deep Learning for NLP activity

Extract custom or hard-to-disambiguate entities from unstructured text.

Named Entities (NER) activity

Extract pre-trained entities like names, organizations, and dates.

Extraction Rules activity

Define rule-based extraction for semi-structured fragments.