Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt

Use this file to discover all available pages before exploring further.

Skills for processing unstructured documents can only be built in Advanced Designer; the cloud-based Skill Designer does not support these scenarios. They use four core NLP activities to identify entities, segment text, and extract fields from freeform content like contracts, letters, and emails:
Each of these activities supports a limited set of languages. See the activity’s reference page for the language list.

Pick a scenario

ScenarioWhen to useKey activities
Pre-trained named entities (whole document)Entities can appear anywhere — minimal configuration neededNER (+ Address Parsing)
Pre-trained named entities (specific paragraphs)The entity always sits in a known paragraphSegmentation + NER (or Address Parsing)
Custom named entities (Deep Learning for NLP)Pre-trained can’t disambiguate, or your entity type isn’t coveredSegmentation + Deep Learning for NLP
Each scenario below shares a common pattern; only the activities you add to the processing flow differ.

Common workflow

1

Create a Document skill

Open Advanced Designer and click Create Document Skill on the start page.
2

Upload documents

On the Documents tab, upload the documents you’ll use to set up the skill.
3

Define fields and label

On the Fields tab, create and configure the fields the skill will extract. Label documents in the Reference section.
4

Add and configure NLP activities

On the Activities tab, add the activities for your scenario (described below). Open each activity in the Activity Editor to configure and train it.
5

Test and publish

Click Test Skill Using Selected Documents to evaluate results. When the results are good enough, publish the skill.

Pre-trained named entities (whole document)

Use this scenario when the entities you need can appear anywhere in the document — for example, company names and addresses in a letter. Add a Named Entities (NER) activity and map each named entity to a field. If you also need to break an address into components (street, city, state, country, postal code), add an Address Parsing activity and map the components to fields.
Document processing flow with a Named Entities (NER) activity

Pre-trained named entities (specific paragraphs)

Use this scenario when the entity always sits in the same paragraph — for example, a purchase amount in the price clause of a sales agreement. First isolate the paragraph with a Segmentation activity, then run a Named Entities (NER) or Address Parsing activity on the segmented field. You can also isolate the paragraph with a Fast Learning or NLP Extraction Rules activity instead of Segmentation, then run NER or Address Parsing on the result.
Pre-trained activities are easy to configure and need no training, but a neural network trained on your documents may extract more accurately. If you have a large document set, also try the custom named entities scenario below and pick whichever performs better.
Document processing flow with Segmentation feeding Named Entities (NER) and Address Parsing

Custom named entities (Deep Learning for NLP)

Use this scenario when pre-trained activities can’t disambiguate the entities you need — for example, extracting only one organization’s name from a paragraph that lists both parties to an agreement, or extracting an entity type that NER doesn’t cover (such as an email address). Pair a Segmentation activity with a Deep Learning for NLP activity: Segmentation isolates the paragraph and Deep Learning extracts the targeted fields.
Training a Deep Learning for NLP activity requires at least 50 documents (150 recommended). For best results, also try the pre-trained Named Entities (NER) activity and pick whichever extracts more accurately on your documents.
Document processing flow with Segmentation feeding a Deep Learning for NLP activity

Named Entities (NER) activity

Extract pre-trained entities like names, organizations, and dates from freeform text.

Address Parsing activity

Split addresses into street, city, state, country, and postal code.

Segmentation activity

Isolate the paragraph that contains the data you want to extract.

Deep Learning for NLP activity

Train a neural network to extract custom or hard-to-disambiguate entities.