The Extraction Rules activity allows you to set rules for detecting fields on semi-structured documents and verifying how such rules work on real-life documents. It is usually applied when a field’s location may differ from document to document, complicating the extraction of data, and when you can provide additional information for detecting such fields: e.g. the location of fields relative to other objects on the document or regular expressions specifying search conditions for an object. For example, you can specify that the Invoice Number field may be located either on the right of the image or directly under the words “Order number”, “Order #”, or other similar keywords. We also recommend adding a Fast Learning activity to the processing flow, enabling Online Learning to collect runtime documents, which will automatically rebuild the skill later via machine learning.Documentation Index
Fetch the complete documentation index at: https://docs.abbyy.com/llms.txt
Use this file to discover all available pages before exploring further.
Use cases
Add the Extraction Rules activity to your document processing flow in the following cases:- When your document set isn’t streamlined enough to use a Fast Learning activity to extract data, you don’t have enough documents to train a Deep Learning activity, and the documents have a known structure which you can formalize.
- When you want greater control over the AI, analyzing the prediction results of the Deep Learning and Fast Learning activities before transferring those values into document fields. For example, if you expect to extract a number located close to some keyword, you can filter out hypotheses that don’t appear to be a number and hypotheses that are not located near the keyword. Generally, if post-processing with rules is required, this usually indicates that the training set for the Deep Learning and Fast Learning activities should be expanded, because machine learning technologies can “feel out” and learn a field’s data type, typical location, and surroundings.
- When you have a FlexiLayout file from ABBYY FlexiLayout Studio which you want to reuse. For more information, see Importing FlexiLayouts from ABBYY FlexiLayout Studio.
- When your documents contain complex structures (e.g. nested tables, which are repeating structures inside other tables) which can’t be extracted by other activities targeted at semi-structured documents.
