Skip to main content
The Deep Learning activity for semi-structured documents is designed to build cognitive skills of a production-level quality to extract fields from semi-structured documents using neural networks.
Note: This activity can’t extract complex structures (for example, nested tables, which are repeating structures inside other tables) and fields of type other than Text. To extract such structures, use the Extraction Rules activity.

Use Cases

Add this activity to your document processing flow when:
  • Your skill will be used to process multiple variants of a certain document type.
  • You are planning to process document variants on which your skill has not yet been trained. For example, you may have a Document skill with a Fast Learning activity which has been trained to extract fields from loan agreements (with different field structures) coming from several different banks. If you decide to use this existing skill to process loan agreements from a new bank yet unknown to the skill, the extraction quality may be below par. To improve extraction quality, you can use a Deep Learning activity instead of a Fast Learning activity.

How It Works

Deep Learning combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Natural Language Processing (NLP) tokens. Through this combination, Deep Learning understands image patterns, the structure of documents, field contents, and surrounding labels. It requires a large number of documents to train, but it generalizes to new document layouts it has not encountered yet, providing a true templateless approach to extraction, which is the only way to deal with documents for which no exhaustive set of layouts is available at the training stage.

Training Requirements

For best results, it is essential to correctly label as many documents as possible. The number of sample documents used for training significantly affects the quality of field extraction. The recommended number of sample documents is as follows:
  • For high-variability documents: At least 200-300 sample documents (2-3 sample documents per variant) are required.
  • For low-variability documents: At least 100 sample documents are required.
The minimum requirement is 100, but it is recommended to have 1,000 to 10,000 labeled documents, making sure that your training set contains approximately equal numbers of all the document variants you intend to process (ideally, at least a few samples of each variants). You need not provide all the possible variants, but the technology needs to see enough varied documents to derive patterns and generalize to variants it has not encountered yet. For example, in the case of invoices, the technology is expected to generalize well to new suppliers when the training set has 500 to 1,000 different suppliers, with two to three sample documents from each in the training set. While Deep Learning tends to generalize, it is beneficial to include the most popular variants of the document in the training set, for example, the suppliers providing the largest number of invoices.

Training Characteristics

Unlike the Fast Learning activity, which is trained on a smaller number of documents and intended for more streamlined document sets, training the Deep Learning activity takes a lot longer and requires more system resources (which is currently 16 CPU cores and 64 GB of RAM). Training the neural network is an iterative process. Each iteration is called an epoch. At the beginning of an epoch, the document set is divided into a training subset and a validation subset. During an epoch, all documents from the training subset are passed through a training algorithm. Then, the neural network performance is evaluated using the validation subset, and the metrics for each field and the entire document set are updated. For more information, see Setting up a Deep Learning activity.