Skip to main contentOnline learning happens at runtime. As more documents are processed, some will be collected by the program and put in the training set and the test set of the skill. The skill will then improve on these sets in real time.
Online learning is available for Document skills and Classification skills.
Note: Online learning is disabled by default. For information on how to enable this feature, see Enabling Online Learning.
How Online Learning works
Note: This section assumes that your Process skill includes a manual review stage and that the Online learning feature has been enabled.
The Online learning process can be outlined as follows:
- New documents are collected and put either into the training set or into the test set.
- A learning session is started using the training set.
- The skill is tested.
Step 1. How documents are collected
Documents will be collected as follows:
- Online learning will start collecting documents as soon as it receives the first corrected document from a Manual Review Operator.
- For a Document skill, this will be the first document where the region of at least one field has been corrected.
- For a Classification skill, this will be the first document whose type has been changed.
- After the first document is obtained, the following documents will be collected:
- All documents that have passed through manual review.
- Some documents that haven’t passed through manual review (their share won’t exceed 33% of all documents in the training set and the test set combined).
- As new documents are collected, they will be put either in the training set or in the test set.
- The maximum number of documents in the training set is 10,000. The maximum number of documents in the test set is 1,000.
Note: These limits may be exceeded if the training set already contains more than 10,000 at the time when Online learning starts collecting new documents. If this is the case, each new document added to an overfilled set will replace the oldest existing document in that set.
- Documents will be put only into the training set until the number of documents in the training set reaches 30. Once this number is reached, documents will be put either into the training set or into the test set.
- Until both sets are full, each new document has an 80% chance of ending up in the training set and a 20% chance of ending up in the test set.
- When one of the sets is full, new documents will be put into the other set until it also becomes full.
- When both sets are full, new documents can still be put into either set, replacing the oldest existing documents.
- When both sets are full, each new document has a 20% chance of ending up in one of the sets and an 80% chance of being discarded.
- When both sets are full, each new document that hasn’t been discarded has an 80% chance of ending up in the training set and a 20% chance of ending up in the test set, replacing the oldest existing document in either set.
Step 2. When a learning session is started
- If this is the first learning session after the skill version was published, it will start once the document set receives 10% of new documents. For example, if there are a total of 95 documents in the document set, a new learning session will start after 10 new documents are added.
- If the last learning session was successful and the skill was updated, a new session will start under the same conditions as for the first session.
- If the last learning session was unsuccessful and the skill wasn’t updated, a new learning session will start once the document set receives 5% of new documents. For example, if there are a total of 95 documents in the document set, a new learning session will start after 5 new documents are added.
Step 3. How the skill is tested
The skill will be updated when Online learning leads to at least a 1% increase in accuracy.
The accuracy of the skill will be tested as follows:
- If there are at least 20 documents in the test set, the skill will be tested on the test set.
- If there are fewer than 20 documents in the test set:
- A Document skill will be tested on both the training set and the test set.
- For a Classification skill, if there aren’t enough documents in the document set (if each class has fewer than five documents), the skill will be tested on both the training set and the test set. If there are enough documents, cross-validation will be used to evaluate the accuracy.
After that, more documents are collected and then a new learning session is started.
Note: Online learning doesn’t create a new version of the skill. A change of version only occurs when a skill is published. See Publishing a skill.