- Use the validation algorithm provided by ABBYY FineReader Engine. It uses the k-fold cross-validation strategy:
On each iteration, the categorized data provided in the TrainingData object is randomly split into FoldsCount equal parts. Each of the parts, in turn, is used for validation: a model is trained on all the parts except that one, then checked on the last part.
The process is repeated RepeatCount times. From the resulting FoldsCount * RepeatCount models, the one which shows the best F-measure score is returned by the ITrainingResult::Model property, and its scores can be obtained via the ITrainingResult::ValidationResult property.
The number of objects in the training set on each training step will be equal to <total number of objects> * (FoldsCount - 1) / FoldsCount. Note that this number should be at least 4 for text classifier and at least 8 for combined classifier. Make sure that your training sample contains enough objects. - Turn off the validation by setting ShouldPerformValidation to FALSE, train the model on the whole training data set, then test the model’s performance on your side, using the IModel::Classify method on another known data sample.
Properties
| Name | Type | Description |
|---|---|---|
| AveragingMethod | AveragingMethodEnum | The method of calculating the average accuracy, precision, recall, and F-measure scores for classifiers with more than 2 categories. This property is AM\_Macro by default. |
| FoldsCount | int | The number of folds used in the k-fold cross-validation algorithm. The default value of this property is 3. |
| RepeatCount | int | The number of iterations used in the algorithm. The default value of this property is 1. |
| ShouldPerformValidation | VARIANT\_BOOL | Specifies if the trained model should be validated. This property is FALSE by default. |
Related objects
Object Diagram
