Training a Learn Model

Training a Learn model is designed to teach the model to predict the labels of novel record pairs based on the example labeled pairs provided in the Training dataset.

If the predicted label and the assigned label do not match, the trained model generates a classification error. Classification errors are made for a number of reasons: the pair might be mislabeled, there is a human error in labeling similar pairs consistently, or the subset or situation that this pair belongs to is not represented or underrepresented in the training dataset. Examine the pairs with errors and change some labels to make all labels consistent. You can use the filter to display pairs that contradict any misclassified pair on the Pairs tab, see the section Filtering and sorting record pairs. You can also add more similar pairs so that the model learns this situation better.

The model is typically trained in two-passes, the first pass is indicated by thin graph lines and the second pass is indicated by broad graph lines. The goal of training is to minimize the validation error rate, thus the second pass stops at the iteration where the validation error rate is the lowest. If there are several iterations with equal lowest validation error rate, the second pass stops at the iteration where the training error rate is the lowest. The second pass is not used if the last iteration is already the best iteration.

Run Training

This button starts the model training process that usually takes many iterations. Statistics of the model are displayed after each training iteration.

 

Figure 33: Run Training

End Training

Click the End Training button to terminate model training. If training iterations have started, the training stops after completing the current training iteration.

Note: The correct prediction rate of such model is most likely not optimal, hence stopping the training prematurely is not recommended. However, you can still save this model if you are satisfied with the model training results.

 

Save Model and Generate Suggestions

Click the Save Model button to save the current model and model scores for each record pair. Suggestions to improve the model are generated and displayed. Click the suggestion link to perform the suggested action. After opening the project again, you can click the Generate Suggestions button to display the list of suggestions without retraining the model.

Model Training Results

After the model training ends, one of the following training results is displayed in the Note field:

Best iteration was found

The iteration with the lowest validation error rate was found. This is the best result.

All training pairs were predicted correctly

The model was able to correctly predict the labels of all pairs in the training dataset, and the validation error rate at the last iteration is the lowest. The second training pass is not needed. Review any incorrectly classified pairs in the validation dataset and then add more pairs that are similar to the incorrectly classified pairs.

User ended the training

The model training was terminated prematurely. It is best to not use models, which are not optimal.

Note: The results of the training might be different from the results of training the model in the same Learn project using a previous version of TIBCO Patterns.

 

Results for Validation Dataset

The following Validation dataset results are displayed during and after the training. Most results display a percentage followed by the number of pairs in brackets:

Error Rate: gives the proportion of incorrectly classified record pairs.

False Positive Rate: gives the proportion of record pairs that are labeled ’False’, but are classified as ’True’.

False Negative Rate: gives the proportion of record pairs that are labeled ’True’, but are classified as ’False’.

No Confidence Predictions: gives the proportion of record pairs that cannot be reliably predicted by this trained model because they have zero confidence and belong to an untrained or completely contradictory area of the model.

Average Confidence: displays the mean confidence of all model predictions for record pairs. The model reports each predicted score with a certain confidence value between 0.0 (for an untrained or completely contradictory prediction) and 1.0 (for a very confident prediction). The prediction confidence for a pair from a specific subset is higher when more pairs with similar feature scores and non-contradicting labels for that subset are used in training.

Click any link in the Validation dataset section to display the appropriately filtered list of pairs. For more information, see Reviewing Record Pairs.

Figure 34: Errors in Validation Dataset After Clicking Show Errors Link

Results for Training Dataset

The following result for the Training dataset is displayed during and after the training:

Error Rate: gives the proportion of incorrectly classified record pairs.

Clicking the Review Labels link provided in this section displays the incorrectly predicted pairs in the Training dataset. Refer to Reviewing Record Pairs for more information. If there are errors made in the Training dataset, you should always review the labels of these pairs to make sure they are correct and consistent.