Reviewing Record Pairs

All saved record pairs are listed in the Pairs tab.

Figure 29: List of Existing Pairs

In this tab, you can perform the following functions:

Assign or change the Current Label

The label of a selected pair can be changed from the drop-down list provided in the Current Label column.

Mark a pair for future review

Use the checkbox in the Review column. The marked pairs can be reviewed later, even after saving and loading the project. It is convenient to use the filter function to display only the pairs that need to be reviewed.

Delete and Delete All

The Delete button deletes the selected pair. The Delete All button deletes all the pairs in the project.

Show/Hide Fields

You can select which fields should be displayed in the grid based on your requirements. Use the Show/Hide Fields button or the Hide Field item in the column context menu. Selecting the checkbox for the field makes it visible in the grid, and vice versa.

Feature Scores

Feature scores appear only after you train a model. Feature scores for a selected record pair represent the entire input information that is used to train or validate the model with that pair. One feature score is calculated for each model feature. A feature score indicates the degree of match between the field values that are included in the model feature. It is a real number ranging from 0.0 to 1.0. A feature score can also be equal to -1.0. The meanings of feature score values are:

-1.0

Indicates that all the field values included in the feature are empty or invalid in one or both records of the pair. However, if the Match Empty Values checkbox is selected for this feature, and the field values are empty in both records, then this results in the 1.0 score.

Exception: for a Predicate feature, this value indicates that at least one field used in the predicate expression is empty or invalid.

0.0

Indicates that there was no common data.

1.0

Indicates an exact match for this model feature.

Filtering and sorting record pairs

Clicking the Filter button displays a Filter Pairs dialog which can be used to filter the list of pairs that are displayed in the grid. Any active filters are indicated by filter icons on the appropriate column titles in the Record Pairs and Feature Scores grids.

Figure 30: Filter Pairs dialog

Current label

Shows pairs where the current label matches the selected item.

Training Label

Shows pairs where the label at the time the model was last trained matches the selected item.

Latest Prediction Result

Shows pairs where the prediction from the last training run either matches (Classified correctly) or does not match (Classified incorrectly) the Training Label.

Needs Review

Shows pairs based on the “Review” checkbox in the grid.

Dataset

Show pairs in the indicated dataset. You can see which pairs are used to train the model and which pairs are used to validate the model. You can also filter the pairs that represent Always False subsets.

Prediction Score

Show pairs based on the model score from the latest training run. The model score is the score output by the model for this pair. By default a score of 0.5 or above is considered a match, a score less than 0.5 is considered a non-match.

Filter based on feature scores

This filter restricts the list of pairs to a specific score value or a range of values for each feature. The “Empty” and “Non-empty” filter types can be used for each feature to filter the list of pairs for a specific subset (see the Subsets for more information). First, select the type of the filter. If the Non-empty filter type is selected, you can edit the Min Score and Max Score values to reduce the range of the displayed feature scores.

In addition to using the Filter dialog, you can see and change filters for the Review, Dataset, Current Label, Training Label, and Prediction columns by right-clicking the column title and selecting a filter for that column. You can also sort the list of record pairs by any fixed column using the same column context menu.

Additional filters are available by right-clicking any record pair in the list:

Show pairs for same subset

This shows a filtered list of pairs that belong to the same subset of present feature scores. This filter is useful to analyze all pairs for a single subset to find similar pairs and any inconsistently labeled pairs. A check mark is displayed at this item of the context menu when this filter is active. You can select the same menu item to clear the filter.

Show confirming pairs

This shows a filtered list of pairs with feature scores that confirm the label of the selected pair. Such pairs provide evidence to the model that helps correctly classify the selected pair. The selected record pair is always included in the list of confirming pairs. If a pair from the validation dataset has no confirming pairs other than itself, this might be the reason why the model prediction is incorrect. Adding similar training pairs that confirm the selected pair can teach the model to classify the selected pair correctly.

Show contradicting pairs

This shows a filtered list of pairs with feature scores that contradict the label of the selected pair. For example, if the selected pair is labeled "False", then another pair where all feature scores are lower than the ones in the selected pair should not be labeled True, because the second pair is “even more false“. If any pair has contradicting pairs, this is the likely reason why the model prediction is incorrect. You must review all contradicting pairs and change the labels of the contradicting pairs or the label of the selected pair appropriately to make all labels consistent.

Note: The applied filter for the same subset, confirming or contradicting pairs is shown in the Filter dialog as a collection of filters that are based on feature scores and the label. You can review and further modify these filters in the Filter dialog.

Back

By using the Back button you can view the previously used filters such as, list of errors, confirming pairs, and contradicting pairs filters.

Note: The filter history is not saved in the Learn project, thus opening the same project again does not have access to the history of previous filters.
Note: The state with no filters such as after clicking the Reset button, is also treated as one of the filters in the filter history.

Reset

The Reset button removes all current filters and sort orders to display the list of all saved pairs. It also removes any automatic filters that might have been applied while processing model training suggestions or by clicking on links in the Training or Trained Model tabs.