Adding Pairs to Underrepresented Subsets
These subsets are determined by the number of record pairs for each subset in the training dataset. The subsets that have very few record pairs also tend to have low prediction confidence values (but confidence depends on other factors as well).
Underrepresented subsets are those that have significantly fewer pairs than other subsets in the training dataset. The training process can cause underrepresented subsets to be influenced too much by other subsets that have more pairs. Therefore, model predictions for the underrepresented subsets can be improved by adding more training record pairs to these subsets.