Adding Pairs to Subsets that Have Too Few True/False Labels

These are subsets in the training dataset that have a large imbalance between “True” and “False” labeled pairs. To train a model properly, there should be at least some balance between “True” and “False” labeled pairs for each subset. You can either add more pairs with the underrepresented label or, if necessary, delete some pairs with the overrepresented label.

If you are sure that no pair from a certain subset can ever be labeled as "True", it is recommended to mark a pair that belongs to such subset as "Always False" on the Pair Selection tab. For more information, see the section Always False Subsets.

If you use the Low Confidence Pair Finder to automatically find the majority of pairs, note that it tends to find most of the pairs that are assigned a False label. In this case this suggestion can largely be ignored.