Class DatasetStats


  • public final class DatasetStats
    extends java.lang.Object
    Stores statistics about existing pairs in a dataset, grouped by subset: the number of pairs with each label and the total number of labeled pairs. Used to determine whether to add another pair to the dataset.
    • Constructor Detail

      • DatasetStats

        public DatasetStats​(FeatureQuery featureQuery,
                            RecPairMap recPairMap,
                            DataPartition partition)
        Calculates and stores all dataset statistics for the given dataset.
        Parameters:
        recPairMap - - stores all pairs in the dataset.
        featureQuery - - stores all features.
    • Method Detail

      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • getPartition

        public DataPartition getPartition()
        Returns:
        the data partition that this object was created for.
      • getFamilyAllSubsets

        public SubsetFamily getFamilyAllSubsets()
        Returns:
        the internal subset family that stores all subsets in the dataset.
      • calcDatasetSubsets

        public static SubsetFamily calcDatasetSubsets​(FeatureQuery fq,
                                                      RecPairMap recPairMap,
                                                      DataPartition partition,
                                                      boolean boolLabelsOnly)
        Finds all subsets present in the specified dataset. Only pairs that have feature values are counted as belonging to a subset.
        Parameters:
        boolLabelsOnly - - if true, only pairs with bool labels are counted.
        Returns:
        a new subset family that contains these subsets.
      • update

        public void update​(RecPair newRecPair)
        Updates statistics with the info about the pair that has been added to this dataset. Must be called after adding each pair to keep statistics current.
        Parameters:
        newRecPair - - the pair that has just been added to this dataset.
        Throws:
        java.lang.IllegalArgumentException - if recPair does not contain a boolean label or feature scores.