Class SubsetLabelPairMap


  • public final class SubsetLabelPairMap
    extends java.lang.Object
    Stores a family (a set) of RLink subsets. Each subset is associated with a map that contains a list of RecPair objects for each boolean Label.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addFeatureValues​(double[] featureValues)
      Converts feature values to RLink subset.
      void addUsefulRecPairs​(DatasetStats trainStats, ModelSettings msDest, DataPartition sourcePartition, int maxTrainSize, int maxReserveSize, java.lang.String fileName, java.util.Date fileLastModified, AddPairsResult ret)
      Add pairs for each subset in this object, using statistics for each label in the same subset in Training dataset.
      java.util.Set<java.util.List<java.lang.Boolean>> getSubsets()  
      boolean isSubsetOfElement​(double[] featureValues)
      Checks if the subset is a subset of this family.
      static java.lang.String subsetToBinary​(java.util.List<java.lang.Boolean> subset)  
      boolean suitableRecord​(java.util.List<java.lang.Boolean> subset, java.util.List<java.lang.String> fieldValues)
      Filters a record in a data table.
      int[] toInts()
      Converts all subsets int the family to integers.
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • SubsetLabelPairMap

        public SubsetLabelPairMap​(FeatureQuery fq)
        Parameters:
        fq - - the feature query that generates the full set of feature values
      • SubsetLabelPairMap

        public SubsetLabelPairMap​(FeatureQuery fq,
                                  java.util.List<RecPair> recPairs)
        Adds all given pairs to the new SubsetLabelPairMap.
        Parameters:
        fq - - the feature query that generates the full set of feature values
    • Method Detail

      • toString

        public java.lang.String toString()
        Returns:
        a string with all RLink subsets in this subset family and a count of record pairs with each label in each subset.
      • addUsefulRecPairs

        public void addUsefulRecPairs​(DatasetStats trainStats,
                                      ModelSettings msDest,
                                      DataPartition sourcePartition,
                                      int maxTrainSize,
                                      int maxReserveSize,
                                      java.lang.String fileName,
                                      java.util.Date fileLastModified,
                                      AddPairsResult ret)
        Add pairs for each subset in this object, using statistics for each label in the same subset in Training dataset. First the label balance is restored. Second, the pairs with the lowest confidence are added while limiting the growth of datasets. If confidence has not been calculated, only the label balance is used, and the dataset size may grow unrestricted.
        Parameters:
        trainStats - - statistics for training dataset. This method updates them with the pairs added to training dataset.
        msDest - - the model settings where new pairs will be added to
        sourcePartition - - if not null, any pair is removed from this (Reserve) partition before adding it to the Training or Validation dataset.
        maxTrainSize - - no more pairs will be added to Train/Vld datasets if the Training dataset reaches this maximum size.
        maxReserveSize - - the maximum number of pairs that can be stored to the Reserve dataset. If sourcePartition is null, pairs that are deemed not useful are added to the Reserve dataset up to this limit. If 0, no pairs are added. If -1, an unlimited number of pairs are added.
        fileName - - the source file that the pair is taken from. Not null.
        fileLastModified - - the last modified date of the pair source file. Not null.
        ret - - the return object where added pairs (for each dataset) and overflow pairs will be stored. Not null. If sourcePartition is given, then no pairs are added to the Reserve dataset or to the list of overflow pairs.
        Throws:
        java.lang.IllegalArgumentException - if sourcePartition is not null or RESERVE, or if any pair being added does not exist in sourcePartition (if it is not null).
      • subsetToBinary

        public static java.lang.String subsetToBinary​(java.util.List<java.lang.Boolean> subset)
        Parameters:
        subset - - the given subset. Not null.
        Returns:
        the binary representation of the given subset, using symbols: 0 - empty, 1 - non-empty, ? - null (nulls should not be present).
      • getSubsets

        public java.util.Set<java.util.List<java.lang.Boolean>> getSubsets()
        Returns:
        unmodifiable set that contains all subsets.
      • toInts

        public int[] toInts()
        Converts all subsets int the family to integers. Each bit is 1 if the feature is present and 0 if it is empty. The lowest bit corresponds to the last feature.
        Returns:
        an array of integers representing each subset.
        Throws:
        java.lang.IllegalStateException - if more than 31 features are used.
      • addFeatureValues

        public void addFeatureValues​(double[] featureValues)
        Converts feature values to RLink subset. Adds the subset to the family, if subset is not empty. If subset is already added, increments its count.
        Parameters:
        featureValues - - values for all model features in feature query.
        Throws:
        java.lang.IllegalArgumentException - if number of feature values does not match the number of querylets in feature query.
      • isSubsetOfElement

        public boolean isSubsetOfElement​(double[] featureValues)
        Checks if the subset is a subset of this family.
        Parameters:
        featureValues - - values for all model features in feature query.
        Returns:
        true if the given subset is a subset of one of the subsets stored in this family.
        Throws:
        java.lang.IllegalArgumentException - if number of feature values does not match the number of querylets in feature query.
      • suitableRecord

        public boolean suitableRecord​(java.util.List<java.lang.Boolean> subset,
                                      java.util.List<java.lang.String> fieldValues)
        Filters a record in a data table. Provides functionality to filter the data table in order to select pairs for the specific RLink subset.
        Parameters:
        subset - - the RLink subset that this filter applies to.
        fieldValues - - all non-key field values of the table record. The fields must match the allFields parameter used to create feature query and features. Nulls and empty strings are considered to be "empty" values.
        Returns:
        true if the given record is potentially suitable for creating record pairs for the specified subset from this subset family
        Throws:
        java.lang.IllegalArgumentException - if subset does not correspond to one of the subsets stored in this subset family.