Class AddPairsResult


  • public final class AddPairsResult
    extends java.lang.Object
    Contains sets of pairs that were added to each dataset, and collections of pairs that were not added to the project for various reasons. Objects of this class are returned from methods that add useful pairs from a given list. Pairs identified as useful are added to one of the datasets that are used for training. The pairs that are not currently useful are added to Reserve dataset up to its size limit, the rest of them are added to the overflow set in this class. Pairs that are already found in the project are stored here in separate lists for cases when: the labels match, are opposite, or the pair in the project has no boolean label. Any duplicate pairs in the given list that are not in the project are stored in the duplicates list.
    • Constructor Detail

      • AddPairsResult

        public AddPairsResult()
        Creates object with empty collections of pairs.
    • Method Detail

      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • addToUsedDataset

        public DataPartition addToUsedDataset​(RecPair recPair,
                                              ModelSettings msDest,
                                              DataPartition sourcePartition,
                                              java.lang.String fileName,
                                              java.util.Date fileLastModified)
        Adds or moves the record pair to a dataset used for training in msDest. Remembers the added pair in this object.
        Parameters:
        msDest - - the model settings where new pairs will be added to
        sourcePartition - - if not null, the pair is removed from this (Reserve) partition before adding it to the Training or Validation dataset.
        fileName - - the source file that the pair is taken from. Not null.
        fileLastModified - - the last modified date of the pair source file. Not null.
        Returns:
        the dataset to which the record pair was added.
        Throws:
        java.lang.IllegalArgumentException - if sourcePartion is not null or Reserve, or if given record pair is not found in sourcePartition, or if rec pair with same keys has already been added to a used dataset, or if the record pair contains incorrect number of fields or feature values.
        java.lang.IllegalStateException - if key field or field definitions are not assigned.
      • addToReserveOrOverflow

        public void addToReserveOrOverflow​(java.util.List<RecPair> recPairs,
                                           ModelSettings msDest,
                                           int maxReserveSize,
                                           java.lang.String fileName,
                                           java.util.Date fileLastModified)
        Adds given pairs to the Reserve dataset, or to the overflow list if the size of the Reserve dataset exceeds the maximum.
        Parameters:
        recPairs - - the pairs to be added to Reserve or overflow.
        msDest - - the model settings where new pairs will be added to
        maxReserveSize - - the maximum number of pairs that can be stored to the Reserve dataset. If sourcePartition is null, pairs that are deemed not useful are added to the Reserve dataset up to this limit. If 0, no pairs are added. If -1, an unlimited number of pairs are added.
        fileName - - the source file that the pair is taken from. Not null.
        fileLastModified - - the last modified date of the pair source file. Not null.
      • addRecPair

        public void addRecPair​(RecPair recPair,
                               DataPartition partition)
        Remembers the pair that was added to the given partition.
      • addToOverflow

        public void addToOverflow​(RecPair recPair)
        Remembers the pair that did not fit within the max size of the Reserve dataset.
      • getNAdded

        public int getNAdded​(DataPartition partition)
        Returns:
        the number of pairs that were added to the specified dataset.
      • getNUseful

        public int getNUseful()
        Returns:
        the number of pairs that were added to the datasets used in training.
      • getNInProject

        public int getNInProject()
        Returns:
        the number of pairs in the given list that are already in project.
      • getNNotUseful

        public int getNNotUseful()
        Gets the number of pairs added to Reserve dataset and to the overflow list.
        Returns:
        the number of unique pairs that were not in the project (in any dataset) and were not added to any dataset used in training. If a list of pairs to be added contains duplicates of the same pair (same ID) and one of them was added to the project, then the remaining duplicates are not included in this number of pairs.
      • getAdded

        public java.util.Set<RecPair> getAdded​(DataPartition partition)
        Returns:
        the pairs added to the given dataset
      • getInProjectNoLabel

        public java.util.List<RecPair> getInProjectNoLabel()
        Returns:
        the pairs that are already in project, but with Unsure or null label
      • getInProjectSameLabel

        public java.util.List<RecPair> getInProjectSameLabel()
        Returns:
        the pairs that are already in project with the same label
      • getInProjectOppositeLabel

        public java.util.List<RecPair> getInProjectOppositeLabel()
        Returns:
        the pairs that are already in project, but with an opposite label
      • getDuplicates

        public java.util.List<RecPair> getDuplicates()
        Returns:
        the duplicate pairs in the given list. These pairs are not in the project.
      • getOverflow

        public java.util.Set<RecPair> getOverflow()
        Returns:
        the pairs that were not added to project because they did not fit into the Reserve dataset (due to its size limit).
      • getUseful

        public java.util.Set<RecPair> getUseful()
        Returns:
        unique pairs that were added to the datasets used in training.
      • getNotUseful

        public java.util.Set<RecPair> getNotUseful()
        Returns:
        the unique pairs that were not in the project (in any dataset) and were not added to any dataset used in training. If a list of pairs to be added contains duplicates of the same pair (same ID) and one of them was added to the project, then the remaining duplicates are not included in the returned set.