Enum DataPartition

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Comparable<DataPartition>

    public enum DataPartition
    extends java.lang.Enum<DataPartition>
    Defines the type of a dataset: Training, Validation, Test, Reserve, Low Confidence, False Subsets. Includes partitions used in training (Training and Validation, equivalent to the Partition enum), as well as additional partitions not used in training. Wraps the XML DataPartitionType enum and provides indexing for lists (of datasets, of dataset results) in model.xml.
    • Enum Constant Summary

      Enum Constants 
      Enum Constant Description
      FSUBSETS
      Dataset of always false subsets.
      LOW_CONF
      Low confidence dataset stores the best pairs found by LowConfPairFinder, but not yet shown to the user.
      RESERVE
      Reserve dataset - not used during model training.
      TEST
      Test dataset - currently not used.
      TRAIN
      Training dataset - used to train the model.
      VLD
      Validation dataset - used to evaluate model performance during training.
    • Enum Constant Detail

      • TRAIN

        public static final DataPartition TRAIN
        Training dataset - used to train the model.
      • VLD

        public static final DataPartition VLD
        Validation dataset - used to evaluate model performance during training.
      • TEST

        public static final DataPartition TEST
        Test dataset - currently not used.
      • RESERVE

        public static final DataPartition RESERVE
        Reserve dataset - not used during model training. Useful examples may be moved to another dataset in the future.
      • LOW_CONF

        public static final DataPartition LOW_CONF
        Low confidence dataset stores the best pairs found by LowConfPairFinder, but not yet shown to the user. Not used in model training.
      • FSUBSETS

        public static final DataPartition FSUBSETS
        Dataset of always false subsets. Stores pairs that represent feature score subsets that are always false. Not used in training, but used in model creation.
    • Method Detail

      • values

        public static DataPartition[] values()
        Returns an array containing the constants of this enum type, in the order they are declared. This method may be used to iterate over the constants as follows:
        for (DataPartition c : DataPartition.values())
            System.out.println(c);
        
        Returns:
        an array containing the constants of this enum type, in the order they are declared
      • valueOf

        public static DataPartition valueOf​(java.lang.String name)
        Returns the enum constant of this type with the specified name. The string must match exactly an identifier used to declare an enum constant in this type. (Extraneous whitespace characters are not permitted.)
        Parameters:
        name - the name of the enum constant to be returned.
        Returns:
        the enum constant with the specified name
        Throws:
        java.lang.IllegalArgumentException - if this enum type has no constant with the specified name
        java.lang.NullPointerException - if the argument is null
      • usedValues

        public static java.util.Set<DataPartition> usedValues()
        Lists partitions that are used in model training. See isUsedInTraining().
        Returns:
        an unmodifiable set containing the partitions that are used in model training: TRAIN and VLD.
      • modelValues

        public static java.util.Set<DataPartition> modelValues()
        Lists partitions that are used for model creation or for model training. See isUsedInTraining().
        Returns:
        an unmodifiable set containing the partitions that are used in model training: TRAIN and VLD, and the partition used for model creation: FSUBSETS.
      • isUsedInTraining

        public boolean isUsedInTraining()
        Checks whether the partition is used in training (TRAIN and VLD). These partitions have evaluation results saved in ModelSettings.
        Returns:
        true if the partition is used in model training process (TRAIN or VLD).
      • isUsedForModel

        public boolean isUsedForModel()
        Checks whether the partition is used for model creation or training (TRAIN, VLD, FSUBSETS). Feature scores for these partitions are calculated together. Only pairs from these partitions are visible in Learn UI.
        Returns:
        true if the partition is used in model training process (TRAIN, VLD, FSUBSETS).
      • getName

        public java.lang.String getName()
        Returns:
        the name of the partition
      • getAbbr

        public java.lang.String getAbbr()
        Returns:
        the abbreviation of the partition
      • getDataset

        public abstract RLinkDataSet<? extends VectorExample> getDataset​(RecPairExperiment exper)
        Parameters:
        exper - - experiment with both datasets.
        Returns:
        the dataset from the experiment for this partition, or null if the dataset is not used in the experiment.