Class ModelConfig


  • public final class ModelConfig
    extends java.lang.Object
    Stores a configuration of an RLink model and builds model(s) with this configuration. If a parameter setter method is not called, a default paramter value is used. Verifies all model parameters. All setter methods accept a value to be assigned to a parameter, and return this object to enable chaining of multiple setter calls.

    Default values: Number of features 2, Thermometer type Array (unchangeable), Minkowski norm 1, initial learning rate 1.7, annealing rate 0.05, precision bits 10, subset training mode Dynamic. To get the default value of any parameter, create a ModelConfig object and then use an appropriate getter method. See descriptions of setter methods for a discussion of recommended values of each parameter.

    • Constructor Detail

      • ModelConfig

        public ModelConfig​(int nFeatures)
        Creates model configuration with default parameters that are recommended for the specified number of features. To reduce model size, subsets are not used with 12 or more features. Precision is reduced starting at 15 features, but not lower than 6 precision bits. No generated examples are used with 17 or more features to reduce training time.
        Parameters:
        nFeatures - - the number of model features (elements in a feature vector). Warning: more than 18 features can result in a huge model (tens of gigabytes or more), and very long training times.
        Throws:
        java.lang.IllegalArgumentException - if number of features is not between 1 and 28.
      • ModelConfig

        public ModelConfig()
        Creates model configuration with all default parameters that are recommended for up to 11 features. Use when the number of features is not known when this object is created.
      • ModelConfig

        public ModelConfig​(ModelConfig other)
        Creates a deep copy of the given object.
        Parameters:
        other - - an object to copy. Not null.
    • Method Detail

      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
        Returns:
        a string with values of the main parameters stored in this object.
      • nFeatures

        public ModelConfig nFeatures​(int nFeatures)
        Sets the number of model features. Preserves the useSubsets parameter (if nFeatures == 1 was not used, else it is set to true).
        Parameters:
        nFeatures - - the number of model features (elements in a feature vector). Warning: more than 12 features (more than 15 features if useSubsets(false) is called) with other parameters set to defaults produces a huge model (tens of gigabytes) and may result in an out of memory error.
        Returns:
        this model configuration.
        Throws:
        java.lang.IllegalArgumentException - if number of features is not between 1 and 28.
      • useSubsets

        public ModelConfig useSubsets​(boolean value)
        Sets the model to take subsets into account (default), or to not use subsets and support more features. The number of features must be set first.
        Parameters:
        value - - if true, the model is large and uses subsets. If false, the model is small, does not use subsets and supports more features.
        Returns:
        this model configuration.
      • norm

        public ModelConfig norm​(double value)
        Sets Minkowski norm value for combining feature values. Typically this method is not used. Any non-default norm value is highly experimental.
        Parameters:
        value - - Minkowski norm value.
        Returns:
        this model configuration.
        Throws:
        java.lang.IllegalArgumentException - if the norm value is not positive.
      • learnRate

        public ModelConfig learnRate​(double value)
        Sets the initial learning rate which defines how quickly the internal model state changes during training in response to incorrectly classified examples. This learning rate is used for the first model iteration and decreases for every subsequent iteration. If this is modified, then the annealing rate may also need to be adjusted, see annealRate(double).
        Parameters:
        value - - the initial learning rate. Use values a little larger than 1.0, e.g. 1.7. Use smaller value (e.g. 1.5) if useSubsets(false) is called. Avoid using values above 2.0 since the training may be too long or inaccurate. Avoid using values very close to 1.0 since the learning may be too slow to learn the best classifications in a reasonable number of iterations.
        Returns:
        this model configuration.
        Throws:
        java.lang.IllegalArgumentException - if value is not greater than 1.
      • annealRate

        public ModelConfig annealRate​(double value)
        Sets the annealing rate which defines the speed of learning rate decrease with each training iteration.
        Parameters:
        value - - the annealing rate. Larger values decrease the learning rate faster. 0 means the learning rate stays the same. Value 0.05 makes the half-life of the learning rate equal to about 15 iterations. A small change in this value can make a big difference in how fast the learning rate decreases. Avoid using large values since they may make the learning rate negligible after just a few iterations.
        Throws:
        java.lang.IllegalArgumentException - if value is below 0.
      • precision

        public ModelConfig precision​(int value)
        Sets the number of precision bits. This number defines precision of internal model weights. Higher precision may help distinguish similar examples with different labels. Recommended values are from 6 to 10. The amount of memory used by model is proportional to 2^(precision). Thus lowering the number of precision bits (while staying within the recommended range) can be used to support additional feature(s) while keeping the model size reasonable.
        Parameters:
        value - - the number of precision bits.
        Returns:
        this model configuration.
        Throws:
        java.lang.IllegalArgumentException - if value is not between 1 and 16.
      • falseSubsets

        public ModelConfig falseSubsets​(int[] subsets)
        Sets the always false subsets. The model always predicts examples of the given subsets and their subsets as False (score 0, confidence 1). Specifying always false subsets reduces the size of the model.
        Parameters:
        subsets - - always false subsets. May be null. Each bit is 1 for a present feature and 0 for a missing feature. The lowest bit corresponds to the last model feature.
        Returns:
        this model configuration.
        Throws:
        java.lang.IllegalArgumentException - if any subset is negative.
      • subsetTrain

        public ModelConfig subsetTrain​(SubsetTrainConfig value)
        Sets the subset training configuration. The STDynamic subset training mode uses training data augmentation and is recommended in most cases. If the datasets contain abundant examples from all subsets that can ever be encountered in the data table, then the STNone mode is recommended.
        Parameters:
        value - - the subset training configuration. Not null.
        Returns:
        this model configuration.
      • getNFeatures

        public int getNFeatures()
        Returns:
        the number of features stored in this object.
      • isUsingSubsets

        public boolean isUsingSubsets()
        Returns:
        true if the model is large and uses subsets, false if the model is small.
      • getNorm

        public double getNorm()
        Returns:
        Minkowski norm value stored in this object.
      • getLearningRate

        public double getLearningRate()
        Returns:
        initial learning rate stored in this object.
      • getAnnealingRate

        public double getAnnealingRate()
        Returns:
        the annealing rate stored in this object.
      • getPrecisionBits

        public int getPrecisionBits()
        Returns:
        number of precision bits stored in this object.
      • getFalseSubsets

        public int[] getFalseSubsets()
        Returns:
        a copy of the false subsets array. May be null.
      • getSubsetTrainConfig

        public SubsetTrainConfig getSubsetTrainConfig()
        Returns:
        a copy of the subset training configuration.
      • build

        public int build​(int customNFeatures)
        Creates an RLink model using the parameters stored in this object and the given number of features. Use when the number of features is not known when setting up the ModelConfig object. Note that RLink.destroyModels() must be called after using this method. Creation parameter values are guaranteed to be valid (verified in setter methods).
        Parameters:
        customNFeatures - - the number of model features to use for building this model. The number of features stored in this object is ignored. The useSubsets parameter is preserved (if number of features is above 1).
        Returns:
        the index of the created model to be used in RLink methods.
        Throws:
        java.lang.IllegalArgumentException - if number of features is not between 1 and 28, or if the number of features in the dynamic parameter arrays does not match the given number of features.
      • build

        public int build()
        Creates an RLink model using the parameters stored in this object. Note that RLink.destroyModels() must be called after using this method. Creation parameter values are guaranteed to be valid (verified in setter methods).
        Returns:
        the index of the created model to be used in RLink methods.
        Throws:
        java.lang.IllegalArgumentException - if the number of features in the dynamic parameter arrays does not match the number of features assigned to this object.