Class RLink


  • public final class RLink
    extends java.lang.Object
    Performs communication with rlink_jni library using JNI to create, load, save, train and evaluate RLink models. Can manage several models at the same time. When new model is created or read from file, a model ID is returned. It is then passed as modelId to all other methods to refer to a specific model. Model training and evaluation is done with individual feature vectors. Most methods are wrappers of native functions. They throw exceptions when native functions return error codes.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  RLink.SubsetTrainMode
      Defines methods used to generate training examples for subsets of given training example
      static class  RLink.ThermometerType
      Type of thermometers used in RLink model.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static double EMPTY_SCORE
      The value -1 used for the empty feature score (when it cannot be calculated).
      static double MAX_SCORE
      The maximum non-empty feature score, and the maximum model score (1).
      static double MIN_SCORE
      The minimum non-empty feature score, and the minimum model score (0).
      static double NO_CONFIDENCE_MEASURE
      The value returned if no confidence measure was calculated or if the requested confidence measure is not supported by the given model.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void beginIteration​(int modelId)
      Must be called before each training iteration with the entire training dataset.
      static int createModel​(int nFeatures, double p, double learningRate, int precisionBits, RLink.ThermometerType thermometerType, int maxSubsetSize, int[] falseSubsets)
      Creates a new untrained RLink model.
      static void destroyModels()
      Destroys all RLink models created by the RLink static methods.
      static void endIteration​(int modelId)
      Must be called after each training iteration with the entire training dataset.
      static double[] getFalseInsertScores​(int modelId)  
      static double[] getFalseRemoveLimits​(int modelId)  
      static int getFeatureCount​(int modelId)
      Gets the number of model features.
      static java.lang.String getID​(int modelId)
      Return the unique ID for this model.
      static double getInitialLearningRate​(int modelId)
      Returns the initial learning rate, or -1 for file versions below RFV5.
      static java.lang.String getMetadata​(int modelId)
      Returns the notes (meta-data) of the model.
      static double getMissingInfoLimit​(int modelId)  
      static double getNegativeTaper​(int modelId)  
      static double getNorm​(int modelId)  
      static double getPositiveTaper​(int modelId)  
      static int getPrecisionBits​(int modelId)
      Returns number of precision bits, or -1 for file versions below RFV5.
      static int getSkippedCount​(int modelId)
      Skipped count is applicable only for TAPER subset training mode.
      static RLink.SubsetTrainMode getSubsetTrainMode​(int modelId)  
      static RLink.ThermometerType getThermometerType​(int modelId)
      Returns the thermometer type, or throws exception for file versions below RFV5 (thermometer type is -1, which is invalid).
      static double getThreshold​(int modelId)
      Return the threshold for this model.
      static int getTrainedCount​(int modelId)
      Trained count is applicable only for TAPER subset training mode.
      static double[] getTrueInsertScores​(int modelId)  
      static double[] getTrueRemoveLimits​(int modelId)  
      static java.lang.String getVersion​(int modelId)
      Return the version ID for this model.
      static void learn​(int modelId, double[] featureValues, boolean label)
      Trains the existing model with the given training example (feature vector and label).
      static RLinkOut predict​(int modelId, double[] featureValues)
      Predicts the score (and label) using default options.
      static RLinkOut predict​(int modelId, double[] featureValues, PredictOptions predictOpts)
      Predicts the score (and label), calculates the requested confidence and significance.
      static int read​(java.lang.String fileName)
      Loads a model from file.
      static void setAnnealingRate​(int modelId, double value)
      Sets the annealing rate (the speed of the learning rate decrease with each training iteration).
      static void setDynamic​(int modelId, double[] trueInsertScores, double[] falseInsertScores, double[] trueRemoveLimits, double[] falseRemoveLimits)
      Sets parameters for removal and insertion of scores to generate related training vectors when using SubsetTrainMode.DYNAMIC.
      static void setHeader​(java.lang.String inFile, java.lang.String outFile, java.lang.String metaData, java.lang.String version, double threshold)
      Copy a model file, updating the header values.
      static void setID​(int modelId, java.lang.String id)
      Set the ID of the model.
      static void setMetadata​(int modelId, java.lang.String metadata)
      Sets the notes (meta-data) field for an existing model.
      static void setMissingInfoLimit​(int modelId, double value)
      Sets the percentage of values that may be missing in a generated training example for a subset.
      static void setSubsetTrainMode​(int modelId, RLink.SubsetTrainMode stMode)
      Set the subset training mode to an existing model.
      static void setTaper​(int modelId, double positiveTaper, double negativeTaper)
      Set parameters for the tapering function that generates examples for subsets.
      static void setThreshold​(int modelId, double threshold)
      Set the cutoff threshold for the model.
      static void setVersion​(int modelId, java.lang.String version)
      Set the version ID of the model.
      static void verifyMissingInfoLimit​(double value)  
      static void verifyModelId​(int modelId)
      Verifies that the model with the given modelId actually exists.
      static void verifyTaper​(double positiveTaper, double negativeTaper)  
      static void write​(int modelId, java.lang.String fileName)
      Saves the existing model to file.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • EMPTY_SCORE

        public static final double EMPTY_SCORE
        The value -1 used for the empty feature score (when it cannot be calculated).
        See Also:
        Constant Field Values
      • MIN_SCORE

        public static final double MIN_SCORE
        The minimum non-empty feature score, and the minimum model score (0).
        See Also:
        Constant Field Values
      • MAX_SCORE

        public static final double MAX_SCORE
        The maximum non-empty feature score, and the maximum model score (1).
        See Also:
        Constant Field Values
      • NO_CONFIDENCE_MEASURE

        public static final double NO_CONFIDENCE_MEASURE
        The value returned if no confidence measure was calculated or if the requested confidence measure is not supported by the given model.
        See Also:
        Constant Field Values
    • Method Detail

      • destroyModels

        public static void destroyModels()
        Destroys all RLink models created by the RLink static methods. No error codes are used, so this native method is public. Client code must call this method when RLink class will no longer be used.
      • verifyModelId

        public static void verifyModelId​(int modelId)
        Verifies that the model with the given modelId actually exists.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Throws:
        java.lang.ArrayIndexOutOfBoundsException - if the model with the given modelId does not exist
      • createModel

        public static int createModel​(int nFeatures,
                                      double p,
                                      double learningRate,
                                      int precisionBits,
                                      RLink.ThermometerType thermometerType,
                                      int maxSubsetSize,
                                      int[] falseSubsets)
        Creates a new untrained RLink model. ModelConfig.build() must be used from other packages.
        Parameters:
        nFeatures - - number of model features.
        p - - Minkowski norm value for combining feature values.
        learningRate - - the learning rate.
        precisionBits - - defines precision of internal model weights.
        thermometerType - - thermometers used by model. Use ARRAY.
        maxSubsetSize - - if it equals nFeatures, a large model is created that is able to take subsets into account. If 1, a small model is created that can support more features. Other values are not supported.
        falseSubsets - - subsets that are always false. May be null.
        Returns:
        the ID of the new model
      • setSubsetTrainMode

        public static void setSubsetTrainMode​(int modelId,
                                              RLink.SubsetTrainMode stMode)
        Set the subset training mode to an existing model. If this call is not used, the default subset training mode is TAPER. For model creation, this is typically set in ModelConfig. If model is read from file, subset training mode is set to default. This method can be used to change it before any further training of the loaded model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        stMode - - the new subset training mode. Not null.
      • getSubsetTrainMode

        public static RLink.SubsetTrainMode getSubsetTrainMode​(int modelId)
        Returns:
        the subset training mode of an existing model.
      • verifyTaper

        public static void verifyTaper​(double positiveTaper,
                                       double negativeTaper)
        Throws:
        java.lang.IllegalArgumentException - if any taper value is not between 0 and 1.
      • setTaper

        public static void setTaper​(int modelId,
                                    double positiveTaper,
                                    double negativeTaper)
        Set parameters for the tapering function that generates examples for subsets. This is only relevant when SubsetTrainMode.TAPER is used. Larger taper values result in more generated examples being skipped. For model creation, this is typically set in ModelConfig. If model is read from file, taper values are set to default. This method can be used to change them before any further training of the loaded model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        positiveTaper - - steepness of tapering function for examples with True labels.
        negativeTaper - - steepness of tapering function for examples with False labels.
        Throws:
        java.lang.IllegalArgumentException - if any taper value is not between 0 and 1.
      • getPositiveTaper

        public static double getPositiveTaper​(int modelId)
        Returns:
        the tapering value for true-labeled items
      • getNegativeTaper

        public static double getNegativeTaper​(int modelId)
        Returns:
        the tapering value for false-labeled items
      • verifyMissingInfoLimit

        public static void verifyMissingInfoLimit​(double value)
        Throws:
        java.lang.IllegalArgumentException - if the missing info limit is not between 0 and 1.
      • setMissingInfoLimit

        public static void setMissingInfoLimit​(int modelId,
                                               double value)
        Sets the percentage of values that may be missing in a generated training example for a subset. This is only relevant when SubsetTrainMode.FIXED is used. For model creation, this is typically set in ModelConfig. If model is read from file, missing info limit is set to default. This method can be used to change it before any further training of the loaded model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        value - - percentage of values that may be missing in a generated example.
        Throws:
        java.lang.IllegalArgumentException - if the missing info limit is not between 0 and 1 or the modelId is invalid.
      • getMissingInfoLimit

        public static double getMissingInfoLimit​(int modelId)
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Returns:
        the percentage of values that may be missing in a generated training example for a subset. This is only relevant when the model was trained in SubsetTrainMode.FIXED mode.
        Throws:
        java.lang.IllegalArgumentException - if the modelId is invalid.
        java.lang.IllegalStateException - if model was not trained in SubsetTrainMode.FIXED mode.
      • setDynamic

        public static void setDynamic​(int modelId,
                                      double[] trueInsertScores,
                                      double[] falseInsertScores,
                                      double[] trueRemoveLimits,
                                      double[] falseRemoveLimits)
        Sets parameters for removal and insertion of scores to generate related training vectors when using SubsetTrainMode.DYNAMIC. For model creation this is typically set in STDynamic. This method can be used to change the parameters before any further training of the loaded model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        trueInsertScores - - Score to insert into true-labeled vectors when filling in a missing score. This score should be close to 1.0. Must be between 0.0 and 1.0, or RL_DYN_NO_INSERT, or RL_DYN_INSERT_DEFAULT
        falseInsertScores - - Score to insert into false-labeled vectors when filling in a missing score. This score should be close to 0.0. Must be between 0.0 and 1.0, or RL_DYN_NO_INSERT, or RL_DYN_INSERT_DEFAULT
        trueRemoveLimits - - Scores above this value will not be removed from true-labeled vectors. It is recommended this be no higher than 0.75. Must be between 0.0 and 1.0, or RL_DYN_NO_REMOVE.
        falseRemoveLimits - - Scores below this value will not be removed from false-labeled vectors. It is recommended this be no lower than 0.50. Must be between 0.0 and 1.0, or RL_DYN_NO_REMOVE.
        Throws:
        java.lang.IllegalArgumentException - if length of any given array does not match the number of features for this model, or an array contains an invalid value.
        java.lang.ArrayIndexOutOfBoundsException - if modelId is invalid.
      • getTrueInsertScores

        public static double[] getTrueInsertScores​(int modelId)
        Returns:
        the scores to insert into true-labeled vectors when filling in a missing score.
      • getFalseInsertScores

        public static double[] getFalseInsertScores​(int modelId)
        Returns:
        the scores to insert into false-labeled vectors when filling in a missing score.
      • getTrueRemoveLimits

        public static double[] getTrueRemoveLimits​(int modelId)
        Returns:
        the scores above which values will not be removed from true-labeled vectors
      • getFalseRemoveLimits

        public static double[] getFalseRemoveLimits​(int modelId)
        Returns:
        the scores below which values will not be removed from false-labeled vectors
      • write

        public static void write​(int modelId,
                                 java.lang.String fileName)
                          throws java.io.IOException
        Saves the existing model to file. Uses the same format as the file that was read.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        fileName - - name of model binary file.
        Throws:
        java.lang.IllegalArgumentException - if fileName is null.
        java.io.IOException
      • read

        public static int read​(java.lang.String fileName)
                        throws java.io.FileNotFoundException
        Loads a model from file. Default values are set to subset train mode, positive and negative taper, and missing info limit. See description of ModelConfig class for default values.
        Parameters:
        fileName - - name of model binary file.
        Returns:
        the index of the newly loaded model
        Throws:
        java.io.FileNotFoundException - if the specified model file does not exist.
        java.lang.IllegalArgumentException - if fileName is null.
      • setHeader

        public static void setHeader​(java.lang.String inFile,
                                     java.lang.String outFile,
                                     java.lang.String metaData,
                                     java.lang.String version,
                                     double threshold)
        Copy a model file, updating the header values. This copies a model file from one location to another, updating one or more of the header values. It is much faster than loading the model into memory and then writing the model out. The fastest way to update header values is to use this method to copy the file, delete the old and then rename the new file to the old name.
        Parameters:
        inFile - path to input file.
        outFile - path to output file.
        metaData - new meta data value for header. If null meta data is not updated.
        version - new version value for header. If null version is not updated.
        threshold - new threshold value for header. If negative threshold is not updated.
        Throws:
        java.lang.IllegalArgumentException - on any errors reading or writing the files.
        java.lang.NullPointerException - if a required argument is null or on other errors.
      • setMetadata

        public static void setMetadata​(int modelId,
                                       java.lang.String metadata)
        Sets the notes (meta-data) field for an existing model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        metadata - - the text of meta-data. May be null to clear meta-data.
      • getMetadata

        public static java.lang.String getMetadata​(int modelId)
        Returns the notes (meta-data) of the model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Returns:
        notes (meta-data) from an existing model.
      • getThreshold

        public static double getThreshold​(int modelId)
        Return the threshold for this model. If no threshold was set for this model -1.0 is returned.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Returns:
        the threshold value for this model.
      • setThreshold

        public static void setThreshold​(int modelId,
                                        double threshold)
        Set the cutoff threshold for the model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        threshold - - the threshold value for this model. The value -1.0 indicates that the threshold from the model should not be used.
        Throws:
        java.lang.IllegalArgumentException - if threshold is greater than 1.0.
      • getVersion

        public static java.lang.String getVersion​(int modelId)
        Return the version ID for this model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Returns:
        the version string for this model, or null if it is not set.
      • setVersion

        public static void setVersion​(int modelId,
                                      java.lang.String version)
        Set the version ID of the model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        version - - version ID as a string. This should not be null.
      • getID

        public static java.lang.String getID​(int modelId)
        Return the unique ID for this model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Returns:
        the ID string for this model, or null if it is not set.
      • setID

        public static void setID​(int modelId,
                                 java.lang.String id)
        Set the ID of the model.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        id - - unique ID. This should not be null.
      • getSkippedCount

        public static int getSkippedCount​(int modelId)
        Skipped count is applicable only for TAPER subset training mode. This parameter is reset to 0 after a model has been loaded from file.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Returns:
        the number of generated examples that were skipped during learning.
      • getTrainedCount

        public static int getTrainedCount​(int modelId)
        Trained count is applicable only for TAPER subset training mode. This parameter is reset to 0 after a model has been loaded from file.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Returns:
        the number of generated examples that were used (not skipped) during learning.
      • getFeatureCount

        public static int getFeatureCount​(int modelId)
        Gets the number of model features.
        Parameters:
        modelId - - ID of the model that was created or read from file.
        Returns:
        the number of features in the specified model.
      • getNorm

        public static double getNorm​(int modelId)
        Returns:
        the Minkowski norm value for combining feature values used by the specified model.
      • getPrecisionBits

        public static int getPrecisionBits​(int modelId)
        Returns number of precision bits, or -1 for file versions below RFV5.
        Returns:
        the precision of internal model weights of the specified model.
      • getInitialLearningRate

        public static double getInitialLearningRate​(int modelId)
        Returns the initial learning rate, or -1 for file versions below RFV5.
        Returns:
        the initial learning rate of the specified model.
      • setAnnealingRate

        public static void setAnnealingRate​(int modelId,
                                            double value)
        Sets the annealing rate (the speed of the learning rate decrease with each training iteration).
        Parameters:
        value - - the annealing rate. Larger values decrease the learning rate faster. 0 means learning rate stays the same.
      • getThermometerType

        public static RLink.ThermometerType getThermometerType​(int modelId)
        Returns the thermometer type, or throws exception for file versions below RFV5 (thermometer type is -1, which is invalid).
        Returns:
        the thermometer type used by the specified model.
        Throws:
        java.lang.IllegalArgumentException - for file versions below RFV5.
      • predict

        public static RLinkOut predict​(int modelId,
                                       double[] featureValues,
                                       PredictOptions predictOpts)
        Predicts the score (and label), calculates the requested confidence and significance.
        Parameters:
        modelId - ID of the model that was created or read from file.
        featureValues - - the feature vector.
        predictOpts - - the options to use for this prediction. If null, uses default uptions.
        Returns:
        prediction from the existing model of the given feature vector.
      • learn

        public static void learn​(int modelId,
                                 double[] featureValues,
                                 boolean label)
        Trains the existing model with the given training example (feature vector and label).
        Parameters:
        modelId - - ID of the model to be trained
        featureValues - - the feature vector.
        label - - the actual label to be learned for this feature vector.
      • beginIteration

        public static void beginIteration​(int modelId)
        Must be called before each training iteration with the entire training dataset.
        Parameters:
        modelId - - ID of the model being trained.
      • endIteration

        public static void endIteration​(int modelId)
        Must be called after each training iteration with the entire training dataset.
        Parameters:
        modelId - - ID of the model being trained.