Class ModelSettings

  • All Implemented Interfaces:
    FeatureListener

    public final class ModelSettings
    extends java.lang.Object
    implements FeatureListener
    Stores all settings for a single Patterns Learn model.xml file.
    • Method Detail

      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • getVersion

        public int getVersion()
        No setter: version is not configurable. New files will always be saved using the latest XML format.
        Returns:
        XML version of the file.
      • getName

        public java.lang.String getName()
        Returns:
        the name of the model.
      • setNote

        public void setNote​(java.lang.String value)
        Parameters:
        value - - note for this model.
      • getNote

        public java.lang.String getNote()
        Returns:
        the note for this model, or empty string if the note was not assigned.
      • getPrevModelName

        public java.lang.String getPrevModelName()
        No setter: previous model name is set automatically when saving a copy of the model.
        Returns:
        the name of the model that this model originated from. Null for current model.
      • setFieldType

        public void setFieldType​(int fieldIndex,
                                 int fieldType,
                                 float invalidValues)
        Assigns the given field type to a field with the given index. The field definitions have to be set previously.
        Parameters:
        fieldIndex - - the index of the field to assign the field type.
        fieldType - - field type value. Must be one of the supported constants in NetricsTable: FLDTYP_TEXT, FLDTYP_SRCHTEXT, FLDTYP_INT, FLDTYP_FLOAT, FLDTYP_DATE, FLDTYP_SRCHDATE.
        invalidValues - - percentage of field values that cannot be converted to the specified field type. Should be 0 for text field types.
        Throws:
        java.lang.IllegalArgumentException - if fieldType is not one of the supported field types, or if invalidValues is not between 0 and 1.
      • getFieldType

        public int getFieldType​(int fieldIndex)
        Returns:
        field type for the field with the given fieldIndex. It is one of the supported NetricsTable.FLDTYP_... constants.
      • getFieldInvalidValues

        public float getFieldInvalidValues​(int fieldIndex)
        Returns:
        percentage of invalid field values for the given fieldIndex.
      • setFieldIgnore

        public void setFieldIgnore​(int fieldIndex,
                                   boolean value)
        Assigns the given Ignore value to a field with the given fieldIndex. The field names have to be set previously.
        Throws:
        java.lang.IllegalArgumentException - if the given field is the key field.
      • getFieldIgnore

        public boolean getFieldIgnore​(int fieldIndex)
        Returns:
        Ignore value for the field with the given fieldIndex.
      • getNIgnoredFields

        public int getNIgnoredFields()
        Returns:
        the number of fields marked as ignored.
      • getFeatureQuery

        public FeatureQuery getFeatureQuery()
        The returned reference can be used to add, modify or delete features. Note that reassigning data file or the key field recreates feature query, so the reference must be obtained again.
        Returns:
        The internal FeatureQuery reference that contains all features for this model. Returns null if the field names (i.e. data file) or key field is not assigned to this object.
      • getNFeatures

        public int getNFeatures()
        Returns:
        the number of user-defined features.
      • getNQlts

        public int getNQlts()
        Returns:
        the number of model features (querylets).
      • beforeFeatureChange

        public void beforeFeatureChange()
        Called by FeatureQuery before a feature is added, deleted, replaced. Deletes any saved feature scores for all pairs, cached table subsets, as they no longer match the feature query.
        Specified by:
        beforeFeatureChange in interface FeatureListener
        Throws:
        java.lang.IllegalStateException - if a saved trained model exists. Changed feature query (which is exportable) would become incompatible with the saved trained model.
      • removeRecPair

        public void removeRecPair​(DataPartition partition,
                                  int index)
        Deletes a record pair with the given index from the specified dataset.
        Parameters:
        partition - - the dataset to be used.
      • removeRecPair

        public void removeRecPair​(RecPairId recPairId)
        Deletes a record pair with the given keys. The pair may be in any dataset.
        Parameters:
        recPairId - - ID for the pair to be deleted. Not null.
        Throws:
        java.lang.IllegalArgumentException - if given record pair is not found in any dataset.
      • removeRecPair

        public void removeRecPair​(DataPartition partition,
                                  RecPairId recPairId)
        Deletes a record pair with the given keys from the specified dataset.
        Parameters:
        partition - - the dataset to be used.
        recPairId - - ID for the pair to be deleted. Not null.
        Throws:
        java.lang.IllegalArgumentException - if given record pair is not found in this dataset.
      • clearRecPairs

        public void clearRecPairs()
        Deletes all record pairs from all datasets.
      • clearRecPairs

        public void clearRecPairs​(DataPartition partition)
        Deletes all record pairs from the specified dataset.
        Parameters:
        partition - - the dataset to be used.
      • hasRecPairs

        public boolean hasRecPairs()
        Returns:
        true if any dataset of record pairs is not empty.
      • getNRecPairs

        public int getNRecPairs​(DataPartition partition)
        Parameters:
        partition - - the dataset to be used.
        Returns:
        number of record pairs in the specified dataset.
      • getNUsedRecPairs

        public int getNUsedRecPairs()
        Returns:
        the number of record pairs in datasets used for model training.
      • getNRecPairs

        public int getNRecPairs()
        Returns:
        the number of record pairs in all datasets.
      • getNLabeledRecPairs

        public int getNLabeledRecPairs​(DataPartition partition)
        Parameters:
        partition - - the dataset to be used.
        Returns:
        number of record pairs that have boolean labels in the specified dataset.
      • minNLabeledRecPairs

        public int minNLabeledRecPairs​(DataPartition partition)
        Calculates a minimum number of labeled record pairs that is recommended for the given partition. Increases with number of features. Model should not be trained with fewer pairs.
        Parameters:
        partition - - the dataset to be used.
        Returns:
        a minimum number of labeled record pairs recommended for the partition.
      • getRecPair

        public RecPair getRecPair​(DataPartition partition,
                                  int index)
        Parameters:
        partition - - the dataset to be used. Not null.
        Returns:
        a record pair with the given index from the specified dataset. It is a new object with information copied from XML.
      • getRecPair

        public RecPair getRecPair​(RecPairId recPairId)
        Parameters:
        recPairId - - ID for the record pair. Not null.
        Returns:
        a record pair with the given keys from any dataset, or null if not found
      • getDataset

        public java.util.List<RecPair> getDataset​(DataPartition partition)
        Converts record pairs in the given partition from XML structures to RecPair objects. May take longer for large datasets - do not use this method unnecessarily.
        Parameters:
        partition - - the dataset to be used.
        Returns:
        all record pairs in the specified dataset
      • getRecPairs

        public java.util.List<RecPair> getRecPairs​(java.util.Set<DataPartition> partitions)
        Converts record pairs in the specified datasets from XML structures to RecPair objects. May take longer for large datasets - do not use this method unnecessarily.
        Parameters:
        partitions - - the datasets to be included. Not null. Typical values are DataPartition.usedValues() and DataPartition.modelValues().
        Returns:
        a single list with all record pairs taken from the 3 datasets.
      • getAllRecPairs

        public java.util.List<RecPair> getAllRecPairs()
        Converts record pairs in all datasets from XML structures to RecPair objects. May take longer for large datasets - do not use this method unnecessarily.
        Returns:
        a single list with all record pairs taken from all datasets
      • setRecPairLabel

        public void setRecPairLabel​(DataPartition partition,
                                    int index,
                                    RecPair.Label label)
        Sets the label of an existing record pair with the given index in specified dataset.
        Parameters:
        partition - - the dataset to be used.
        Throws:
        java.lang.NullPointerException - if label is null.
      • setRecPairLabel

        public void setRecPairLabel​(RecPairId recPairId,
                                    RecPair.Label label)
        Sets the label of an existing record pair with the given keys (in any dataset).
        Parameters:
        recPairId - - ID for the existing record pair. Not null.
        Throws:
        java.lang.NullPointerException - if label is null.
        java.lang.IllegalArgumentException - if record pair is not found in any dataset.
      • setRecPairReview

        public void setRecPairReview​(DataPartition partition,
                                     int index,
                                     boolean review)
        Sets review attribute of an existing record pair with the given index in specified dataset.
        Parameters:
        partition - - the dataset to be used.
      • setRecPairReview

        public void setRecPairReview​(RecPairId recPairId,
                                     boolean review)
        Sets review attribute of an existing record pair with the given keys (in any dataset).
        Parameters:
        recPairId - - ID for the existing record pair. Not null.
        Throws:
        java.lang.IllegalArgumentException - if record pair is not found in any dataset.
      • saveFeatureScoresToCsv

        public void saveFeatureScoresToCsv​(DataPartition partition,
                                           java.nio.file.Path csvFile)
                                    throws java.io.IOException
        Saves the pairs that have feature scores from the specified dataset to the CSV file. For each pair, the feature scores are saved followed by the label (0 or 1). The first row contains column titles.
        Parameters:
        partition - - the dataset to be used.
        Throws:
        java.io.IOException - if file cannot be created or I/O error occurs during writing.
        java.lang.IllegalStateException - if featureQuery has not been created.
      • saveDatasetToCsv

        public void saveDatasetToCsv​(DataPartition partition,
                                     java.nio.file.Path csvFile)
                              throws java.io.IOException
        Saves record pairs from the specified dataset to the CSV file. The saved pair format is "key1,key2,label".
        Parameters:
        partition - - the dataset to be used.
        Throws:
        java.io.IOException - if file cannot be created or I/O error occurs during writing.
      • saveAllRecPairsToCsv

        public void saveAllRecPairsToCsv​(java.nio.file.Path csvFile)
                                  throws java.io.IOException
        Saves record pairs from all datasets to the CSV file. The saved pair format is "key1,key2,label".
        Throws:
        java.io.IOException - if file cannot be created or I/O error occurs during writing
      • calcDatasetResult

        public DatasetResult calcDatasetResult​(DataPartition partition,
                                               double threshold)
        Calculates statistics for custom threshold from model predictions stored in one dataset.
        Parameters:
        partition - - identifies dataset used to calculate statistics.
        threshold - - the threshold that separates 'true' and 'false' model scores
        Returns:
        an object that contains statistics for this dataset.
        Throws:
        java.lang.IllegalStateException - if trained model has not been saved.
      • optimizeScoreThreshold

        public double optimizeScoreThreshold​(DataPartition partition)
        Optimizes the score threshold to minimize the error rate and maximize the separation of the correct predictions.
        Parameters:
        partition - - the dataset to be used.
        Returns:
        the optimal score threshold for the given dataset.
      • hasTrainedModel

        public boolean hasTrainedModel()
        Use this method before obtaining any other info about trained model.
        Returns:
        true if the trained model has been assigned to ModelSettings. This also implies that model binary file was saved to model directory.
      • getModelFileName

        public java.lang.String getModelFileName()
        Returns:
        name of binary model file. Trained model must be assigned.
      • getModelTrainDate

        public java.util.Date getModelTrainDate()
        Returns:
        timestamp when model was trained. Trained model must be assigned.
      • getModelTrainIter

        public int getModelTrainIter()
        Returns:
        model training iteration. Trained model must be assigned.
      • setModelScoreThreshold

        public void setModelScoreThreshold​(double value)
        Sets the prediction score threshold of the trained model. This is needed only when a custom threshold is used (default threshold 0.5 is saved with the trained model). This does not change saved training results for datasets. Trained model must be assigned.
        Throws:
        java.lang.IllegalArgumentException - if threshold value is out of range.
      • getModelScoreThreshold

        public double getModelScoreThreshold()
        Returns:
        the prediction score threshold that has been saved for the trained model. Trained model must be assigned.
      • getDatasetResult

        public DatasetResult getDatasetResult​(DataPartition partition)
        Returns a saved dataset result.
        Parameters:
        partition - - the partition that is used in training and has a saved dataset result.
        Returns:
        newly created DatasetResult with info copied from trained model in XML. All saved dataset results use the default threshold 0.5.
        Throws:
        java.lang.IllegalStateException - if a trained model has not been saved.
        java.lang.IllegalArgumentException - if the given partition is not used in training.
      • createTrainedModelHints

        public java.util.List<Hint> createTrainedModelHints()
                                                     throws com.netrics.likeit.NetricsException,
                                                            java.io.IOException
        Creates hints that depend on model training results. May take a long time.
        Returns:
        a list of created hints of various derived types. List is empty if model has not been trained or no applicable hints can be created.
        Throws:
        java.io.IOException - if I/O error occurs while communicating with server
        com.netrics.likeit.NetricsException - if the server indicates that an error has occured
        java.lang.IllegalStateException - if model trainer has not been created.