TIBCO® Patterns 6.1.2

Class LowConfPairFinder

  • java.lang.Object
    • com.tibco.patterns.learn.api.autopair.LowConfPairFinder

  • public final class LowConfPairFinder
    extends java.lang.Object
    Finds new low confidence pairs to be labeled and added to the project. Searches the data table using records taken from the same data table. Constructs new pairs using the low confidence matching records that are found in the table. Thus it helps to better train untrained, insufficiently trained and conflicting score ranges of the model.

    While this object is being used, a new trained model should not be saved using the same ModelTrainer object (it would remove existing model from server).

    • Constructor Detail

      • LowConfPairFinder

        public LowConfPairFinder​(RecPairMap recPairMap,
                                 FeatureQuery featureQuery,
                                 ModelTrainer mTrainer,
                                 java.lang.String startPos)
                          throws com.netrics.likeit.NetricsException,
                                 java.io.IOException
        Constructs the object that can find new low confidece pairs.
        Parameters:
        recPairMap - - contains all records pairs of the active model. Not null. The LOW_CONF dataset must be cleared before the next() method is used.
        featureQuery - - contains features of the active model. Not null.
        mTrainer - - model trainer that has the data table and model loaded to server, and has the model created or loaded to ModelTrainer, and has the scorer created.
        startPos - - the initial position in the data table. If null or the position is at the end of table, starts at the first record (by ID).
        Throws:
        java.lang.IllegalStateException - if the table or model has not been loaded to server, or if the model was not created or loaded to ModelTrainer, or if the table is empty, or if feature confidence measure is not used.
        com.netrics.likeit.NetricsException - if the server indicates that an error has occured.
        java.io.IOException - if an I/O error occurs while communicating with server.
    • Method Detail

      • getDftMaxPairsPerQuery

        public static int getDftMaxPairsPerQuery()
        Returns:
        Default maximum number of pairs that are returned for one originating record.
      • getDftMaxFalsePairsPerQuery

        public static int getDftMaxFalsePairsPerQuery()
        Returns:
        Default maximum number of False pairs returned for one originating record.
      • getDftMaxConfThreshold

        public static double getDftMaxConfThreshold()
        Returns:
        Default maximum confidence limit of found pairs
      • setInitialConfThreshold

        public void setInitialConfThreshold​(double initialConfThreshold)
        Sets the initial confidence limit of all returned pairs. Resets the current confidence limit to the same value. Only pairs with confidence at or below this confidence limit are returned. Using lower values for this parameter can limit the number of returned pairs and help focus on pairs with very low confidence first. If too few pairs are found, the confidence threshold is automatically increased.
        Throws:
        java.lang.IllegalArgumentException - if the parameter value not between the minimum and maximum confidence limits.
      • suggestInitialConfThreshold

        public void suggestInitialConfThreshold​(double initialConfThreshold)
        Sets the initial confidence threshold after restricting it within valid limits.
        See Also:
        setInitialConfThreshold(double)
      • setMaxConfTheshold

        public void setMaxConfTheshold​(double maxConfThreshold)
        Sets the maximum confidence limit of all returned pairs. Resets the current confidence limit to the initial confidence limit. If too few pairs are found, the current confidence limit is automatically increased, but it is never larger than the maximum confidence limit.
        Throws:
        java.lang.IllegalArgumentException - if the value is below the minimum or the initial confidence threshold.
      • setSavedPairs

        public void setSavedPairs​(java.util.List<RecPair> savedPairs)
                           throws com.netrics.likeit.NetricsException,
                                  java.io.IOException
        Recalculates model predictions for the saved pairs using the current model and remembers the pairs with confidence that is still below the current confidence threshold. Should be called after the initial confidence theshold is set, and before processing any input records.
        Parameters:
        savedPairs - - unprocessed pairs saved from the last LowConfPairFinder. Not null.
        Throws:
        com.netrics.likeit.NetricsException - if the server indicates that an error has occured
        java.io.IOException - if an I/O error occurs
      • setDebugOutput

        public void setDebugOutput​(boolean value)
        Parameters:
        value - - if true, query debug information is outputted.
      • isDebugOutput

        public boolean isDebugOutput()
        Returns:
        true if query debug information is outputted.
      • processNextInputRec

        public int processNextInputRec()
                                throws com.netrics.likeit.NetricsException,
                                       java.io.IOException
        Finds low confidence pairs for the next input record in the table. Scores, predicts and adds the found pairs (not duplicates) to the common data structure. This is the only method that can be called from the concurrent producer thread.
        Returns:
        the number of found pairs for this input record that were saved in this object.
        Throws:
        com.netrics.likeit.NetricsException - if the server indicates that an error has occured
        java.io.IOException - if an I/O error occurs
      • next

        public RecPair next​(RecPair.Label prevLabel,
                            DataPartition prevPartition)
        Returns the next constructed pair to be labeled. Removes that pair from this object. The pair is not in the project (such pairs are skipped). This is the main method that can be called from a concurrent consumer thread. The LOW_CONF dataset must be cleared before this method is used since any pairs already in that dataset will not be returned (unless the method is used to save the found pairs one by one into the previously cleared LOW_CONF dataset).
        Parameters:
        prevLabel - - the label that the user assigned to the previously returned pair. Used to determine what pairs to return next based on label balance. Can be null if no previous pair exists, or if this functionality is not needed.
        prevPartition - the dataset that the previously returned pair was added to. If it is FSUBSETS, no more pairs from the last pair's subset or its subsets will be returned. Can be null if no previous pair exists or if this functionality is not needed.
        Returns:
        the pair to be labeled, or null if there are no more pairs or all remaining pairs were skipped.
      • hasNext

        public boolean hasNext()
        Checks if more constructed pairs exist. Even if they exist, next() may return null if all of the remaining pairs need to be skipped (e.g. they are already in the project). Can be called from a concurrent consumer thread.
        Returns:
        true if there are more constructed pairs.
      • getNStoredPairs

        public int getNStoredPairs()
        Can be called from a concurrent consumer thread.
        Returns:
        the number of found pairs that are currently stored in this object.
      • getNReturnedPairs

        public int getNReturnedPairs()
        Can only be used in the consumer thread.
        Returns:
        the number of pairs that were already returned from this object.
      • getNReturnedFalsePairs

        public int getNReturnedFalsePairs()
        Can only be used in the consumer thread.
        Returns:
        the number of False pairs that were already returned from this object, and their label was reported back to this object (in the next() call).
      • getNSkippedThreshold

        public int getNSkippedThreshold()
        Can only be used in the consumer thread.
        Returns:
        the number of pairs that were not returned, but skipped because the threshold was adjusted and the pair confidence is now above the threshold.
      • getNSkippedSimilar

        public int getNSkippedSimilar()
        Can only be used in the consumer thread.
        Returns:
        the number of pairs that were not returned, but skipped because the feature values were very similar to the last returned pair.
      • getNSkippedExists

        public int getNSkippedExists()
        Can only be used in the consumer thread.
        Returns:
        the number of pairs that were not returned, but skipped because they already exist in the project.
      • getNSkippedFalse

        public int getNSkippedFalse()
        Can only be used in the consumer thread.
        Returns:
        the number of pairs that were not returned, but skipped because their subsets were previously added to Always False dataset.
      • getNSkippedPairs

        public int getNSkippedPairs()
        Can only be used in the consumer thread.
        Returns:
        the number of pairs that were not returned, but skipped for any reason.
      • hasMoreInputRecords

        public boolean hasMoreInputRecords()
        Returns:
        true if there are more input records in the table that are left to process.
      • getNProcessedInputRecs

        public int getNProcessedInputRecs()
        Get the number of processed records. Can be called from a concurrent consumer thread.
        Returns:
        the number of processed input records (total number of queries).
      • getLastTablePos

        public java.lang.String getLastTablePos()
        Returns:
        the position of the next record to be processed in the table, as it existed at the time the last found pair was returned. If no pairs were ever returned, returns the current position in the table.
      • getLastConfThreshold

        public double getLastConfThreshold()
        Gets the last confidence threshold. Must only be used from consumer thread.
        Returns:
        the confidence threshold used at the time the last found pair was returned. If no pairs were ever returned, returns the current confidence threshold.