Class GroupingEngine


  • public class GroupingEngine
    extends java.lang.Object
    Implements the grouping algorithms.
    • Field Detail

      • debugging

        public boolean debugging
        Internal use only.
    • Constructor Detail

      • GroupingEngine

        public GroupingEngine()
        Creates a Grouper object with default options.
    • Method Detail

      • setSkipThreshold

        public final void setSkipThreshold​(float skip_threshold)
        Sets the skip threshold. Grouping ignores all links with a score below the skip threshold.

        Default is 0.0. Valid range is 0.0 to 1.0, inclusive.

        The skip threshold cannot be changed after a link has been added.
        Parameters:
        skip_threshold - Grouping ignores all links with scores below this value.
      • getSkipThreshold

        public final float getSkipThreshold()
        Returns:
        The skip threshold. See setSkipThreshold(float).
      • setMergeThreshold

        public final void setMergeThreshold​(float merge_threshold)
        Sets the merge threshold. Links that would link subgroups, but have a score strictly below this threshold, are ignored.

        Default is 0.0. Valid range is 0.0 to 1.0, inclusive.

        The merge threshold cannot be changed after processing has started.
        Parameters:
        merge_threshold - Links that would link subgroups, but have a score below this threshold, are ignored.
      • getMergeThreshold

        public final float getMergeThreshold()
        Returns:
        The skip threshold. See setSkipThreshold(float).
      • setScorePrecision

        public final void setScorePrecision​(int precision)
        Sets the score precision. Default is 3. Valid range is 1 to 6, inclusive.
        Grouping truncates scores to the specified precision.
        For example, with a precision of 2:
         0.912 and 0.917 are considered equal. Both truncate to 0.91.
         0.82999 and 0.8130 are considered unequal. 0.82999 truncates to 0.82 but 0.830 truncates to 0.83

        A perfect score (1.0) is never considered equal to an imperfect score, regardless of precision.

        The precision cannot be changed after a link has been added.
        Parameters:
        precision - The new score precision.
      • getScorePrecision

        public final int getScorePrecision()
        Returns:
        The score precision. See setScorePrecision(int)
      • associateKeys

        public void associateKeys​(java.lang.String key1,
                                  java.lang.String key2,
                                  float score,
                                  java.lang.String linkId)
        Associates two keys, with a specified score.
        Usually, this is called once for each link.
        Parameters:
        key1 - The first key. Cannot be null.
        key2 - The second key. Cannot be null.
        score - The score. Must be between 0.0 and 1.0, inclusive.
        linkId - Optional, identifies the link. Link-ids are included in the grouping-engine output.
        Typically, link-ids are unique, but that is up to the application. The most common use of link-ids is as a reference back to the source of the link, e.g. a query result.
      • associateKeys

        public void associateKeys​(java.lang.String key1,
                                  java.lang.String key2,
                                  float score)
        Like associateKeys(String,String,float,String), but with no original link-id.

        Associates two keys, with a specified score.
        Usually, this is called once for each link.
        Parameters:
        key1 - The first key. Cannot be null.
        key2 - The second key. Cannot be null.
        score - The score. Must be between 0.0 and 1.0, inclusive.
      • getGroups

        public final java.util.Iterator<Group> getGroups()
        Begins formation of groups and returns an iterator over the groups.
        The iterator can be scanned once and cannot be reset.
        Links cannot be added after calling this.
        Returns:
        a Iterator across the groups. This iterator can be scanned once and cannot be reset.
      • getGroupCount

        public long getGroupCount()
        Returns:
        the number of groups formed. Unavailable until the groups returned by getGroups() have all been processed.
      • getInputLinkCount

        public long getInputLinkCount()
        Returns:
        the total number of links processed.
      • getLinkCount

        public long getLinkCount()
        Returns:
        the number of links used to form groups.
      • getKeyCount

        public long getKeyCount()
        Returns:
        the number of keys found.