Class NetricsWeightedDictionary

    • Constructor Summary

      Constructors 
      Constructor Description
      NetricsWeightedDictionary​(java.lang.String name)
      A weighted dictionary is used to weight terms which should have more or less relevance than other terms.
      NetricsWeightedDictionary​(java.lang.String name, java.lang.String filename, java.lang.String encoding)
      Create a weighted dictionary from a CSV file.
    • Constructor Detail

      • NetricsWeightedDictionary

        public NetricsWeightedDictionary​(java.lang.String name)
        A weighted dictionary is used to weight terms which should have more or less relevance than other terms. For instance, in a company names table, the term "company" might be given less weight as it is less indicative of a match (i.e. it is shared by many company names that are unrelated).
        Parameters:
        name - The name of the weighted dictionary to be created.
      • NetricsWeightedDictionary

        public NetricsWeightedDictionary​(java.lang.String name,
                                         java.lang.String filename,
                                         java.lang.String encoding)
        Create a weighted dictionary from a CSV file. The file is read by the server and must be accessible by the server. Each line will be an equivalence class and terms are comma separated. Do not call addEquivalence class when using this constructor - it will throw an exception. For a weighted dictionary, there should be at least two items per line. The first item is the weight and the remaining items are the terms to be weighted. The file should be in CSV format.
        Parameters:
        name - The name of the dictionary to be created
        filename - The name of the file (on the server) from which to read the thesaurus. The file must be located inside the server's loadable-data directory.
        encoding - This defines the character encoding used in the file. Currently supported encodings are: "UTF-8" or "LATIN1". DEFAULT: "LATIN1"
    • Method Detail

      • addEquivalenceClass

        public void addEquivalenceClass​(java.lang.String[] terms,
                                        double weight)
        Add terms with associated weights.

        Thesauri retrieved from a server via NetricsServerInterface.getThesaurus(String) cannot be modified. Thesauri created from a file via NetricsWeightedDictionary(String, String, String) cannot be modified.
        Parameters:
        weight - The weight of the term.
        terms - All Strings which are elements of the array are considered to be equal for the purpose of record scoring.
        Throws:
        java.lang.IllegalStateException - if the thesaurus was created from a file or retrieved from a server.
      • addClassesFrom

        public int addClassesFrom​(NetricsFieldedReader rsrc)
                           throws NetricsFileFormatException,
                                  NetricsException
        Add a set of classes from a fielded source. This adds a set of equivalence classes from a fielded record source. Each fielded record is considered one equivalence class. The first entry must be the class weight (as string representations of a floating point value). Thus each equivalence class (record) must have at least two entries.
        Specified by:
        addClassesFrom in class NetricsBaseThesaurus
        Parameters:
        rsrc - a NetricsFieldedReader object that provides the equivalence classes.
        Returns:
        the number of classes added.
        Throws:
        NetricsFileFormatException - if there was an error reading records from the source.
        NetricsException - if an equivalence class has less than 2 entries or the first entry is not a float value.
        See Also:
        NetricsFieldedReader, NetricsCSVReader
      • setCharmap

        public void setCharmap​(java.lang.String name)
        Set the character map for this thesaurus.

        Set the character map that is used to map all class entries. This should be the same character map assigned to the table fields this thesaurus will be applied to. It defaults to the standard character map.

        Overrides:
        setCharmap in class NetricsBaseThesaurus
        Parameters:
        name - The name of an existing character map.
      • setExactMatchMode

        public void setExactMatchMode()
        Select exact match mode.

        Calling this method sets the match mode to exact matching. By default the match mode is set to inexact match mode. With an exact match mode terms must match exactly (after the character map is applied) for the thesaurus match and any associated weighting or penalty to be applied. In the default inexact mode a match will be applied even if the term in the query or record is slightly different than the term in the thesaurus. The amount of difference allowed between the thesaurus term and the term in the record or query is determined by the length of the term, short terms must match exactly, longer terms allow one or two character differences.

        One place where exact mode may be appropriate is when you you have terms in your thesaurus that are very similar but must not be confused.
        Overrides:
        setExactMatchMode in class NetricsBaseThesaurus