Package com.netrics.likeit
Class NetricsCombinedThesaurus
- java.lang.Object
-
- com.netrics.likeit.NetricsBaseThesaurus
-
- com.netrics.likeit.NetricsCombinedThesaurus
-
public class NetricsCombinedThesaurus extends NetricsBaseThesaurus
Define a combined thesaurus.
This class is used to define a ibi™ Patterns - Search Combined Thesaurus. A combined thesaurus combines the features of a standard NetricsThesaurus with those of a NetricsWeightedDictionary.
-
-
Constructor Summary
Constructors Constructor Description NetricsCombinedThesaurus(java.lang.String name)
Create an empty combined thesaurus.NetricsCombinedThesaurus(java.lang.String name, java.lang.String filename, java.lang.String encoding)
Create a combined thesaurus from a server side CSV file.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
addClassesFrom(NetricsFieldedReader rsrc)
Add a set of classes from a fielded source.void
addEquivalenceClass(java.lang.String[] terms, double weight, double penalty)
Add a set of related terms with associated weight and penalty.void
setCharmap(java.lang.String name)
Set the character map for this thesaurus.void
setExactMatchMode()
Select exact match mode.-
Methods inherited from class com.netrics.likeit.NetricsBaseThesaurus
toString
-
-
-
-
Constructor Detail
-
NetricsCombinedThesaurus
public NetricsCombinedThesaurus(java.lang.String name)
Create an empty combined thesaurus.
Records must be added using the addEquivalenceClass or addClassesFrom methods.
A combined thesaurus can be used to define both substitutions as with a standard NetricsThesaurus and weighted terms as with a NetricsWeightedDictionary. The names for NetricsThesaurus objects, NetricsWeightedDictionary objects and NetricsCombinedThesaurus objects are kept in a common pool. Thus creating a NetricsCombinedThesaurus with the same name as a NetricsThesaurus or NetricsWeightedDictionary will overwrite the other object on the server.- Parameters:
name
- The name of the combined thesaurus to be created.- See Also:
NetricsThesaurus
,NetricsWeightedDictionary
-
NetricsCombinedThesaurus
public NetricsCombinedThesaurus(java.lang.String name, java.lang.String filename, java.lang.String encoding)
Create a combined thesaurus from a server side CSV file.
Each line of the file is an equivalence class. Each comma separated field of the line is a term in the equivalence class. Do not call addEquivalenceClass or addClassesFrom when using this constructor - it will throw an exception. For a combined thesaurus there must be at least three entries for each line of the file. The first item is the weight, the second item is the substitution penalty, the remaining items are the terms for this class.
A combined thesaurus can be used to define both substitutions as with a standard NetricsThesaurus and weighted terms as with a NetricsWeightedDictionary. The names for NetricsThesaurus objects, NetricsWeightedDictionary objects and NetricsCombinedThesaurus objects are kept in a common pool. Thus creating a NetricsCombinedThesaurus with the same name as a NetricsThesaurus or NetricsWeightedDictionary will overwrite the other object on the server.- Parameters:
name
- The name of the combined thesaurus to be created.filename
- The name of the file (on the server) from which to read the thesaurus. The file must be located inside the server's loadable-data directory.encoding
- This defines the character encoding used in the file. Currently supported encodings are: "UTF-8" or "LATIN1". DEFAULT: "LATIN1"- See Also:
NetricsThesaurus
,NetricsWeightedDictionary
-
-
Method Detail
-
addEquivalenceClass
public void addEquivalenceClass(java.lang.String[] terms, double weight, double penalty)
Add a set of related terms with associated weight and penalty.
Thesauri retrieved from a server viaNetricsServerInterface.getThesaurus(String)
cannot be modified. Thesauri created from a file viaNetricsCombinedThesaurus(String, String, String)
cannot be modified.- Parameters:
terms
- The terms to be weighted.weight
- The weight of the term.penalty
- The substitution penalty for terms in this class.- Throws:
java.lang.IllegalStateException
- if the thesaurus was created from a file or retrieved from a server.
-
addClassesFrom
public int addClassesFrom(NetricsFieldedReader rsrc) throws NetricsFileFormatException, NetricsException
Add a set of classes from a fielded source.
This adds a set of equivalence classes from a fielded record source. Each fielded record is considered one equivalence class. The first two entries must be the class weight and the class penalty (as string representations of a floating point value). Thus each equivalence class (record) must have at least three entries.- Specified by:
addClassesFrom
in classNetricsBaseThesaurus
- Parameters:
rsrc
- a NetricsFieldedReader object that provides the equivalence classes.- Returns:
- the number of classes added.
- Throws:
NetricsFileFormatException
- if there was an error reading records from the source.NetricsException
- if an equivalence class has less than 3 entries or the first two are not valid float values.- See Also:
NetricsCSVReader
-
setCharmap
public void setCharmap(java.lang.String name)
Set the character map for this thesaurus.
Set the character map that is used to map all class entries. This should be the same character map assigned to the table fields this thesaurus will be applied to. It defaults to the standard character map.- Overrides:
setCharmap
in classNetricsBaseThesaurus
- Parameters:
name
- The name of an existing character map.
-
setExactMatchMode
public void setExactMatchMode()
Select exact match mode.
Calling this method sets the match mode to exact matching. By default the match mode is set to inexact match mode. With an exact match mode terms must match exactly (after the character map is applied) for the thesaurus match and any associated weighting or penalty to be applied. In the default inexact mode a match will be applied even if the term in the query or record is slightly different than the term in the thesaurus. The amount of difference allowed between the thesaurus term and the term in the record or query is determined by the length of the term, short terms must match exactly, longer terms allow one or two character differences.
One place where exact mode may be appropriate is when you you have terms in your thesaurus that are very similar but must not be confused.- Overrides:
setExactMatchMode
in classNetricsBaseThesaurus
-
-