Package com.netrics.likeit
Class NetricsThesaurus
- java.lang.Object
-
- com.netrics.likeit.NetricsBaseThesaurus
-
- com.netrics.likeit.NetricsThesaurus
-
public class NetricsThesaurus extends NetricsBaseThesaurus
This class contains a list of synonyms that can be used while searching.
-
-
Constructor Summary
Constructors Constructor Description NetricsThesaurus(java.lang.String name)
A thesaurus is used to equate terms which are not typographically similar.NetricsThesaurus(java.lang.String name, java.lang.String filename, java.lang.String encoding)
Create a thesaurus of synonyms from a CSV file.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
addClassesFrom(NetricsFieldedReader rsrc)
Add a set of classes from a fielded source.void
addEquivalenceClass(java.lang.String[] terms)
Add an array of synonyms.void
setCharmap(java.lang.String name)
Set the character map for this thesaurus.void
setExactMatchMode()
Select exact match mode.-
Methods inherited from class com.netrics.likeit.NetricsBaseThesaurus
toString
-
-
-
-
Constructor Detail
-
NetricsThesaurus
public NetricsThesaurus(java.lang.String name)
A thesaurus is used to equate terms which are not typographically similar. For instance, Dick is a commonly used nickname for Richard, and the terms should be considered to be equivalent. This can be accomplished by loading a thesaurus in which these two terms share an equivalence class.In general, all terms which are synonyms but are not typographically similar should be included in a thesaurus.
If a term is included in two equivalence classes, it is considered to be a synonym for all terms in both classes. However, each term in the first class is not considered to be a synonym of each term in the second class. For instance, although duck might be a synonym for bird and crouch, bird and crouch are not considered synonyms.
- Parameters:
name
- Name of the thesaurus.
-
NetricsThesaurus
public NetricsThesaurus(java.lang.String name, java.lang.String filename, java.lang.String encoding)
Create a thesaurus of synonyms from a CSV file. In this case, thesaurus equivalence classes are loaded from a CSV file read by the server. Each line will be an equivalence class and terms are comma separated. Do not call addEquivalence class when using this constructor - it will throw an exception.- Parameters:
name
- The name of the thesaurus to be createdfilename
- The name of the file (on the server) from which to read the thesaurus. The file must be located inside the server's loadable-data directory.encoding
- This defines the character encoding used in the file. Currently supported encodings are: "UTF-8" or "LATIN1". DEFAULT: "LATIN1"
-
-
Method Detail
-
addEquivalenceClass
public void addEquivalenceClass(java.lang.String[] terms)
Add an array of synonyms.
Thesauri retrieved from a server viaNetricsServerInterface.getThesaurus(String)
cannot be modified. Thesauri created from a file viaNetricsThesaurus(String, String, String)
cannot be modified.- Parameters:
terms
- All Strings which are elements of the array are considered to be equal for the purpose of record scoring.- Throws:
java.lang.IllegalStateException
- if the thesaurus was created from a file or retrieved from a server.
-
addClassesFrom
public int addClassesFrom(NetricsFieldedReader rsrc) throws NetricsFileFormatException, NetricsException
Add a set of classes from a fielded source.
This adds a set of equivalence classes from a fielded record source. Each fielded record is considered one equivalence class. Each equivalence class must have at least one entry.- Specified by:
addClassesFrom
in classNetricsBaseThesaurus
- Parameters:
rsrc
- a NetricsFieldedReader object that provides the equivalence classes.- Returns:
- the number of classes added.
- Throws:
NetricsFileFormatException
- if there was an error reading records from the source.NetricsException
- if an equivalence class has less than 1 entry.- See Also:
NetricsFieldedReader
,NetricsCSVReader
-
setCharmap
public void setCharmap(java.lang.String name)
Set the character map for this thesaurus.Set the character map that is used to map all class entries. This should be the same character map assigned to the table fields this thesaurus will be applied to. It defaults to the standard character map.
- Overrides:
setCharmap
in classNetricsBaseThesaurus
- Parameters:
name
- The name of an existing character map.
-
setExactMatchMode
public void setExactMatchMode()
Select exact match mode.
Calling this method sets the match mode to exact matching. By default the match mode is set to inexact match mode. With an exact match mode terms must match exactly (after the character map is applied) for the thesaurus match and any associated weighting or penalty to be applied. In the default inexact mode a match will be applied even if the term in the query or record is slightly different than the term in the thesaurus. The amount of difference allowed between the thesaurus term and the term in the record or query is determined by the length of the term, short terms must match exactly, longer terms allow one or two character differences.
One place where exact mode may be appropriate is when you you have terms in your thesaurus that are very similar but must not be confused.- Overrides:
setExactMatchMode
in classNetricsBaseThesaurus
-
-