Class NetricsQuery
- java.lang.Object
-
- com.netrics.likeit.NetricsQuery
-
- All Implemented Interfaces:
java.io.Serializable
public class NetricsQuery extends java.lang.Object implements java.io.Serializable
NetricsQuery implements the heirarchical query structure. Queries take the form of a hierarchical tree. There are currently five ways to create basic scores - a simple query comparison, a cognate query comparison, a date comparison, a predicate querylet, and the Attributes Query. This will be explained in further detail below. These scores can then be combined by a set of query operators: AND, OR, NOT, RLINK, WGTBYFIELD, REF, MATCH and FIRSTVALID. For each of the score combiners, you will need to pass the NetricsQuery "children" to be combined by that operator. It follows that all the "leaves" in the tree will need to be score generators, and all the "branches" will need to be score combiners.To create the most basic search, just follow the following code:
NetricsSearchCfg tblCfg = new NetricsSearchCfg("test"); tblCfg.setNetricsQuery(NetricsQuery.Simple("Michael Phelps",null,null)); NetricsSearchResponse resp = si.search(tblCfg, null);
The above code will search the table "test" (all searchable text fields) for the string "Michael Phelps." It's just that easy. A much more complicated search might look like the following:String[] fnames = {"name1"}; NetricsQuery nq1 = NetricsQuery.Simple("rec1f1",fnames,null); String[] queries = {"rec1f2","rec1f1"}; String[] fnames2 = {"name1","name2"}; NetricsQuery nq2 = NetricsQuery.Cognate(queries,fnames2,null,0.8); String[] fnames3 = {"date"}; NetricsQuery nq3 = NetricsQuery.Custom("10/12/2001",fnames3,null,NetricsQuery.CS_DATE); NetricsQuery nq4 = NetricsQuery.Predicate( "DATE \"2001/01/01\" <= $\"date\" and $\"date\" <= DATE \"2001/12/31\""); NetricsQuery[] nqs = {nq1,nq2,nq3,nq4}; NetricsQuery nq = NetricsQuery.And(null,nqs);
The above query combines 4 different querylets using an AND (which performs a simple average of the individual scores). The first querylet is a simple string comparison, the second is a cognate string comparison, the third is a date comparison, and the fourth is a predicate comparison.Notice that a NetricsQuery object is created by a static class method of the NetricsQuery class. One or more static methods exist for each type of query, both score generators and score combiners. For more details see the individual method descriptions.
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description NetricsQuery()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static NetricsQuery
And(double[] weights, NetricsQuery[] nqs)
Create an AND Query Expression Node.static NetricsQuery
And(double[] weights, NetricsQuery[] nqs, double[] ignore_scores)
Create an AND Query Expression Node with ignore scores.static NetricsQuery
And(double[] weights, NetricsQuery[] nqs, double[] ignore_scores, double[] reject_scores)
Create an AND Query Expression Node with ignore scores and reject scores.static NetricsQuery
Attributes(java.lang.String[] attr_values, java.lang.String[] attr_names, double[] attr_weights)
Create a Variable Attributes Query node.static NetricsQuery
Cognate(java.lang.String[] qstrs, java.lang.String[] fldnames, double[] fldweights, double noncogwgt)
Create a Cognate Query Expression Node.static NetricsQuery
Cognate(java.lang.String[] qstrs, java.lang.String[] fldnames, double[] fldweights, double noncogwgt, double empty_field_penalty)
Create a Cognate Query Expression Node.static NetricsQuery
Custom(java.lang.String qstr, java.lang.String[] fldnames, double[] fldweights, int customtype)
Create a Custom Query node.static NetricsQuery
FirstValid(ConfidenceQlt[] conf_qlets, boolean invalid_only)
Create a First Valid query expression node.NetricsQuery
getReference()
Obtain a reference to a named querylet.static NetricsQuery
MatchCase(NetricsQuery[] querylets, double core_strength, double[] thresholds, double[] weights, double[] penalty_weights)
Create a Matchcase query expression node.void
noConfidence()
Turn off the reporting of confidence measures and significance values.static NetricsQuery
Not(NetricsQuery nq)
Create a NOT Query Expression Node.static NetricsQuery
Or(double[] weights, NetricsQuery[] nqs)
Create an OR Query Expression Node.static NetricsQuery
Predicate(NetricsPredicate pred)
Create a Predicate Query Expression Node.static NetricsQuery
Predicate(java.lang.String expr)
Create a Predicate Query Expression Node from a string predicate.static NetricsQuery
Reference(java.lang.String querylet_name)
Create a reference to a named querylet.static NetricsQuery
Rlink(java.lang.String modelname, NetricsQuery[] nqs)
Create an RLINK Query Expression Node.static NetricsQuery
Rlink(java.lang.String modelname, NetricsQuery[] nqs, boolean use_threshold)
Create an RLINK Query Expression Node.static NetricsQuery
RlinkWithAlt(java.lang.String model_name, NetricsQuery[] qlets, boolean use_model_threshold, double confidence_cutoff, double learn_match_cutoff, NetricsQuery alternate_query, double alternate_match_cutoff, boolean preindexit)
Create an Rlink query with an alternate query.void
scoreType(int scoreType)
This specifies the type of score to be used for ordering records.void
setEmptyScore(double score)
Sets the score a comparison gets when empty data is encountered.void
setGroup(java.lang.String qlet_group)
Assign a group to this querylet.void
setInvalidScore(double score)
Sets the score a comparison gets when invalid data is encountered.void
setMatchEmpty(boolean value)
Sets the flag to match empty values.void
setName(java.lang.String qlet_name)
Assign a name to this querylet.void
setXparm(int id, double value)
Use this method only on the advice of your TIBCO representative.void
setXparm(int id, int value)
Use this method only on the advice of your TIBCO representative.static NetricsQuery
Simple(java.lang.String qstr, java.lang.String[] fldnames, double[] fldweights)
Create a Simple Query Expression Node.java.lang.String
toString()
Print out query for debugging.void
useCharmap(java.lang.String charmap_name)
Assign a character map for this query.void
useEphemeralThesaurus(com.netrics.likeit.NetricsBaseThesaurus thes_def, double theswgt)
Assign an ephemeral thesaurus to be used by this NetricsQuery.void
useFeatureConfidence()
Use the feature based confidence measure.void
useRlinkThreshold()
Set an RLink query to use the Model cutoff threshold.void
useThesaurus(java.lang.String thesname, double theswgt)
Assign a thesaurus to be used by the NetricsQuery object.static NetricsQuery
Wgtbyfield(java.lang.String fieldName, NetricsQuery nq)
Create a weighted field query expression node.
-
-
-
Field Detail
-
CS_NONE
public static final int CS_NONE
Not a custom scorer- See Also:
- Constant Field Values
-
CS_DATE
public static final int CS_DATE
Custom Date comparison scorer- See Also:
- Constant Field Values
-
-
Method Detail
-
Custom
public static NetricsQuery Custom(java.lang.String qstr, java.lang.String[] fldnames, double[] fldweights, int customtype)
Create a Custom Query node.Currently there is only one custom query type and that is NetricsQuery.CS_DATE. To perform a date search, pass the query date in "qstr," the single date field in the "fldnames" vector, set fldweights to null, and set the customtype to NetricsQuery.CS_DATE. A date comparison will look for inversions between month and date, misplaced century information, and other qualities that a normal string comparison would not take into account.
- Parameters:
qstr
- the query stringfldnames
- list of fields to query; for a date query this must have exactly one field in the list.fldweights
- list of weights for field; for a date query this should be null.customtype
- type of the custom query; for a date query this should be NetricsQuery.CS_DATE.- Returns:
- a custom query object
-
Simple
public static NetricsQuery Simple(java.lang.String qstr, java.lang.String[] fldnames, double[] fldweights)
Create a Simple Query Expression Node.This is the basic TIBCO® Patterns - Search query and is used to compare an arbitrary query string against a field set in a table.
- Parameters:
qstr
- the query stringfldnames
- list of fields to queryfldweights
- list of weights for field- Returns:
- a simple query object
-
And
public static NetricsQuery And(double[] weights, NetricsQuery[] nqs)
Create an AND Query Expression Node.This is used to combine multiple NetricsQuery scores into a single score. This is usually performed with a simple average, but can be a weighted average if the user utilizes the weights parameter.
- Parameters:
weights
- list of floats that are the weights for the sub expressions of this querynqs
- array of NetricsQuery objects that are the sub expressions for this AND- Returns:
- an And query object
-
And
public static NetricsQuery And(double[] weights, NetricsQuery[] nqs, double[] ignore_scores)
Create an AND Query Expression Node with ignore scores.This is used to combine multiple NetricsQuery scores into a single score. This is usually performed with a simple average, but can be a weighted average if the user utilizes the weights parameter.
This version allows the caller to specify a set of scores, one per sub expression, that define a lower bound for the sub expression score. If the score is below the specified lower bound the sub expression will be ignored completely when computing the average or weighted average for the output score.
There are two special score values: -1.0 is the special score used to indicate a record should be rejected. Typically this is used to reject records when a search on empty or invalid data is performed (see the methods
setInvalidScore(double)
andsetEmptyScore(double)
for how to change the score assigned in such cases). Instead of rejecting a record a sub expression can be ignored if empty or invalid data is encountered by setting the empty and invalid scores to -1.0 and then using the ignore threshold to ignore scores of -1.0. -2.0 can be used as an indicator that no ignore threshold processing is to be performed for this sub expression.A cautionary note on ignore scores: Ignore threshold scores should be used with extreme caution. Ignoring a low score in one sub expression will likely boost the score of a record with a poor match on that sub expression (i.e. a score below the threshold) above the score of a similar record with a good match on that sub expression (i.e. above the threshold). Thus when used inappropriately ignore score thresholds will tend to push poorer matches above better matches. This is especially true if ignore thresholds are applied to more than one sub expression. Thus ignore thresholds should only be applied in the rare cases where a low score indicates the sub expression is completely irrelevant and the record should score above close but imperfect matches on the sub expression. Another valid use would be to ignore empty fields as described above, although normally it is better to just let the default score of 0.0 be averaged into the output score.
- Parameters:
weights
- list of floats that are the weights for the sub expressions of this querynqs
- array of NetricsQuery objects that are the sub expressions for this ANDignore_scores
- array of floats that are the ignore-scores for the sub expressions for this AND, or null to skip the ignore tests- Returns:
- an And query object
- See Also:
setEmptyScore(double)
,setInvalidScore(double)
-
And
public static NetricsQuery And(double[] weights, NetricsQuery[] nqs, double[] ignore_scores, double[] reject_scores)
Create an AND Query Expression Node with ignore scores and reject scores.This is used to combine multiple NetricsQuery scores into a single score. This is usually performed with a simple average, but can be a weighted average if the user utilizes the weights parameter.
This version allows the caller to specify a set of ignore scores and/or a set of reject scores. Ignore scores define a lower bound for the sub expression score. If the score is below the specified lower bound the sub expression will be ignored completely when computing the average or weighted average for the output score. Reject scores define a lower bound also. But in the case of the reject score if the sub expression score is below the reject score the entire record will be rejected. (More precisely the record is assigned a score of -1.0, which causes the record to be rejected unless the score is ignored in a higher level AND expression.)
Both ignore scores and reject scores support two special score values: -1.0 is the special score used to indicate a record should be rejected. Typically this is used to reject records when a search on empty or invalid data is performed (see the methods
setInvalidScore(double)
andsetEmptyScore(double)
for how to change the score assigned in such cases). Instead of rejecting a record a sub expression can be ignored if empty or invalid data is encountered by setting the empty and invalid scores to -1.0 and then using the ignore threshold to ignore scores of -1.0. -2.0 can be used as an indicator that no ignore or reject threshold processing is to be performed for this sub expression.If either of the ignore scores or reject scores are null then no ignore or reject tests respectively are performed for this AND. If a sub expression is assigned both an ignore score and a reject score then the lesser value takes precedence over its range, the greater value is applied to the range from the lesser to the greater. E.g. if the reject score for sub expression 1 is 0.2 and the ignore score for sub expression 1 is 0.4 then records with a sub expression 1 score less than 0.2 are rejected and records with a score from 0.2 to 0.4 will ignore sub expression 1. Conversely if the ignore score is 0.2 and the reject score 0.4 then scores less than 0.2 will cause sub expression 1 to be ignored whereas scores from 0.2 to 0.4 will cause the record to be rejected.
A cautionary note on ignore scores: Ignore threshold scores should be used with extreme caution. Ignoring a low score in one sub expression will likely boost the score of a record with a poor match on that sub expression (i.e. a score below the threshold) above the score of a similar record with a good match on that sub expression (i.e. above the threshold). Thus when used inappropriately ignore score thresholds will tend to push poorer matches above better matches. This is especially true if ignore thresholds are applied to more than one sub expression.
Further care is needed when using both ignore thresholds and reject thresholds on the same set of sub expressions. Ignore thresholds tend to favor records with high scores for one expression and very low scores for the others. Reject thresholds will pass a record only if all sub expressions are above the given threshold values. Thus combining ignore thresholds and reject thresholds can result in few or no records being returned.
- Parameters:
weights
- list of floats that are the weights for the sub expressions of this querynqs
- array of NetricsQuery objects that are the sub expressions for this ANDignore_scores
- array of floats that are the ignore-scores for the sub expressions for this AND, or null to skip the ignore testsreject_scores
- array of floats that are the reject-scores for the sub expressions for this AND, or null to skip the reject tests- Returns:
- an And query object
- See Also:
setEmptyScore(double)
,setInvalidScore(double)
-
Or
public static NetricsQuery Or(double[] weights, NetricsQuery[] nqs)
Create an OR Query Expression Node.This is used to choose between multiple NetricsQuery scores by taking the maximum score. To weight the individual scores, the user can utilize the weights parameter.
- Parameters:
weights
- list of floats that are the weights for the sub expressions of this querynqs
- list of NetricsQuery objects that are the sub expressions for this OR.- Returns:
- an Or query object
-
Not
public static NetricsQuery Not(NetricsQuery nq)
Create a NOT Query Expression Node.Take the compliment of the NetricsQuery score - e.g. a .1 becomes a .9.
- Parameters:
nq
- NetricsQuery object that is the sub expression for this NOT.- Returns:
- a Not query object
-
Wgtbyfield
public static NetricsQuery Wgtbyfield(java.lang.String fieldName, NetricsQuery nq)
Create a weighted field query expression node.This replaces record weights from earlier versions of TIBCO® Patterns - Search software. This is used to modify the score of a match based on a value in the record that was matched. It provides a way of favoring certain records over others. Essentially the score of the passed in NetricsQuery is multiplied by the value in the indicated field of the record. The indicated field must have a field type of FLOAT. If the resulting score is greater than 1.0 it is set to 1.0, if it is less than 0.0 it is set to 0.0.
- Parameters:
fieldName
- name of the fields that contains record weightsnq
- the query to weight- Returns:
- a weighted query
-
MatchCase
public static NetricsQuery MatchCase(NetricsQuery[] querylets, double core_strength, double[] thresholds, double[] weights, double[] penalty_weights) throws java.lang.IllegalArgumentException
Create a Matchcase query expression node.This creates a Matchcase score combiner. This combiner is intended for use in queries that match records. It is not intended for "search" queries that a looking for the closest matches to a search term.
When matching records it is often the case that a record is deemed to match if some core set of fields match "well enough". For example when matching person records it may be considered a match if there is a strong match on the name and SSN. The other fields may tend to support or refute the match, but they are not strictly needed to determine the match. There may be a number of these core sets of fields. In the case of person match in addition to name and SSN a match on name, DOB and address may also suffice to establish these records represent the same person. This match case score combiner represents one match case, i.e. one set of core fields. A complete match query typically consists of a number of these match case queries OR'ed together to produce the full record matching query.
A match case then consists of a set of core querylets and a set of secondary querylets. The core querylets hava a core strength. This defines the maximum score of a perfect match on just the core querylets with no other supporting or refuting data from the secondary querylets. The core querylets also have a threshold value. If one or more of the core querylets are below their threshold value it is not considered to have matched "well enough", and thus is not considered to be an instance of this match case. The score returned by the MatchCase combiner is zero if the match case threshold criteria is not met.
The secondary querylets may either add to the basic core match score, or subtract from it, depending on whether their score is above or below a threshold score. This threshold score is the boundary between a value that is considered to have matched and one that has not.
Secondary querylets have two other parameters associated with them, a reward factor and a penalty factor. This determines by how much a match above or below the threshold value will increase or descrease the core score. If a secondary querylet has a reward factor of 1.0, and a perfect match, it would push the core score to 1.0. Similarly if a secondary querylet has a penalty factor of 1.0 and is a perfect non-match, the core score would be pulled down to 0.0. Generally the reward and penalty factors are much less than 1.0. They should be adjusting the score up or down by a small factor, and not pushing it to one extreme or the other. Note that rewards and penalties are accumulated across all secondary querylets. Thus if the sum of rewards or penalty factors is greater than one, the sum of all rewards or penalties could push the score outside of the allowed range of 0.0 to 1.0. If this should happen the score is truncated to the allowed range.
- Parameters:
querylets
- this is an array of all querylets, both core and secondary.core_strength
- this is the core match strength factor as described above. It must be >= 0.0 and <= 1.0.thresholds
- this provides the threshold values for all querylets. It also defines which querylets are considered core querylets and which are secondary. Threshold values that are less than 0.0 (negative) are considered core querylets. The threshold is the absolute value. Threshold values greater than or equal to 0.0 are considered secondary querylets. These values must be >= -1.0 and <= 1.0.weights
- For values corresponding to the core querylets (i.e. whose threshold value is less than 0.0) this is a querylet weighting factor similar to the weighting factor for the And score combiner. For values corresponding to secondary querylets this is the reward factor. These values must be >= 0.0 and <= 1.0.penalty_weights
- This is the penalty weight factor for the secondary querylets. Values corresponding to core querylets are ignored. If this value is null the penalty weights default to the reward weights. These values must be >= 0.0 and <= 1.0 for secondary querylets. Other values are ignored and may be any value.- Returns:
- a match-case query object
- Throws:
java.lang.IllegalArgumentException
- if a required argument is null or if an argument has an invalid value as described above.
-
FirstValid
public static NetricsQuery FirstValid(ConfidenceQlt[] conf_qlets, boolean invalid_only) throws java.lang.IllegalArgumentException
Create a First Valid query expression node.This creates a First Valid score combiner. This combiner is intended for use with Rlink queries. The Rlink query returns a confidence measure that is based on how well trained the model is for the particular case presented. A low confidence value indicates a low reliability of the score output. When the confidence is low it may be desirable to employ an alternative query for this particular comparison. The First Valid score combiner provides a means of doing so.
Note that all current score generators produce a confidence measure of 1.0 (the highest possible). All score combiners, except Rlink and the First Valid score combiner, return the minimum confidence score of their input querylets as their confidence score. The Rlink combiner produces a confidence score based on how well trained the Learn Model is for the example evaluated. The First Valid score combiner returns the confidence measure of the selected querylet.
The First Valid score combiner allows you to assign a minimum confidence measure to each querylet. The querylets are scanned in order, the first querylet that meets or exceeds its minimum confidence measure is used as the output of the First Valid combiner. Note that it is the first querylet that meets the criteria that is selected, not the querylet with the highest confidence measure. If no querylet meets its minimum confidence measure the First Valid combiner returns a reject score (-1.0), so the record is not returned.
The standard use case is to provide an alternative query to be used for those cases where the Learn Model is poorly trained and may not be reliable. The first querylet is the Rlink query that uses the Learn Model, the second querylet is a query that does not use a Learn Model. When the Learn Model is unsure of its result, as indicated by a confidence measure below the given cutoff value, the First Valid combiner will return the result of the alternative query. When the Learn Model is sure of its prediction, as indicated by a confidence measure at or above the given cutoff, the result of the Learn Model is used. In this way an incompletely trained Learn Model can be safely used. You get the benefit of the greater accuracy of the Learn Model for those situations it has been trained for, and you get the results of a predefined query for those situations where the Learn Model has not been well trained. A larger number of alternative queries that use different Learn Models could also be employed.
The meaning of a particular score may be different depending on which querylet returned the score. For example a score of 0.6 from a Learn Model typically indicates a match between the two items. The same 0.6 score from a standard AND of simple and cognate queries generally indicates a fairly poor match. To correct for this difference in the meaning of the scores the First Valid combiner "normalizes" the scores to a common score range. For each querylet a match score cutoff must be provided. This score represents the level that forms the boundary between scores that represent a match and those that do not. If an alternative querylet is selected the scores for that querylet are normalized such that its match score cutoff is set equivalent to the match score cutoff of the first querylet. For example, in a typical case the Learn Model querylet may have a cutoff of 0.5, and the alternative querylet a cutoff of 0.85. If the alternative querylet is selected and its match score is 0.85, the score returned by the First Valid combiner is adjusted to 0.5. A score of 0.8 is adjusted to about 0.47 and a score of 0.99 is adjusted to about 0.96. This adjustment of scores allows a common cutoff score, or other processing rules based on scores, to be applied to the output score of the First Valid query regardless of which querylet is selected by the First Valid combiner.
As the input querylets to the First Valid score combiner likely use the same match data, it is often unnecessary and wasteful to evaluate all of the querylets in the prefilter stage of matching. A no prefilter flag is provided for each querylet. If set the querylet is ignored by the prefilter. This can dramatically improve performance and may help in accuracy. Note that it is invalid to set the no prefilter flag for all querylets. At least one of the equivalent querylets must be passed to the prefilter.
A combination of a query record and a record that is returned with a low confidence from a Learn Model can form a useful training pair for the model. Training on this pair can fill in the poorly trained region, allowing the Learn Model to predict with greater confidence the next time such an example is encountered. To facilitate retrieving such records the invalid only flag can be set to true. This reverses the sense of the First Valid combiner so that it returns only records that failed to meet the confidence criteria for at least one querylet. All other records are assigned the reject score of -1.0.
- Parameters:
conf_qlets
- an array of the input "Confidence" querylets. The ConfidenceQlt class adds the additional parameters to each query: minimum confidence measure, match score cutoff and the no prefilter flag.invalid_only
- If true the logic for this combiner in inverted: The first querylet whose confidence is at or below the confidence threshold is used.- Returns:
- a NetricsQuery object for the First Valid combiner.
- Throws:
java.lang.IllegalArgumentException
- if a required argument is null.- See Also:
ConfidenceQlt
-
Attributes
public static NetricsQuery Attributes(java.lang.String[] attr_values, java.lang.String[] attr_names, double[] attr_weights)
Create a Variable Attributes Query node.This is used to query a set of Variable Attributes values. It is equivalent to an And of a set of simple queries on the attribute values with an empty score of -1.0 and ignore scores of -1.0 on the And. That is it is an And of a match on each of the listed attributes where missing attribute values in the record are ignored.
The same options excepted by a Simple query can be applied to this query. That includes using a thesaurus, which will apply to all attributes, and setting the empty score. Note that by setting the empty score to something other than -1.0 the default behavior of ignoring missing attribute values can be changed so that missing values will lower the overall score rather than being ignored.
The attr_weights argument allows a weight factor to be applied to each of the attributes. This behaves as the querylet weights do in the And query, it adjusts the relative importance of a particular attribute value but it does not reduce the final score.
- Parameters:
attr_values
- the list of attribute values to be matched.attr_names
- the list of names of the attribute to be matched. This must correspond to the attr_values.attr_weights
- the list of weighting factors for each attribute. This may be null, in which case all attributes get equal weights. If given this must match the attr_values and attr_names arrays.- Returns:
- A new NetricsQuery object for this query.
-
Rlink
public static NetricsQuery Rlink(java.lang.String modelname, NetricsQuery[] nqs)
Create an RLINK Query Expression Node.The RLink query uses a TIBCO® Patterns - Search Learn Model to intelligently combine querylet scores to produce a single match score. Its use is similar to that of the AND score combiner.
The named model must have been trained with the same set of querylets as passed to the RLINK combiner.
- Parameters:
modelname
- the name of the Learn Model to usenqs
- list of NetricsQuery objects that are the sub expressions for this RLINK expression.- Returns:
- an RLink query object
-
Rlink
public static NetricsQuery Rlink(java.lang.String modelname, NetricsQuery[] nqs, boolean use_threshold)
Create an RLINK Query Expression Node.The RLink query uses a Patterns Learn Model to intelligently combine querylet scores to produce a single match score. Its use is similar to that of the AND score combiner.
The named model must have been trained with the same set of querylets as passed to the RLINK combiner.
This call allows the use threshold flag to be set. A Patterns Learn Model file may have a cutoff threshold score stored in it. If the use threshold flag is set this score is used as the absolute cutoff score for the query. No records with scores below this value are returned. This supercedes any cutoff specified in the search options. If the named Patterns Learn Model does not contain a threshold score this flag is quietly ignored.
- Parameters:
modelname
- the name of the Learn Model to usenqs
- list of NetricsQuery objects that are the sub expressions for this RLINK expression.use_threshold
- if true the threshold value stored in the model is used as the absolute cut off score.- Returns:
- an RLink query object
-
RlinkWithAlt
public static NetricsQuery RlinkWithAlt(java.lang.String model_name, NetricsQuery[] qlets, boolean use_model_threshold, double confidence_cutoff, double learn_match_cutoff, NetricsQuery alternate_query, double alternate_match_cutoff, boolean preindexit) throws java.lang.IllegalArgumentException
Create an Rlink query with an alternate query. This directly implements the standard use case for a FirstValid query combiner, that is an Rlink query with an alternate standard query that is used when the Rlink query encounters a case where the model is poorly trained. The query returned is a FirstValid combiner with two sub-queries: the Rlink query and the alternate.- Parameters:
model_name
- the name of the Learn Model to be used by the Rlink query.qlets
- the querylets for the Rlink query. The number of querylets must match the number of features in the Learn Model.use_model_threshold
- if this true and the model contains a threshold value that is used as the cutoff.confidence_cutoff
- the confidence threshold for the Rlink query.learn_match_cutoff
- the match strength cutoff value for the Rlink query. This is used only to normalize the Alternate Query scores.alternate_query
- the alternate query to be used when the Rlink query has low confidence.alternate_match_cutoff
- the match strength cutoff for the alternate query, used to normalize its scores.preindexit
- if true the alternate query is passed to the prefilter otherwise only the Rlink query is passed to the prefilter.- Returns:
- an rlink-with-alt query object
- Throws:
java.lang.IllegalArgumentException
- if an argument is invalid as per the ConfidenceQlt class or FirstValid method.- See Also:
ConfidenceQlt
,Rlink(String, NetricsQuery[], boolean)
,FirstValid(com.netrics.likeit.ConfidenceQlt[], boolean)
-
useRlinkThreshold
public void useRlinkThreshold() throws NetricsException
Set an RLink query to use the Model cutoff threshold. If the Learn Model has a threshold encoded into it that value is used as an absolute cutoff threshold for the entire query. This supercedes any cutoff encoded in the search options.- Throws:
NetricsException
- If the query is not an RLink Query, if the use model threshold value is already set and has a value of false, or if the query is invalid.
-
noConfidence
public void noConfidence() throws NetricsException
Turn off the reporting of confidence measures and significance values. This is used only with RLINK queries. By default an estimate of the prediction confidence is calculated and returned. For true predictions (scores at or above 0.5) a set of "significance" scores is calculated for each feature. Computing these values takes some time and resources. For most scenarios these values are not needed. This method can be used to turn off the calculation of these values, saving a small amount of processing time on each query.- Throws:
NetricsException
- if this is not an RLINK query.
-
useFeatureConfidence
public void useFeatureConfidence() throws NetricsException
Use the feature based confidence measure.This is used only with RLINK queries.
When a TIBCO® Patterns - Search Learn Model calculates a score it can also calculate a measure of how confident it is that the score generated is reliable. The reliability of the score depends on how well trained the model is on similar pairs of records.
There are a number of different confidence measures that can be used. The measures available depend on the release version of the model. Confidence measures can not be calculated for models versions less than RFV3. The feature confidence measure is available only on model versions RFV6 or higher (corresponding to TIBCO® Patterns - Search release 5.4, but note it is the model version, not the TIBCO® Patterns - Search version, that matters, a release 5.4 TIBCO® Patterns - Search server using a model generated by a previous release does not support feature confidence measures). The feature confidence measure produces the most reliable measure of how well trained the model is for a particular pair of records. However it is also by far the most expensive to compute. Therefore it is not the default measure used.
Generally the feature confidence measure is used in applications that are specifically looking for poorly trained pairs or records. These may be in applications implementing a continuous learning capability, or model training platforms. It could also be used in trouble shooting applications to help determine why a model came up with a questionable prediction.
Calling this method will cause the feature based confidence measure to be used rather than the default measure.
- Throws:
NetricsException
- if this is not an RLINK query.
-
Cognate
public static NetricsQuery Cognate(java.lang.String[] qstrs, java.lang.String[] fldnames, double[] fldweights, double noncogwgt)
Create a Cognate Query Expression Node.A cognate query is similar to the Simple query, but allows for cross field matching. It uses the same string comparison algorithm but performs the match with multiple query strings. For instance, a first and last name can be searched against the first and last name fields of the database. This aids in finding transposed fields (where a first name occurs in the last name field or vice versa). This method uses the default empty field penalty 1.0.
- Parameters:
qstrs
- the list of query strings. The length of the qstrs array must always equal the length of the fldnames array.fldnames
- list of fields to query.fldweights
- list of weights for fields. The fldweights array must also be the same length as fldnames array if it is given. If null, default field weights 1.0 are used.noncogwgt
- penalty applied to diagonal (non-cognate) field matches. Scores for non-cognate matches are reduced by this factor. For example, if noncogwgt is set to 0.8, the score of a first name exact match in the last name field would be reduced to 0.8.- Returns:
- a cognate query
-
Cognate
public static NetricsQuery Cognate(java.lang.String[] qstrs, java.lang.String[] fldnames, double[] fldweights, double noncogwgt, double empty_field_penalty)
Create a Cognate Query Expression Node.This creates a cognate query as described for the
Cognate(String[], String[], double[], double)
static method, adding the empty field penalty parameter.The empty field penalty applies in situations where the query input has a fewer or greater number of non-empty fields then the record. A typical case is in matching names where there are first, middle and last name fields. Often the middle name field is not populated. If given a query without a middle name you'd like to match records with a middle name, without a large penalty for the unmatched middle name in the record. If given a query with a middle name you'd like to match records that do not have a middle name without a large penalty for the unmatched middle name in the query. Setting the empty field penalty allows you to define how much the match is penalized for the unmatched data in these situations.
A cognate query allows for cross field matching. Therefore it is not valid to reduce penalties only for unmatched data in the middle name field of the record if the query middle name field is empty. The record may have the first name in the middle name field, and the middle name in the first name field. The middle name field of the record gets matched, the first name field is left unmatched. In this case we want to reduce the penalty for the unmatched data in the first name field. The general rule is when the record or query has more unpopulated fields than the other, the adjustment for empty field matches is applied to the fields with the least proportion of matched data. See the TIBCO® Patterns - Search Concepts Guide for a full explanation of how the penalty is applied, with examples.
A penalty of 1.0 implies the full penalty should be applied for unmatched data. This means there is no adjustment because of empty fields. This is the default, and the behavior previous to the availability of this feature.
A penalty factor of 0.0 implies no penalty is applied for unmatched data that can be attributed to an empty field. For example, with a penalty factor of 0.0 a match of "John", "Quincy", "Adams" against "John", "", "Adams" would return a 1.0 score. It is generally recommended that an empty field penalty of 0.0 not be used. Some small penalty should normally be applied so that records that match in their entirety score higher than those that left some query or record data unmatched.
- Parameters:
qstrs
- the list of query stringsfldnames
- list of fields to queryfldweights
- list of weights for fieldnoncogwgt
- penalty applied to diagonal (non-cognate) field matchesempty_field_penalty
- the penalty applied for unmatched data that can be associated with an empty field. If a value less than 0.0 is given the default empty field penalty 1.0 is used.- Returns:
- a cognate query
-
useThesaurus
public void useThesaurus(java.lang.String thesname, double theswgt) throws NetricsException
Assign a thesaurus to be used by the NetricsQuery object.The thesaurus and the query fields should use the same character map.
- Parameters:
thesname
- the name of the thesaurustheswgt
- the weight to give thesaurus matches- Throws:
NetricsException
- if this query doesn't support thesauri
-
useEphemeralThesaurus
public void useEphemeralThesaurus(com.netrics.likeit.NetricsBaseThesaurus thes_def, double theswgt) throws NetricsException
Assign an ephemeral thesaurus to be used by this NetricsQuery.An ephemeral thesaurus is one that exists only for the duration of a single query and is accessible only by that query. The thesaurus name, although required for consistency, is not used. Thus ephemeral thesauri may have the same name as a permanent thesaurus or another ephemeral thesaurus without causing interference.
Ephemeral thesauri are intended for those cases where possible substitutions or weighted terms for a query are generated dynamically based on the query and is not a fixed set of substitutions or weightings that can be encoded into a static thesaurus.
- Parameters:
thes_def
- A thesaurus object (an object of any class extending NetricsBaseThesaurus).theswgt
- the weight to give thesaurus matches- Throws:
NetricsException
- if this query doesn't support thesauri- See Also:
NetricsThesaurus
,NetricsWeightedDictionary
,NetricsCombinedThesaurus
-
useCharmap
public void useCharmap(java.lang.String charmap_name) throws NetricsException
Assign a character map for this query.Use this method with extreme caution.
This method is only applicable to Simple, Cognate and Attribute queries. If called on any other query type it throws an exception.
Normally the character map applied to a query is determined by the character mapping applied to the fields. However an attempt to query across fields using different character maps will throw a parameter conflict error. In general the character map used by the query should always be the same as the character map for the fields being queried. Using different character maps can result in very poor or completely erroneous match response. The conflict error is thrown to prevent inadvertently obtaining erroneous results due to mismatched character maps.
However in some rare circumstance, if the character maps are very similar, it may be possible and desirable to perform a search across fields with different character maps. This method allows the user to override the conflict error and explicitly set the character map to be used for this query.
- Parameters:
charmap_name
- the name of the character map to be used.- Throws:
NetricsException
- if this query does not support character maps
-
setXparm
public void setXparm(int id, int value) throws NetricsException
Use this method only on the advice of your TIBCO representative.- Parameters:
id
- identifies the expert parametervalue
- value of the expert parameter- Throws:
NetricsException
- if this query does not support expert parameters
-
setXparm
public void setXparm(int id, double value) throws NetricsException
Use this method only on the advice of your TIBCO representative.- Parameters:
id
- identifies the expert parametervalue
- value of the expert parameter- Throws:
NetricsException
- if this query does not support expert parameters
-
Predicate
public static NetricsQuery Predicate(NetricsPredicate pred)
Create a Predicate Query Expression Node.The predicate will be evaluated on the record and if it's true, the NetricsQuery will be given a score of 1.0. If it's false a 0.0 score is used. This is a simple comparison for use when other queries are both slower and more cumbersome (like a gender comparison). It can also be used when inexact matching is not required.
The value returned by the predicate expression must be either a boolean value or a floating point value in the range 0.0 to 1.0. If a floating point value is returned it is used as the score for this query expression.
- Parameters:
pred
- the NetricsPredicate object to evaluate- Returns:
- a predicate query
-
Predicate
public static NetricsQuery Predicate(java.lang.String expr)
Create a Predicate Query Expression Node from a string predicate.- Parameters:
expr
- the string predicate expression- Returns:
- a predicate query
-
Reference
public static NetricsQuery Reference(java.lang.String querylet_name)
Create a reference to a named querylet. Query references avoid the calculation expense of using identical querylets in multiple places within a query. As the returned querylet is only a reference, most options (setName(String)
,scoreType(int)
, etc.) cannot be set on it.- Parameters:
querylet_name
- The name of another querylet.- Returns:
- A querylet that refers to another querylet by name.
- See Also:
setName(String)
-
getReference
public NetricsQuery getReference()
Obtain a reference to a named querylet. Query references avoid the calculation expense of using identical querylets in multiple places within a query.- Returns:
- A reference querylet which refers to this querylet.
- See Also:
setName(String)
-
setName
public void setName(java.lang.String qlet_name)
Assign a name to this querylet.This call associates the given name with this query. This is used to identify a particular node in a query tree. The name is associated with this particular query node, it is not associated with and sub-nodes under this query or any higher level nodes that include this query.
A name may be associated with any query type, so it can identify a node anywhere on the query tree, a leaf or any score combiner node.
Currently the primary use for assigning a name to a query is to retrieve the match score for the query. A complex query may have many levels in its query tree. The search results return only the top level scores, and the querylet scores the top level query received. If there is a need to retrieve the score for node at a lower level the node should be assigned a name using this method. The match score for the node can be retrieved from the search results using name assigned.
- Parameters:
qlet_name
- The name assigned to this querylet. The name is letter case sensitive. It may be any valid string. The UTF-8 encoded value of the name is limited to 999 bytes. All querylet names assigned within a query tree must be unique. This implies a NetricsQuery object that has been assigned a name can not be used multiple places in a query tree.- See Also:
NetricsSearchResult.getNamedQltScore(java.lang.String)
,NetricsSearchResult.getNamedQltScoresArray()
,NetricsSearchResult.getNamedQlts()
-
setMatchEmpty
public void setMatchEmpty(boolean value) throws NetricsException
Sets the flag to match empty values. If this is set to true, an empty value in the query will match an empty field value, resulting in 1.0 score. If this is false (default), matching such values will result in an empty score. The flag applies to Simple, Cognate, Date and Attribute queries. It does not apply to Predicate queries.- Parameters:
value
- - the value of the Match Empty flag.- Throws:
NetricsException
- See Also:
setEmptyScore(double)
-
setEmptyScore
public void setEmptyScore(double score)
Sets the score a comparison gets when empty data is encountered.A comparison will receive this score when empty data is encountered in the query or in the field value. If both are empty, the query returns an empty score by default, or it can return 1.0 score, see
setMatchEmpty(boolean)
. The special score -1.0 can be used to indicate the record should be rejected. The default setting is appropriate for most situations, thus it is rarely necessary to set this.- Parameters:
score
- the score to be assigned.- See Also:
setMatchEmpty(boolean)
-
setInvalidScore
public void setInvalidScore(double score)
Sets the score a comparison gets when invalid data is encountered.A comparison will receive this score when invalid data in the record is encountered or certain other errors occur when matching the record. The special score -1.0 can be used to indicate the record is to be rejected. The default setting is appropriate for most situations, thus it is rarely necessary to set this.
- Parameters:
score
- the score to be assigned.
-
setGroup
public void setGroup(java.lang.String qlet_group)
Assign a group to this querylet.This call associates the given group name with this query and all sub-queries of this query.
Grouping is used to identify portions of a query that are to be considered as independent match cases. Each identified group is treated as a separate match case within the over all match. This only pertains to join queries, where a group is associated with a match on a particular child record.
Typically this is used where it is desired to find the best match to an input record that consists of a parent and a set of child records, where there are two or more instances of a child record from the same table. The desire is to find the best match to any one of the multiple child records. A separate querylet would be constructed for each instance of the child record, and then they would be ORed together.
In the above match case the early phase of the matching process will try to find a record that best matches all of the OR cases, leading to poor results. Assigning separate group names to the querylets generated from the separate input child records lets the early phase know these are independent match
record. If it is necessary to generate multiple querylets for a particular child record, where one of these is not going to be a sub-query of the other, than both should be assigned the same group name.
A querylet cannot be assigned to two different groups. Remember that assigning a group name to a NetricsQuery object, also assigns the name to all sub-queries of that object. So if a different group name has been assigned to one of the sub-queries, an exception will be thrown when the query is executed.
- Parameters:
qlet_group
- The group name assigned to this querylet. The name is letter case sensitive. It may be any valid string. The UTF-8 encoded value of the name is limited to 999 bytes.
-
scoreType
public void scoreType(int scoreType)
This specifies the type of score to be used for ordering records.- Parameters:
scoreType
- Score types include the following:Normal (pass in NetricsSearchOpts.SCORE_NORMAL)
This is the standard TIBCO® Patterns - Search search. This type of search looks for the query text inside the record text. The presence of extra information in the record not found in the query does not penalize the record score. Use this score type for a substring or keyword search.Symmetric (pass in NetricsSearchOpts.SCORE_SYMMETRIC)
A symmetric search compares the full texts of both the query and record and evaluates their similarity. If the record contains information not present in the query, the score will be lower than if that information had not been present. Use this score only when the query represents the entirety of the text expected to be found. This is typically used in record matching operations.Reverse (pass in NetricsSearchOpts.SCORE_REVERSE)
Reverse scoring functions the same as the normal search, but with the roles of the record and query reversed. Records are selected based on how well they match some piece of the query, with no penalty for unmatched sections of the query. Use this score type to categorize documents by using the document as a query against a table consisting of records of known keywords.Minimum ( pass in NetricsSearchOpts.MINIMUM).
Minimum of normal and reverse scores. This is similar to the Symmetric score, but gives a larger penalty for unmatched data. The use cases for this score type are limited.Maximum ( pass in NetricsSearchOpts.MAXIMUM).
Maximum of normal and reverse scores. This can be used where either the record or the query may be a subset of the other, as long as one of them is matched well it is to be considered a match.Scoreit ( pass in NetricsSearchOpts.SCORE_IT).
This is a measure of informational difference between query and record somewhat similar to the symmetric score. It is very rarely used.All six score types are computed and returned for each record and can be accessed using the getNormMatchScore, getRevMatchScore, getSymMatchScore, getMinMatchScore, getMaxMatchScore and getITMatchScore NetricsSearchResult methods. The getMatchScore method holds a copy of whichever match score was used to select and sort the records in the list.
-
toString
public java.lang.String toString()
Print out query for debugging. The toString method is implemented for debugging purposes only.- Overrides:
toString
in classjava.lang.Object
-
-