|
ibi Patterns .NET API
|
NetricsQuery objects are used to implement the complex query structures introduced with release 4.1. More...
Public Member Functions | |
| void | useRlinkThreshold () |
| Set an RLink query to use the Model cutoff Threshold. | |
| void | noConfidence () |
| Turn off the reporting of confidence measures and significance values. | |
| void | useFeatureConfidence () |
| Use the feature based confidence measure. | |
| void | useThesaurus (String thesname, double theswgt) |
| Assign a thesaurus to be used by the NetricsQuery object. The thesaurus and the query fields should use the same character map. | |
| void | useEphemeralThesaurus (NetricsBaseThesaurus thes_def, double theswgt) |
| Assign an ephemeral thesaurus to be used by this NetricsQuery. | |
| void | useCharmap (String charmap_name) |
| Assign a character map for this query. | |
| void | setXparm (int id, int val) |
| Use this method only on the advice of your ibi Patterns support representative. | |
| void | setXparm (int id, double val) |
| Use this method only on the advice of your ibi Patterns support representative. | |
| NetricsQuery | getReference () |
| Obtain a reference to a named querylet. | |
| void | setMatchEmpty (bool value) |
| Sets the flag to match empty values. | |
| void | setEmptyScore (double score) |
| Sets the score a comparison gets when empty data is encountered. | |
| void | setInvalidScore (double score) |
| Sets the score a comparison gets when an error occurs. | |
| void | setName (String qlet_name) |
| Assign a name to this querylet. | |
| void | setGroup (String qlet_group) |
| Assign a group to this querylet. | |
| void | scoreType (NetricsSearchOpts.score_types scoreType) |
| This specifies the type of score to be used for ordering records. | |
| void | scoreType (int scoreType) |
| Obsolete. Use NetricsQuery.scoreType(NetricsSearchOpts.score_types) instead. | |
Static Public Member Functions | |
| static NetricsQuery | Date (String date, String fieldName) |
| A date comparison will look for inversions between month and day, misplaced century information, and other qualities that a normal string comparison would not take into account. | |
| static NetricsQuery | Date (DateTime date, String fieldName) |
| A date comparison will look for inversions between month and day, misplaced century information, and other qualities that a normal string comparison would not take into account. | |
| static NetricsQuery | Custom (String qstr, String[] fldnames, double[] fldweights, int customtype) |
| Create a Custom Query node. | |
| static NetricsQuery | Simple (String qstr, String[] fldnames, double[] fldweights) |
| Create a Simple Query Expression Node. | |
| static NetricsQuery | Simple (String qstr, String[] fldnames) |
| NetricsQuery.Simple(string, string[], double[]) A basic query on a multiple fields, with no field weights. | |
| static NetricsQuery | Simple (String qstr, String fldname) |
| NetricsQuery.Simple(string, string[], double[]) A basic query on a multiple fields, with no field weights. | |
| static NetricsQuery | Simple (String qstr) |
| NetricsQuery.Simple(string, string[], double[]) A basic query on a single field. | |
| static NetricsQuery | And (double[] weights, NetricsQuery[] nqs) |
| Create an AND Query Expression Node. | |
| static NetricsQuery | And (double[] weights, NetricsQuery[] nqs, double[] ignore_scores) |
| Create an AND Query Expression Node with ignore scores. | |
| static NetricsQuery | And (double[] weights, NetricsQuery[] nqs, double[] ignore_scores, double[] reject_scores) |
| Create an AND Query Expression Node with ignore scores and reject scores. | |
| static NetricsQuery | Or (double[] weights, NetricsQuery[] nqs) |
| Create an OR Query Expression Node. | |
| static NetricsQuery | Not (NetricsQuery nq) |
| Create a NOT Query Expression Node. | |
| static NetricsQuery | Wgtbyfield (String fieldName, NetricsQuery nq) |
| Create a weighted field query expression node. | |
| static NetricsQuery | MatchCase (NetricsQuery[] querylets, double core_strength, double[] thresholds, double[] weights, double[] penalty_weights) |
| Create a Matchcase query expression node. | |
| static NetricsQuery | FirstValid (ConfidenceQlt[] conf_qlets, bool invalid_only) |
| Create a First Valid query expression node. | |
| static NetricsQuery | Attributes (String[] attr_values, String[] attr_names, double[] attr_weights) |
| Create a Variable Attributes Query node. | |
| static NetricsQuery | Rlink (String modelname, NetricsQuery[] nqs) |
| Create an RLINK Query Expression Node. | |
| static NetricsQuery | Rlink (String modelname, NetricsQuery[] nqs, bool use_threshold) |
| Create an RLINK Query Expression Node. | |
| static NetricsQuery | RlinkWithAlt (String model_name, NetricsQuery[] qlets, bool use_model_threshold, double confidence_cutoff, double learn_match_cutoff, NetricsQuery alternate_query, double alternate_match_cutoff, bool preindexit) |
| Create an Rlink query with an alternate query. | |
| static NetricsQuery | Cognate (String[] qstrs, String[] fldnames, double[] fldweights, double noncogwgt) |
| Create a Cognate Query Expression Node. | |
| static NetricsQuery | Cognate (String[] qstrs, String[] fldnames, double[] fldweights, double noncogwgt, double empty_field_penalty) |
| Create a Cognate Query Expression Node. | |
| static NetricsQuery | Predicate (NetricsPredicate pred) |
| Create an Predicate Query Expression Node. | |
| static NetricsQuery | Predicate (String expr) |
| Create an Predicate Query Expression Node from a string predicate. | |
| static NetricsQuery | Reference (String querylet_name) |
| Create a reference to a named querylet. | |
Static Public Attributes | |
| static int | CS_NONE = 0 |
| Set to not use a custom scoring function. | |
| static int | CS_DATE = 1 |
| Set to use a date scoring function. | |
NetricsQuery objects are used to implement the complex query structures introduced with release 4.1.
Queries now take the form of a hierarchial tree. There are currently five ways to create basic scores - a simple query comparison, a cognate query comparison, a date comparison, a predicate querylet and the Attributes Query. This will be explained in further detail below. These scores can then be combined by five more operators - AND, OR, NOT, WGTBYFIELD and RLINK (this last is how to use the Learn Model). For each of the score combiners, you will need to pass the NetricsQuery "children" which will be combined by that operator. It follows that all the "leaves" in the tree will need to be score generators, and all the "branches" will need to be score combiners.
To create the most basic search, just follow the following code:
The above code will search the table "test" (all searchable text fields) for the string "Michael Phelps." It's just that easy. A much more complicated search might look like the following:
The above query combines 4 different querylets using an AND (which performs a simple average of the individual scores. The first querylet is a simple string comparison, the second is a cognate string comparison, the third is a date comparison, and the fourth is a predicate comparison. Look below to find out more about each kind of query.
|
inlinestatic |
Create an AND Query Expression Node.
This is used to combine multiple NetricsQuery scores into a single score. This is usually performed with a simple average, but can be a weighted average if the user utilizes the weights parameter.
This sample shows an AND query using 2 querylets that are then And'd together.
| weights | list of floats that are the weights for the sub expressions of this query |
| nqs | array of NetricsQuery objects that are the sub expressions for this AND |
|
inlinestatic |
Create an AND Query Expression Node with ignore scores.
This is used to combine multiple NetricsQuery scores into a single score. This is usually performed with a simple average, but can be a weighted average if the user utilizes the weights parameter.
This version allows the caller to specify a set of scores, one per sub expression, that define a lower bound for the sub expression score. If the score is below the specified lower bound the sub expression will be ignored completely when computing the average or weighted average for the output score.
There are two special score values: -1.0 is the special score used to indicate a record should be rejected. Typically this is used to reject records when a search on empty or invalid data is performed (see methods NetricsQuery.setInvalidScore(double) and NetricsQuery.setEmptyScore(double) for how to change the score assigned in such cases). Instead of rejecting a record a sub expression can be ignored if empty or invalid data is encountered by setting the empty and invalid scores to -1.0 and then using the ignore threshold to ignore scores of -1.0. -2.0 can be used as an indicator that no ignore threshold processing is to be performed for this sub expression.
A cautionary note on ignore scores: Ignore threshold scores should be used with extreme caution. Ignoring a low score in one sub expression will likely boost the score of a record with a poor match on that sub expression (i.e. a score below the threshold) above the score of a similar record with a good match on that sub expression (i.e. above the threshold). Thus when used inappropriately ignore score thresholds will tend to push poorer matches above better matches. This is especially true if ignore thresholds are applied to more than one sub expression. Thus ignore thresholds should only be applied in the rare cases where a low score indicates the sub expression is completely irrelevant and the record should score above close but imperfect matches on the sub expression. Another valid use would be to ignore empty fields as described above, although normally it is better to just let the default score of 0.0 be averaged into the output score.
| weights | array of floats that are the weights for the sub expressions of this query |
| nqs | array of NetricsQuery objects that are the sub expressions for this AND |
| ignore_scores | array of floats that are the ignore-scores for the sub expressions for this AND, or null to skip the ignore tests |
Sample code using And method
|
inlinestatic |
Create an AND Query Expression Node with ignore scores and reject scores.
This is used to combine multiple NetricsQuery scores into a single score. This is usually performed with a simple average, but can be a weighted average if the user utilizes the weights parameter.
This version allows the caller to specify a set of ignore scores and/or a set of reject scores. Ignore scores define a lower bound for the sub expression score. If the score is below the specified lower bound the sub expression will be ignored completely when computing the average or weighted average for the output score. Reject scores define a lower bound also. But in the case of the reject score if the sub expression score is below the reject score the entire record will be rejected. (More precisely the record is assigned a score of -1.0, which causes the record to be rejected unless the score is ignored in a higher level AND expression.)
Both ignore scores and reject scores support two special score values: -1.0 is the special score used to indicate a record should be rejected. Typically this is used to reject records when a search on empty or invalid data is performed (see methods NetricsQuery.setInvalidScore(double) and NetricsQuery.setEmptyScore(double) for how to change the score assigned in such cases). Instead of rejecting a record a sub expression can be ignored if empty or invalid data is encountered by setting the empty and invalid scores to -1.0 and then using the ignore threshold to ignore scores of -1.0. -2.0 can be used as an indicator that no ignore threshold or reject threshold processing is to be performed for this sub expression.
If either of the ignore scores or reject scores are null then no ignore or reject test respectively are performed for this AND. If a sub expression is assigned both an ignore score and a reject score then the lesser value takes precedence over its range, the greater value is applied to the range from the lesser to the greater. E.g. if the reject score for sub expression 1 is 0.2 and the ignore score for sub expression 1 is 0.4 then records with a sub expression 1 score less than 0.2 are rejected and records with a score from 0.2 to 0.4 will ignore sub expression 1. Conversely if the ignore score is 0.2 and the reject score 0.4 then scores less than 0.2 will cause sub expression 1 to be ignored whereas scores from 0.2 to 0.4 will cause the record to be rejected.
A cautionary note on ignore scores: Ignore threshold scores should be used with extreme caution. Ignoring a low score in one sub expression will likely boost the score of a record with a poor match on that sub expression (i.e. a score below the threshold) above the score of a similar record with a good match on that sub expression (i.e. above the threshold). Thus when used inappropriately ignore score thresholds will tend to push poorer matches above better matches. This is especially true if ignore thresholds are applied to more than one sub expression.
Further care is needed when using both ignore thresholds and reject thresholds on the same set of sub expressions. Ignore thresholds tend to favor records with high scores for one expression and very low scores for the others. Reject thresholds will pass a record only if all sub expressions are above the given threshold values. Thus combining ignore thresholds and reject thresholds can result in few or no records being returned.
| weights | array of floats that are the weights for the sub expressions of this query |
| nqs | array of NetricsQuery objects that are the sub expressions for this AND |
| ignore_scores | array of floats that are the ignore-scores for the sub expressions for this AND, or null to skip the ignore tests |
| reject_scores | array of floats that are the reject-scores for the sub expressions for this AND, or null to skip the reject tests |
Sample code using And method
|
inlinestatic |
Create a Variable Attributes Query node.
This is used to query a set of Variable Attribute values. It is equivalent to an And of a set of simple queries on the attribute values with an empty score of -1.0 and an ignore scores of -1.0 on the And. That is it is an And of a match on each of the listed attributes where missing attribute values in the record are ignored.
The same options excepted by a Simple query can be applied to this query. That includes using a thesaurus, which will apply to all attributes, and setting the empty score. Note that by setting the empty score to something other than -1.0 the default behavior of ignoring missing attribute values can be changed so that missing values will lower the overall score rather than being ignored.
The attr_weights argument allows a weight factor to be applied to each of the attributes. This behaves as the querylet weights do in the And query, it adjusts the relative importance of a particular attribute value but it does not reduce the final score.
| attr_values | the list of attribute values to be matched. |
| attr_names | the list of names of the attribute to be matched. This must correspond to the attr_values. |
| attr_weights | the list of weighting factors for each attribute. This may be null, in which case all attributes get equal weights. If given this must match the attr_values and attr_names arrays. |
|
inlinestatic |
Create a Cognate Query Expression Node.
A cognate query is a more complicated version of a Simple query. It uses the same string comparison algorithm but performs the match with multiple query strings. For instance, a first and last name can be searched against the first and last name fields of the database. This aids in finding transposed fields (where a first name occurs in the last name field or vice versa). The length of the qstrs array must always equal the length of the fldnames array. The fldweights array should also be that length if is used. The noncogwgt is a penalty applied to diagonal (non-cognate) field matches. Scores for non-cognate matches are reduced by this factor. For example, if noncogwgt is set to 0.8, the score of a first name exact match in the last name field would be reduced to 0.8.
This sample shows how to perform a search of an ibi™ Patterns - Search table using a Cognate query.
| qstrs | the list of query strings |
| fldnames | list of fields to query |
| fldweights | list of weights for field |
| noncogwgt | weight for diagonal fields in the query |
|
inlinestatic |
Create a Cognate Query Expression Node.
This creates a cognate query as described for the NetricsQuery.Cognate(String[], String[], double[], double) static method, adding the empty field penalty parameter.
The empty field penalty applies in situations where the query input has a fewer or greater number of non-empty fields then the record. A typical case is in matching names where there are first, middle and last name fields. Very often the middle name field is not populated. If given a query without a middle name you'd like to match records with a middle name, without a large penalty for the unmatched middle name in the record. If given a query with a middle name you'd like to match records that do not have a middle name without a large penalty for the unmatched middle name in the query. Setting the empty_field_penalty allows you to define how much the match is penalized for the unmatched data in these situations.
A cognate query allows for cross field matching. Therefore it is not valid to reduce penalties only for unmatched data in the middle name field of the record if the query middle name field is empty. The record may have the first name in the middle name field, and the middle name in the first name field. The middle name field of the record gets matched, the first name field is left unmatched. In this case we want to reduce the penalty for the unmatched data in the first name field. The general rule is when the record or query has more unpopulated fields than the other, the adjustment for empty field matches is applied to the fields with the least proportion of matched data. See the ibi™ Patterns - Search Concepts Guide for a full explanation of how the penalty is applied, with examples.
A penalty factor of 1.0 implies the full penalty should be applied for unmatched data. This means there is no adjustment because of empty fields. This is the default, and the behavior previous to the availability of this feature.
A penalty factor of 0.0 implies no penalty is applied for unmatched data that can be attributed to an empty field. For example, with a penalty factor of 0.0 a match of "John", "Quincy", "Adams" against "John", "", "Adams" would return a 1.0 score.
It is generally recommended that an empty field penalty of 0.0 not be used. Some small penalty should normally be applied so that records that match in their entirety score higher than those that left some query or record data unmatched.
| qstrs | the list of query strings |
| fldnames | list of fields to query |
| fldweights | list of weights for field |
| noncogwgt | penalty applied to diagonal (non-cognate) field matches |
| empty_field_penalty | the penalty applied for unmatched data that can be associated with an empty field. If a value less than 0.0 is given the default empty field penalty is used. |
|
inlinestatic |
Create a Custom Query node.
Currently there is only one custom query type and that is NetricsQuery.CS_DATE. To perform a date search, pass the query date in "qstr," the single date field in the "fldnames" vector, set fldweights to null, and set the customtype to NetricsQuery.CS_DATE. A date comparison will look for inversions between month and date, misplaced century information, and other qualities that a normal string comparison would not take into account.
This sample shows how to perform a search using a date field and the Customer Query.
| qstr | the query string |
| fldnames | list of fields to query |
| fldweights | list of weights for field |
| customtype | type of the custom query |
|
inlinestatic |
A date comparison will look for inversions between month and day, misplaced century information, and other qualities that a normal string comparison would not take into account.
| date | the date to search for |
| fieldName | fieldName field to search |
|
inlinestatic |
A date comparison will look for inversions between month and day, misplaced century information, and other qualities that a normal string comparison would not take into account.
| date | the date to search for |
| fieldName | fieldName field to search |
|
inlinestatic |
Create a First Valid query expression node.
This creates a First Valid score combiner. This combiner is intended for use with Rlink queries. The Rlink query returns a confidence measure that is based on how well trained the model is for the particular case presented. A low confidence measure indicates a low confidence in the reliability of the score output. When the confidence is low it may be desirable to employ a different query for this particular comparison. The First Valid score combiner provides a means of doing so.
Note that all current score generators produce a confidence measure of 1.0 (the highest possible). All score combiners, except Rlink and the First Valid score combiner, return the minimum confidence score of their input querylets as their confidence score. The Rlink combiner produces a confidence score based on how well trained the Learn Model is for the example evaluated. The First Valid score combiner returns the confidence measure of the selected querylet.
The First Valid score combiner allows you to assign a minimum confidence measure to each querylet. The querylets are scanned in order, the first querylet that meets or exceeds its minimum confidence measure is used as the output of the First Valid combiner. Note that it is the first querylet that meets the criteria that is selected, not the querylet with the highest confidence measure. If no querylet meets its minimum confidence measure the First Valid combiner returns a reject score (-1.0), so the record is not returned.
The standard use case is to provide an alternative query to be used for those cases where the Learn Model is poorly trained and may not be reliable. The first querylet is the Rlink query employing the Learn Model, the second querylet is a fixed query that does not use a Learn Model. When the Learn Model is unsure of its results, as indicated by a confidence measure below a given cutoff value, the First Valid combiner would return the results of the alternative query. When the Learn Model is sure of its prediction, as indicated by a confidence measure at or above the given cutoff, the results of the Learn Model are used. In this way an incompletely trained Learn Model can be safely used. You get the benefit of the greater accuracy of the Learn Model for those situations it has been trained for, you get the results of a predefined query for those situations where the Learn Model has not been well trained. In theory a number of alternative queries could be employed that use different Learn Models.
The meaning of a particular score may be different depending on which querylet returned the score. For example a score of 0.6 from a Learn Model typically indicates a match between the two items. The same 0.6 score from a standard AND of simple and cognate queries generally indicates a fairly poor match. To correct for this difference in the meaning of the scores the First Valid combiner "normalizes" the scores to a common score range. For each querylet a match score cutoff must be provided. This score represents the level that forms the boundary between scores that represent a match and those that do not. If an alternative querylet is selected the scores for that querylet are normalized such that its match score cutoff is set equivalent to the match score cutoff of the first querylet. So in a typical case the Learn Model querylet may have a cutoff of 0.5, and the fixed querylet a cutoff of 0.85. If the fixed query is selected and its match score is 0.85, the score returned by the First Valid combiner is adjusted to 0.5. A score of 0.8 is adjusted to about 0.47 and a score of 0.99 is adjusted to about 0.96. This adjustment of scores allows a common cutoff score, or other processing rules based on scores, to be applied regardless of which querylet is selected by the First Valid combiner.
As the input querylets to the First Valid score combiner likely use the same match data, it is often unnecessary and wasteful to apply all of the querylets in the prefilter stage of matching. A no prefilter flag is provided for each querylet. If set the querylet is ignored by the prefilter. This can dramatically improve performance and may help in accuracy. Note it is invalid to set the no prefilter flag for all querylets. At least one must be passed to the prefilter.
As a combination of a query and a record that returns a low confidence from a Learn Model forms an ideal training pair for the model. Training on this pair can fill in the poorly train region, allowing the Learn Model to predict with confidence the next time such an example is encountered. To facilitate retrieving such pairs the invalid only flag can be set to true. This reverses the sense of the First Valid combiner so that it returns only records that failed to meet the confidence criteria for at least one querylet. All other records are assigned the reject score of -1.0.
| conf_qlets | this is an array of the input "Confidence" querylets. The ConfidenceQlt class adds the three additional parameters to each query: minimum confidence measure, match score cutoff and the no prefilter flag. |
| invalid_only | If true the logic for this combiner in inverted: This first querylet whose confidence is at or below the confidence threshold is used. |
|
inline |
Obtain a reference to a named querylet.
Query references avoid the calculation expense of using identical querylets in multiple places within a query.
|
inlinestatic |
Create a Matchcase query expression node.
This creates a Matchcase score combiner. This combiner is intended for use in queries that match records. It is not intended for "search" queries that a looking for the closest matches to a search term.
When matching records it is often the case that a record is deemed to match if some core set of fields match "well enough". For example when matching person records it may be considered a match if there is a strong match on the name and SSN. The other fields may tend to support or refute the match, but they are not strictly needed to determine the match. There may be a number of these core sets of fields. In the case of person match in addition to name and SSN a match on name, DOB and address may also suffice to establish these records represent the same person. This match case score combiner represents one match case, i.e. one set of core fields. A complete match query typically consists of a number of these match case queries OR'ed together to produce the full record matching query.
A match case then consists of a set of core querylets and a set of secondary querylets. The core querylets hava a core strength. This defines the maximum score of a perfect match on just the core querylets with no other supporting or refuting data from the secondary querylets. The core querylets also have a threshold value. If one or more of the core querylets are below their threshold value it is not considered to have matched "well enough", and thus is not considered to be an instance of this match case. The score returned by the MatchCase combiner is zero if the match case threshold criteria is not met.
The secondary querylets may either add to the basic core match score, or subtract from it, depending on whether their score is above or below a threshold score. This threshold score is the boundary between a value that is considered to have matched and one that has not.
Secondary querylets have two other parameters associated with them, a reward factor and a penalty factor. This determines by how much a match above or below the threshold value will increase or descrease the core score. If a secondary querylet has a reward factor of 1.0, and a perfect match, it would push the core score to 1.0. Similarly if a secondary querylet has a penalty factor of 1.0 and is a perfect non-match, the core score would be pulled down to 0.0. Generally the reward and penalty factors are much less than 1.0. They should be adjusting the score up or down by a small factor, and not pushing it to one extreme or the other. Note that rewards and penalties are accumulated across all secondary querylets. Thus if the sum of rewards or penalty factors is greater than one, the sum of all rewards or penalties could push the score outside of the allowed range of 0.0 to 1.0. If this should happen the score is truncated to the allowed range.
| querylets | this is an array of all querylets, both core and secondary. |
| core_strength | this is the core match strength factor as described above. It must be >= 0.0 and <= 1.0 |
| thresholds | this provides the threshold values for all querylets. It also defines which querylets are considered core querylets and which are secondary. Threshold values that are less than 0.0 (negative) are considered core querylets. The threshold is the absolute value. Threshold values greater than or equal to 0.0 are considered secondary querylets. These values must be >= -1.0 and <= 1.0. |
| weights | For values corresponding to the core querylets (i.e. whose threshold value is less than 0.0) this is a querylet weighting factor similar to the weighting factor for the And score combiner. For values corresponding to secondary querylets this is the reward factor. These values must be >= 0.0 and <= 1.0. |
| penalty_weights | This is the penalty weight factor for the secondary querylets. Values corresponding to core querylets are ignored. If this value is null the penalty weights default to the reward weights. These values must be >= 0.0 and <= 1.0 for secondary querylets. Other values are ignored and may be any value. |
|
inline |
Turn off the reporting of confidence measures and significance values.
This is used only with RLINK queries. By default an estimate of the prediction confidence is calculated and returned. For true predictions (scores at or above 0.5) a set of "significance" scores is calculated for each feature. Computing these values takes some time and resources. For most scenarios these values are not needed. This method can be used to turn off the calculation of these values, saving a small amount of processing time on each query.
|
inlinestatic |
Create a NOT Query Expression Node.
Take the compliment of the NetricsQuery score - i.e. a .1 becomes a .9.
| nq | NetricsQuery object that is the sub expression for this NOT. |
|
inlinestatic |
Create an OR Query Expression Node.
This is used to choose between multiple NetricsQuery scores by taking the maximum score. To weight the individual scores, the user can utilize the weights parameter.
This sample shows an OR query using 2 querylets that are then Or'd together.
| weights | list of floats that are the weights for the sub expressions of this query |
| nqs | list of NetricsQuery objects that are the sub expressions for this OR |
|
inlinestatic |
Create an Predicate Query Expression Node.
The predicate will be evaluated on the record and if it's true, the NetricsQuery will be given a score of 1.0. If it's false a 0.0 score is used. This is a simple comparison for use when other queries are both slower and more cumbersome (like a gender comparison). It can also be used when inexact matching is not required.
The value returned by the predicate expression must be either a boolean value or a floating point value in the range 0.0 to 1.0. If a floating point value is returned it is used as the score for this query expression.
| pred | the NetricsPredicate object to evaluate |
NetricsQuery.Predicate(String)
|
inlinestatic |
Create an Predicate Query Expression Node from a string predicate.
This sample shows a AND query using both a Simple and Predicate query and highlights the difference between a predicate filter and a predicate query.
| expr | the string predicate expression |
|
inlinestatic |
Create a reference to a named querylet.
Query references avoid the calculation expense of using identical querylets in multiple places within a query.
As the returned querylet is only a reference, most options (setName(String), scoreType(int), etc.) cannot be set on it.
| querylet_name | The name of another querylet. |
|
inlinestatic |
Create an RLINK Query Expression Node.
The Rlink query uses an ibi™ Patterns - Search Learn Model to intelligently combine querylet scores to produce a single match score. Its use is similar to that of the AND score combiner.
The named model must have been trained with the same set or querylets as passed to the RLINK combiner.
| modelname | the name of the Learn Model to use |
| nqs | list of NetricsQuery objects that are the sub expressions for this RLINK expression. The number of querylets must be the same as the number of features in the model. |
|
inlinestatic |
Create an RLINK Query Expression Node.
The Rlink query uses an ibi™ Patterns - Search Learn Model to intelligently combine querylet scores to produce a single match score. Its use is similar to that of the AND score combiner.
The named model must have been trained with the same set or querylets as passed to the RLINK combiner.
This call allows the use threshold flag to be set. A Patterns Learn Model file may have a cutoff threshold score stored in it. If the use threshold flag is set this score is used as the absolute cutoff score for the query. No records with scores below this value are returned. This supercedes any cutoff specified in the search options. If the named Patterns Learn Model does not contain a threshold score this flag is quietly ignored.
| modelname | the name of the Learn Model to use |
| nqs | list of NetricsQuery objects that are the sub expressions for this RLINK expression. The number of querylets must be the same as the number of features in the model. |
| use_threshold | if true the threshold value stored in the model is used as the absolute cut off score. |
|
inlinestatic |
Create an Rlink query with an alternate query.
This directly implements the standard use case for a FirstValid query combiner, that is an Rlink query with an alternate standard query that is used when the Rlink query encounters a case where the model is poorly trained. The query returned is a FirstValid combiner with two sub-queries: the Rlink query and the alternate.
| model_name | the name of the Learn Model to be used by the Rlink query. |
| qlets | the querylets for the Rlink query. The number of querylets must match the number of features in the Learn Model. |
| use_model_threshold | if this true and the model contains a threshold value that is used as the cutoff. |
| confidence_cutoff | the confidence threshold for the Rlink query. |
| learn_match_cutoff | the match strength cutoff value for the Rlink query. This is used only to normalize the Alternate Query scores. |
| alternate_query | the alternate query to be used when the Rlink query has low confidence. |
| alternate_match_cutoff | the match strength cutoff for the alternate query, used to normalize its scores. |
| preindexit | if true the alternate query is passed to the prefilter otherwise only the Rlink query is passed to the prefilter. |
|
inline |
This specifies the type of score to be used for ordering records.
This can only be applied to fuzzy-text queries (Simple, Cognate, Attribute, and Date).
Score types include the following:
Normal (pass in NetricsSearchOpts.score_types.SCORE_NORMAL). This is the standard search with all weights and penalties applied. This type of search looks for the query text inside the record text. The presence of extra information in the record not found in the query does not penalize the record score. Use this score type for a substring or keyword search.
Symmetric (pass in NetricsSearchOpts.score_types.SCORE_SYMMETRIC). A symmetric search compares the full texts of both the query and record and evaluates their similarity. If the record contains information not present in the query, the score will be lower than if that information had not been present. Use this score only when the query represents the entirety of the text expected to be found. This is typically used in record matching operations.
Reverse (pass in NetricsSearchOpts.score_types.SCORE_REVERSE). Reverse scoring functions the same as the normal search, but with the roles of the record and query reversed. Records are selected based on how well they match some piece of the query, with no penalty for unmatched sections of the query. Use this score type to categorize documents by using the document as a query against a table consisting of records of known keywords.
Minimum ( pass in NetricsSearchOpts.score_types.MINIMUM). Minimum of normal and reverse scores. This is similar to the Symmetric score, but gives a larger penalty for unmatched data. The use cases for this score type are limited.
Maximum ( pass in NetricsSearchOpts.score_types.MAXIMUM). Maximum of normal and reverse scores. This can be used where either the record or the query may be a subset of the other, as long as one of them is matched well it is to be considered a match.
Scoreit ( pass in NetricsSearchOpts.score_types.SCORE_IT). This is a measure of informational difference between query and record somewhat similar to the symmetric score. It is very rarely used.
All six score types are computed and returned for each record and can be accessed using the getNormMatchScore, getRevMatchScore, getSymMatchScore, getMinMatchScore, getMaxMatchScore and getITMatchScore NetricsSearchResult methods. The getMatchScore method holds a copy of whichever match score was used to select and sort the records in the list.
| InvalidOperationException | If applied to an unsupported query type. |
This sample shows an AND query using a score type of Symmetric.
|
inline |
Sets the score a comparison gets when empty data is encountered.
A comparison will receive this score when empty data is encountered. In some situations it is more appropriate to set this score to 0.0 or -1.0, and the user is therefore allowed to configure this setting.
The setEmptyScore can be set globally by using the method with NetricsSearchOpts NetricsSearchOpts.setEmptyScore(double) or on the individual NetricsQuery querylet.
This sample shows a AND query and setting the empty score to .2 on one of the querylets.
|
inline |
Assign a group to this querylet.
This call associates the given group name with this query and all sub-queries of this query.
Grouping is used to identify portions of a query that are to be considered as independent match cases. Each identified group is treated as a separate match case within the over all match. This only pertains to join queries, where a group is associated with a match on a particular child record.
Typically this is used where it is desired to find the best match to an input record that consists of a parent and a set of child records, where there are two or more instances of a child record from the same table. The desire is to find the best match to any one of the multiple child records. A separate querylet would be constructed for each instance of the child record, and then they would be ORed together.
In the above match case the early phase of the matching process will try to find a record that best matches all of the OR cases, leading to poor results. Assigning separate group names to the querylets generated from the separate input child records lets the early phase know these are independent match
record. If it is necessary to generate multiple querylets for a particular child record, where one of these is not going to be a sub-query of the other, than both should be assigned the same group name.
A querylet cannot be assigned to two different groups. Remember that assigning a group name to a NetricsQuery object, also assigns the name to all sub-queries of that object. So if a different group name has been assigned to one of the sub-queries, an exception will be thrown when the query is executed.
| qlet_group | The group name assigned to this querylet. The name is letter case sensitive. It may be any valid string. The UTF-8 encoded value of the name is limited to 999 bytes. |
|
inline |
Sets the score a comparison gets when an error occurs.
A comparison will receive this score when an error occurs. In some situations it is more appropriate to set this score to 0.0 or -1.0, and the user is therefore allowed to configure this setting.
This sample demonstrates a Custom date query and setting the score to .5 if the date field contains invalid data.
|
inline |
Sets the flag to match empty values.
If this is set to true, an empty value in the query will match an empty field value, resulting in 1.0 score. If this is false (default), matching such values will result in an empty score. The flag applies to Simple, Cognate, Date and Attribute queries. It does not apply to Predicate queries.
value - the value of the Match Empty flag.
|
inline |
Assign a name to this querylet.
This call associates the given name with this query. This is used to identify a particular node in a query tree. The name is associated with this particular query node, it is not associated with any sub-nodes under this query or any higher level nodes that include this query.
A name may be associated with any query type, so it can identify a node anywhere on the query tree, a leaf or any score combiner node.
Currently the primary use for assigning a name to a query is to retrieve the match score for the query. A complex query may have many levels on its query tree. The search results return only the top level scores, and the querylet scores the top level query received. If there is a need to retrieve the score for node at a lower level the node should be assigned a name using this method. The match score for the node can be retrieved from the search results using name assigned.
| qlet_name | The name assigned to this querylet. The name is letter case sensitive. It may be any valid string. The UTF-8 encoded value of the name is limited to 999 bytes. All querylet names assigned within a query tree must be unique. This implies a NetricsQuery object that has been assigned a name can not be used multiple places in a query tree. |
|
inlinestatic |
NetricsQuery.Simple(string, string[], double[]) A basic query on a single field.
| qstr | the query string |
|
inlinestatic |
NetricsQuery.Simple(string, string[], double[]) A basic query on a multiple fields, with no field weights.
| qstr | the query string |
| fldname | name of the field to query |
|
inlinestatic |
NetricsQuery.Simple(string, string[], double[]) A basic query on a multiple fields, with no field weights.
| qstr | the query string |
| fldnames | names of the field to query |
|
inlinestatic |
Create a Simple Query Expression Node.
This is the basic query and is used to compare an arbitrary query string against a field set in a table.
This sample shows how to perform a search of a ibi Patterns - Search table using a Simple query.
| qstr | the query string |
| fldnames | list of fields to query |
| fldweights | list of weights for field |
|
inline |
Assign a character map for this query.
Use this method with extreme caution.
This method is only applicable to Simple, Cognate and Attribute queries. If called on any other query type it throws an exception.
Normally the character map applied to a query is determined by the character mapping applied to the fields. However an attempt to query across fields using different character maps will throw a parameter conflict error. In general the character map used by the query should always be the same as the character map for the fields being queried. Using different character maps can result in very poor or completely erroneous match results. The conflict error is thrown to prevent inadvertently obtaining erroneous results due to mismatched character maps.
However in some rare circumstance, if the character maps are very similar, it may be possible and desirable to perform a search across fields with different character maps. This method allows the user to override the conflict error and explicitly set the character map to be used for this query.
| charmap_name | the name of the character map to be used. |
|
inline |
Assign an ephemeral thesaurus to be used by this NetricsQuery.
An ephmeral thesaurus is one that exists only for the duration of a single query and is accessible ony by that query. The thesaurus name, although required for consistency, is not used. Thus ephemeral thesauri may have the same name as a permanent thesaurus or another ephemeral thesaurus without causing interference.
Ephemeral thesauri are inteded for those cases where possible sutstitutions or weighted terms for a query are generated dynamically based on the query and is not a fixed set of substitutions or weightings that can be encoded into a static thesaurus.
| thes_def | A Thesarus object (any extension of NetricsBaseThesaurus). |
| theswgt | the weight to give thesaurus matches. |
|
inline |
Use the feature based confidence measure.
This is used only with RLINK queries.
When ibi™ Patterns - Search Learn Model calculates a score it can also calculate a measure of how confident it is that the score generated is reliable. The reliability of the score depends on how well trained the model is on similar pairs of records.
There are a number of different confidence measures that can be used. The measures available depend on the release version of the model. Confidence measures can not be calculated for models versions less than RFV3. The feature confidence measure is available only on model versions RFV6 or higher (corresponding to ibi™ Patterns - Search release 5.4, but note it is the model version, not the ibi™ Patterns - Search version, that matters, a release 5.4 ibi™ Patterns - Search server using a model generated by a previous release does not support feature confidence measures). The feature confidence measure produces the most reliable measure of how well trained the model is for a particular pair of records. However it is also by far the most expensive to compute. Therefore it is not the default measure used.
Generally the feature confidence measure is used in applications that are specifically looking for poorly trained pairs or records. These may be in applications implementing a continuous learning capability, or model training platforms. It could also be used in trouble shooting applications to help determine why a model came up with a questionable prediction.
Calling this method will cause the feature based confidence measure to be used rather than the default measure.
|
inline |
Set an RLink query to use the Model cutoff Threshold.
If the Learn Model has a threshold encoded into it that value is used as an absolute cutoff threshold for the entire query. This supercedes any cutoff encoded in the search options.
This method throws a NetricsException if the query is not an RLink Query, if the use model threshold value is already set for this query and has a value of false, or if this query is invalid.
|
inline |
Assign a thesaurus to be used by the NetricsQuery object. The thesaurus and the query fields should use the same character map.
This sample shows an AND query where one of the querylets uses a Thesaurus.
| thesname | the name of the thesaurus |
| theswgt | the weight to give thesaurus matches |
|
inlinestatic |
Create a weighted field query expression node.
This replaces record weights from earlier versions of ibi™ Patterns - Search.
Essentially the user can pass in a currently constructed NetricsQuery to be weighted by the value in a specified field in a each record in the result set. It is assumed that the value of that field is a floating point value.
This sample shows how to perform a search of an ibi™ Patterns - Search table using a Simple query apply the Wgtbyfield method to influence the similarity score on record result set..