Characteristics of ibi Patterns - Search Machine Learning Models

A machine learning model can be created using the ibi Patterns - Search Learn UI. See ibi Patterns - Search Learn UI User's Guide for details. You can also create your own application for creating and training a model using the ibi Patterns - Search Learn API (a Java library) or have TIBCO solutions engineers create and train a model for you. Every machine learning model uses a fixed set of features representing the information that might be relevant when making a particular type of decision. (In the Sample Problem: Record Equivalence, this was a set of field-to-field comparison scores using ibi Patterns - Search queries). To model the human decision, the chosen set of features should include all the information used to make a "Yes" or "No" decision, though it might also include irrelevant or marginally relevant features.

Defining the features is a crucial factor in developing a successful application using the ibi Patterns - Search Machine Learning Platform. It is important to define the feature set so that all the relevant information is available to the Learn model. The consistency of the feature vectors and their labels that are used for training is also very important. While the machine learning algorithm is tolerant of accidental mislabeling of a few feature vectors used in training, the human "Yes" or "No" decisions should be based only on the information that is included in the features. To train your own model, you should read the documentation provided with the ibi Patterns - Search Learn UI or the Learn API carefully. You can consult the TIBCO experts while defining the features.

For a Learn Model to perform well, it must be trained with a sufficient number of examples covering all the different types of situations that might come up when in use. For example: in the Sample Problem: Record Equivalence, if an incoming record has no address information, for the Learn Model to make a good prediction of whether this record matches another, the Learn Model must be trained with examples where the address information is missing. A Learn Model is said to be well trained if it was trained with a sufficient number of examples for all of the different matching situations that are likely to come up in real data. In practice, manually finding enough training examples to produce a well trained Learn Model can be difficult. The Learn UI makes this task much easier. In particular, the low confidence pair finder feature of the Learn UI automatically finds and presents for labeling training examples that, if consistently labeled, ensures the Learn Model is well trained for the data loaded in the Learn UI.

A Learn Model that is well trained for the records loaded into the Learn UI data table might not be well trained for a full production set of data. The production data might contain matching situations that do not exist in the Learn UI data table. In these situations, the model might make incorrect predictions. To help detect and handle such situations, the Learn Model can output a confidence value with each prediction it makes. This is an approximate measure of how well trained the Learn Model is for this particular matching situation. The confidence value is a number between 0.0 and 1.0.

A confidence value of 0.0 indicates the Learn Model had no training at all in this match situation, or that the training was completely contradictory, and thus this prediction is of very low confidence.
A confidence value of 1.0 indicates the Learn Model was thoroughly trained for this situation with consistent training examples and thus this prediction is of very high confidence.

Depending on the release version of ibi Patterns - Search that produced your Learn Model, different confidence measures might be available. Different confidence measures have different reliability. The most reliable measure, as of the 5.5 release of ibi Patterns - Search, is the feature based confidence measure. However, this measure is also expensive to compute, and might not be suitable for applications with high query loads.

A Learn model is loaded into the ibi Patterns - Searchserver. Like an in-memory table, a Learn model is a named in-memory object. Any number of Learn Models can exist at the same time in the same instance of a ibi Patterns - Search server provided enough free memory is available.

Every model is built to process a specific set of features. Therefore, the feature vectors evaluated by a particular trained model must represent the same features as the ones used to train the model. When matching records, this means that the record structure and the query used when training the model must be identical to the record structure and the query used for evaluating feature vectors with the trained model.