Package Contents
The Learn API library is delivered as a JAR file in the following location:
<install-home>\learnapi\lib\TIB_tps_learn_api.jar
All Java packages in this API are hierarchically structured under com.tibco.patterns.learn. The package names mentioned below omit this prefix.
Installation
Copy the TIB_tps_learn_api.jar file to a location of your choice and add an entry to the Java class path for this JAR file.
The RLink JNI library is required to use the Learn API. This library is delivered as: <install-home>\learn_api\bin\rlink_jni.dll . The directory that contains rlink_jni.dll must be added to the Windows environment variable PATH.
Compatibility
The Learn API is compiled to be compatible with Java version 1.8 or higher. Your Java virtual machine and compiler must support version 1.8 or higher in order to use this API.
Functionality that involves tables and their records in the Learn API must be used with a running instance of a TIBCO Patterns - Search server. The version of the TIBCO Patterns - Search server must be the same as the version of the Learn API. The binary Learn model files saved by the Learn API can only be loaded to the TIBCO Patterns - Search server. Its version must be the same or later than the version of the Learn API.
Related Documentation
The Learn UI application that uses the Learn API is part of TIBCO Patterns - Search. See the Learn UI Guide document for related information about training and evaluating Learn models.
The TIBCO Patterns - Search Concepts Guide provides relevant information about Learn models, loading the model to the server and using it for evaluation, the search query types used in feature querylets, the language used to construct predicate expressions, and so on.
Learn API Overview
The TIBCO Patterns - Search Learn API provides functionality to train and evaluate Learn machine learning models in TIBCO Patterns - Search. See TIBCO Patterns - Search Concepts Guide for general information about Learn models, acceptable inputs and outputs, and the types of problems that such models can be applied to.
The Learn API may be accessed on several levels. You may also extend this API by deriving subclasses of some classes to suit your specific needs. However, only the following packages are intended to be directly used by most users:
api.projectapi.featuremodelconfigtrainingpackage:CO...classesrlinkpackage:PredictOptions,RlinkOut, andConf...classes
The core functionality of training and evaluating Learn models is implemented in the rlink_jni library. At the lowest level, the Learn API rlink package uses the rlink_jni library to perform basic operations of creating, saving, loading, training and evaluating a Learn model. This package supports several models that may exist in memory at the same time. Model training and evaluation in this package is done with individual feature vectors, which is typically not used directly. This package is supported by the modelconfig package that provides convenient means of creating Learn models.
At the next API level, the training package provides the automated model training and evaluation functionality that should be used instead of the simple functions in the rlink package. The training package allows training and validating a Learn model using datasets that contain a number of feature vectors and their labels. A model is trained using the Training dataset, and the performance of the trained model is measured using the Validation dataset. Both datasets grouped together are called an experiment. Model training typically requires a number of iterations over the Training dataset. The package provides several implementations of the ConvergenceObserver interface that are used to monitor the training process, gather statistics of each model training iteration, and stop the training when certain criteria are satisfied. The training can be stopped when the error rate for the validation dataset reaches a certain level (using COErrorRate class). The training iteration with the minimum validation error rate can be found and then the model can be retrained using that exact number of iterations using COErrorRateMin class; this is the recommended approach. The training can also be stopped when the maximum change in prediction scores for any validation example becomes small enough (using COScoreChange class). You can extend the relevant classes to gather additional statistics or to use different criteria for stopping the training.
With the packages described above, you can train a model using feature vectors, where the values of individual features are given by the client application. A common application area is to use a Learn model to predict the match between any two records (a record pair) in a data table. In this case each feature value is the output score of a TIBCO Patterns - Search querylet defined for one or more fields in that data table. The recpair package uses datasets and experiments that are composed of labeled record pairs. Each record pair stores its feature scores - the values of the feature vector that is used for training the Learn model. Feature scores are calculated as match scores of individual querylets. A querylet (or model feature) is a generic TIBCO Patterns - Search query that contributes one feature score in the feature vector. Simple, Cognate, Date and Predicate querylets are supported by the api.feature package (see below). A running instance of the TIBCO Patterns - Search server is required to calculate feature scores when only the record keys are given. After the feature scores are calculated, the model training process is the same as described above for the training package.
The highest level of the API supports creating, saving, and loading Learn projects and performing even more automated and convenient model training using the data saved in the Learn project files. Some of this functionality is used by the Learn UI application, which is a part of TIBCO Patterns - Search. The main package that provides access to this functionality is api.project. It includes classes that manage all files in the Learn project directory, including the main XML file for the Learn project, XML files for each saved model, trained model binary files, the data table, thesauri, and so on. A number of properties can be assigned and read from each XML file. This package also implements more automated model training functionality that simplifies the training steps in a client application. This includes the launching and stopping of the TIBCO Patterns - Search server, loading the data table and thesauri to the server, calculating feature scores and generating model training suggestions.
The following packages supplement the functionality related to Learn projects.
The api.feature package manages the XML structure of all features and querylets. This package is always used when features are represented by TIBCO Patterns - Search querylets, i.e. it is also used from the recpair package.
The api.autopair package contains functionality for automatically finding new low confidence record pairs in the data table. New pairs are then presented to the user for labeling.
The api.hint package has classes that generate and assist in the processing of all types of model training suggestions. It also contains classes that calculate statistics of a given dataset and help identify whether a new record pair is likely to be useful for training the model.
The jaxb package has utilities for XML structures, while the jaxb.project and jaxb.model packages contain generated JAXB classes that support XML files for the project and for each saved model. The XML schemas for these files are stored in the schema directory. Finally, the shared package contains general purpose utilities that are used from many other Learn API packages.
| Package | Description |
|---|---|
| com.tibco.patterns.learn.api.autopair |
Contains functionality for finding and generating new record pairs automatically.
|
| com.tibco.patterns.learn.api.feature |
Manages feature query, features and querylets that are stored in Patterns Learn
model.xml file.
|
| com.tibco.patterns.learn.api.hint |
Generation of all types of model training suggestions, filtering of
data table for a specific subset in a suggestion, analysis of datasets by subset and label.
|
| com.tibco.patterns.learn.api.project |
The main package to support Patterns Learn projects and automated model training.
|
| com.tibco.patterns.learn.modelconfig |
Configuration and options for creating and using Learn models.
|
| com.tibco.patterns.learn.recpair |
Management of datasets of record pairs, calculation of feature scores
for record pairs.
|
| com.tibco.patterns.learn.rlink |
Creation, loading and saving of RLink models; model training and evaluation
using individual feature vectors.
|
| com.tibco.patterns.learn.training |
Functionality to train, validate, monitor statistics and stop the training
of RLink models using datasets of feature vectors.
|