Configuring the Data File and Its Fields
The data file must be in CSV format. The first row of the file must contain field names.
The data file should contain a representative sample of records in the possibly much larger table on the TIBCO Patterns server where the trained model is eventually used. The data file should have enough records to create a large variety of record pairs. It should have at least 100000 records to use the Low Confidence Pair Finder efficiently. An empty data file should not be assigned, otherwise you will not be able to create any record pairs or train a model.
| 1. | On the Project tab, click Assign. |
| 2. | Click Browse and locate the data file. The Learn UI provides a choice to associate the selected data file with the project in one of the following ways: |
From the project directory
Select Copy data file to project directory. In this case, the file is copied to the project directory. The copied file is accessed by the project. This makes the project folder completely portable so that it can be easily copied to another computer.
From a different location
Do not select Copy data file to project directory. In this case, the file is linked to the project without copying it. This can be used to create several Learn projects on the same system that use the same data file, without creating multiple copies of the data file.
Figure 9: Assign Data File
Reviewing List of Fields
After the data file is assigned, the list of fields from the file is displayed on the Data tab. Statistics for all fields are also displayed. These statistics might help determine the appropriate field type and also whether a field is useful in determining a record match.
Figure 10: Reviewing the List of Fields 
You can perform the following operations in this tab:
| 3. | Select key field |
Choose the key field for the data table in the Key column. The field selected must contain a unique value for every record.
| 4. | Change field type |
The default field type is Searchable Text. You can change this by clicking the field type. Then choose the new field type from the drop-down list. Fields that contain date values, for example, Date of Birth, generally should be changed to Date or Searchable Date type. Fields to be compared as numbers, such as size or weight, should be assigned the Integer or Floating Point field type. However, numeric ID fields, like order numbers, phone numbers, and ZIP codes are best left as Searchable Text fields to be compared as text.
Statistics for a Searchable Date field calculated on the Data tab, must be identical to statistics when the Date field type is selected for the same field.
The custom filter that is applied on the Pair Selection tab for the field of type Date must be preserved when the field type is changed to Searchable Date, and vice versa.
| 5. | Ignore fields |
Selecting the Ignore checkbox for a field eliminates this field from the pair selection and learning process. Fields that are never useful for matching can be marked as ignored.
These fields are not displayed in all the other tabs, avoiding clutter when examining records and pairs.