Terminology
Before using TIBCO Clarity, go over the terminology used in TIBCO Clarity.
Dataset
A collection of raw data from one or more data sources. A dataset can contain more than one project.
See Project.
Project
A project either contains an entire dataset or a portion of a dataset. Various validation and transformation rules can be performed on a project.
See Dataset.
rows/records
A mode that determines how your source data can be organized:
- In rows mode, each single data row is treated as an independent piece of data.
- In records mode, each object is treated as an independent piece of data, which means a single object can contain more than one row.
Undo/Redo
TIBCO Clarity saves all the operations performed on a project. Use the Undo/Redo function to revert data to a previous status, or to reproduce the steps already performed.
See Project.
Predefined data type
A predefined data type that is defined by TIBCO Clarity based on basic data types, such as String, Integer, and so on.
Custom data type
A customized data type that is defined based on the predefined data types with extra constraints.
Facet
A facet is a single defining aspect that helps determine the set of values for a simple type. By applying facets on a particular column, you can filter down to a subset of rows and understand data in greater detail.
Cluster
A process to find the same items with slightly different spellings.
Switchable groups
A group that merges several data columns to detect duplicates.
Look-up table
A look-up table is defined to help transform source data to a desirable format.
Data profile
A process to get an assessment of the current state of data and information about errors that the data contains.
Data transformation
A process where source data is changed from its given format into the format expected by an appropriate application.
Dedup
A process to find duplicated or similar records in data. It is short for deduplication.
Dependency check
A process to explore dependencies among data columns. You can group some data columns as a Key, and also a Value, and then checks if the Key columns can uniquely determine the Value columns.
Batch processing