Terminology

Dataset

A collection of raw data from one or more data sources. A dataset can contain more than one project.

Project

A project either contains an entire dataset or a portion of a dataset. Various validation and transformation rules can be performed on a project.

See Dataset.

rows/records

A mode that determines how your source data can be organized:

In rows mode, each single data row is treated as an independent piece of data.
In records mode, each object is treated as an independent piece of data, which means a single object can contain more than one row.

Undo/Redo

TIBCO Clarity saves all the operations performed on a project. Use the Undo/Redo function to revert data to a previous status, or to reproduce the steps already performed.

See Project.

Predefined data type

A predefined data type that is defined by TIBCO Clarity based on basic data types, such as String, Integer, and so on.

Custom data type

A customized data type that is defined based on the predefined data types with extra constraints.

Facet

A facet is a single defining aspect that helps determine the set of values for a simple type. By applying facets on a particular column, you can filter down to a subset of rows and understand data in greater detail.

Cluster

A process to find the same items with slightly different spellings.

Switchable groups

A group that merges several data columns to detect duplicates.

Look-up table

A look-up table is defined to help transform source data to a desirable format.

Data profile

A process to get an assessment of the current state of data and information about errors that the data contains.

Data transformation

A process where source data is changed from its given format into the format expected by an appropriate application.

Dedup

A process to find duplicated or similar records in data. It is short for deduplication.

Dependency check

A process to explore dependencies among data columns. You can group some data columns as a Key, and also a Value, and then checks if the Key columns can uniquely determine the Value columns.

Batch processing

Batch processing applies various data management operations performed on one project to the whole dataset.

See Project and Dataset.

null/empty/blank

null

A field without any value.

empty

An empty field without a white space, or with one or more white spaces. For example, "", " ", or " ".

blank

A field without any value (null). Or, an empty field without a white space ("").