User Guide > Exploring Data Lineage > About Data Lineage
 
About Data Lineage
When you work with resources in Studio, there are often many relationships and dependencies between resources. A composite view, for example, depends on things like data sources and procedures which might then depend on a different data source. Other views might reference the view and thus be dependent on it. Security policies, triggers, and other resources might be dependencies. Understanding these interdependencies can be useful in analyzing where data comes from, how it is derived, and how it is used.
To help you understand resource dependencies, Studio provides a data lineage graph that supports TDV modeling and data visualization capabilities. The TDV data lineage feature helps you:
Trace where data originates—From data sources through published resources, you can trace which specific data sources contribute to each column.
Understand how data is used and/or transformed—You can determine how the data is used or transformed by TDV resources and which data contributes to the published resources.
Discover the impact of change and analyze dependencies—If you delete or rename a resource, the data lineage graph for related resources graphically indicates which dependent resources are impacted.
You can obtain the data lineage for all resource types, but lineage for these resources can be especially useful:
Data sources
Tables
Composite views
Models
Procedures
Definition sets
Published services
System resources
See Lineage Information for Different Resource Types for what is displayed for each resource type along with examples.
Determine Where Data Comes From
The data lineage graph helps you discover:
What data sources contribute to each column? The data lineage graph helps you understand how many systems a view is hitting and if more data is requested, where TDV gets it.
What are the platforms for each data source? This can help with optimizations like push down, data ship, and caching.
How many records are pulled from each data source? Combine some of the query plan functionality with data source lineage to help with data ship optimizations.
Determine How Data Is Used
For any defined data source, you can open a data lineage graph that shows all TDV resources that use that data. You can then expand the resources in the lineage graph to see the columns that are participating in the lineage.
Determine How Data Is Transformed
When data is modified by procedures or transformed, the data lineage graph can help you trace the resources involved and how the changes are propagated. From the data lineage graph, you can easily open the editor for any of the resources to discover exactly what the changes are. For example, you can:
See that a projection is concatenated in a prior view, then CAST to a date, then used in a GROUP BY clause.
See that a column is used in a JOIN or is used in a filter.
See that a column is used in aggregate or analytical function.
Analyze Changes and Dependencies
The data lineage graph helps you determine:
What will be affected by a proposed change (rename, deletion, addition) to a column or resource? Upstream references are important in this use case. This includes what calls/references the column might need to also be changed (if for example a rename of the column is performed). Also important is where the column is published since that impacts external systems that could break with a change.
How published resource/columns are provided to a consuming application and debug any issues by analyzing the downstream dependencies back to the originating system.
What resources you need to have privileges to because of the resource dependencies. See “About Managing Dependency Privileges” in the TDV Administration Guide which says that a user/group must have the privileges to access all dependencies. The lineage graph can help you investigate the resources to which you need privileges.
Lineage Panel Buttons and Controls
The following table lists the unique controls in the Lineage panel and what they do.
Label
Use to...
Show Detail
Display the definition of the resource in a new panel beneath the resource lineage graph. This button does not appear on the Lineage panel when it is opened from within a resource editor.
Save to File
Open a dialog box to save the dependencies in a file. See Exporting Lineage Reports to a Comma-Separated-Values File.
Hide/Show Dependencies
A toggle button that hides or shows dependencies for a selected resource when enabled. By default, dependencies are displayed.
Hide/Show References
A toggle button that hides or shows references for a selected resource when enabled. By default, references are displayed.
Hide/Show Cached Tables/Views
A toggle button that hides or shows cached tables and views for direct lineage. By default, cached tables and views are displayed.
Show/Hide Indirect Links
A toggle button that shows or hides WHERE/GROUP BY/FROM and other indirect links when enabled. By default, indirect links are hidden.