Getting Column Dependencies Using the API
TDV provides a system procedure that you can use against a specified composite view to get its column dependencies, discover where the data comes from, and see how it is derived. The data can come from many different resource types and paths including tables, views, procedures, packaged queries, delimited files, and Web service operations. Some of these possibilities are illustrated here:
If the view depends on a procedure, web service, or packaged query, that is the end of the dependency analysis for that data. No further analysis is done through a procedure, web service, or packaged query.
After you request the column dependencies for a view, the GetColumnDependencies procedure results tell you, for each column in the view, what data source originated the data and what views transformed or filtered the data as shown in this example:
In this example, the column lineage data results might be as shown here:
The GetColumnDependencies procedure lets you:
• | View the resources involved in creating column results going all the way back to the data source or procedure. |
• | Find out which columns have been transformed and how. |
• | Find out which columns are not transformed. |
• | Discover the original data source type. |
• | Find out which columns involve constants. |
• | See who originally created the resources involved and when. |
• | Create views on top of the procedure to analyze and aggregate data source information. |
• | Publish the GetColumnDependencies procedure: |
— | As a new Data Service. You can then access it with the supported protocols like JDBC, ODBC, ADO.NET. |
— | As a new Web service which you can access using REST, SOAP, and the other supported protocols. |
To get the column dependencies for a view
1. | In Studio, open the view containing the columns you want to learn about. |
2. | Click the Info tab and copy the full contents of the Name field which is the path for the view. |
3. | In the localhost directory, navigate to the lib/resource folder and open the GetColumnDependencies procedure. |
4. | Click Execute. Studio prompts you to enter the input parameters for this procedure. |
5. | Paste the view pathpathname copied earlier into the resourcePath field. |
6. | Enter the other input values for the procedure as shown in the table. |
Parameter |
Value Description |
resourcePath |
Required Enter the path of the resource to be analyzed. The supported resource types are TDV SQL Views in plain or published form. |
columnFilter |
Optional. Specify if the column lineage results should be filtered. If empty, no filtering occurs. To filter the results, enter a comma-separated sequence of case-insensitive column names, indicating the columns whose dependencies should be analyzed. |
ignoreCaches |
Optional. Specify if the analysis should ignore whether depended resources are cached or not. Enter one of these values: true—Do not return any caches as part of the column lineage. false—The default. If blank or false, any existing caches in the column lineage are returned in the procedure results. |
recursively |
Optional. Specify if the analysis should be done recursively all the way to the source level of dependency or only return one level of dependency. Enter one of these values: true—Return all levels of dependencies down to the original source level. false—The default. If blank or false, return only a single level of column dependency. |
The analysis is done based on the view definition, independently of whether the view is cached or not.
7. | Click OK. |
TDV executes the procedure and displays the column dependencies in the lower Result For columnDependencies panel.
The contents of the columns are described below:
Column |
Description |
|||||||||
columnName |
The name of the resource column having the column dependency encoded in the row. |
|||||||||
dependencyDatasourcePath |
The path to the data source containing the resource owning the dependency. Empty if not applicable. |
|||||||||
dependencyDatasourceType |
The type of the data source containing the resource owning the dependency. Empty if not applicable. |
|||||||||
dependencyResourcePath |
The path to the resource owning the dependency. Empty if not applicable. |
|||||||||
The type of the resource that owns the dependency. Empty if not applicable. The set of data source types consists of all the data source adapter names accepted by TDV. The table and procedure resource types accepted by TDV is as follows: Database Table |
||||||||||
dependencyIdentifier |
The column name if the dependency is on a column; otherwise, a literal. |
|||||||||
dependencyKind |
One of the following:
|
|||||||||
derivationKind |
One of the following:
|
|||||||||
cardinalityInfo |
When applicable, one of the following:
|
|||||||||
derivations |
When the dependent column is not a direct projection of the dependency, this field denotes how the dependent column is derived. |
9. | Optionally, save the results as a Comma-Separated Values (*.csv) file by clicking Save to File on the Result For columnDependencies panel. |
10. | Optionally, open any of the involved resources to see who originally created the resource and when on the resource Info tab. |