Creating Cardinality Statistics for a View

User Guide > Performance Tuning > Creating Cardinality Statistics for Cost-Based Optimization > Creating Cardinality Statistics for a View

If you intend to use the view for caching, you can collect cardinality statistics for it.

Note: You must set up the cache before you define the cardinality statistics gathering profile. See Setting Up Caching for more information.

For the TDV parallel option, you must gather statistics for the column that is the numeric simple key for the view. This column should be either the one identified as the primary key column by the execution plan, or the primary key that you have defined using the Indexes tab.

To collect cardinality statistics for a view

1. From the resource tree, open the view for which you want to collect cardinality statistics.

2. Select the Caching tab.

3. Click Create Cache.

For more information on creating a cache, see Caching to a File Target.

4. If required by Studio, define the Cardinality Statistics for the data source. For more information, see Creating Cardinality Statistics on a Data Source.

5. Select the Cardinality Statistics tab for the view.

6. Click Create Statistics.

7. Check Enable to allow use of the gathered table statistics to optimize execution plans.

You do not need to check the Enable check box to configure statistics gathering, but you do need to check that box for the optimizer to use the statistics. To disable statistics gathering and use of existing statistics, clear the Enable check box and save.

8. Optionally, define cardinality override settings:

— Minimum Cardinality—The minimum number of returned rows you would expect from a SQL SELECT on this table. Zero (0) is a valid number.

— Maximum Cardinality—The maximum number of rows the resource could return.

— Expected Cardinality—Typical number of unique rows returned from this table.

9. Specify the schedule for collecting cardinality statistics: On Cache Refresh or Manual. You can use the Gather Now button at any time.

10. Optional. Set the Statistics Gathering Timeout value.

When the table statistics gathering timeout is set to “-1” the data source statistics gathering timeout is used. A value of 0 (the default) indicates that there is no time limit.

If the statistics processing is already returning data from a full table scan, the timeout setting does not stop processing immediately; instead, the timeout triggers an attempt to make statistical estimates based on the subset of data already returned.

11. Specify your table statistics gathering preferences:

— None—Do not gather statistics for the table.

— Use datasource settings—Gather statistics according to the definitions specified for this data source.

— Gather table boundary statistics—Performs SELECT COUNT (*) on each specified table to count its rows.

— Specify gathering for specific columns—Control statistics gathering column by column.

12. Optionally, select the Specify gathering for specific columns option and then specify column gathering rules and manipulate the values for MIN, MAX, and distinct.

a. In the Gathering Rule column, double-click to select one of the first two options in the table below to gather statistics.

Gathering Rule Option	Choose to...
Gather all statistics	Collects all other statistics plus a full table scan of numeric data types to build a histogram. Or, builds a full string index for string data types. BLOB, CLOB and other data types that yield meaningless statistical data is not configurable for statistical evaluation.
Gather boundary statistics	Collect table boundary statistics and gather minimum and maximum, and count distinct values for numeric data types. For string data types, calculate the number of distinct values for tables with fewer than ten thousand rows. To reduce the cost of gathering statistics for column boundaries, apply one or more assumptions.
Do not gather statistics	Stop gathering statistics on the column (default).

b. Use one or more of the following fields to override and avoid processing of statistics in the database.

This can be a valuable way to avoid gathering column boundary statistics and have TDV use the values you specify when the queries are run.

When TDV gathers statistics, it retrieves all data from the source and builds a histogram and string index based on this data. This operation can be expensive, depending on the number of rows and columns, network bandwidth, and so on. If you have limited time for statistics gathering, overriding the values can be useful.

These column values are used to create a uniform distribution at run time for the evaluation of the selections. If a value is specified for a column, statistics are not gathered for that column.

— Minimum Value—An option to override the minimum value for the column.

— Maximum Value—An option to override the maximum value for the column.

— Distinct Count—An option to override the number of distinct values for the column.

— Num of Buckets—An option to increase or decrease the granularity of the statistics.

13. Save your selections.