Teradata 14 Statistics - What and When to Collect

This blog post covers the 'what' and 'when' of Teradata statistics collection, specifically Teradata V14.Statistics collection is changing quite a lot under Teradata V14.

To quote the Teradata V14 Release Definition:

"The syntax, options, and methods of collecting database statistics are changed for Release 14.0."

Also, note the following performance regression (p56 Teradata V14 Release Definition):

"There is an increased CPU cost for running COLLECT STATISTICS on systems with 20 or more AMPs per node, when compared with pre-13.10 releases. This increase is a side effect of overall system optimization and a fix is not planned."

Let's hope the CPU impact is not too dramatic, and is offset by the improved functionality. Fingers crossed.

Teradata's Advice

New opportunities for statistics collection in Teradata 14.0Statistics collection recommendations – Teradata 14.0

As you are no doubt aware, best practice for statistics collection changes over time. Summary advice for TD v14 statistics is below.

When To Use Full Teradata 14 Statistics

You should collect full statistics in these cases:

  • UPI and USI columns used in non-equality tests.

  • most NUPIs (see below).

  • non-index search columns used in WHERE clauses.

  • all NUSIs.

  • partition column for PPI tables.

  • PARTITION for PPI tables.

  • (PARTITION, PI) for PPI tables.

  • (PARTITION, PI, partition column) for PPI tables.

  • small table index, join and search columns.

Statistics collection on large tables with NUPIs can be very costly. This should be done only if needed.

  • In the absence of collected statistics, NUPI's are assumed to be 75% unique.

  • A NUPI that is < 60% unique or > 90% will benefit from collected statistics.

  • In other cases random AMP sampling is likely to be sufficient and explicit statistics collection can be avoided.

This means a random sample will often be good enough.

When to Use Sample Teradata 14 Statistics

You should use sample statistics in these cases:

  • UPI columns.

  • USI columns.

When to Use Randomly Sampled Teradata 14 Statistics

There is often no need to collect statistics in these cases:

  • evenly distributed NUPIs.

  • UPI equality tests.

  • USI equality tests.

Specific to TD v14

Note these TD v14 hints:

  • refresh SUMMARY statistics after data loads.

  • PARTTITION statistics no longer needed on NPPI tables.

  • refresh at the table level once statistics have been collected, single scan of the table.

  • The HELP STATISTICS COLUMN/INDEX statement is replaced by a new SHOW STATISTICS.

  • Use the new USING clause to fine tune statistics collection options at an individual column level.

  • following significant changes to column demographics, statistics should be deleted and re-collected.

  • Adjust the statistics sample % at job level rather than the system level (DBS Control).

  • Multi-column statistics must be collected in the same sequence as exists in the underlying table.

New TD v14.10 features

Here are some new features available with v14.10:

  • AutoStats will automate the analysis and collection of statistics. What on, when and the sample %.

  • Table level thresholds can be set confirming the number of days or the % of change required in a table before statistics are re-collected.

  • New statistics system management tool accessed via a new Viewpoint portal. This can be used to identify missing, stale or unused statistics.

Collection Frequency/Approach

  • Teradata suggest a 10% change number as a rough guide to decide when to refresh statistics.

  • Unlikely to be beneficial to refresh statistics unless there has been sufficient changes to data demographics. Will just result in wasted CPU cycles.

  • Not refreshing statistics can also be detrimental. The CPU cycles saved can be dwarfed by the poor explain plans that can result.

  • Optimal frequency is as little/often as required to ensure a good query plan.