Teradata Tips - The Query Optimizer Likes Simple SQL
This is the second of our 'Teradata tips' series. Enjoy!
We all know that Teradata SQL is easy to write, right? Sadly, bad SQL is even easier to write. By ‘bad’ SQL we mean SQL that generates an inefficient query plan.
No matter how we submit a Teradata SQL query, whether we use SQL Assistant or BTEQ or some other SQL tool, we are telling the Teradata system what we want.The Teradata optimizer (or optimiser for us Brits) has the often unenviable task of turning the submitted SQL query into an execution plan. This describes how the Teradata system will physically execute the SQL query against the database.
To see the execution plan for any query simply prefix the SQL with the ‘EXPLAIN’ modifier and re-submit to Teradata. The response is how Teradata will execute the query. But, all you Teradata tips readers knew that, right?
It can be quite a challenge for the optimiser to turn a piece of arbitrary SQL into an efficient execution plan. The Teradata optimiser is ‘cost based’ and aims to generate the most efficient query plan, as you might expect.
Although the Teradata optimiser is undoubtedly one of the best in the market, sometimes even Teradata generates bad query plans. The difference between a ‘good’ (efficient) plan and a ‘bad’ (inefficient) plan can be the difference between a query that takes seconds/minutes and one that takes hours...or even days! A bad plan will simply consume resources and keep running until it hits a limit of some sorts - often spool.
A lot of what the Teradata optimiser has to do is to decide the sequence of table joins in the query. This is largely a function of the number of tables referenced in the query.
Consider the following relationship between the number of tables in a query and the number of ways in which they can be joined:
Number of Tables
Possible Join Orders
The number of possible join orders is the factorial of the number or tables in the query. As more tables are added, even one at a time, the number of possible join orders that the optimiser has to consider soon becomes very large.
We can help the optimiser in its quest to generate efficient query plans by not submitting massive SQL queries that join lots and lots of tables together just because we can - you know who you are ;-)
With this in mind, may we suggest the following Teradata tips:
• DO NOT join more than 6 fact (non-reference) tables together in a single SQL statement
• DO use work tables to store intermediate results - break that massive join into multiple smaller joins with intermediate results stored in tables - it's what the optimiser does anyway under the covers via spool tables
So, Teradata tips readers, be warned: it is not big or clever to write monster SQL queries, just because we can, and 'let the optimiser figure it out'. That will only work up to a point, after which the chances of an inefficient plan become too high.
To reiterate, Teradata tips #2 : the Teradata query optimiser likes simple SQL.
That *does not* mean the optimiser can't cope with complex SQL. It means that the chances of a 'good' query pan reduce as SQL query complexity increases. The number of tables in a query is key determinant of query complexity.
As the man said:
“Simplicity is the ultimate sophistication” Leonardo Da Vinci (1452-1519)
Who are we to argue?
Thanks for reading the latest in VLDB's Teradata Tips series :-)