
Try and Understand SQL Query Plans
As a declarative language, when we fire off SQL for execution we’re telling the database what we want, and the database figures out the how, which is very useful.
Turning the submitted SQL into an execution plan, known as query optimisation, is a very complex topic. A lot of folk assume (wrongly) that the database ‘just runs the SQL’. Not even close, I’m afraid.
The plan that the database query optimiser generates in response to your SQL being submitted is the ‘how.’ The query plan is the key determinant of how long your query will take to run.
Think of a satnav having to compute a route. Going from Liverpool to London via Glasgow would make no sense, but you’d still get there…eventually. That would be considered a bad plan.
A bad query plan can lead to extended execution times, whereas a good plan can lead to shorter execution times.
The good news is that we can interrogate the plan prior to query execution. As most SQL jockeys should be aware, it is typically just a case of prefixing your query with the word EXPLAIN and running it as usual. No doubt it’s not that simple on Oracle (eh Jeff?).
What comes back from running an EXPLAIN is where the fun starts!
Luckily for me, I started my SQL career on Teradata, way back in 1989. Teradata’s EXPLAIN plans are by far the easiest to understand that I have yet encountered. I’ve been sketching them out with a pen and paper ever since I first ran queries.
Each database has it’s own EXPLAIN plan format. Some are just plain ugly, eh Postgres? The common thread though is that they all represent exactly how your query will execute when you remove the EXPLAIN and re-submit the SQL. Cool eh!
For any given database the challenge is to understand the ‘red flags’ in the EXPLAIN output that might be the cause of long running queries.
Specific to Teradata, we learned many years ago to look for things like large fact table re-distributions or the duplication of large dimension tables. There are whole sessions at Teradata conferences covering ‘explain the explain.’ I’ve even given talks on this topic myself.
A word of caution: take all timings in EXPLAIN plans with a massive pinch of salt, no matter how much the DBMS vendor bleats on about them being oh so accurate. It doesn’t matter — it’s the sequence of events we’re interested in, not the predicted elapse time.
Don’t be afraid of the EXPLAIN plan. Get in the habit of drawing it out with a pen and paper and a lot will be revealed. Then do the research into what are considered expensive operations specific to your particular platform. The answer is in there, you’ve just got to find it.
What you can do about a ‘bad plan’ is a post for another day!
#sql #data #mysql #explain #queryoptimisation #optimiser #teradata #postgresql

