MySQL guru Sheeri Kritzer listed eight SQL best practices for database professionals who are hard at work on their projects.
Kritzer’s list began with a preface on why she tries to live by the eight SQL rules in her post about best practices:
In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.”
Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.
Kritzer started the list by suggesting database developers always use explicit joins. “If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join – I consider that a mistake,” she wrote.
One person asked about that practice in a comment about the post. Peter Zaitsev of the MySQL Performance Blog said this:
I personally feel this is syntax matter, unless you use LEFT JOIN. Some people find it more readable, some not – I would for example like to see clauses grouped by tables as for inner join there is no difference between where and on clauses for optimizer and I prefer to see how query can be executed easily.
Kritzer agreed in her response that the use of explicit joins is definitely syntax. “I feel that specifying INNER vs. LEFT or RIGHT or CROSS helps them understand that they’re not saying “I want you to join the tables,” ie, take an action, they’re saying, “I want the result if you join the tables this way,” ie, “this is what I want back.”
Database pros should not be afraid of JOINS. If good indexing has been used for the database, Krtizer said JOINS are not necessarily resource-intensive. And when a correlated subquery can be replaced with a JOIN, it should be replaced.
Developers should also always define field names. “No using SELECT * or INSERT INTO table VALUES. It’s a pain,” she wrote. For reporting, always use the timestamp from the database server, in case those of web servers are different.
Reporting will go more smoothly if one stores IPs as integers with INET_ATON and retrieves them with INET_NTOA, Kritzer also wrote. On the topic of reporting, she considered the impact of network traffic and made this best practice suggestion:
If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting – instead of making 50 connections for states in the US, get them all at once.
If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.
“Running a query in a loop is usually a bad idea,” Kritzer said of those. “Consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.”
—
Add to Del.icio.us | Digg | Yahoo! My Web | Furl
David Utter is a staff writer for Murdok covering technology and business.