ThatJeffSmith

SQL Magic Trick: Getting Row Count Without Fetching All Rows

So while hanging out with a bunch of co-workers last week, someone mentioned they had this trick. The challenge is knowing how many rows you have in your result set before you actually read them all. You see, Oracle has no way of telling us how many records are in our data set before we actually fetch the entire data set.

Yes, Oracle gives us %ROWCOUNT, but that only tells us ‘…how many rows affected so far.’ So when you start to fetch the records back, you know how many you have processed. And, you know there are more left.

Wouldn’t it be nice if you could get the size of the record set and have that available as you process the records?

Well, maybe you do.

Of course, nothing is free. Especially something as ‘cool’ as this.

Welcome to the world of analytics. Tom Kyte, of AskTom fame, has said this of analytics -

Analytics rock
Analytics roll

been thinking about writing a book just about analytics…

Analytic functions are usually the answer to some challenging set based questions and puzzles. I’ll know when the answer involves analytics, and jump over to Google for a quick example. How they work is a bit beyond the scope of a quick blog post, but I’ll try a bit.

From the Docs -

“Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.

Analytic functions are commonly used to compute cumulative, moving, centered, and reporting aggregates.”

You should really, really KNOW about analytic functions.

I feel the same way about regular expressions. While it’s not expected that you master them, you need to know when they are called for and how to figure out how to use them as needed. It’s what separates the intermediate hackers from the professionals.

So here’s the ‘trick code’ and an example:

SELECT MainSQL.* FROM (
SELECT t.*,
       ROW_NUMBER() OVER ( ORDER BY 1 ) Row_Number,
       COUNT(1) OVER ( ORDER BY 1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) Row_Count
FROM (
select * from beer -- YOUR QUERY HERE                                          
) t
ORDER BY 1
) MainSQL
WHERE Row_Number >= 1
  AND Row_Number < ( 1 + 50 )
ORDER BY Row_Number;

So everything HAS to happen in the SELECT. What we have is a window being defined for the entire dataset. Or that’s how I read the ‘RANGED BETEWEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING’ bit. So how does this look when we run it?

Analytic functions aren't magic, but they are fun to use

So the ‘cool’ part is seeing the ‘Row Number’ and the ‘Row Count’ as the dataset is retrieved.

But how much does this cost?

Well, for my simple BEER table query, it cost ’641′ – that being the difference of the plan cost for the two queries.

The plans are radically different - nothing is free!

Just because the plans are radically different and the analytic version costs much much more, does this mean you shouldn’t use it? Heck no! Just test your code first, and assume nothing.

This query won’t actually retrieve all the rows either. The ‘AND Row_Number < ( 1 + 50 )' bit ensures that. You could increase the '50' to something higher I suppose. But maybe we should dig a bit deeper? Sometimes plan cost doesn't' tell the whole story. What about IO, Sorts, Memory, CPU, and all that jazz?

Metric Analytic Function Query With Row Numbers Plain Query
logical read bytes from cache 1040384 24576
no work – consistent read gets 121 1
session logical reads 127 3
sorts (memory) 4 2
sorts (rows) 13281 1276
table scan blocks gotten 121 1
table scan rows gotten 11955 121

These numbers scare me – enough that I wouldn’t put this in my application without a lot of testing and without a lot of thinking about whether this is actually the best solution to the problem at hand. But I’m still really glad I was exposed to this, and will now have another trick ‘up my sleeve!’