SQL Magic Trick: Getting Row Count Without Fetching All Rows

thatjeffsmith Database Stuff 8 Comments

Tell Others About This Story:

So while hanging out with a bunch of co-workers last week, someone mentioned they had this trick. The challenge is knowing how many rows you have in your result set before you actually read them all. You see, Oracle has no way of telling us how many records are in our data set before we actually fetch the entire data set.

Yes, Oracle gives us %ROWCOUNT, but that only tells us ‘…how many rows affected so far.’ So when you start to fetch the records back, you know how many you have processed. And, you know there are more left.

Wouldn’t it be nice if you could get the size of the record set and have that available as you process the records?

Well, maybe you do.

Of course, nothing is free. Especially something as ‘cool’ as this.

Welcome to the world of analytics. Tom Kyte, of AskTom fame, has said this of analytics

Analytics rock
Analytics roll

been thinking about writing a book just about analytics…

Analytic functions are usually the answer to some challenging set based questions and puzzles. I’ll know when the answer involves analytics, and jump over to Google for a quick example. How they work is a bit beyond the scope of a quick blog post, but I’ll try a bit.

From the Docs

“Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.

Analytic functions are commonly used to compute cumulative, moving, centered, and reporting aggregates.”

You should really, really KNOW about analytic functions.

I feel the same way about regular expressions. While it’s not expected that you master them, you need to know when they are called for and how to figure out how to use them as needed. It’s what separates the intermediate hackers from the professionals.

So here’s the ‘trick code’ and an example:

SELECT MainSQL.* FROM (
SELECT t.*,
       ROW_NUMBER() OVER ( ORDER BY 1 ) ROW_NUMBER,
       COUNT(1) OVER ( ORDER BY 1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) ROW_COUNT
FROM (
SELECT * FROM beer -- YOUR QUERY HERE                                          
) t
ORDER BY 1
) MainSQL
WHERE ROW_NUMBER >= 1
  AND ROW_NUMBER < ( 1 + 50 )
ORDER BY ROW_NUMBER;

So everything HAS to happen in the SELECT. What we have is a window being defined for the entire dataset. Or that’s how I read the ‘RANGED BETEWEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING’ bit. So how does this look when we run it?

Analytic functions aren't magic, but they are fun to use

So the ‘cool’ part is seeing the ‘Row Number’ and the ‘Row Count’ as the dataset is retrieved.

But how much does this cost?

Well, for my simple BEER table query, it cost ‘641’ – that being the difference of the plan cost for the two queries.

The plans are radically different - nothing is free!

Just because the plans are radically different and the analytic version costs much much more, does this mean you shouldn’t use it? Heck no! Just test your code first, and assume nothing.

This query won’t actually retrieve all the rows either. The ‘AND Row_Number < ( 1 + 50 )' bit ensures that. You could increase the '50' to something higher I suppose. But maybe we should dig a bit deeper? Sometimes plan cost doesn't' tell the whole story. What about IO, Sorts, Memory, CPU, and all that jazz?

Metric Analytic Function Query With Row Numbers Plain Query
logical read bytes from cache

1040384

24576
no work – consistent read gets

121

1
session logical reads

127

3
sorts (memory)

4

2
sorts (rows)

13281

1276
table scan blocks gotten

121

1
table scan rows gotten

11955

121

These numbers scare me – enough that I wouldn’t put this in my application without a lot of testing and without a lot of thinking about whether this is actually the best solution to the problem at hand. But I’m still really glad I was exposed to this, and will now have another trick ‘up my sleeve!’

Related Posts Plugin for WordPress, Blogger...
Tell Others About This Story:

Comments 8

  1. Just one note to your co-workers statement that this will be fast for almost all real web applications 😉 :

    My company has a webshop with a couple hundred thousand products. We have a search facility based on Oracle Text.
    Suppose I as a customer happen to start a query on a common word so the “where contains()” clause will find 132788 products.
    If I place such a query as “mainsql” with an analytic construct to find the total count, Oracle Text will have to access all those thousands of products…

    As a customer in such a case I would not like to wait even 3-4 seconds (web users have no patience) to get “Now showing 1-50 of 132788 products”.
    I would prefer to blindingly fast get “Now showing 1-50 of many products – we suggest you add an extra word to narrow your search.”

    Anyway – that’s opinion 😉

    1. JeffS Post
      Author
        1. JeffS Post
          Author
  2. JeffS Post
    Author

    Kim, I think you win for the longest comment reply ever on my blog! Thanks so much for sharing your insight and pointing out Tom’s take on pagination.

    Your note regarding the engine having to access all the rows in order to count them is something I hope no one glosses over when they read these ‘tricks.’ It’s not really a trick then, is it?

    My co-worker who originally shared the analytic approach with me for this row count puzzle supplied a simple, yet very important caveat: ‘For almost all queries for real web applications, this is so fast and does not make a time difference in terms of with or without the windowing.’ Yeah, don’t build this into your data warehouse reports without doing a bit of testing!

  3. Hi, Jeff

    Analytics sure rock’n’roll 😉

    You could combine your trick with Tom Kyte’s (and others) way to “paginate” result sets:
    http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:127412348064

    Something like this:


    SQL> variable from_row number
    SQL> variable to_row number
    SQL> begin
    2 :from_row := 1;
    3 :to_row := 3;
    4 end;
    5 /

    PL/SQL procedure successfully completed.

    SQL> select s2.*
    2 from (
    3 select s1.*
    4 , rownum rnum
    5 from (
    6 select main.*
    7 , count(*) over () row_count
    8 from (
    9 select empno
    10 , ename
    11 from scott.emp
    12 ) main
    13 order by main.ename
    14 ) s1
    15 where rownum = :from_row
    18 ;

    EMPNO ENAME ROW_COUNT RNUM
    ---------- ---------- ---------- ----------
    7876 ADAMS 14 1
    7499 ALLEN 14 2
    7698 BLAKE 14 3

    SQL> begin
    2 :from_row := 4;
    3 :to_row := 6;
    4 end;
    5 /

    PL/SQL procedure successfully completed.

    SQL>
    SQL> select s2.*
    2 from (
    3 select s1.*
    4 , rownum rnum
    5 from (
    6 select main.*
    7 , count(*) over () row_count
    8 from (
    9 select empno
    10 , ename
    11 from scott.emp
    12 ) main
    13 order by main.ename
    14 ) s1
    15 where rownum = :from_row
    18 ;

    EMPNO ENAME ROW_COUNT RNUM
    ---------- ---------- ---------- ----------
    7782 CLARK 14 4
    7902 FORD 14 5
    7900 JAMES 14 6

    SQL> explain plan for
    2 select s2.*
    3 from (
    4 select s1.*
    5 , rownum rnum
    6 from (
    7 select main.*
    8 , count(*) over () row_count
    9 from (
    10 select empno
    11 , ename
    12 from scott.emp
    13 ) main
    14 order by main.ename
    15 ) s1
    16 where rownum = :from_row
    19 ;

    Explained.

    SQL> select * from table(dbms_xplan.display);

    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    Plan hash value: 4140508541

    ------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 14 | 644 | 3 (0)| 00:00:01 |
    |* 1 | VIEW | | 14 | 644 | 3 (0)| 00:00:01 |
    |* 2 | COUNT STOPKEY | | | | | |
    | 3 | VIEW | | 14 | 462 | 3 (0)| 00:00:01 |
    | 4 | WINDOW SORT | | 14 | 140 | 3 (0)| 00:00:01 |
    | 5 | TABLE ACCESS FULL| EMP | 14 | 140 | 3 (0)| 00:00:01 |

    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------
    ------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    1 - filter("S2"."RNUM">=TO_NUMBER(:FROM_ROW))
    2 - filter(ROWNUM<=TO_NUMBER(:TO_ROW))

    A couple points here:

    “COUNT(*) OVER ()” is a short cut to avoid writing range between unbounded preceding and unbounded following. It also makes the order by unnecessary and makes it clear you simply wish a total count without any regard to a particular “window” of data.

    Using ROWNUM for the filtering rather than ROW_NUMBER gives Oracle the possibility of doing COUNT STOPKEY in the access plan. (I believe maybe the optimizer is sufficiently intelligent in recent versions that it may do the same thing with ROW_NUMBER – I am not sure 🙂

    But beware…

    Even though the ROWNUM (or ROW_NUMBER) pagination/filtering makes it possible not to FETCH all the data in order to know the total count of rows, the Oracle SQL engine still has to ACCESS all the rows in order to count them. In many cases it will still be expensive, but for example if your query has columns with expensive function calls then it is possible that the counting of the rows is relatively cheap compared with what you save because only the rows up to :TO_ROW will actually call the functions.

    So in general I personally would advise using this magic trick with caution. Don’t use it all over your application simply because your web developer says “it’s cool to be able to write page 1 of 2705432 pages” 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *