[UPDATED]Pagination on DB results

Recently at work I faced this classic problem:

The dataset in DB is too large to be fetch by your java app without a OutOfMemoryException

This is what pagination comes into place –  load results page by page.

in case of complicated where clause, of course it is slower

The solution is clear, it is about implementation.

I am a lazy developer who used to pagination API like that provided by Hibernate, thus I expected the answer to be simpler.

While I am using  Oracle 11g with Spring’s jdbctemplate, the methods details seems quite scattered. Thus here I formulated the generalized answer:

Besides Hibernate pagination, we can use SQL pagination / JDBC pagination

SQL pagination

There are two basic approaches:

  • operating on piecemeal result set (New Query for Each Page)
  • operating on full result set

The way to do it is SQL specific

For MySQL / many other SQLs it can be done with limit and offset

Postgresql: http://microjet.ath.cx/WebWiki/ResultPaginationWithPostgresql.html

In Oracle, it use the same form as to handle “Top-N query” e.g. who are the 5 highest paid employee, which is optimized

[sourcecode language="sql"]

select *   from ( select a.*, rownum rnum

from ( YOUR_QUERY_GOES_HERE — including the order by ) a

where rownum <= MAX_ROWS )

where rnum >= MIN_ROWS

[/sourcecode]

Here is a very detailed explanation on ROW-NUM

Similar SO Thread

JDBC pagination

The question comes into mind is: when I execute the SQL, how is the result being loaded? Immediately or on request?  same as this SO thread

First we need to understand some basics of JDBC, as from Oracle

Per javadoc: statement.execute()

execute: Returns true if the first object that the query returns is a ResultSet object. Use this method if the query could return one or more ResultSet objects. Retrieve the ResultSet objects returned from the query by repeatedly calling Statement.getResutSet.

We access data in Resultset via a cursor. Note this cursor is different from that of DB while it is a pointer initially positioned before the first row of data.

The data is fetch on request. while when you do the execute() you are fetching for the first time.

Then, how many data is loaded? It is configurable.

One can use the java API setFetchSize() method on ResultSet to  control how many rows are fetched from DB a time by the driver,  how big the blocks it retrieves at once.

For example assume the total result is 1000.  If fetch size is 100, fetching the 1st row will load 100 rows from DB and 2nd to 100th row will be loaded from local memory.to query 101st row another 100 rows will be load into memory.

From JavaDoc

Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects genrated by this Statement. If the value specified is zero, then the hint is ignored. The default value is zero.

Note the word “hint” – it can be override by driver specific implementation.

This is also what the “Limit Rows to 100″ feature in client like SQL developer based on.

Complete the whole solution, to scroll results, one need to consider the ResultSet Types and ScrollableCursor in API

One can find an example implementation from this post in oracle

which is from the book Oracle Toplink Developer’s Guide

Example 112 JDBC Driver Fetch Size


ReadAllQuery query = new ReadAllQuery();

query.setReferenceClass(Employee.class);

query.setSelectionCriteria(new ExpressionBuilder.get("id").greaterThan(100));

// Set the JDBC fetch size

query.setFetchSize(50);

// Configure the query to return results as a ScrollableCursor

query.useScrollableCursor();

// Execute the query

ScrollableCursor cursor = (ScrollableCursor) session.executeQuery(query);

// Iterate over the results

while (cursor.hasNext()) {

System.out.println(cursor.next().toString());

}

cursor.close();

.....................

After all, the questions boil to

Which is the better way to do pagination?

Note the SQL should be ORDER by to make sense in the SQL approach,

Otherwise it is possible to show some rows again in next page.

It seems they are not mutual exclusive. When one use JDBC to execution paging SQL, he also needs to consider the fetch size. (I wonder is it setting the fetch size equals paging size as optimal, or is the SQL pagination meaningless?)

Below is some points from Postgresql’s documentation  on JDBC Driver  and other SO answers

 

The major difference is for the JDBC way, it is required to hold the connection during the fetching. This may not be suitable in stateless web application, for example.

 For SQL way

  • the syntax is SQL specific and may not be easy to maintain.

For JDBC way

  • The connection to the server must be using the V3 protocol. This is the default for (and is only supported by) server versions 7.4 and later.
  • The Connection must not be in autocommit mode. The backend closes cursors at the end of transactions, so in autocommit mode the backend will have closed the cursor before anything can be fetched from it.
  • The Statement must be created with a ResultSet type of ResultSet.TYPE_FORWARD_ONLY. This is the default, so no code will need to be rewritten to take advantage of this, but it also means that you cannot scroll backwards or otherwise jump around in the ResultSet.
  • The query given must be a single statement, not multiple statements strung together with semicolons.

 

[UPDATED]Memory consumption for FetchSize

Some points to note after prd testing: To my surprise, I found that the JDBC code will allocate memory according to the fetch size.  It will create lots of object in JDBC no matter there is such record or not. e.g. setFetchSize(10000) will pre-create 10000 object requried according the the schema (more memory if fields in DB is longer).

Thus it is advised either predetermine the record# by a count sql, or to increase the fetchSize in a greedy manner (not possible for the method queryForList, but possible if you use springs’ queryForRowSet.  related SO thread. You check whether the cursor size exceed the threshold, the increase the fetch size. finer multi level control is also possible. The trade off here is the threshold and the number of turnaround request needed. (a simple large paging size is fastest)

e.g. do this check

while(resultSet.next()){

if(level ==0 && rowNum>level1Threshold){

resultSet.setFetchSize(level2Threshold);

level=1;

} else if (level==1 && rowNum>level2Threshold){

resultSe.setFetchSize(maxPagingSize);

}

//do sth

}

 

 

More optimization tested

Using Spring ‘s spring’s resultset wrapper curosr ResultSetWrappingSqlRowSet  may be faster and easier, since queryForList create objects in both jdbc and the List you get, while this method let you control the cursor to loop the result, create objects on demand for the list you get (for jdbc the memory hit still applies)

speed improvement of setFetchSize is, of course, more relevant to actual getting objects, not for looping only  ResultSetWrappingSqlRowSet

i.e. faster for

    (resultSet.next()){
        results.add(this.rowMapper.mapRow(this.resultSet.getResultSet(), rowNum++));
        }

but not

    (resultSet.next()){
    //do nothing
    }


Some Further reading

This post is about performance tuning with optical fetch size

http://webmoli.com/2009/02/01/jdbc-performance-tuning-with-optimal-fetch-size/

http://blog.lishman.com/2008/03/jdbc-fetch-size.html

About these ads

One thought on “[UPDATED]Pagination on DB results

  1. Kleine Blase

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s