Unix Technical Forum

Limit vs setMaxRows issue

This is a discussion on Limit vs setMaxRows issue within the pgsql Interfaces jdbc forums, part of the PostgreSQL category; --> Hi, When using the ps.setMaxRows() call on PreparedStatement, the jdbc driver sets the row limit via the "raw" postgres ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Interfaces jdbc

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-16-2008, 12:09 AM
Sebastiaan van Erk
 
Posts: n/a
Default Limit vs setMaxRows issue

Hi,

When using the ps.setMaxRows() call on PreparedStatement, the jdbc
driver sets the row limit via the "raw" postgres protocol. However, in
the words of Tom Lane, "the row limit in the protocol only says how many
rows to deliver in the first batch. The presumption is that you'll
eventually grab the rest, and so the query is planned on that basis."

What this means that when we do the following query:

select action_id from actions order by action_id

with a ps.setMaxRows(100), it takes about 1.8 seconds for the query to
complete. However, if we do the following query:

select action_id from actions order by action_id limit 100

without any ps.setMaxRows() the query only takes 0.156 seconds. This is
more than a factor of 10 faster.

I'm not 100% sure of what JDBC says about setMaxRows (it's kind of
ambiguous in the java doc, as usual), but as far as I can tell, if you
call setMaxRows on the prepared statement there is no way in to ever
retrieve more than that number of rows. If this is indeed the case, it
seems to me that currently there is a mismatch between the JDBC api and
the postgresql api, and JDBC should somehow tell postgres that this is a
hard limit and it should not plan for a second batch.

Therefore, my question is: is this a bug? It is not feasable for me to
add LIMIT clauses to all the SQL queries in my code, so if this IS a
bug, I hope it can be fixed. If it is NOT a bug, is there an alternative
workaround that does not involve changing all of my sql statements?

Thanks in advance,
Sebastiaan



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-16-2008, 12:09 AM
Dave Cramer
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

Sebastiaan,

I believe the setMaxRows will use a cursor, because you have an order
by on the cursor it will have to be fully materialized

Try it without the order by

Dave
On 21-Jun-06, at 5:11 AM, Sebastiaan van Erk wrote:

> Hi,
>
> When using the ps.setMaxRows() call on PreparedStatement, the jdbc
> driver sets the row limit via the "raw" postgres protocol. However,
> in the words of Tom Lane, "the row limit in the protocol only says
> how many rows to deliver in the first batch. The presumption is
> that you'll eventually grab the rest, and so the query is planned
> on that basis."
>
> What this means that when we do the following query:
>
> select action_id from actions order by action_id
>
> with a ps.setMaxRows(100), it takes about 1.8 seconds for the query
> to complete. However, if we do the following query:
>
> select action_id from actions order by action_id limit 100
>
> without any ps.setMaxRows() the query only takes 0.156 seconds.
> This is more than a factor of 10 faster.
>
> I'm not 100% sure of what JDBC says about setMaxRows (it's kind of
> ambiguous in the java doc, as usual), but as far as I can tell, if
> you call setMaxRows on the prepared statement there is no way in to
> ever retrieve more than that number of rows. If this is indeed the
> case, it seems to me that currently there is a mismatch between the
> JDBC api and the postgresql api, and JDBC should somehow tell
> postgres that this is a hard limit and it should not plan for a
> second batch.
>
> Therefore, my question is: is this a bug? It is not feasable for me
> to add LIMIT clauses to all the SQL queries in my code, so if this
> IS a bug, I hope it can be fixed. If it is NOT a bug, is there an
> alternative workaround that does not involve changing all of my sql
> statements?
>
> Thanks in advance,
> Sebastiaan
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-16-2008, 12:09 AM
Sebastiaan van Erk
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

Hi,

I'm sorry, but I don't really know what you mean with setMaxRows using a
cursor.

Note that the *same query* (including the order by) is fast when it
contains the LIMIT 100 sql appended to it. So this query *also* does an
order by. The only difference between the queries is that one uses the
setMaxRows() call to limit the number of rows in the result set, and the
other uses SQL.

Greetings,
Sebastiaan


Dave Cramer wrote:
> Sebastiaan,
>
> I believe the setMaxRows will use a cursor, because you have an order
> by on the cursor it will have to be fully materialized
>
> Try it without the order by
>
> Dave
> On 21-Jun-06, at 5:11 AM, Sebastiaan van Erk wrote:
>
>> Hi,
>>
>> When using the ps.setMaxRows() call on PreparedStatement, the jdbc
>> driver sets the row limit via the "raw" postgres protocol. However,
>> in the words of Tom Lane, "the row limit in the protocol only says
>> how many rows to deliver in the first batch. The presumption is that
>> you'll eventually grab the rest, and so the query is planned on that
>> basis."
>>
>> What this means that when we do the following query:
>>
>> select action_id from actions order by action_id
>>
>> with a ps.setMaxRows(100), it takes about 1.8 seconds for the query
>> to complete. However, if we do the following query:
>>
>> select action_id from actions order by action_id limit 100
>>
>> without any ps.setMaxRows() the query only takes 0.156 seconds. This
>> is more than a factor of 10 faster.
>>
>> I'm not 100% sure of what JDBC says about setMaxRows (it's kind of
>> ambiguous in the java doc, as usual), but as far as I can tell, if
>> you call setMaxRows on the prepared statement there is no way in to
>> ever retrieve more than that number of rows. If this is indeed the
>> case, it seems to me that currently there is a mismatch between the
>> JDBC api and the postgresql api, and JDBC should somehow tell
>> postgres that this is a hard limit and it should not plan for a
>> second batch.
>>
>> Therefore, my question is: is this a bug? It is not feasable for me
>> to add LIMIT clauses to all the SQL queries in my code, so if this IS
>> a bug, I hope it can be fixed. If it is NOT a bug, is there an
>> alternative workaround that does not involve changing all of my sql
>> statements?
>>
>> Thanks in advance,
>> Sebastiaan
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster
>>

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-16-2008, 12:09 AM
A.M.
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

On Wed, June 21, 2006 12:00 pm, Kris Jurka wrote:

>
> I'm not sure how you would like the driver to tell the server that it
> doesn't want more than setMaxRows rows. The defined API for this is using
> LIMIT in your sql query. The driver cannot do this for you (at least
> without parsing your query) because the query may already have a LIMIT or
> it may be something like an INSERT into a VIEW that has a DO INSTEAD
> SELECT rule on it. If you're suggesting that we extended the
> frontend/backend protocol to include this extra information than that's
> definitely a feature request, not a bug report.


The backend protocol already supports maximum row limit if you use the
extended protocol.

I guess it's a legacy thing in the driver, but there really is no reason
to use the simple query protocol at all on recent postgresqls.

-M


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-16-2008, 12:09 AM
Tom Lane
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

"A.M." <agentm@themactionfaction.com> writes:
> On Wed, June 21, 2006 12:00 pm, Kris Jurka wrote:
>> If you're suggesting that we extended the
>> frontend/backend protocol to include this extra information than that's
>> definitely a feature request, not a bug report.


> The backend protocol already supports maximum row limit if you use the
> extended protocol.


No, it would take a protocol change to add such a thing out-of-line
(that is, not as a LIMIT clause in the query text). The reason is that
the planning is done at PARSE time, or at the latest BIND time. The
row limit field in the EXECUTE message comes far too late to affect the
query plan. EXECUTE's row limit was not meant as anything except a way
to fetch a query result in segments, avoiding the grab-it-all-at-once,
run-out-of-memory syndrome. It is definitely *not* meant to imply that
the client doesn't intend to fetch the whole query result eventually.

I don't have a lot of sympathy for the OP's position that he shouldn't
have to use a LIMIT clause for this ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-16-2008, 12:09 AM
Oliver Jowett
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

Sebastiaan van Erk wrote:

> Therefore, my question is: is this a bug? It is not feasable for me to
> add LIMIT clauses to all the SQL queries in my code, so if this IS a
> bug, I hope it can be fixed. If it is NOT a bug, is there an alternative
> workaround that does not involve changing all of my sql statements?


It's not a bug. setMaxRows() is essentially a hint, there's certainly no
requirement that the driver will go off and add LIMIT clauses to your
query, the minimal implementation won't change query execution at all
and will just limit rows coming back out of the ResultSet.. It might be
nice to add LIMIT but that would require the driver to parse query
strings which gets very complicated and isn't going to catch all the
cases anyway. You'll be getting at least some improvement with the
existing driver because the whole resultset isn't being transferred and
processed, even if the plan is still assuming you will grab all the data.

If your queries need a LIMIT clause to get decent performance then your
safest bet is to add a LIMIT clause yourself. You can keep the
setMaxRows() as well if you like..

-O

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-16-2008, 12:09 AM
Sebastiaan van Erk
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

Hi,

Thanks for the helpful replies (thanks also to Oliver Jowett).

As it seems to me to be the case that a setMaxRows call actually limits
the data you can access to that specific number of rows and it is
impossible to ever get more rows, it seems to me to be a waste of time
and a loss of performance if the backend does not know this and prepares
the result as if everything will (eventually) be returned.

I am not suggesting that the driver parse queries and add a LIMIT clause
itself. This would make the driver exceedingly complex, it would
duplicate logic in the driver that is already in postgres itself and
cause an extra maintenance nightmare. Furthermore it would probably
introduce many new bugs, cause lots of work, and all for functionality
(i.e., limit the resultset to n rows) that *already exists* in postgres
itself.

So I guess it is indeed a feature request then; that the backend
protocol supports limiting the resultset without having to alter the
query, and that this limit is indeed a hard limit [i will never ask for
more rows] (instead of a soft limit [i might ask for the other rows]).
Considering how all the *functionality* at least is already implemented,
this should not be too much work, I imagine.

The reason I would like to see this feature (instead of adding the LIMIT
manually) is for cross database compatibility (which is the essence of
JDBC). Basically, setMaxRows is portable, LIMIT is not. Since I am part
of a team developing a cross-database application in which performance
is often important, this feature is quite important to us. Currently
postgres is slow for us on simple index queries on large data sets (1.8
seconds for the first 100 primary keys only of a table of 43000 rows);
and unfortunately, these kinds of queries are very common in our
application.

Regards,
Sebastiaan

Kris Jurka wrote:
>
>
> On Wed, 21 Jun 2006, Sebastiaan van Erk wrote:
>
>> I'm not 100% sure of what JDBC says about setMaxRows (it's kind of
>> ambiguous in the java doc, as usual), but as far as I can tell, if
>> you call setMaxRows on the prepared statement there is no way in to
>> ever retrieve more than that number of rows. If this is indeed the
>> case, it seems to me that currently there is a mismatch between the
>> JDBC api and the postgresql api, and JDBC should somehow tell
>> postgres that this is a hard limit and it should not plan for a
>> second batch.
>>
>> Therefore, my question is: is this a bug? It is not feasable for me
>> to add LIMIT clauses to all the SQL queries in my code, so if this IS
>> a bug, I hope it can be fixed. If it is NOT a bug, is there an
>> alternative workaround that does not involve changing all of my sql
>> statements?
>>

>
> I'm not sure how you would like the driver to tell the server that it
> doesn't want more than setMaxRows rows. The defined API for this is
> using LIMIT in your sql query. The driver cannot do this for you (at
> least without parsing your query) because the query may already have a
> LIMIT or it may be something like an INSERT into a VIEW that has a DO
> INSTEAD SELECT rule on it. If you're suggesting that we extended the
> frontend/backend protocol to include this extra information than
> that's definitely a feature request, not a bug report.
>
> Kris Jurka


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-16-2008, 12:09 AM
Mark Lewis
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

> The reason I would like to see this feature (instead of adding the LIMIT
> manually) is for cross database compatibility (which is the essence of
> JDBC). Basically, setMaxRows is portable, LIMIT is not. Since I am part
> of a team developing a cross-database application in which performance
> is often important, this feature is quite important to us. Currently
> postgres is slow for us on simple index queries on large data sets (1.8
> seconds for the first 100 primary keys only of a table of 43000 rows);
> and unfortunately, these kinds of queries are very common in our
> application.


JDBC is a little too low-level to give true database independence; you
can write portable queries, but you're severely restricted when it comes
to functionality supported by most databases but not in a standardized
way, such as limits, locking, performance hinting, sequences/serials,
etc.

For simple, non-performance critical apps you can mostly get away with
it (as we did for a while with some of our products). But for anything
more sophisticated, your application really needs a way to deal with
database-specific SQL.

On newer projects we use Hibernate HQL, which has been a major boon in
terms of database portability and performance.

-- Mark Lewis

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-16-2008, 12:09 AM
David Wall
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

What about the setFetchSize method? My impression is that using
setFetchSize along with setMaxRows would do the trick on the back end as
it resulted in the cursor mechanism that didn't retrieve the complete
result set. Is that not the case in PG 8.1 (at least with the few
caveats listed at http://jdbc.postgresql.org/documenta...1/query.html)?

David


> It's not a bug. setMaxRows() is essentially a hint, there's certainly
> no requirement that the driver will go off and add LIMIT clauses to
> your query, the minimal implementation won't change query execution at
> all and will just limit rows coming back out of the ResultSet.. It
> might be nice to add LIMIT but that would require the driver to parse
> query strings which gets very complicated and isn't going to catch all
> the cases anyway. You'll be getting at least some improvement with the
> existing driver because the whole resultset isn't being transferred
> and processed, even if the plan is still assuming you will grab all
> the data.
>
> If your queries need a LIMIT clause to get decent performance then
> your safest bet is to add a LIMIT clause yourself. You can keep the
> setMaxRows() as well if you like..




---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-16-2008, 12:09 AM
Sebastiaan van Erk
 
Posts: n/a
Default Re: Limit vs setMaxRows issue

Hi,

We actually do use a part of hibernate (the dialects) with some custom
additions, especially for creating tables, indexes, etc. However, since
the user of the application creates the data model (and changes it
runtime), and since the user is actually working in a tables and
columns paradigm, we cannot really use the ORM or HQL parts of hibernate.

Anyway, apart from the joys (i.e. difficulties) in writing portable SQL
(oh what fun ;-)), I still think it would be a nice improvement to the
JDBC driver is setMaxRows did actually hard limit the number of returned
rows, considering it hardly make sense in this case for postgres to
prepare to return more rows than specified.

I'm willing to help or even code it if someone points me in the right
direction. The JDBC part should be easy; I'm not sure about the postgres
side, but I imagine it's not too difficult either. The hard part is
probably getting the protocol extended and getting people to agree that
it's a good idea to change the protocol, etc... ;-) (although it's a
pretty non-intrusive, backwards compatible change).

Regards,
Sebastiaan

Mark Lewis wrote:
>> The reason I would like to see this feature (instead of adding the LIMIT
>> manually) is for cross database compatibility (which is the essence of
>> JDBC). Basically, setMaxRows is portable, LIMIT is not. Since I am part
>> of a team developing a cross-database application in which performance
>> is often important, this feature is quite important to us. Currently
>> postgres is slow for us on simple index queries on large data sets (1.8
>> seconds for the first 100 primary keys only of a table of 43000 rows);
>> and unfortunately, these kinds of queries are very common in our
>> application.
>>

>
> JDBC is a little too low-level to give true database independence; you
> can write portable queries, but you're severely restricted when it comes
> to functionality supported by most databases but not in a standardized
> way, such as limits, locking, performance hinting, sequences/serials,
> etc.
>
> For simple, non-performance critical apps you can mostly get away with
> it (as we did for a while with some of our products). But for anything
> more sophisticated, your application really needs a way to deal with
> database-specific SQL.
>
> On newer projects we use Hibernate HQL, which has been a major boon in
> terms of database portability and performance.
>
> -- Mark Lewis
>
>
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:13 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com