This is a discussion on statement caching proof of concept within the pgsql Interfaces jdbc forums, part of the PostgreSQL category; --> This is just proof of concept. More work has to be done to make it build properly and work ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| This is just proof of concept. More work has to be done to make it build properly and work properly under different jdk's Couple of questions. 1) What to do if there are multiple concurrent requests per connection for the same statement? 1) we could just allow it 2) we could return a non-cacheable preparedstatement 3) throw an exception 2) Is it enough to cache prepared statements or should we cache statements too? Note: this work is based completely on apache's dbcp statement caching implementation and this will be noted in the final code. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Mon, 19 Jun 2006, Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it build > properly and work properly under different jdk's That's neat, but isn't statement caching something that belongs to the application server, not the JDBC driver? Also, it seems to me that the changes to AbstractJdbc2Connection and AbstractJdbc2Statement are completely unrelated to statement caching. > Note: this work is based completely on apache's dbcp statement caching > implementation and this will be noted in the final code. What kind of licensing issues does that bring us? - Heikki ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On 19-Jun-06, at 2:07 PM, Heikki Linnakangas wrote: > On Mon, 19 Jun 2006, Dave Cramer wrote: > >> This is just proof of concept. More work has to be done to make it >> build properly and work properly under different jdk's > > That's neat, but isn't statement caching something that belongs to > the application server, not the JDBC driver? There's an interesting question. However the way it has been done it can be very easily removed, or added. > > Also, it seems to me that the changes to AbstractJdbc2Connection > and AbstractJdbc2Statement are completely unrelated to statement > caching. > >> Note: this work is based completely on apache's dbcp statement >> caching implementation and this will be noted in the final code. > > What kind of licensing issues does that bring us? None, their license is more or less bsd. > > - Heikki > ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it > build properly and work properly under different jdk's Interesting. How is it intended to be used? Automatically for every prepared statement? How does this compare to other JDBC implementations? And, why would this be a part of the Postgres driver? It seems more like an add-on to JDBC drivers in general. I do use a pool for prepared statements myself, actually quite like the one you wrote. But it is part of my general database related code library. Don't get me wrong, if there is a clean way to boost the performance of the driver without allocating too many resources, i'd like to see it in the driver. But if it is complicated to use, or tries do to "clever" things behind my back, i'm not so sure. Maybe activate it via a parameter like preparedThreshold? > Couple of questions. > > 1) What to do if there are multiple concurrent requests per > connection for the same statement? > 1) we could just allow it > 2) we could return a non-cacheable preparedstatement > 3) throw an exception Should do whatever a normal prepared statement would do in this situation, imho. That is, if i have code that uses normal prepared statements, and is updated to use these cached statements, it should not fail in any new ways. Or either the different behaviour it must be very well documented. > 2) Is it enough to cache prepared statements or should we cache > statements too? Actually, if i don't use a prepared statement, i want just this: do it once, then forget about it. Though it could be nice to boost stupid code that doesn't know about prepared statements; which is not really a problem of the driver. > Note: this work is based completely on apache's dbcp statement > caching implementation and this will be noted in the final code. What does that mean? Till ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On 19-Jun-06, at 2:57 PM, till toenges wrote: > Dave Cramer wrote: >> This is just proof of concept. More work has to be done to make it >> build properly and work properly under different jdk's > > Interesting. How is it intended to be used? Automatically for every > prepared statement? How does this compare to other JDBC > implementations? Essentially yes, however this is all good feedback. The way it's implemented we could use it or not on a per connection basis. > > And, why would this be a part of the Postgres driver? It seems more > like > an add-on to JDBC drivers in general. I do use a pool for prepared > statements myself, actually quite like the one you wrote. But it is > part > of my general database related code library. Other drivers use statement caching namely Oracle. > > Don't get me wrong, if there is a clean way to boost the > performance of > the driver without allocating too many resources, i'd like to see > it in > the driver. But if it is complicated to use, or tries do to "clever" > things behind my back, i'm not so sure. Maybe activate it via a > parameter like preparedThreshold? All good feedback > >> Couple of questions. >> >> 1) What to do if there are multiple concurrent requests per >> connection for the same statement? >> 1) we could just allow it >> 2) we could return a non-cacheable preparedstatement >> 3) throw an exception > > Should do whatever a normal prepared statement would do in this > situation, imho. That is, if i have code that uses normal prepared > statements, and is updated to use these cached statements, it > should not > fail in any new ways. Or either the different behaviour it must be > very > well documented. I'd agree the driver should not fail. > >> 2) Is it enough to cache prepared statements or should we cache >> statements too? > > Actually, if i don't use a prepared statement, i want just this: do it > once, then forget about it. Though it could be nice to boost stupid > code > that doesn't know about prepared statements; which is not really a > problem of the driver. > >> Note: this work is based completely on apache's dbcp statement >> caching implementation and this will be noted in the final code. > > What does that mean? Nothing much, I am just giving credit where credit is due. I borrowed most of the concepts from them. Their license like ours allows this. Dave > > > Till > ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Hmmm. You know, an interesting alternative way to get the same kind of performance boost: cache just the handles to the server-side prepared statements, but not the Java PreparedStatement instances. I'm just suggesting the idea because it seems like it would be simpler to implement (just a pool and some ref-counting), and you wouldn't need to worry about any special concurrency issues, because you'd be dealing with distinct PreparedStatement object that just happen to refer to the same server-side statement. It seems like it would be less error-prone to only cache the expensive part, and leave the rest of the stateful stuff in PreparedStatement alone. But on the other hand, I'm not in a position to offer an alternative implementation, and working code trumps vaporware any day -- Mark Lewis On Mon, 2006-06-19 at 12:55 -0400, Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it > build properly and work properly under different jdk's > > Couple of questions. > > 1) What to do if there are multiple concurrent requests per > connection for the same statement? > 1) we could just allow it > 2) we could return a non-cacheable preparedstatement > 3) throw an exception > > 2) Is it enough to cache prepared statements or should we cache > statements too? > > Note: this work is based completely on apache's dbcp statement > caching implementation and this will be noted in the final code. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Dave Cramer wrote: > Other drivers use statement caching namely Oracle. Are there any special things you have to do or care about to use cached statements there? My experience with the oracle driver is limited. >>> 1) What to do if there are multiple concurrent requests per >>> connection for the same statement? >>> 1) we could just allow it >>> 2) we could return a non-cacheable preparedstatement >>> 3) throw an exception >> Should do whatever a normal prepared statement would do in this >> situation, imho. That is, if i have code that uses normal prepared >> statements, and is updated to use these cached statements, it >> should not >> fail in any new ways. Or either the different behaviour it must be >> very >> well documented. > > I'd agree the driver should not fail. The current situation allows to use a prepared statement from several threads at once, or more than once before reading the result in a single thread. There is no way to get all the results back reliably in this case. Since the JDBC specifications don't demand anything more (afaik), solution 1 seems ok. A slightly more advanced solution might be to create a map of lists of prepared statements, instead of a map of prepared statements. If i have a prepared statement "SELECT 1" and this is in high demand by hundreds of threads (silly example), it would be possible to have more than one prepared statement with the same sql. That would be more like solution 2, with better performance in concurrent applications. I like Mark Lewis idea of caching the handles, but i have no real knowledge about the postgres api, the overhead of creating a new prepared statement object (could easily offset the saved memory in a "heavy duty" application) and how well this would work; or how this would influence the specific problem of multiple concurrent requests. Certainly sounds a bit more difficult to implement. How does it combine with methods like get*MetaData()? And what happens with methods like setQueryTimeout()? They change the behaviour of the cached statement, and don't get reset between use. Especially interesting in the case of automatic caching of prepared statements, where the user expects a new, "clean" statement. How does the Oracle driver handle this, for example? Ok, just looked at the source again. The method clearWarnings() is called before returning the statement. There could be an extended clearCachedStatement() method, or something like this to reset everything neccessary. Till ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| till toenges wrote: > The current situation allows to use a prepared statement from several > threads at once, or more than once before reading the result in a single > thread. There is no way to get all the results back reliably in this > case. Since the JDBC specifications don't demand anything more (afaik), > solution 1 seems ok. Oversight on my part: If the caching is automatic, an application that originally used several prepared statements with the same sql and now gets only one from the cache would very probably misbehave. Therefore "just allowing it" is not an option. In fact, i have no precise idea how a statement would know that it is ready for reuse. An application could keep a reference to the statement and run it again and again. WeakReferences and ReferenceQueues could be used. Always interesting to play with the garbage collector ;-) Maybe the postgres api has a way to identify individual calls and then Mark Lewis idea of just caching the handle and creating individual prepared statement objects upon use is the right way to go. Till ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Dave Cramer wrote: > This is just proof of concept. More work has to be done to make it build > properly and work properly under different jdk's Isn't there a bunch of statement state (things like fetch size, max rows, etc) that have defined defaults and this cache implementation will not provide? The "wrapper" implementation approach suffers from the usual difficulty that the "back links" such as ResultSet.getStatement() point to the wrong object. It's actually quite a bit of work to get this right.. The cached statements are vulnerable to buggy apps that mutate the statement after close, since there's no interception of those methods to check whether the wrapper statement has been closed. What exactly is the performance bottleneck you're trying to avoid by having the statement pool? If it's the parse/plan cost, I think Mark's suggestion of putting the cache at the protocol level may be simpler. If it's overall statement cost, you might be better off with a generic wrapper that is not postgresql-specific at all. -O ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| On 19-Jun-06, at 7:15 PM, Oliver Jowett wrote: > Dave Cramer wrote: >> This is just proof of concept. More work has to be done to make it >> build properly and work properly under different jdk's > > Isn't there a bunch of statement state (things like fetch size, max > rows, etc) that have defined defaults and this cache implementation > will not provide? Good point, these would all have to be reset to their defaults as well. > > The "wrapper" implementation approach suffers from the usual > difficulty that the "back links" such as ResultSet.getStatement() > point to the wrong object. It's actually quite a bit of work to get > this right.. Since result sets only live as long as the statement, wouldn't they point to the statement that is still open ? > > The cached statements are vulnerable to buggy apps that mutate the > statement after close, since there's no interception of those > methods to check whether the wrapper statement has been closed. No question, and I would certainly not make this the default behaviour. The user would have to turn on caching. > > What exactly is the performance bottleneck you're trying to avoid > by having the statement pool? If it's the parse/plan cost, I think > Mark's suggestion of putting the cache at the protocol level may be > simpler. If it's overall statement cost, you might be better off > with a generic wrapper that is not postgresql-specific at all. How does the generic wrapper solve the problems above ? I would think they all suffer from the same problems ? > > -O > ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| Thread Tools | |
| Display Modes | |
|
|