vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi Tom, Tom Lane schreef: > On the surface this looks like a reasonable plan choice. If you like > you can try the other two basic types of join plan by turning off > enable_hashjoin, which will likely drive the planner to use a merge > join, and then also turn off enable_mergejoin to get a nested loop > (or if it thinks nested loop is second best, turn off enable_nestloop > to see the behavior with a merge join). The problem is that the query logically requests all records ( as in "select * from a join") from the database but actually displays (in practise) in 97% of the time the first 1000 records and at most the first 50.000 records 99.99999999999999% of the time by scrolling (using "page down) in the gui and an occasional "jump to record xxxx" through something called a locator) (both percentages tested!). If I do the same query with a "limit 60.000" or if I do a "set enable_seqscan = off" the query returns in 0.3 secs. Otherwise it lasts for 20 secs (which is too much for the user to wait for, given the circumstances). I cannot change the query (it is geneated by a tool called Clarion) but it something like (from the psqlodbc_xxx.log): "... declare SQL_CUR01 cursor for SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer; fetch 100 in SQL_CUR01; ...." PostgreSQL does the planning (and than executes accordingly) to the query and not the "fetch 100". Changing the query with a "limit whatever" prohibits scrolling after the size of the resultset. If Postgres should delay the planning of the actual query untill the fetch it could choose the quick solution. Another solution would be to "advise" PostgreSQL which index etc (whatever etc means ;-)) to use ( as in the mailing from Silke Trissl in the performance list on 09-02-05). > What's important in comparing different plan alternatives is the ratios > of estimated costs to actual elapsed times. If the planner is doing its > job well, those ratios should be similar across all the alternatives > (which implies of course that the cheapest-estimate plan is also the > cheapest in reality). If not, it may be appropriate to fool with the > planner's cost estimate parameters to try to line up estimates and > reality a bit better. I I really do a "select *" and display the result, the planner is right (tested with "set enable_seqscan = off" and "set enable_seqscan = on). Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: J.Kraaijeveld@Askesis.nl web: www.askesis.nl ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| Tom Lane wrote: > "Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes: > > I cannot change the query (it is geneated by a tool called Clarion) but it something like (from the psqlodbc_xxx.log): > > "... > > declare SQL_CUR01 cursor for > > SELECT A.ordernummer, B.klantnummer FROM "orders" A LEFT OUTER JOIN "klt_alg" B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer; > > fetch 100 in SQL_CUR01; > > ..." > > Well, the planner does put some emphasis on startup time when dealing > with a DECLARE CURSOR plan; the problem you face is just that that > correction isn't large enough. (From memory, I think it optimizes on > the assumption that 10% of the estimated rows will actually be fetched; > you evidently want a setting of 1% or even less.) Ouch. Is this really a reasonable assumption? I figured the primary use of a cursor was to fetch small amounts of data at a time from a large table, so 10% seems extremely high as an average fetch size. Or is the optimization based on the number of rows that will be fetched by the cursor during the cursor's lifetime (as opposed to in a single fetch)? Also, one has to ask what the consequences are of assuming a value too low versus too high. Which ends up being worse? > We once talked about setting up a GUC variable to control the percentage > of a cursor that is estimated to be fetched: > http://archives.postgresql.org/pgsql...0/msg01108.php > It never got done but that seems like the most reasonable solution to > me. Or keep some statistics on cursor usage, and adjust the value dynamically based on actual cursor queries (this might be easier said than done, of course). -- Kevin Brown kevin@sysexperts.com ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| Greg Stark wrote: > Kevin Brown <kevin@sysexperts.com> writes: > > >>Ouch. Is this really a reasonable assumption? I figured the primary >>use of a cursor was to fetch small amounts of data at a time from a >>large table, so 10% seems extremely high as an average fetch size. Or >>is the optimization based on the number of rows that will be fetched >>by the cursor during the cursor's lifetime (as opposed to in a single >>fetch)? >> >>Also, one has to ask what the consequences are of assuming a value too >>low versus too high. Which ends up being worse? > > > This is one of the things the planner really cannot know. Ultimately it's the > kind of thing for which hints really are necessary. Oracle distinguishes > between the "minimize total time" versus "minimize startup time" with > /*+ ALL_ROWS */ and /*+ FIRST_ROWS */ hints, for example. > > I would also find it reasonable to have hints to specify a selectivity for > expressions the optimizer has no hope of possibly being able to estimate. > Things like "WHERE myfunction(col1,col2,?) /*+ 10% */" > > Not to mention that hints would be helpful if you want to specify a particular index for a specific query (case in point, testing plans and response of various indices without having to drop and create other ones). This is a bit of functionality that I'd like to see. ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Kevin Brown <kevin@sysexperts.com> writes: > Ouch. Is this really a reasonable assumption? I figured the primary > use of a cursor was to fetch small amounts of data at a time from a > large table, so 10% seems extremely high as an average fetch size. Or > is the optimization based on the number of rows that will be fetched > by the cursor during the cursor's lifetime (as opposed to in a single > fetch)? > > Also, one has to ask what the consequences are of assuming a value too > low versus too high. Which ends up being worse? This is one of the things the planner really cannot know. Ultimately it's the kind of thing for which hints really are necessary. Oracle distinguishes between the "minimize total time" versus "minimize startup time" with /*+ ALL_ROWS */ and /*+ FIRST_ROWS */ hints, for example. I would also find it reasonable to have hints to specify a selectivity for expressions the optimizer has no hope of possibly being able to estimate. Things like "WHERE myfunction(col1,col2,?) /*+ 10% */" -- greg ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| ||||
| Greg Stark wrote: > > Kevin Brown <kevin@sysexperts.com> writes: > > Also, one has to ask what the consequences are of assuming a value too > > low versus too high. Which ends up being worse? > > This is one of the things the planner really cannot know. Ultimately it's the > kind of thing for which hints really are necessary. Oracle distinguishes > between the "minimize total time" versus "minimize startup time" with > /*+ ALL_ROWS */ and /*+ FIRST_ROWS */ hints, for example. Well, the planner *can* know the actual value to use in this case, or at least a close approximation, but the system would have to gather some information about cursors during fetches. At the very least, it will know how many rows were actually fetched by the cursor in question, and it will also hopefully know how many rows were returned by the query being executed. Store the ratio of the two in a list, then store the list itself into a table (or something) at backend exit time. -- Kevin Brown kevin@sysexperts.com ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |