This is a discussion on Synchronized scans within the Pgsql Patches forums, part of the PostgreSQL category; --> Heikki Linnakangas <heikki@enterprisedb.com> writes: > I don't think anyone can reasonably expect to get the same ordering when > ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > I don't think anyone can reasonably expect to get the same ordering when > the same query issued twice in general, but within the same transaction > it wouldn't be that unreasonable. If we care about that, we could keep > track of starting locations per transaction, only do the synchronization > on the first scan in a transaction, and start subsequent scans from the > same page as the first one. I think the real problem here is that the first scan is leaving state behind that changes the behavior of the next scan. Which can have no positive benefit, since obviously the first scan is not still proceeding; the best you can hope for is that it's a no-op and the worst case is that it actively pessimizes things. Why doesn't the patch remove the shmem entry at scan termination? > I think the warning on LIMIT without ORDER BY is a good idea, regardless > of the synchronized scans patch. I seriously doubt that can be done in any way that doesn't both warn about perfectly-safe cases and fail to warn about other unsafe ones. Furthermore, it's not uncommon for people to do "SELECT * ... LIMIT 1" just to remind themselves of column names or whatever. Do we really want the thing to be so nannyish? I was envisioning simply a stronger warning in the SELECT reference page ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> I don't think anyone can reasonably expect to get the same ordering when >> the same query issued twice in general, but within the same transaction >> it wouldn't be that unreasonable. If we care about that, we could keep >> track of starting locations per transaction, only do the synchronization >> on the first scan in a transaction, and start subsequent scans from the >> same page as the first one. > > I think the real problem here is that the first scan is leaving state > behind that changes the behavior of the next scan. Which can have no > positive benefit, since obviously the first scan is not still > proceeding; the best you can hope for is that it's a no-op and the worst > case is that it actively pessimizes things. Why doesn't the patch > remove the shmem entry at scan termination? Because there's no reason why it should, and it would require a lot more bookkeeping. There can be many scanners on the same table, so we'd need to implement some kind of reference counting, which means having to reliably decrement the counter when a scan terminates. In any case if there actually is a concurrent scan, you'd still see different ordering. Removing the entry when a scan is over would just make it harder to trigger. >> I think the warning on LIMIT without ORDER BY is a good idea, regardless >> of the synchronized scans patch. > > I seriously doubt that can be done in any way that doesn't both warn > about perfectly-safe cases and fail to warn about other unsafe ones. > Furthermore, it's not uncommon for people to do "SELECT * ... LIMIT 1" > just to remind themselves of column names or whatever. Do we really > want the thing to be so nannyish? It really depends on how many false negatives and positives it gives. If too many, it's just annoying, but if it's reasonably accurate I think it would be OK to remind people running queries like that. > I was envisioning simply a stronger warning in the SELECT reference page ... I doubt the people that would be bitten by this read the SELECT reference page. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Jun 4, 2007, at 15:24 , Heikki Linnakangas wrote: > I don't think anyone can reasonably expect to get the same ordering > when the same query issued twice in general, but within the same > transaction it wouldn't be that unreasonable. The order rows are returned without an ORDER BY clause *is* implementation dependent, and is not guaranteed, at least by the spec. Granted, LIMIT without ORDER BY (and DISTINCT for that matter) brings this into sharp relief. > I think the warning on LIMIT without ORDER BY is a good idea, > regardless of the synchronized scans patch. I'm not saying this isn't a good idea, but are there other places where there might be gotchas for the unwary, such as DISTINCT without ORDER BY or (for an unrelated example) UNION versus UNION ALL? How many of these types of messages would be useful? Michael Glaesemann grzm seespotcode net ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On Mon, 2007-06-04 at 16:42 -0400, Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: > > I don't think anyone can reasonably expect to get the same ordering when > > the same query issued twice in general, but within the same transaction > > it wouldn't be that unreasonable. If we care about that, we could keep > > track of starting locations per transaction, only do the synchronization > > on the first scan in a transaction, and start subsequent scans from the > > same page as the first one. > > I think the real problem here is that the first scan is leaving state > behind that changes the behavior of the next scan. Which can have no > positive benefit, since obviously the first scan is not still > proceeding; the best you can hope for is that it's a no-op and the worst > case is that it actively pessimizes things. Why doesn't the patch > remove the shmem entry at scan termination? > Sounds like a reasonable idea to me. We could add the PID to the data structure so that it would only remove the hint if it's the one that set the hint. I think we'd just need to call a function to do that from heap_endscan(), correct? However, we couldn't 100% guarantee that the state would be cleared. A backend could be killed in the middle of a scan. The case you're worried about seems very narrow to me, and I think "actively pessimizes" is too strong. Unless I misunderstand, "the best you can hope for" no-op happens in all cases except a most bizarre one: that in which you're executing repeated identical LIMIT queries with no ORDER BY; and the tuples returned occupy more than 128K (16 pages is the reporting period) but fewer would be effective to cache; and the table in question is larger than the large table threshold. I'm just trying to add some perspective about what we're fixing, here. But it's fair to say that a scan should clear any state when it's done. Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Michael Glaesemann wrote: >> I think the warning on LIMIT without ORDER BY is a good idea, >> regardless of the synchronized scans patch. > > I'm not saying this isn't a good idea, but are there other places where > there might be gotchas for the unwary, such as DISTINCT without ORDER BY > or (for an unrelated example) UNION versus UNION ALL? How many of these > types of messages would be useful? LIMIT without ORDER BY is worse because it not only returns tuples in different order, but it can return different tuples altogether when you run it multiple times. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Mon, 2007-06-04 at 22:09 +0100, Heikki Linnakangas wrote: > > I think the real problem here is that the first scan is leaving state > > behind that changes the behavior of the next scan. Which can have no > > positive benefit, since obviously the first scan is not still > > proceeding; the best you can hope for is that it's a no-op and the worst > > case is that it actively pessimizes things. Why doesn't the patch > > remove the shmem entry at scan termination? > > Because there's no reason why it should, and it would require a lot more > bookkeeping. There can be many scanners on the same table, so we'd need > to implement some kind of reference counting, which means having to > reliably decrement the counter when a scan terminates. > That's what I thought at first, and why I didn't do it. Right now I'm thinking we could just add the PID to the hint, so that it would only remove its own hint. Would that work? It's still vulnerable to a backend being killed and the hint hanging around. However, the next scan would clear get it back to normal. Reference counting would cause weirdness if a backend died or something, because it would never decrement to 0. > In any case if there actually is a concurrent scan, you'd still see > different ordering. Removing the entry when a scan is over would just > make it harder to trigger. > Agreed. I don't know for sure whether that's good or bad, but it would make the nondeterminism less immediately visible. Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Jeff Davis wrote: > On Mon, 2007-06-04 at 22:09 +0100, Heikki Linnakangas wrote: >>> I think the real problem here is that the first scan is leaving state >>> behind that changes the behavior of the next scan. Which can have no >>> positive benefit, since obviously the first scan is not still >>> proceeding; the best you can hope for is that it's a no-op and the worst >>> case is that it actively pessimizes things. Why doesn't the patch >>> remove the shmem entry at scan termination? >> Because there's no reason why it should, and it would require a lot more >> bookkeeping. There can be many scanners on the same table, so we'd need >> to implement some kind of reference counting, which means having to >> reliably decrement the counter when a scan terminates. >> > > That's what I thought at first, and why I didn't do it. Right now I'm > thinking we could just add the PID to the hint, so that it would only > remove its own hint. Would that work? Were you thinking of storing the PID of the backend that originally created the hint, or updating the PID every time the hint is updated? In any case, we still wouldn't know if there's other scanners still running. We could just always remove the hint when a scan ends, and rely on the fact that if there's other scans still running they will put the hint back very quickly. There would then be a small window where there's no hint but a scan is active, and a new scan starting during that window would fail to synchronize with the other scanners. > It's still vulnerable to a backend being killed and the hint hanging > around. However, the next scan would clear get it back to normal. Oh, did you mean that the PID would be updated whenever a new scan starts? So that the PID stored would always be the PID of the latest scanner. That might work pretty well, though a small scan with a LIMIT, or any other situation where some scans run faster than others, might clear the hint prematurely while other scans are still running. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Mon, 2007-06-04 at 22:57 +0100, Heikki Linnakangas wrote: > > That's what I thought at first, and why I didn't do it. Right now I'm > > thinking we could just add the PID to the hint, so that it would only > > remove its own hint. Would that work? > > Were you thinking of storing the PID of the backend that originally > created the hint, or updating the PID every time the hint is updated? In > any case, we still wouldn't know if there's other scanners still running. > My thought was that every time the location was reported by a backend, it would store 3 pieces of information, not 2: * relfilenode * the PID of the backend that created or updated this particular hint last * the location Then, on heap_endscan() (if that's the right place), we find the hint, and if the PID matches, we remove it. If not, it does nothing. This would only matter when there weren't other scans. When concurrent scans were happening, chances are the PID wouldn't match anyway, and thus not be removed. Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Jun 4, 2007, at 16:34 , Heikki Linnakangas wrote: > LIMIT without ORDER BY is worse because it not only returns tuples > in different order, but it can return different tuples altogether > when you run it multiple times. Wouldn't DISTINCT ON suffer from the same issue without ORDER BY? Michael Glaesemann grzm seespotcode net ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| Jeff Davis <pgsql@j-davis.com> writes: > My thought was that every time the location was reported by a backend, > it would store 3 pieces of information, not 2: > * relfilenode > * the PID of the backend that created or updated this particular hint > last > * the location > Then, on heap_endscan() (if that's the right place), we find the hint, > and if the PID matches, we remove it. If not, it does nothing. > This would only matter when there weren't other scans. When concurrent > scans were happening, chances are the PID wouldn't match anyway, and > thus not be removed. But note that barring backend crash, once all the scans are done it is guaranteed that the hint will be removed --- somebody will be last to update the hint, and therefore will remove it when they do heap_endscan, even if others are not quite done. This is good in the sense that later-starting backends won't be fooled into starting at what is guaranteed to be the most pessimal spot, but it's got a downside too, which is that there will be windows where seqscans are in process but a newly started scan won't see them. Maybe that's a killer objection. When exactly is the hint updated? I gathered from something Heikki said that it's set after processing X amount of data, but I think it might be better to set it *before* processing X amount of data. That is, the hint means "I'm going to be scanning at least <threshold> blocks starting here", not "I have scanned <threshold> blocks ending here", which seems like the interpretation that's being used at the moment. What that would mean is that successive "LIMIT 1000" calls would in fact all start at the same place, barring interference from other backends. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |