vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > The problem I try to solve is something along: a bunch of clients try to update a count, and ONE of them must initialize > the count if it does not exist... this can't be done with current infrastructure without race conditions. The solution without merge but a unique key in other db's is: update if no rows updated insert if duplicate key update if no rows updated goto insert note, that the counter updates need to be of the form set x = x + ? where key=y do you see a potential race condition with this ? In pg you also need a savepoint before the insert for this to work. Depending on the ratio of insert vs update we also start with insert when the insert succeeds more that 50% (I would use a higher percentage with pg though): insert if duplicate key update if no rows updated goto insert Andreas ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Yes, these algorithms are clear to me, but they don't work for batch updates in postgres without savepoints before each row insert/update, which is not good for performance (not to mention on older postgres versions without savepoint support it won't work at all). If there is a way of no race condition, no performance penalty, that would be something new and useful. I just guess the MERGE would provide that. Cheers, Csaba. On Thu, 2005-11-17 at 12:34, Zeugswetter Andreas DCP SD wrote: > > The problem I try to solve is something along: a bunch of clients try > to update a count, and ONE of them must initialize > the count if it > does not exist... this can't be done with current infrastructure without > race conditions. > > The solution without merge but a unique key in other db's is: > > update > if no rows updated > insert > if duplicate key > update > if no rows updated goto insert > > note, that the counter updates need to be of the form set x = x + ? > where key=y > do you see a potential race condition with this ? > In pg you also need a savepoint before the insert for this to work. > > Depending on the ratio of insert vs update we also start with insert > when > the insert succeeds more that 50% (I would use a higher percentage with > pg though): > > insert > if duplicate key > update > if no rows updated goto insert > > Andreas ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote: > Yes, these algorithms are clear to me, but they don't work for batch > updates in postgres without savepoints before each row insert/update, > which is not good for performance (not to mention on older postgres > versions without savepoint support it won't work at all). If there is a > way of no race condition, no performance penalty, that would be > something new and useful. I just guess the MERGE would provide that. Well, then you guess wrong. This isn't what MERGE is for. MERGE is just a neat way of specifying the UPDATE and INSERT cases in the same statement. It doesn't remove the possibility duplicate inserts and thus primary key violations. If someone wants to make extensions to MERGE so that it can avoid the race condition and avoid the duplicate key violations, that's fine. But be aware that this is outside of the spec. It may be a useful addition, but perhaps we should consider MERGE and REPLACE as completely seperate targets. MERGE has a whole join construction with subqueries that would be a pain to make work in a way that is truly serialisable. REPLACE deals with only one row and tries to solve the race for that case only. Much easier to consider them seperately, no? I guess what's really irritating is that this clearly exposes the case listed in the docs as "Why SERIALIZABLE isn't in all cases". If we could solve that for MERGE, we could probably solve it in the general case too. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQFDfHhxIB7bNG8LQkwRAhIwAJwPmzE2GHrqzPujkkj2I5 r6OlVo5QCeN4st Ka50Vh0AnXuj4pBt27V6j7I= =7rb7 -----END PGP SIGNATURE----- |
| |||
| OK, in this case I don't care about either MERGE or REPLACE, but for an UPSERT which does the locking :-) Cheers, Csaba. On Thu, 2005-11-17 at 13:32, Martijn van Oosterhout wrote: > On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote: > > Yes, these algorithms are clear to me, but they don't work for batch > > updates in postgres without savepoints before each row insert/update, > > which is not good for performance (not to mention on older postgres > > versions without savepoint support it won't work at all). If there is a > > way of no race condition, no performance penalty, that would be > > something new and useful. I just guess the MERGE would provide that. > > Well, then you guess wrong. This isn't what MERGE is for. MERGE is just > a neat way of specifying the UPDATE and INSERT cases in the same > statement. It doesn't remove the possibility duplicate inserts and thus > primary key violations. > > If someone wants to make extensions to MERGE so that it can avoid the > race condition and avoid the duplicate key violations, that's fine. But > be aware that this is outside of the spec. It may be a useful addition, > but perhaps we should consider MERGE and REPLACE as completely seperate > targets. > > MERGE has a whole join construction with subqueries that would be a > pain to make work in a way that is truly serialisable. REPLACE deals > with only one row and tries to solve the race for that case only. Much > easier to consider them seperately, no? > > I guess what's really irritating is that this clearly exposes the case > listed in the docs as "Why SERIALIZABLE isn't in all cases". If we > could solve that for MERGE, we could probably solve it in the general > case too. > > Have a nice day, ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Csaba Nagy <nagy@ecircle-ag.com> writes: > OK, in this case I don't care about either MERGE or REPLACE, but for an > UPSERT which does the locking :-) This is exactly the point --- pretty much nobody has come to us and asked for a feature that does what Peter and Martijn say MERGE does. (I haven't bothered to look at the 2003 spec, I'm assuming they read it correctly.) What we *have* been asked for, over and over, is an insert-or-update feature that's not so tedious and inefficient as the savepoint-insert-rollback-update kluge. That's what we ought to be concentrating on providing. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| * Tom Lane (tgl@sss.pgh.pa.us) wrote: > This is exactly the point --- pretty much nobody has come to us and > asked for a feature that does what Peter and Martijn say MERGE does. > (I haven't bothered to look at the 2003 spec, I'm assuming they read it > correctly.) What we *have* been asked for, over and over, is an > insert-or-update feature that's not so tedious and inefficient as the > savepoint-insert-rollback-update kluge. That's what we ought to be > concentrating on providing. I guess to be clear on what this distinction actually is, specifically: MERGE under SQL2003 doesn't appear to be intended to be used concurrently. For data warehousing situations this can be just fine such as in my case where I get a monthly update of some information and need to merge that update in with the prior information. In this case there's only one MERGE running and I'd hope it'd be faster than doing check for existance, insert/update on each row in plpgsql or something (since there'd be multiple index lookups, etc, I think). Concurrent MERGEs running *can* fail, just like whole transactions which do the check/insert/update can fail. REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a transaction which is supposed to not fail but instead do locking to ensure that it doesn't fail. This requires predicate locking to be efficient because you want to tell the concurrent transaction "if you have the same key as me, just wait a second and you can do an update 'cause I'm going to create the key if it doesn't exist before I'm done". I think REPLACE/INSERT ON DUPLICATE UPDATE is definitely harder to do than MERGE because of the idea that it isn't supposed to fail generally. I think SQL2003 MERGE would be reasonably easy to do and to get the efficiency benefits out of it (assuming there are some to be had in the end). I don't think MERGE can really be made to be both though, in which case it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON DUPLICATE UPDATE something else. Perhaps a special form of MERGE where you know it's going to be doing that locking. I really don't like the idea of making the SQL2003 version of MERGE be the MERGE special case (by requiring someone to take a table lock ahead of time or do something else odd). Thanks, Stephen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQFDfJ6SrzgMPqB3kigRAjXWAJ9R/50PoocURxvi74g7dwhIO4akgQCcDEDG 4hGZAVR/9Age8pFtEOp4kfo= =F91e -----END PGP SIGNATURE----- |
| |||
| On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: > REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a > transaction which is supposed to not fail but instead do locking to > ensure that it doesn't fail. This requires predicate locking to be > efficient because you want to tell the concurrent transaction "if you > have the same key as me, just wait a second and you can do an update > 'cause I'm going to create the key if it doesn't exist before I'm done". Is the requirement for predicate locking, over and above a unique constraint on an index that involves the record key, to deal with the scenario of two inserts executing at the same time, both before commit? Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ .. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Unless you have a table lock, INSERT has to be before UPDATE, think UPDATE, UPDATE (both fail), INSERT, INSERT. --------------------------------------------------------------------------- Zeugswetter Andreas DCP SD wrote: > > The problem I try to solve is something along: a bunch of clients try > to update a count, and ONE of them must initialize > the count if it > does not exist... this can't be done with current infrastructure without > race conditions. > > The solution without merge but a unique key in other db's is: > > update > if no rows updated > insert > if duplicate key > update > if no rows updated goto insert > > note, that the counter updates need to be of the form set x = x + ? > where key=y > do you see a potential race condition with this ? > In pg you also need a savepoint before the insert for this to work. > > Depending on the ratio of insert vs update we also start with insert > when > the insert succeeds more that 50% (I would use a higher percentage with > pg though): > > insert > if duplicate key > update > if no rows updated goto insert > > Andreas > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| mark@mark.mielke.cc wrote: > On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: > > REPLACE/INSERT ON DUPLICATE UPDATE appears to essentially be a > > transaction which is supposed to not fail but instead do locking to > > ensure that it doesn't fail. This requires predicate locking to be > > efficient because you want to tell the concurrent transaction "if you > > have the same key as me, just wait a second and you can do an update > > 'cause I'm going to create the key if it doesn't exist before I'm done". > > Is the requirement for predicate locking, over and above a unique > constraint on an index that involves the record key, to deal with > the scenario of two inserts executing at the same time, both before > commit? No. If you have a primary key you can easily prevent duplicates. You need a table lock or predicate locking to prevent duplicates if you do not have a primary key. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| Tom Lane wrote: > Csaba Nagy <nagy@ecircle-ag.com> writes: > > OK, in this case I don't care about either MERGE or REPLACE, but for an > > UPSERT which does the locking :-) > > This is exactly the point --- pretty much nobody has come to us and > asked for a feature that does what Peter and Martijn say MERGE does. > (I haven't bothered to look at the 2003 spec, I'm assuming they read it > correctly.) What we *have* been asked for, over and over, is an > insert-or-update feature that's not so tedious and inefficient as the > savepoint-insert-rollback-update kluge. That's what we ought to be > concentrating on providing. I am confused over the various options. I have heard these syntaxes: SQL2003 MERGE MySQL REPLACE http://dev.mysql.com/doc/refman/5.1/en/replace.html MySQL INSERT VIOLATION ... UPSERT So it seems MERGE does not have the use-case we most need, though it can be bent to do it. (Given their MATCH syntax, it doesn't seem there is any logic that it tries INSERT first). Looking at the MySQL URL above, REPLACE has three possible syntaxes with normal (DELETE), SET (UPDATE), and SELECT. Is this the direction we need to go? I don't like INSERT ... VIOLATION because I would like a new keyword for this. Is UPSERT the same as REPLACE? Should we use UPSERT instead? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |