This is a discussion on Re: MERGE vs REPLACE within the pgsql Hackers forums, part of the PostgreSQL category; --> On Thu, 17 Nov 2005, Bruce Momjian wrote: > Unless you have a table lock, INSERT has to be ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Thu, 17 Nov 2005, Bruce Momjian wrote: > Unless you have a table lock, INSERT has to be before UPDATE, think > UPDATE, UPDATE (both fail), INSERT, INSERT. No matter what operation you start with you need a loop that try insert/update until one of them succeed like in this example: http://www.postgresql.org/docs/8.1/s...UPSERT-EXAMPLE Without a loop you might not get to execute neither the insert nor the update. Why? Think about this example: BEGIN INSERT <- fail because there is a row already <- before we manage to do the update someone delete the row (which we can see in the default transaction isolation level) UPDATE <- fail because there is no row so we will loop and try the insert again <- before we manage to do the insert someone else does an insert INSERT <- fail because there is a row already <- before we manage to do the update someone delete the row ..... You might need to loop any number of times before you manage to perform one of the two operations. Which operation you should start with depends on which of the two cases is the common one. -- /Dennis Björklund ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Oh, good point. I was thinking just about concurrent MERGEs. However, it is more complicated than that. By definitaion you can not see changes from other transactions while your statement is being run (even if you increment CommandCounter), so to be atomic, you would still see the row even though some other transaction had deleted it. I think we avoid that now because UPDATE, (which is a DELETE then INSERT) chains the tuples together so others see the activity happening. Seems like we are going to have to peek at rows like we do now for INSERT and peek at index rows, if I remember correctly. I can't think of any other place in the code where we loop around other backend's activity like this. This could be tricky. --------------------------------------------------------------------------- Dennis Bjorklund wrote: > On Thu, 17 Nov 2005, Bruce Momjian wrote: > > > Unless you have a table lock, INSERT has to be before UPDATE, think > > UPDATE, UPDATE (both fail), INSERT, INSERT. > > No matter what operation you start with you need a loop that try > insert/update until one of them succeed like in this example: > > http://www.postgresql.org/docs/8.1/s...UPSERT-EXAMPLE > > Without a loop you might not get to execute neither the insert nor the > update. Why? Think about this example: > > BEGIN > > INSERT <- fail because there is a row already > > <- before we manage to do the update someone > delete the row (which we can see in the > default transaction isolation level) > > UPDATE <- fail because there is no row so we will loop > and try the insert again > > <- before we manage to do the insert someone else does > an insert > > INSERT <- fail because there is a row already > > <- before we manage to do the update someone > delete the row > .... > > > You might need to loop any number of times before you manage to perform > one of the two operations. Which operation you should start with depends > on which of the two cases is the common one. > > -- > /Dennis Bj?rklund > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Bruce Momjian <pgman@candle.pha.pa.us> writes: > Oh, good point. I was thinking just about concurrent MERGEs. However, > it is more complicated than that. By definitaion you can not see > changes from other transactions while your statement is being run (even > if you increment CommandCounter), so to be atomic, you would still see > the row even though some other transaction had deleted it. We would have to use the same semantics we use now for read-committed UPDATE, that is look at the latest version of the row even though this would not normally be visible to the transaction's snapshot. In the case of a serializable transaction, no doubt we should fail if any concurrent change actually happens. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Oh, good point. I was thinking just about concurrent MERGEs. However, > > it is more complicated than that. By definitaion you can not see > > changes from other transactions while your statement is being run (even > > if you increment CommandCounter), so to be atomic, you would still see > > the row even though some other transaction had deleted it. > > We would have to use the same semantics we use now for read-committed > UPDATE, that is look at the latest version of the row even though this > would not normally be visible to the transaction's snapshot. > > In the case of a serializable transaction, no doubt we should fail if > any concurrent change actually happens. I have some psaudocode to explain what we want for this feature, whatever syntax we choose: Start Check unique index Found lock row for update if zero rows, return to start if more than one row, fail update row Notfound create savepoint insert row into heap lock index page if conflicting index entry, abort savepoint, return to start add index entry unlock index page While the "notfound" case might look strange, we actually use this exact method for inserts now, see ExecInsert() and _bt_doinsert(). Particularly see this comment in the second function: /* * If we're not allowing duplicates, make sure the key isn't already in * the index. * * NOTE: obviously,_bt_check_unique can only detect keys that are already in * the index; so it cannot defend against concurrent insertions of the * same key. We protect against that by means of holding a write lock on * the target page. Any other would-be inserter of the same key must * acquire a write lock on the same target page, so only one would-be * inserter can be making the check at one time. Furthermore, once we are * past the check we hold write locks continuously until we have performed * our insertion, so no later inserter can fail to see our insertion. * (This requires some care in _bt_insertonpg.) * * If we must wait for another xact, we release the lock while waiting, and * then must start over completely. */ Here is the unique check error from _bt_check_unique(): ereport(ERROR, (errcode(ERRCODE_UNIQUE_VIOLATION), errmsg("duplicate key violates unique constraint \"%s\"", RelationGetRelationName(rel)))); I think the problem here is that it is going to longjump() back to postgres.c (and out of your code loop). While we have savepoints, I think they only work coming from client applications, rather than inside our code. Ideally you would like to be able to say: savepoint(); func(); rollback_to_savepoint(); but you can't, so I think you are going to have to factor out that unique error callback and return a failure code to the caller. I suppose some boolean flag need to be added to _bt_doinsert(), but that is called via a function pointer for the index type, so you are going to have to update the insert function signatures for all access methods. The good news is that only btree supports unique indexes, according to the documentation ("Only B-tree currently supports unique indexes") so for the other access methods the extra parameter is just ignored. Another issue is multiple unique indexes. What if the first unique index matches one row, but a different row matches the second unique indexed column? Fail because unique checks do not identify exactly one row? Or the _new_ value for the second indexed column conflicts with the second unique index. The MERGE/REPLACE should fail. The UPDATE block will handle this on its own, but the INSERT block will need to check for that an really error out, rather than return to the caller, so the loop in ExecInsertIndexTuples() has to restart on unique failure _only_ on the first index check, not the subsequent ones. One simplification would be to allow MERGE/REPLACE only on a table that has a single unique index. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Thu, Nov 17, 2005 at 09:30:43PM -0500, Bruce Momjian wrote: > > Is the requirement for predicate locking, over and above a unique > > constraint on an index that involves the record key, to deal with > > the scenario of two inserts executing at the same time, both before > > commit? > > No. If you have a primary key you can easily prevent duplicates. You > need a table lock or predicate locking to prevent duplicates if you do > not have a primary key. AFAIK you can also accomplish this without a table lock as long as you have a unique index on the right set of fields and those fields are also NOT NULL. ISTM it would be good to support that case as well, since you might want to MERGE based on something other than the PK. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: > I don't think MERGE can really be made to be both though, in which case > it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON > DUPLICATE UPDATE something else. Perhaps a special form of MERGE where > you know it's going to be doing that locking. I really don't like the > idea of making the SQL2003 version of MERGE be the MERGE special case > (by requiring someone to take a table lock ahead of time or do something > else odd). Anyone know off-hand what the big 3 do? If the industry consensus is that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then it's probably better to follow that lead. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Jim C. Nasby wrote: > On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote: > >>I don't think MERGE can really be made to be both though, in which case >>it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON >>DUPLICATE UPDATE something else. Perhaps a special form of MERGE where >>you know it's going to be doing that locking. I really don't like the >>idea of making the SQL2003 version of MERGE be the MERGE special case >>(by requiring someone to take a table lock ahead of time or do something >>else odd). > > > Anyone know off-hand what the big 3 do? If the industry consensus is > that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then > it's probably better to follow that lead. It was already said here that oracle and db2 both use MERGE, dunno about mssql. And yes merge CAN be used to do REPLACE (oracle uses their dummy table for this, we can use the fact that FROM clause isn't required in postgres). -- Regards Petr Jelinek (PJMODOS) ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| > > And yes merge CAN be used to do REPLACE (oracle uses their dummy table > for this, we can use the fact that FROM clause isn't required in postgres). > the FROM clause is required by default (starting with 8.1) unless you change a postgresql.conf parameter. and i don't think that idea will have any fan... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Tue, Nov 22, 2005 at 04:20:12AM +0100, Petr Jelinek wrote: > It was already said here that oracle and db2 both use MERGE, dunno about > mssql. > > And yes merge CAN be used to do REPLACE (oracle uses their dummy table > for this, we can use the fact that FROM clause isn't required in postgres). Statements about MERGE on the web: http://www.dba-oracle.com/oracle_tips_rittman_merge.htm http://databasejournal.com/features/...le.php/3322041 http://certcities.com/editorial/colu...ditorialsID=51 http://publib.boulder.ibm.com/infoce...oc/sqls578.htm http://www.jdixon.dotnetdevelopersjo..._statement.htm http://publib.boulder.ibm.com/infoce...n/r0010873.htm http://expertanswercenter.techtarget...978700,00.html Not one (*not one!*) of these mentions any special handling of duplicate keys. They even go to pains to say that any errors cause everything to rollback. The last one is especially interesting: : Is there any way to capture errors from a MERGE statement? Also, is : there any way to know how many records were inserted or updated for the : MERGE statement like SQL%ROWCOUNT? Any assistance greatly appreciated. : : You capture errors the same way you would if you were doing regular : INSERT and UPDATE statements....with exception handlers. Just include a : WHEN OTHERS exception handler in the block where your MERGE statement : is and have to display SQLCODE and SQLERRM if an error occurs. Then you : can figure out which specific errors are occurring and create : individual exception handlers for those. There are even places that tell you how to decompose your MERGE into an INSERT plus UPDATE statement. The real advantage of MERGE is that the semantics prevent your updating a row you just inserted, which is harder in the general case but easy if the executor is handling the rows one at a time. Rather than trying to make MERGE do something it wasn't designed for, we should probably be spending our efforts on triggers for error conditions. Maybe something like: CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz(); Where baz would be passed NEW and OLD just like a normal trigger and if the trigger return NULL, the update is ignored. In the meantime the function can divert the insert to another table if it likes. This seems like a much more workable and useful addition. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQFDgvmqIB7bNG8LQkwRAr0eAKCBybl6qs2NN6kXG15tiG g+nObK0QCeP+2S +S9F/7PZ70V8CQmZqqMn6sE= =QsDZ -----END PGP SIGNATURE----- |
| ||||
| Jaime Casanova wrote: > > > > And yes merge CAN be used to do REPLACE (oracle uses their dummy table > > for this, we can use the fact that FROM clause isn't required in postgres). > > > > the FROM clause is required by default (starting with 8.1) unless you > change a postgresql.conf parameter. > > and i don't think that idea will have any fan... No, it is not, try SELECT 1. Oracle requires SELECT 1 FROM dual. The change in 8.1 is that SELECT pg_class.relname no longer works. You have to do SELECT relname FROM pg_class. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |