Unix Technical Forum

Re: MERGE vs REPLACE

This is a discussion on Re: MERGE vs REPLACE within the pgsql Hackers forums, part of the PostgreSQL category; --> On Thu, 17 Nov 2005, Bruce Momjian wrote: > Unless you have a table lock, INSERT has to be ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

 

LinkBack Thread Tools Display Modes
  #11 (permalink)  
Old 04-11-2008, 07:52 AM
Dennis Bjorklund
 
Posts: n/a
Default Re: MERGE vs REPLACE

On Thu, 17 Nov 2005, Bruce Momjian wrote:

> Unless you have a table lock, INSERT has to be before UPDATE, think
> UPDATE, UPDATE (both fail), INSERT, INSERT.


No matter what operation you start with you need a loop that try
insert/update until one of them succeed like in this example:

http://www.postgresql.org/docs/8.1/s...UPSERT-EXAMPLE

Without a loop you might not get to execute neither the insert nor the
update. Why? Think about this example:

BEGIN

INSERT <- fail because there is a row already

<- before we manage to do the update someone
delete the row (which we can see in the
default transaction isolation level)

UPDATE <- fail because there is no row so we will loop
and try the insert again

<- before we manage to do the insert someone else does
an insert

INSERT <- fail because there is a row already

<- before we manage to do the update someone
delete the row
.....


You might need to loop any number of times before you manage to perform
one of the two operations. Which operation you should start with depends
on which of the two cases is the common one.

--
/Dennis Björklund


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-11-2008, 07:52 AM
Bruce Momjian
 
Posts: n/a
Default Re: MERGE vs REPLACE


Oh, good point. I was thinking just about concurrent MERGEs. However,
it is more complicated than that. By definitaion you can not see
changes from other transactions while your statement is being run (even
if you increment CommandCounter), so to be atomic, you would still see
the row even though some other transaction had deleted it.

I think we avoid that now because UPDATE, (which is a DELETE then
INSERT) chains the tuples together so others see the activity happening.

Seems like we are going to have to peek at rows like we do now for
INSERT and peek at index rows, if I remember correctly. I can't think
of any other place in the code where we loop around other backend's
activity like this.

This could be tricky.

---------------------------------------------------------------------------

Dennis Bjorklund wrote:
> On Thu, 17 Nov 2005, Bruce Momjian wrote:
>
> > Unless you have a table lock, INSERT has to be before UPDATE, think
> > UPDATE, UPDATE (both fail), INSERT, INSERT.

>
> No matter what operation you start with you need a loop that try
> insert/update until one of them succeed like in this example:
>
> http://www.postgresql.org/docs/8.1/s...UPSERT-EXAMPLE
>
> Without a loop you might not get to execute neither the insert nor the
> update. Why? Think about this example:
>
> BEGIN
>
> INSERT <- fail because there is a row already
>
> <- before we manage to do the update someone
> delete the row (which we can see in the
> default transaction isolation level)
>
> UPDATE <- fail because there is no row so we will loop
> and try the insert again
>
> <- before we manage to do the insert someone else does
> an insert
>
> INSERT <- fail because there is a row already
>
> <- before we manage to do the update someone
> delete the row
> ....
>
>
> You might need to loop any number of times before you manage to perform
> one of the two operations. Which operation you should start with depends
> on which of the two cases is the common one.
>
> --
> /Dennis Bj?rklund
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>


--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 04-11-2008, 07:52 AM
Tom Lane
 
Posts: n/a
Default Re: MERGE vs REPLACE

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Oh, good point. I was thinking just about concurrent MERGEs. However,
> it is more complicated than that. By definitaion you can not see
> changes from other transactions while your statement is being run (even
> if you increment CommandCounter), so to be atomic, you would still see
> the row even though some other transaction had deleted it.


We would have to use the same semantics we use now for read-committed
UPDATE, that is look at the latest version of the row even though this
would not normally be visible to the transaction's snapshot.

In the case of a serializable transaction, no doubt we should fail if
any concurrent change actually happens.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 04-11-2008, 07:53 AM
Bruce Momjian
 
Posts: n/a
Default Re: MERGE vs REPLACE

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Oh, good point. I was thinking just about concurrent MERGEs. However,
> > it is more complicated than that. By definitaion you can not see
> > changes from other transactions while your statement is being run (even
> > if you increment CommandCounter), so to be atomic, you would still see
> > the row even though some other transaction had deleted it.

>
> We would have to use the same semantics we use now for read-committed
> UPDATE, that is look at the latest version of the row even though this
> would not normally be visible to the transaction's snapshot.
>
> In the case of a serializable transaction, no doubt we should fail if
> any concurrent change actually happens.


I have some psaudocode to explain what we want for this feature,
whatever syntax we choose:

Start
Check unique index
Found
lock row for update
if zero rows, return to start
if more than one row, fail
update row
Notfound
create savepoint
insert row into heap
lock index page
if conflicting index entry, abort savepoint, return to start
add index entry
unlock index page

While the "notfound" case might look strange, we actually use this exact
method for inserts now, see ExecInsert() and _bt_doinsert().
Particularly see this comment in the second function:

/*
* If we're not allowing duplicates, make sure the key isn't already in
* the index.
*
* NOTE: obviously,_bt_check_unique can only detect keys that are already in
* the index; so it cannot defend against concurrent insertions of the
* same key. We protect against that by means of holding a write lock on
* the target page. Any other would-be inserter of the same key must
* acquire a write lock on the same target page, so only one would-be
* inserter can be making the check at one time. Furthermore, once we are
* past the check we hold write locks continuously until we have performed
* our insertion, so no later inserter can fail to see our insertion.
* (This requires some care in _bt_insertonpg.)
*
* If we must wait for another xact, we release the lock while waiting, and
* then must start over completely.
*/

Here is the unique check error from _bt_check_unique():

ereport(ERROR,
(errcode(ERRCODE_UNIQUE_VIOLATION),
errmsg("duplicate key violates unique constraint \"%s\"",
RelationGetRelationName(rel))));

I think the problem here is that it is going to longjump() back to
postgres.c (and out of your code loop). While we have savepoints, I
think they only work coming from client applications, rather than inside
our code. Ideally you would like to be able to say:

savepoint();
func();
rollback_to_savepoint();

but you can't, so I think you are going to have to factor out that
unique error callback and return a failure code to the caller. I
suppose some boolean flag need to be added to _bt_doinsert(), but that
is called via a function pointer for the index type, so you are going to
have to update the insert function signatures for all access methods.
The good news is that only btree supports unique indexes, according to
the documentation ("Only B-tree currently supports unique indexes") so
for the other access methods the extra parameter is just ignored.

Another issue is multiple unique indexes. What if the first unique
index matches one row, but a different row matches the second unique
indexed column? Fail because unique checks do not identify exactly one
row?

Or the _new_ value for the second indexed column conflicts with the
second unique index. The MERGE/REPLACE should fail. The UPDATE block
will handle this on its own, but the INSERT block will need to check for
that an really error out, rather than return to the caller, so the loop
in ExecInsertIndexTuples() has to restart on unique failure _only_ on
the first index check, not the subsequent ones.

One simplification would be to allow MERGE/REPLACE only on a table that
has a single unique index.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 04-11-2008, 07:54 AM
Jim C. Nasby
 
Posts: n/a
Default Re: MERGE vs REPLACE

On Thu, Nov 17, 2005 at 09:30:43PM -0500, Bruce Momjian wrote:
> > Is the requirement for predicate locking, over and above a unique
> > constraint on an index that involves the record key, to deal with
> > the scenario of two inserts executing at the same time, both before
> > commit?

>
> No. If you have a primary key you can easily prevent duplicates. You
> need a table lock or predicate locking to prevent duplicates if you do
> not have a primary key.


AFAIK you can also accomplish this without a table lock as long as you
have a unique index on the right set of fields and those fields are also
NOT NULL. ISTM it would be good to support that case as well, since you
might want to MERGE based on something other than the PK.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 04-11-2008, 07:54 AM
Jim C. Nasby
 
Posts: n/a
Default Re: MERGE vs REPLACE

On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
> I don't think MERGE can really be made to be both though, in which case
> it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
> DUPLICATE UPDATE something else. Perhaps a special form of MERGE where
> you know it's going to be doing that locking. I really don't like the
> idea of making the SQL2003 version of MERGE be the MERGE special case
> (by requiring someone to take a table lock ahead of time or do something
> else odd).


Anyone know off-hand what the big 3 do? If the industry consensus is
that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then
it's probably better to follow that lead.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 04-11-2008, 07:55 AM
Petr Jelinek
 
Posts: n/a
Default Re: MERGE vs REPLACE

Jim C. Nasby wrote:
> On Thu, Nov 17, 2005 at 10:15:30AM -0500, Stephen Frost wrote:
>
>>I don't think MERGE can really be made to be both though, in which case
>>it should really be the SQL2003 MERGE and we can make REPLACE/INSERT ON
>>DUPLICATE UPDATE something else. Perhaps a special form of MERGE where
>>you know it's going to be doing that locking. I really don't like the
>>idea of making the SQL2003 version of MERGE be the MERGE special case
>>(by requiring someone to take a table lock ahead of time or do something
>>else odd).

>
>
> Anyone know off-hand what the big 3 do? If the industry consensus is
> that merge should actually be REPLACE/INSERT ON DUPLICATE UPDATE then
> it's probably better to follow that lead.


It was already said here that oracle and db2 both use MERGE, dunno about
mssql.

And yes merge CAN be used to do REPLACE (oracle uses their dummy table
for this, we can use the fact that FROM clause isn't required in postgres).

--
Regards
Petr Jelinek (PJMODOS)

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #18 (permalink)  
Old 04-11-2008, 07:56 AM
Jaime Casanova
 
Posts: n/a
Default Re: MERGE vs REPLACE

>
> And yes merge CAN be used to do REPLACE (oracle uses their dummy table
> for this, we can use the fact that FROM clause isn't required in postgres).
>


the FROM clause is required by default (starting with 8.1) unless you
change a postgresql.conf parameter.

and i don't think that idea will have any fan...


--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #19 (permalink)  
Old 04-11-2008, 07:56 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: MERGE vs REPLACE

On Tue, Nov 22, 2005 at 04:20:12AM +0100, Petr Jelinek wrote:
> It was already said here that oracle and db2 both use MERGE, dunno about
> mssql.
>
> And yes merge CAN be used to do REPLACE (oracle uses their dummy table
> for this, we can use the fact that FROM clause isn't required in postgres).


Statements about MERGE on the web:

http://www.dba-oracle.com/oracle_tips_rittman_merge.htm
http://databasejournal.com/features/...le.php/3322041
http://certcities.com/editorial/colu...ditorialsID=51
http://publib.boulder.ibm.com/infoce...oc/sqls578.htm
http://www.jdixon.dotnetdevelopersjo..._statement.htm
http://publib.boulder.ibm.com/infoce...n/r0010873.htm
http://expertanswercenter.techtarget...978700,00.html

Not one (*not one!*) of these mentions any special handling of
duplicate keys. They even go to pains to say that any errors cause
everything to rollback. The last one is especially interesting:

: Is there any way to capture errors from a MERGE statement? Also, is
: there any way to know how many records were inserted or updated for the
: MERGE statement like SQL%ROWCOUNT? Any assistance greatly appreciated.
:
: You capture errors the same way you would if you were doing regular
: INSERT and UPDATE statements....with exception handlers. Just include a
: WHEN OTHERS exception handler in the block where your MERGE statement
: is and have to display SQLCODE and SQLERRM if an error occurs. Then you
: can figure out which specific errors are occurring and create
: individual exception handlers for those.

There are even places that tell you how to decompose your MERGE into an
INSERT plus UPDATE statement. The real advantage of MERGE is that the
semantics prevent your updating a row you just inserted, which is
harder in the general case but easy if the executor is handling the
rows one at a time.

Rather than trying to make MERGE do something it wasn't designed for,
we should probably be spending our efforts on triggers for error
conditions. Maybe something like:

CREATE TRIGGER foo AFTER ERROR ON bar EXECUTE baz();

Where baz would be passed NEW and OLD just like a normal trigger and if
the trigger return NULL, the update is ignored. In the meantime the
function can divert the insert to another table if it likes. This seems
like a much more workable and useful addition.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQFDgvmqIB7bNG8LQkwRAr0eAKCBybl6qs2NN6kXG15tiG g+nObK0QCeP+2S
+S9F/7PZ70V8CQmZqqMn6sE=
=QsDZ
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #20 (permalink)  
Old 04-11-2008, 07:56 AM
Bruce Momjian
 
Posts: n/a
Default Re: MERGE vs REPLACE

Jaime Casanova wrote:
> >
> > And yes merge CAN be used to do REPLACE (oracle uses their dummy table
> > for this, we can use the fact that FROM clause isn't required in postgres).
> >

>
> the FROM clause is required by default (starting with 8.1) unless you
> change a postgresql.conf parameter.
>
> and i don't think that idea will have any fan...


No, it is not, try SELECT 1. Oracle requires SELECT 1 FROM dual. The
change in 8.1 is that SELECT pg_class.relname no longer works. You have to
do SELECT relname FROM pg_class.

--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:01 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com