This is a discussion on Stored procedure deadlock on self with subquery within the SQL Server forums, part of the Microsoft SQL Server category; --> Here's a really weird one for any SQL Server gurus out there... We have observed (SQL Server 2000) scenarios ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Here's a really weird one for any SQL Server gurus out there... We have observed (SQL Server 2000) scenarios where a stored procedure which (a) begins a transaction (b) inserts some rows into a table (c) re-queries another table using a subquery which references the inserted table (correlated or not) will deadlock itself at point (c), even when it is the only task on the server. [I use the term 'deadlock' because I can't think of anything else appropriate, even though, as stated, this is the ONLY task executing, so this isn't a classical deadlock situation] A typical deadlocking scenario would be (assume insert_table is the table into which some rows are being inserted) begin transaction insert insert_table(col....) select (col....) from some_other_table /* this following query will deadlock and never complete */ select some_other_table.col from some_other_table where not exists (select * from insert_table where some_other_table.col = insert_table.col ) Whereas if the offending second query in the sequence is rewritten as a join e.g select some_other_table.col from some_other_table left join insert_table on some_other_table.col = insert_table.col where insert_table.col is null the query will not deadlock. If the subquery is an aggregate function, a deadlock will also not occur. If the transaction is committed prior to executing the blocking query, then it will not block (hardly surprising; if it did, there'd be major problems with SQL Server!). Note that this is a canonical case of a much more complex SP, and that simplified test cases often will not deadlock; you need a significant amount of data, typically 30,000 rows or more to see the problem. The blocking query is, in real life, used to drive a subsequent table insert operation, but this is not relevant to the problem. We conclude that there is some problem, possibly involving catalogue contention, if a temporary table must be created in a subquery within a transaction in a stored procedure, and if the subquery involves references to a table for which locks have been acquired. Note that the lock timeout will never trigger and a deadlock victim is never chosen, presumably because the deadlock occurs entirely within the same SPID. Locking hints and transaction isolation level setting does not affect the result. Note also that the exact same queries, executed as a TSQL batch, do not deadlock; you must place them in a stored procedure. Recovery mode for the database is SIMPLE, and the problem is portable across databases; it can also be exhibited with MSDE/2000, and is independent of whether or not the database server is the local machine or not. Has anyone else experienced this problem and/or know of a workaround, other than those mentioned here?. It does look awfully like a bug with SQL Server, since a single task should never be able to deadlock itself, surely. |
| |||
| Andrew Mayo (ajmayo@my-deja.com) writes: > A typical deadlocking scenario would be (assume insert_table is the > table into which some rows are being inserted) > > begin transaction > > insert insert_table(col....) select (col....) from some_other_table > > /* this following query will deadlock and never complete */ > > select some_other_table.col > from some_other_table > where not exists (select * > from insert_table > where some_other_table.col = insert_table.col ) > >... > If the transaction is committed prior to executing the blocking query, > then it will not block (hardly surprising; if it did, there'd be major > problems with SQL Server!). >... > We conclude that there is some problem, possibly involving catalogue > contention, if a temporary table must be created in a subquery within > a transaction in a stored procedure, and if the subquery involves > references to a table for which locks have been acquired. Without a reproducible scenarion, it is very difficult to comment. Of course, a real mean person can get the above to deadlock itself: in the SELECT call a UDF which calls xp_cmdshell that tries to retrieve a row from insert_table. This will cause a big fat deadlock, that SQL Server of course cannot detect. I would not really expect this to the problem, though. What I would recomment as a first shot is to use my routine aba_lockinfo, http://www.sommarskog.se/sqlutil/aba_lockinfo.html. This routine gives you a good overview of what's going on in the server. In particular, you can see if there is some blocking, and in such what you are blocked by. You can also keep an eye at the cpu, io and memory columns to determine wheter the process is working. Another place to look is the SQL Server error log. Particularly I would be interested if there is any talk of UMS scheduler. Unless any of what I have set leads you to a revelation, I would recommend you to open a case with Microsoft, as it appears to be too complex to pursue the actual case over the newsgroups. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp |
| |||
| Erland Sommarskog <esquel@sommarskog.se> wrote in message news:<Xns96141CD2380CYazorman@127.0.0.1>... > Without a reproducible scenarion, it is very difficult to comment. Of > course, a real mean person can get the above to deadlock itself: in the > SELECT call a UDF which calls xp_cmdshell that tries to retrieve a > row from insert_table. This will cause a big fat deadlock, that SQL > Server of course cannot detect. I would not really expect this to > the problem, though. Thanks very much; that's a useful hint. We found, after further exploration, that rephrasing the subqueries as self joins did NOT resolve the problem, although a simple subquery expressed as a single left outer join DID cause the SP to stop blocking. However, more complex self-joins caused the problem to reappear. We concluded that the problem is likely triggered by the query plan requiring a temp table to be dynamically created in order to either satisfy the complex join condition or process the subquery. We have now worked around the problem by taking the code out of a transaction entirely and building to a temp table. We then serialise the SP at top and bottom by means of an exclusive lock on the permanent table we are rebuilding. This ensures that any caller of the SP will not regain control until the work table it rebuilds is either verified up-to-date or has been rebuilt. The downside is that in rare circumstances, two simultaneous callers will both rebuilt temp tables, and then one of these tasks will discover that the permanent work table was rebuilt and discard their work. This will occur very rarely and is an acceptable tradeoff. So, the current hypothesis for this bug is that, within an SP (a) Begin a transaction (b) Perform an insert operation on empty table t which populates a reasonable rowcount - say, 40,000 rows (note that in our tests, table t had a PK but no other indexes or constraints, e.g foreign keys etc - nor are identity columns involved) (c) Within the same transaction, perform a second insert operation on table t which involves a subquery or self join involving table t (plus at least one other table), sufficiently complex to require the query analyser to dynamically create a temp table as part of the execution plan. We believe that the SP will 'lock up' at step (c) and that this is highly likely to be a catalogue serialisation issue, rather than something like page lock or table lock collision. If I get time, I will attempt to produce a canonical test case. |
| |||
| Andrew Mayo (ajmayo@my-deja.com) writes: > (a) Begin a transaction > (b) Perform an insert operation on empty table t which populates a > reasonable rowcount - say, 40,000 rows (note that in our tests, table > t had a PK but no other indexes or constraints, e.g foreign keys etc - > nor are identity columns involved) > (c) Within the same transaction, perform a second insert operation on > table t which involves a subquery or self join involving table t (plus > at least one other table), sufficiently complex to require the query > analyser to dynamically create a temp table as part of the execution > plan. All I can say is that this sounds highly unlikely. As for the loopback possibility, it occurred to me that there are a few more possibilies where this could creep in. You mentioned that the queries were complex, and if they refer to views or functions the loopback maybe hidden. Here are a couple a ways that you could haev a loopback: * A function that calls xp_cmdshell to do some dynamic SQL. (Very poor programming practice!) * A function or a view that uses OPENQUERY to receive the result set from a stored procedure. (Not very good programming practice either.) * Your query accesses views on a remote server, which in their turn refer back to tables on your own server. I will have to admit that it would be very interesting to see the output from a procedure like aba_lockinfo when you have this situation. I would say that my main hypothesis is that the query simply takes a lot time. But then again, I have not been at the site. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp |
| |||
| Erland Sommarskog <esquel@sommarskog.se> wrote in message news:<Xns961480782A4FYazorman@127.0.0.1>... > All I can say is that this sounds highly unlikely. > > As for the loopback possibility, it occurred to me that there are a > few more possibilies where this could creep in. You mentioned that the > queries were complex, and if they refer to views or functions the loopback > maybe hidden. Here are a couple a ways that you could haev a loopback: > > * A function that calls xp_cmdshell to do some dynamic SQL. (Very poor > programming practice!) > * A function or a view that uses OPENQUERY to receive the result set > from a stored procedure. (Not very good programming practice either.) > * Your query accesses views on a remote server, which in their turn > refer back to tables on your own server. > Well, although the data and tables are more complex than my simplified example, they are not significantly more complex, and they do not do any of the above. Also, if the queries are done out of a transaction they complete in about 10 seconds and if done in the transaction, they never complete. However, I'm trying to create a canonical test case, so far unsuccessfully, to pin down what the heck is going on, and will repost if and when I can do this. |
| |||
| I am probably misunderstanding/oversimplifying your scenario, however it seems to me that 1) You do these two queries in a transaction 2) The insert table is empty, and you load all rows with the first ftransaction - this will put write locks on all pages on the table (making it unable to be read until the transaction doing the insert is closed) 3) You then try to join to the table which is not possible because it is locked by the previous insert 4) This appears to be confirmed by the fact that running the query in a 2nd transaction works fine 5) If this is correct - then testing the transaction allowing dirty reads (READ UNCOMMITTED) should allow it to work If this is correct then you have a case of blocking locks, which is normal/correct behaviour (though not obvious in this case) Julian 8^) "Andrew Mayo" <ajmayo@my-deja.com> wrote in message news:2b20cd9f.0503080848.539a012d@posting.google.c om... > Here's a really weird one for any SQL Server gurus out there... > > We have observed (SQL Server 2000) scenarios where a stored procedure > which > > (a) begins a transaction > (b) inserts some rows into a table > (c) re-queries another table using a subquery which references the > inserted table (correlated or not) > > will deadlock itself at point (c), even when it is the only task on > the server. > > [I use the term 'deadlock' because I can't think of anything else > appropriate, even though, as stated, this is the ONLY task executing, > so this isn't a classical deadlock situation] > > A typical deadlocking scenario would be (assume insert_table is the > table into which some rows are being inserted) > > begin transaction > > insert insert_table(col....) select (col....) from some_other_table > > /* this following query will deadlock and never complete */ > > select some_other_table.col > from some_other_table > where not exists (select * > from insert_table > where some_other_table.col = insert_table.col ) > > > Whereas if the offending second query in the sequence is rewritten as > a join > > e.g > > select some_other_table.col > from some_other_table > left join insert_table > on some_other_table.col = insert_table.col > where insert_table.col is null > > the query will not deadlock. > > If the subquery is an aggregate function, a deadlock will also not > occur. > > If the transaction is committed prior to executing the blocking query, > then it will not block (hardly surprising; if it did, there'd be major > problems with SQL Server!). > > Note that this is a canonical case of a much more complex SP, and that > simplified test cases often will not deadlock; you need a significant > amount of data, typically 30,000 rows or more to see the problem. The > blocking query is, in real life, used to drive a subsequent table > insert operation, but this is not relevant to the problem. > > We conclude that there is some problem, possibly involving catalogue > contention, if a temporary table must be created in a subquery within > a transaction in a stored procedure, and if the subquery involves > references to a table for which locks have been acquired. > > Note that the lock timeout will never trigger and a deadlock victim is > never chosen, presumably because the deadlock occurs entirely within > the same SPID. > > Locking hints and transaction isolation level setting does not affect > the result. Note also that the exact same queries, executed as a TSQL > batch, do not deadlock; you must place them in a stored procedure. > > Recovery mode for the database is SIMPLE, and the problem is portable > across databases; it can also be exhibited with MSDE/2000, and is > independent of whether or not the database server is the local machine > or not. > > Has anyone else experienced this problem and/or know of a workaround, > other than those mentioned here?. It does look awfully like a bug with > SQL Server, since a single task should never be able to deadlock > itself, surely. |
| |||
| "julian_muir" <julian_muir@yahho.com> wrote in message news:<4232f716$1$89169$ed2619ec@ptn-nntp-reader01.plus.net>... > I am probably misunderstanding/oversimplifying your scenario, however it > seems to me that [snip] > 5) If this is correct - then testing the transaction allowing dirty reads > (READ UNCOMMITTED) should allow it to work > > If this is correct then you have a case of blocking locks, which is > normal/correct behaviour (though not obvious in this case) > I did consider this possibility. However, setting isolation level hinting in either or both SQL statements did not change things. I still strongly suspect catalogue contention, I have to say. Certainly, the first insert statement will acquire a bunch of write locks. Possibly these will even be escalated to a table-level lock. In fact,you can explicitly specify such a lock hint without changing the behaviour of the SP.(i.e it will still hang up) Now the important point that maybe I didn't stress significantly enough here is that this behaviour ONLY occurs in a STORED PROCEDURE. If it were related to deadlocking issues, there is no reason why it would be different if you executed it as part of a single TSQL batch, within a transaction, since the exact same locks would be acquired. Also, we should bear in mind that the locks you are acquiring are YOUR OWN locks. Because of this, you should never be blocked - and, in fact, you can do a transactional insert, then, without committing, do a SELECT on the table you just inserted into quite successfully and without blocking, as you might expect. (whether you actually see anything in the select will of course depend on whether you want committed reads or dirty reads, for example). After all, if it didn't work that way, you'd begin a transaction and then get blocked on your own locks - so, for example, two singleton updates on the same row within a single transaction would fail because the first update would acquire a write lock and by this theory, the second update should be blocked by it. Of course, this does not happen because, as I said, they are YOUR locks and YOU are not blocked by them, only other processes. You can do what you like; its YOUR transaction, and in this test scenario there are no other database users to complicate matters. Further testing currently confirms my working hypothesis; namely, that a whole bunch of things need to happen (a) The base table into which inserts are made needs to be truncated prior to executing the query (b) I *think* the query optimiser must then make a decision to recompile the SP (or at least rebuild the execution plan) based on a statistics change (need to check this area) (c) The subquery or self-join in the second statement (that blocks) must be sufficiently complex to require a temporary table to be created in order to execute the query plan. A really simple self-join, for instance, makes the problem go away, but a more complex join will promptly bring it back. As soon as I can produce a simple reproducible example I will post it. |
| ||||
| Andrew Mayo (ajmayo@my-deja.com) writes: > I still strongly suspect catalogue contention, I have to say. And I still think this sounds unlikely. > Now the important point that maybe I didn't stress significantly > enough here is that this behaviour ONLY occurs in a STORED PROCEDURE. > > If it were related to deadlocking issues, there is no reason why it > would be different if you executed it as part of a single TSQL batch, > within a transaction, since the exact same locks would be acquired. For various reasons you can get different query plans when you use a stored procedure and when you have a loose batch. The two chief reasons for these possibilities: 1) Parameter sniffing. (If you SP has parameters the values of these can affect the query plan.) 2) Different settings. (Recall that some settings are saved with the stored procedure.) > As soon as I can produce a simple reproducible example I will post it. Once you have it, I would be very interested in seeing it. However, I fear that it might be a time-consuming task to construct a repro. And, as I said before, it would be interesting to see lock and process information from this situation. -- Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se Books Online for SQL Server SP3 at http://www.microsoft.com/sql/techinf...2000/books.asp |