This is a discussion on Slow insert when table is almost "full". IBM Informix call 376023 within the Informix forums, part of the Database Server Software category; --> IDS 9.21 FC4 on HP-UX 11.11 We had a problem with a slow-running process twice on two months. On ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| IDS 9.21 FC4 on HP-UX 11.11 We had a problem with a slow-running process twice on two months. On each occasion, we could see that the rate of insert into a particular table was abnormally slow. We could also see that, in each case, the number of used pages was equal to the number of allocated ones. The working hypothesis is that the engine is spending a long time looking for small quantities of free space within the existing pages. This is a table that is added to daily, then older rows purged monthly. But we'd never had this problem previously in several years of running, nor did the number of extents (34 in the first occurrence, 4 in the second) seem of significance, nor have we had any problems with any of the other 600-odd tables taking new extents. Following a restore to a test system I was able to reproduce the problem in "lab"conditions: a simple INSERT INTO runs at around 4000 inserts per minute when the table is nearly full: if rebuilt (ALTER FRAGMENT ... INIT IN) the inserts then run at 128,000 inserts per minute. So the insert rate is more than 30 times slower when the problem is being experienced. The table itself is only about 3GBytes in size, but is in a database server of about 400GBytes. Unfortunately, the problem cannot be reproduced by an onunload/onload (the insert rate is normal following this, which may itself give useful clues), so it can only be investigated with a lot of set-up effort [no free EMC BCVs available I reproduce the first part of an oncheck -pt for the affected table below. As mentioned, if the table is rebuilt, the inserts immediately run at their normal rate. Any ideas gratefully ridiculed :-) thanks Neil TBLspace Report for live:root.dsinvlines Physical Address e00014 Creation date 07/12/2003 19:38:14 TBLspace Flags 802 Row Locking TBLspace use 4 bit bit-maps Maximum row size 219 Number of special columns 0 Number of keys 0 Number of extents 4 Current serial value 1 First extent size 4 Next extent size 500000 Number of pages allocated 1500004 Number of pages used 1500004 Number of data pages 1430215 Number of rows 12871602 Partition partnum 9437201 Partition lockid 9437201 |
| |||
| I'm surprised that no-one clever has responded yet. I'll state some blindingly obvious things for you to ridicule... I guess this is a simple table, with no blobs/etc. I also guess there is no intentional fragmentation. Is the indexing simple? Is it possible to remove the indexes when it's going slow and see if that helps. Does the fabled 'update stats' help at all in that case? If your hypothesis is correct, then wouldn't the deletion of rows at the end of the month result in large blocks becoming available. Of course if, over time, you add (numerically) more rows in successive months, then fragmentation will start to happen as time goes by... If you are fragmenting on date, is that working correctly? Is there any insert trigger being fired when these rows go in? Is the insert going straight into the table, or via a view? Any CHECK constraints? So, there's the obvious things done... Yes, I'd agree that your hypothesis looks reasonable, but wonder whether it's data or index fragmentation that may be responsible. Maybe you can pursue this in your lab? If you can get it a bit repeatable, then perhaps you could try some things before you get to loading the data in. Good luck, I can't see this being a fast thing to debug, if only because of the quantities of data involved and the non-repeatability of it. A suggestion for testing in the lab, if you can do it, would be to 'dd' the dbspace before trying anything, then at least you can get back to a known state. In message <bjg81r$i58fo$1@ID-162943.news.uni-berlin.de>, Neil Truby <neil.truby@ardenta.com> writes >IDS 9.21 FC4 on HP-UX 11.11 > >We had a problem with a slow-running process twice on two months. On each >occasion, we could see that the rate of insert into a particular table was >abnormally slow. We could also see that, in each case, the number of used >pages was equal to the number of allocated ones. > >The working hypothesis is that the engine is spending a long time looking >for small quantities of free space within the existing pages. This is a >table that is added to daily, then older rows purged monthly. But we'd >never had this problem previously in several years of running, nor did the >number of extents (34 in the first occurrence, 4 in the second) seem of >significance, nor have we had any problems with any of the other 600-odd >tables taking new extents. > >Following a restore to a test system I was able to reproduce the problem in >"lab"conditions: a simple INSERT INTO runs at around 4000 inserts per minute >when the table is nearly >full: if rebuilt (ALTER FRAGMENT ... INIT IN) the inserts then run at >128,000 inserts per minute. So the insert rate is more than 30 times slower >when the problem is being experienced. > >The table itself is only about 3GBytes in size, but is in a database server >of about 400GBytes. Unfortunately, the problem cannot be reproduced by an >onunload/onload (the insert rate is normal following this, which may itself >give useful clues), so it can only be investigated with a lot of set-up >effort [no free EMC BCVs available > >I reproduce the first part of an oncheck -pt for the affected table below. >As mentioned, if the table is rebuilt, the inserts immediately run at their >normal rate. > >Any ideas gratefully ridiculed :-) > >thanks >Neil > >TBLspace Report for live:root.dsinvlines > > Physical Address e00014 > Creation date 07/12/2003 19:38:14 > TBLspace Flags 802 Row Locking > TBLspace use 4 bit bit-maps > Maximum row size 219 > Number of special columns 0 > Number of keys 0 > Number of extents 4 > Current serial value 1 > First extent size 4 > Next extent size 500000 > Number of pages allocated 1500004 > Number of pages used 1500004 > Number of data pages 1430215 > Number of rows 12871602 > Partition partnum 9437201 > Partition lockid 9437201 > > -- Andrew Lennard andy@kontron.demon.co.uk |
| |||
| "Andy Lennard" <andy@kontron.demon.co.uk> wrote in message news:qNvFE8JIFIX$EwY7@kontron.demon.co.uk... > I'm surprised that no-one clever has responded yet. You have, Andy. Oh, I see what you mean ... :-) > I guess this is a simple table, with no blobs/etc. Yes. > I also guess there is no intentional fragmentation. Yes, that's right, there isn't. > Is the indexing simple? Is it possible to remove the indexes when it's > going slow and see if that helps. Does the fabled 'update stats' help at > all in that case? Hmmm. Indexing is simple. I haven't tried dropping one. I doubt if it'll make any difference, because the indexes are detatched. Update stats? Well, Mark Denham suggested an "update statistcis low", and he's clever, so I'll try that too .... > If your hypothesis is correct, then wouldn't the deletion of rows at the > end of the month result in large blocks becoming available. Of course > if, over time, you add (numerically) more rows in successive months, > then fragmentation will start to happen as time goes by... You'd have thought so. The 2nd occurrence occurred only six weeks after we'd rebuilt the table following the first occurrence. > If you are fragmenting on date, is that working correctly? n/a > Is there any insert trigger being fired when these rows go in? No. > Is the insert going straight into the table, or via a view? Direct to a table. > Any CHECK constraints? No, only a primary (and therefore unique) key > Yes, I'd agree that your hypothesis looks reasonable, but wonder whether > it's data or index fragmentation that may be responsible. Maybe you can > pursue this in your lab? If you can get it a bit repeatable, then > perhaps you could try some things before you get to loading the data in. Indeed. I'd suspect data, since the problem seems to coincide with the number of pages allocated being equal to the number used, as per the oncheck output. But who knows ...? > A suggestion for testing in the lab, if you can do it, would be to 'dd' > the dbspace before trying anything, then at least you can get back to a > known state. You reckon? Only by dropping all the other application dbspaces first I'd have thought ... and even then it's timestamps would be out of step with rootdbs, llogdbs, physdbs etc .... Thanks for stimulating my mind, I'll let you know how your suggestions go. cheers Neil |