Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > Informix

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 05-16-2008, 01:44 PM
Andrew Ford
 
Posts: n/a
Default Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 - Red HatEL5

I'm running into some very strange high checkpoints and really poor
performance towards the end of an index build. Can anyone shed some light
on what could be going on?

Things start off really great. Scan threads, xchng threads and psortproc
threads hum along just fine writing out my temporary files to PSORT_DBTEMP.
But when it comes time to actually put my new index pages on disk things
start getting weird. I start experiencing very long checkpoints (700
seconds followed by 350 seconds). I realize that these are non blocking
checkpoints and I shouldn't be concerned but read/write performance goes
right into the terlet. Before the checkpoints I'm reading about 30K pages a
second and writing 10K pages a second, during and after the checkpoints I'm
reading about 2K pages a second and writing 200 pages a second.

So the really complicated stuff that I would expect to take the longest time
(scanning the data pages and doing all of the sorting) completes in about 10
minutes while the easier stuff that I would expect to go really quickly
(writing the index pages to disk) is taking 40 minutes.

IDS 11.10.UC2W2 - Red Hat EL5

115 million rows - Unique index key size = 40 bytes

524288 2K buffers, 127 LRUs, 127 Cleaners

Getting the same results with RTO_SERVER_RESTART enabled and disabled with
CKPTINVL set as low as 60 seconds and as high as 10 minutes.

Getting the same results with lru_min and lru_max set as low as 5/10 and as
high as 70/80.

Andrew


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 05-16-2008, 01:44 PM
Neil Truby
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 - Red HatEL5


"Andrew Ford" <aford@networkip.net> wrote in message
news:mailman.1093.1210707144.20610.informix-list@iiug.org...
> I'm running into some very strange high checkpoints and really poor
> performance towards the end of an index build. Can anyone shed some light
> on what could be going on?
>
> Things start off really great. Scan threads, xchng threads and psortproc
> threads hum along just fine writing out my temporary files to
> PSORT_DBTEMP. But when it comes time to actually put my new index pages on
> disk things start getting weird. I start experiencing very long
> checkpoints (700 seconds followed by 350 seconds). I realize that these
> are non blocking checkpoints and I shouldn't be concerned but read/write
> performance goes right into the terlet. Before the checkpoints I'm
> reading about 30K pages a second and writing 10K pages a second, during
> and after the checkpoints I'm reading about 2K pages a second and writing
> 200 pages a second.
>
> So the really complicated stuff that I would expect to take the longest
> time (scanning the data pages and doing all of the sorting) completes in
> about 10 minutes while the easier stuff that I would expect to go really
> quickly (writing the index pages to disk) is taking 40 minutes.
>
> IDS 11.10.UC2W2 - Red Hat EL5
>
> 115 million rows - Unique index key size = 40 bytes
>
> 524288 2K buffers, 127 LRUs, 127 Cleaners
>
> Getting the same results with RTO_SERVER_RESTART enabled and disabled with
> CKPTINVL set as low as 60 seconds and as high as 10 minutes.
>
> Getting the same results with lru_min and lru_max set as low as 5/10 and
> as high as 70/80.
>
> Andrew


I've no particualr experience on this version, but just one thing I've seen
before in others: is the dbspace to which the indexes being written very
fragmented: ie could IDS be spending time scanning for free space?


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 05-16-2008, 01:44 PM
Andrew Ford
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5

From: "Neil Truby" <neil.truby@ardenta.com>

> is the dbspace to which the indexes being written very fragmented: ie
> could IDS be spending time scanning for free space?


I can guarantee the dbspace is not fragmented. The dbspace is empty before
the index is created, the table extent sizes are 1GB, so my index extents
should be large enough to make this not a allocating many small index
extents problem either.

I looks to me, and I could be way way off here, that the non blocking
checkpoint is attempting to flush the dirty pages to disk and the LRU
cleaners are still trying to do their business and they are involved in a
battle royale for resources because we're flooding the buffer cache with
these 3 million new index pages we just created and nobody wins.

I am using large (to me anyhow) chunks, 100 GB for the data dbspace and 100
GB for the index dbspace and I do have a decent number of buffers (over
500,000 2K buffers = 1 GB) Maybe something is not optimized and IDS is
spinning its wheels trying to sort something before it dumps it to disk.

I also see the main_loop() thread relatively often in the onstat -g act
output during these silly long checkpoints. This is the checkpoint trying
to do its thing, correct?

The search continues...

Thanks,

Andrew

----- Original Message -----
From: "Neil Truby" <neil.truby@ardenta.com>
Newsgroups: comp.databases.informix
To: <informix-list@iiug.org>
Sent: Tuesday, May 13, 2008 5:06 PM
Subject: Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -
RedHatEL5


>
> "Andrew Ford" <aford@networkip.net> wrote in message
> news:mailman.1093.1210707144.20610.informix-list@iiug.org...
>> I'm running into some very strange high checkpoints and really poor
>> performance towards the end of an index build. Can anyone shed some
>> light on what could be going on?
>>
>> Things start off really great. Scan threads, xchng threads and psortproc
>> threads hum along just fine writing out my temporary files to
>> PSORT_DBTEMP. But when it comes time to actually put my new index pages
>> on disk things start getting weird. I start experiencing very long
>> checkpoints (700 seconds followed by 350 seconds). I realize that these
>> are non blocking checkpoints and I shouldn't be concerned but read/write
>> performance goes right into the terlet. Before the checkpoints I'm
>> reading about 30K pages a second and writing 10K pages a second, during
>> and after the checkpoints I'm reading about 2K pages a second and writing
>> 200 pages a second.
>>
>> So the really complicated stuff that I would expect to take the longest
>> time (scanning the data pages and doing all of the sorting) completes in
>> about 10 minutes while the easier stuff that I would expect to go really
>> quickly (writing the index pages to disk) is taking 40 minutes.
>>
>> IDS 11.10.UC2W2 - Red Hat EL5
>>
>> 115 million rows - Unique index key size = 40 bytes
>>
>> 524288 2K buffers, 127 LRUs, 127 Cleaners
>>
>> Getting the same results with RTO_SERVER_RESTART enabled and disabled
>> with CKPTINVL set as low as 60 seconds and as high as 10 minutes.
>>
>> Getting the same results with lru_min and lru_max set as low as 5/10 and
>> as high as 70/80.
>>
>> Andrew

>
> I've no particualr experience on this version, but just one thing I've
> seen before in others: is the dbspace to which the indexes being written
> very fragmented: ie could IDS be spending time scanning for free space?
>
> _______________________________________________
> Informix-list mailing list
> Informix-list@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 05-16-2008, 01:44 PM
jprenaut@yahoo.com
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5

On May 13, 5:31 pm, "Andrew Ford" <af...@networkip.net> wrote:
> From: "Neil Truby" <neil.tr...@ardenta.com>
>
> > is the dbspace to which the indexes being written very fragmented: ie
> > could IDS be spending time scanning for free space?

>
> I can guarantee the dbspace is not fragmented. The dbspace is empty before
> the index is created, the table extent sizes are 1GB, so my index extents
> should be large enough to make this not a allocating many small index
> extents problem either.
>
> I looks to me, and I could be way way off here, that the non blocking
> checkpoint is attempting to flush the dirty pages to disk and the LRU
> cleaners are still trying to do their business and they are involved in a
> battle royale for resources because we're flooding the buffer cache with
> these 3 million new index pages we just created and nobody wins.
>
> I am using large (to me anyhow) chunks, 100 GB for the data dbspace and 100
> GB for the index dbspace and I do have a decent number of buffers (over
> 500,000 2K buffers = 1 GB) Maybe something is not optimized and IDS is
> spinning its wheels trying to sort something before it dumps it to disk.
>
> I also see the main_loop() thread relatively often in the onstat -g act
> output during these silly long checkpoints. This is the checkpoint trying
> to do its thing, correct?
>
> The search continues...
>
> Thanks,
>
> Andrew
>
> ----- Original Message -----
> From: "Neil Truby" <neil.tr...@ardenta.com>
>
> Newsgroups: comp.databases.informix
> To: <informix-l...@iiug.org>
> Sent: Tuesday, May 13, 2008 5:06 PM
> Subject: Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -
> RedHatEL5
>
> > "Andrew Ford" <af...@networkip.net> wrote in message
> >news:mailman.1093.1210707144.20610.informix-list@iiug.org...
> >> I'm running into some very strange high checkpoints and really poor
> >> performance towards the end of an index build. Can anyone shed some
> >> light on what could be going on?

>
> >> Things start off really great. Scan threads, xchng threads and psortproc
> >> threads hum along just fine writing out my temporary files to
> >> PSORT_DBTEMP. But when it comes time to actually put my new index pages
> >> on disk things start getting weird. I start experiencing very long
> >> checkpoints (700 seconds followed by 350 seconds). I realize that these
> >> are non blocking checkpoints and I shouldn't be concerned but read/write
> >> performance goes right into the terlet. Before the checkpoints I'm
> >> reading about 30K pages a second and writing 10K pages a second, during
> >> and after the checkpoints I'm reading about 2K pages a second and writing
> >> 200 pages a second.

>
> >> So the really complicated stuff that I would expect to take the longest
> >> time (scanning the data pages and doing all of the sorting) completes in
> >> about 10 minutes while the easier stuff that I would expect to go really
> >> quickly (writing the index pages to disk) is taking 40 minutes.

>
> >> IDS 11.10.UC2W2 - Red Hat EL5

>
> >> 115 million rows - Unique index key size = 40 bytes

>
> >> 524288 2K buffers, 127 LRUs, 127 Cleaners

>
> >> Getting the same results with RTO_SERVER_RESTART enabled and disabled
> >> with CKPTINVL set as low as 60 seconds and as high as 10 minutes.

>
> >> Getting the same results with lru_min and lru_max set as low as 5/10 and
> >> as high as 70/80.

>
> >> Andrew

>
> > I've no particualr experience on this version, but just one thing I've
> > seen before in others: is the dbspace to which the indexes being written
> > very fragmented: ie could IDS be spending time scanning for free space?

>
> > _______________________________________________
> > Informix-list mailing list
> > Informix-l...@iiug.org
> >http://www.iiug.org/mailman/listinfo/informix-list


What does onstat -g ckp show for the long checkpoints? I supposed
secondly if you are using interval checkpoints then people might need
to get used to seeing long checkpoint durations since I believe the
checkpoint duration includes the non-blocked time when the buffer pool
is being written. How are you measuring the reads and writes, and
where are those reads and writes occuring, on the disk containing the
file system where you are writing with PSORT_DBTEMP or to the disks
where your chunks are? If you think it's contention between LRU
flushers and the checkpoint, well, I wouldn't think that would happen
since the LRU cleaners would actually stop cleaning LRU's when the
checkpoint came up, and then the LRU cleaners are assigned chunks to
clean during the checkpoint (well that's the way we used to do it, I
haven't looked at 11 to see if there was any significant changes to
this code). I'm wondering if the problem is you have all this index
going to 1 chunk of 1 dbspace. Where in the past with chunk limits of
2gig you would have multiple page cleaners working in parallel to
clean multiple chunks, but if your system is now just 1 chunk, then I
think the code would still only assign 1 LRU cleaner to clean all the
pages in that chunk, so it's now you aren't getting nearly the amount
of parallel processing as before. During the checkpoint if you look
at onstat -g act how many flush_sub threads are running, that's the
LRU cleaners? Also in the onstat -F how many page cleaners do you see
assigned to doing chunk cleaning? Again if I remember off the top of
my head that would be a "state" of "C".

Jacques
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 05-16-2008, 01:44 PM
jprenaut@yahoo.com
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5

On May 13, 7:02 pm, jpren...@yahoo.com wrote:
> On May 13, 5:31 pm, "Andrew Ford" <af...@networkip.net> wrote:
>
>
>
> > From: "Neil Truby" <neil.tr...@ardenta.com>

>
> > > is the dbspace to which the indexes being written very fragmented: ie
> > > could IDS be spending time scanning for free space?

>
> > I can guarantee the dbspace is not fragmented. The dbspace is empty before
> > the index is created, the table extent sizes are 1GB, so my index extents
> > should be large enough to make this not a allocating many small index
> > extents problem either.

>
> > I looks to me, and I could be way way off here, that the non blocking
> > checkpoint is attempting to flush the dirty pages to disk and the LRU
> > cleaners are still trying to do their business and they are involved in a
> > battle royale for resources because we're flooding the buffer cache with
> > these 3 million new index pages we just created and nobody wins.

>
> > I am using large (to me anyhow) chunks, 100 GB for the data dbspace and 100
> > GB for the index dbspace and I do have a decent number of buffers (over
> > 500,000 2K buffers = 1 GB) Maybe something is not optimized and IDS is
> > spinning its wheels trying to sort something before it dumps it to disk.

>
> > I also see the main_loop() thread relatively often in the onstat -g act
> > output during these silly long checkpoints. This is the checkpoint trying
> > to do its thing, correct?

>
> > The search continues...

>
> > Thanks,

>
> > Andrew

>
> > ----- Original Message -----
> > From: "Neil Truby" <neil.tr...@ardenta.com>

>
> > Newsgroups: comp.databases.informix
> > To: <informix-l...@iiug.org>
> > Sent: Tuesday, May 13, 2008 5:06 PM
> > Subject: Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -
> > RedHatEL5

>
> > > "Andrew Ford" <af...@networkip.net> wrote in message
> > >news:mailman.1093.1210707144.20610.informix-list@iiug.org...
> > >> I'm running into some very strange high checkpoints and really poor
> > >> performance towards the end of an index build. Can anyone shed some
> > >> light on what could be going on?

>
> > >> Things start off really great. Scan threads, xchng threads and psortproc
> > >> threads hum along just fine writing out my temporary files to
> > >> PSORT_DBTEMP. But when it comes time to actually put my new index pages
> > >> on disk things start getting weird. I start experiencing very long
> > >> checkpoints (700 seconds followed by 350 seconds). I realize that these
> > >> are non blocking checkpoints and I shouldn't be concerned but read/write
> > >> performance goes right into the terlet. Before the checkpoints I'm
> > >> reading about 30K pages a second and writing 10K pages a second, during
> > >> and after the checkpoints I'm reading about 2K pages a second and writing
> > >> 200 pages a second.

>
> > >> So the really complicated stuff that I would expect to take the longest
> > >> time (scanning the data pages and doing all of the sorting) completes in
> > >> about 10 minutes while the easier stuff that I would expect to go really
> > >> quickly (writing the index pages to disk) is taking 40 minutes.

>
> > >> IDS 11.10.UC2W2 - Red Hat EL5

>
> > >> 115 million rows - Unique index key size = 40 bytes

>
> > >> 524288 2K buffers, 127 LRUs, 127 Cleaners

>
> > >> Getting the same results with RTO_SERVER_RESTART enabled and disabled
> > >> with CKPTINVL set as low as 60 seconds and as high as 10 minutes.

>
> > >> Getting the same results with lru_min and lru_max set as low as 5/10 and
> > >> as high as 70/80.

>
> > >> Andrew

>
> > > I've no particualr experience on this version, but just one thing I've
> > > seen before in others: is the dbspace to which the indexes being written
> > > very fragmented: ie could IDS be spending time scanning for free space?

>
> > > _______________________________________________
> > > Informix-list mailing list
> > > Informix-l...@iiug.org
> > >http://www.iiug.org/mailman/listinfo/informix-list

>
> What does onstat -g ckp show for the long checkpoints? I supposed
> secondly if you are using interval checkpoints then people might need
> to get used to seeing long checkpoint durations since I believe the
> checkpoint duration includes the non-blocked time when the buffer pool
> is being written. How are you measuring the reads and writes, and
> where are those reads and writes occuring, on the disk containing the
> file system where you are writing with PSORT_DBTEMP or to the disks
> where your chunks are? If you think it's contention between LRU
> flushers and the checkpoint, well, I wouldn't think that would happen
> since the LRU cleaners would actually stop cleaning LRU's when the
> checkpoint came up, and then the LRU cleaners are assigned chunks to
> clean during the checkpoint (well that's the way we used to do it, I
> haven't looked at 11 to see if there was any significant changes to
> this code). I'm wondering if the problem is you have all this index
> going to 1 chunk of 1 dbspace. Where in the past with chunk limits of
> 2gig you would have multiple page cleaners working in parallel to
> clean multiple chunks, but if your system is now just 1 chunk, then I
> think the code would still only assign 1 LRU cleaner to clean all the
> pages in that chunk, so it's now you aren't getting nearly the amount
> of parallel processing as before. During the checkpoint if you look
> at onstat -g act how many flush_sub threads are running, that's the
> LRU cleaners? Also in the onstat -F how many page cleaners do you see
> assigned to doing chunk cleaning? Again if I remember off the top of
> my head that would be a "state" of "C".
>
> Jacques


Oh, I would also try and get several onstat -g stk all's while the
checkpoint is occuring and look for the flush_sub that's assigned to
clean the chunk with your index in it and see what the stack for that
flush_sub is, is it waiting for a particular resource, is it waiting
for i/o, etc...

Jacques
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 05-16-2008, 01:44 PM
Andrew Ford
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5

Thanks for the reply Jacques,

> What does onstat -g ckp show for the long checkpoints?


Critical
Sections Physical Log Logical Log
Clock Total Flush Block #
Ckpt Wait Long # Dirty Dskflu Total Avg Total Avg
Interval Time Trigger LSN Time Time Time Waits
Time Time Time Buffers /Sec Pages /Sec Pages /Sec
174 08:49:07 CKPTINTVL 40:0xa48e370 28.5 27.1 0.0 5
1.3 1.2 1.3 29967 1105 46 0 1 0
175 09:01:51 CKPTINTVL 40:0xa48f428 704.9 703.7 0.0 5
0.9 1.1 1.7 35210 50 28 0 30 0
176 09:08:27 CKPTINTVL 40:0xa4b2658 366.2 366.1 0.0 0
0.0 0.0 0.0 37169 101 86 0 35 0


> I supposed secondly if you are using interval checkpoints then
> people might need to get used to seeing long checkpoint durations


I agree completely and I wouldn't be worried if performance didn't tank
during these long checkpoints.


> How are you measuring the reads and writes, and where are those
> reads and writes occuring, on the disk containing the file system
> where you are writing with PSORT_DBTEMP or to the disks where your chunks
> are?


I'm just eyeballing the reads/writes per second via an onstat -r 1 -D.

The PSORT_DBTEMP filesystem is on a different physical disk than the
database chunks. Also, the long checkpoints and poor performance continue
after the SORT files in PSORT_DBTEMP are gone.

> If you think it's contention between LRU flushers and the checkpoint,
> well, I wouldn't think that would happen since the LRU cleaners would
> actually stop cleaning LRU's when the checkpoint came up...


IBM Informix Dynamic Server Version 11.10.UC2W2 -- On-Line (CKPT INP) --
Up 00:07:53 -- 1704708 Kbytes
Blocked:CKPT


Fg Writes LRU Writes Chunk Writes
0 171931 4700

address flusher state data # LRU Chunk Wakeups Idle Tim
8c58c5dc 0 C 4 16 194 641 442.592
8c58cba0 1 I 0 13 1 454 444.043
8c58d164 2 L e9 15 0 452 438.621
8c58d728 3 L 1b 14 0 451 439.441
8c58dcec 4 I 0 17 0 454 439.822
8c58e2b0 5 L 1 22 0 460 442.573
8c58e874 6 L cb 18 0 454 438.354

....

8c5b8544 122 L 15 13 0 453 442.079
8c5b8b08 123 L d1 13 0 455 444.793
8c5b90cc 124 L b 10 0 449 440.193
8c5b9690 125 L 4f 14 0 450 440.198
8c5b9c54 126 L e5 12 0 446 437.391
states: Exit Idle Chunk Lru


Must be new to 11.10 and non blocking checkpoints. Cleaners in state C and
state L.
I thought this might be part of the problem but now, I don't know. More on
this later.


> During the checkpoint if you look at onstat -g act how many flush_sub
> threads are running, that's the LRU cleaners?


flush_sub threads don't appear (they're not idle for sure from the onstat -F
output above)

kaio threads dominate the active threads followed by the main_loop() thread
and then the index building threads (xchg, psortproc)

> Also in the onstat -F how many page cleaners do you see assigned to doing
> chunk cleaning?


Just the 1 for the 1 index chunk with dirty pages (index building is the
only activity on the machine at the time)

> I would also try and get several onstat -g stk all's while the
> checkpoint is occuring


>From what I can tell, they all look like this:


Stack for thread: 30 flush_sub(1)
base: 0x8d3eb000
len: 36864
pc: 0x088a58bb
tos: 0x8d3f3a60
state: sleeping
vp: 1

0x088a58bb (oninit)yield_processor_mvp(0x8d3cba14, 0x8d3f3ba0, 0x0,
0xf63510, 0x310025, 0x1)
0x088a6a1f (oninit)mt_yield(0xffffffff, 0x8cc779d8, 0x0, 0x0, 0x8ddf1b28,
0x8dd8dd58)
0x088b5f22 (oninit)mt_aio_wait(0x8cb6f018, 0x8cb6f018, 0x8cb6f018,
0x88a6a1f, 0x8c2ba098, 0x0)
0x088b77fe (oninit)mt_aio_start(0x8cb6f018, 0x1, 0x9, 0x616ccc, 0x4, 0x1)
0x08680372 (oninit)aclean_chunk(0x4, 0x8c3d7080, 0x8d3f3eac, 0x0, 0x0, 0x0)
0x08680ae2 (oninit)flush_sub(0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
0x088ade7f (oninit)startup (0x1b, 0x8d3f6fa8, 0x8d3eafa8, 0x2fa8,
0x1a7e4ef9, 0x0)
0x00000000 (*nosymtab*)0x(nil)


I'm running the index build again to see what I can see and I see this:

Checkpoint in progress - 1 C cleaner
No LRU cleaning activity (LRU dirty pages < lru_min_dirty so no need)
single sqlexec thread reading my new index pages (automatic update
statistics?)
very poor write performance and not so good read performance to the same
chunk (sqlexec thread reading, checkpoint writing dirty pages)

It looks like the badness starts when the sqlexec thread starts reading the
index pages while a checkpoint is in progress and goes away when the
checkpoint finishes (no more dirty buffers to clean.) Could this just be a
result of contention at the disk level from trying to read and write to the
same device? Seems a little too bad to be just that.

I mean, I guess this isn't the end of the world. I would mostly like to
understand what is going on and if this is expected behaviour.

Thanks for the help,

Andrew

----- Original Message -----
From: <jprenaut@yahoo.com>
Newsgroups: comp.databases.informix
To: <informix-list@iiug.org>
Sent: Tuesday, May 13, 2008 7:02 PM
Subject: Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -
RedHatEL5


> On May 13, 5:31 pm, "Andrew Ford" <af...@networkip.net> wrote:
>> From: "Neil Truby" <neil.tr...@ardenta.com>
>>
>> > is the dbspace to which the indexes being written very fragmented: ie
>> > could IDS be spending time scanning for free space?

>>
>> I can guarantee the dbspace is not fragmented. The dbspace is empty
>> before
>> the index is created, the table extent sizes are 1GB, so my index extents
>> should be large enough to make this not a allocating many small index
>> extents problem either.
>>
>> I looks to me, and I could be way way off here, that the non blocking
>> checkpoint is attempting to flush the dirty pages to disk and the LRU
>> cleaners are still trying to do their business and they are involved in a
>> battle royale for resources because we're flooding the buffer cache with
>> these 3 million new index pages we just created and nobody wins.
>>
>> I am using large (to me anyhow) chunks, 100 GB for the data dbspace and
>> 100
>> GB for the index dbspace and I do have a decent number of buffers (over
>> 500,000 2K buffers = 1 GB) Maybe something is not optimized and IDS is
>> spinning its wheels trying to sort something before it dumps it to disk.
>>
>> I also see the main_loop() thread relatively often in the onstat -g act
>> output during these silly long checkpoints. This is the checkpoint
>> trying
>> to do its thing, correct?
>>
>> The search continues...
>>
>> Thanks,
>>
>> Andrew
>>
>> ----- Original Message -----
>> From: "Neil Truby" <neil.tr...@ardenta.com>
>>
>> Newsgroups: comp.databases.informix
>> To: <informix-l...@iiug.org>
>> Sent: Tuesday, May 13, 2008 5:06 PM
>> Subject: Re: Create Index Mysteriously Slow at the End - IDS
>> 11.10.UC2W2 -
>> RedHatEL5
>>
>> > "Andrew Ford" <af...@networkip.net> wrote in message
>> >news:mailman.1093.1210707144.20610.informix-list@iiug.org...
>> >> I'm running into some very strange high checkpoints and really poor
>> >> performance towards the end of an index build. Can anyone shed some
>> >> light on what could be going on?

>>
>> >> Things start off really great. Scan threads, xchng threads and
>> >> psortproc
>> >> threads hum along just fine writing out my temporary files to
>> >> PSORT_DBTEMP. But when it comes time to actually put my new index
>> >> pages
>> >> on disk things start getting weird. I start experiencing very long
>> >> checkpoints (700 seconds followed by 350 seconds). I realize that
>> >> these
>> >> are non blocking checkpoints and I shouldn't be concerned but
>> >> read/write
>> >> performance goes right into the terlet. Before the checkpoints I'm
>> >> reading about 30K pages a second and writing 10K pages a second,
>> >> during
>> >> and after the checkpoints I'm reading about 2K pages a second and
>> >> writing
>> >> 200 pages a second.

>>
>> >> So the really complicated stuff that I would expect to take the
>> >> longest
>> >> time (scanning the data pages and doing all of the sorting) completes
>> >> in
>> >> about 10 minutes while the easier stuff that I would expect to go
>> >> really
>> >> quickly (writing the index pages to disk) is taking 40 minutes.

>>
>> >> IDS 11.10.UC2W2 - Red Hat EL5

>>
>> >> 115 million rows - Unique index key size = 40 bytes

>>
>> >> 524288 2K buffers, 127 LRUs, 127 Cleaners

>>
>> >> Getting the same results with RTO_SERVER_RESTART enabled and disabled
>> >> with CKPTINVL set as low as 60 seconds and as high as 10 minutes.

>>
>> >> Getting the same results with lru_min and lru_max set as low as 5/10
>> >> and
>> >> as high as 70/80.

>>
>> >> Andrew

>>
>> > I've no particualr experience on this version, but just one thing I've
>> > seen before in others: is the dbspace to which the indexes being
>> > written
>> > very fragmented: ie could IDS be spending time scanning for free space?

>>
>> > _______________________________________________
>> > Informix-list mailing list
>> > Informix-l...@iiug.org
>> >http://www.iiug.org/mailman/listinfo/informix-list

>
> What does onstat -g ckp show for the long checkpoints? I supposed
> secondly if you are using interval checkpoints then people might need
> to get used to seeing long checkpoint durations since I believe the
> checkpoint duration includes the non-blocked time when the buffer pool
> is being written. How are you measuring the reads and writes, and
> where are those reads and writes occuring, on the disk containing the
> file system where you are writing with PSORT_DBTEMP or to the disks
> where your chunks are? If you think it's contention between LRU
> flushers and the checkpoint, well, I wouldn't think that would happen
> since the LRU cleaners would actually stop cleaning LRU's when the
> checkpoint came up, and then the LRU cleaners are assigned chunks to
> clean during the checkpoint (well that's the way we used to do it, I
> haven't looked at 11 to see if there was any significant changes to
> this code). I'm wondering if the problem is you have all this index
> going to 1 chunk of 1 dbspace. Where in the past with chunk limits of
> 2gig you would have multiple page cleaners working in parallel to
> clean multiple chunks, but if your system is now just 1 chunk, then I
> think the code would still only assign 1 LRU cleaner to clean all the
> pages in that chunk, so it's now you aren't getting nearly the amount
> of parallel processing as before. During the checkpoint if you look
> at onstat -g act how many flush_sub threads are running, that's the
> LRU cleaners? Also in the onstat -F how many page cleaners do you see
> assigned to doing chunk cleaning? Again if I remember off the top of
> my head that would be a "state" of "C".
>
> Jacques
> _______________________________________________
> Informix-list mailing list
> Informix-list@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 05-16-2008, 01:44 PM
jprenaut@yahoo.com
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5

On May 13, 9:22 pm, "Andrew Ford" <af...@networkip.net> wrote:
> Thanks for the reply Jacques,
>
> > What does onstat -g ckp show for the long checkpoints?

>
> Critical
> Sections Physical Log Logical Log
> Clock Total Flush Block #
> Ckpt Wait Long # Dirty Dskflu Total Avg Total Avg
> Interval Time Trigger LSN Time Time Time Waits
> Time Time Time Buffers /Sec Pages /Sec Pages /Sec
> 174 08:49:07 CKPTINTVL 40:0xa48e370 28.5 27.1 0.0 5
> 1.3 1.2 1.3 29967 1105 46 0 1 0
> 175 09:01:51 CKPTINTVL 40:0xa48f428 704.9 703.7 0.0 5
> 0.9 1.1 1.7 35210 50 28 0 30 0
> 176 09:08:27 CKPTINTVL 40:0xa4b2658 366.2 366.1 0.0 0
> 0.0 0.0 0.0 37169 101 86 0 35 0
>
> > I supposed secondly if you are using interval checkpoints then
> > people might need to get used to seeing long checkpoint durations

>
> I agree completely and I wouldn't be worried if performance didn't tank
> during these long checkpoints.
>
> > How are you measuring the reads and writes, and where are those
> > reads and writes occuring, on the disk containing the file system
> > where you are writing with PSORT_DBTEMP or to the disks where your chunks
> > are?

>
> I'm just eyeballing the reads/writes per second via an onstat -r 1 -D.
>
> The PSORT_DBTEMP filesystem is on a different physical disk than the
> database chunks. Also, the long checkpoints and poor performance continue
> after the SORT files in PSORT_DBTEMP are gone.
>
> > If you think it's contention between LRU flushers and the checkpoint,
> > well, I wouldn't think that would happen since the LRU cleaners would
> > actually stop cleaning LRU's when the checkpoint came up...

>
> IBM Informix Dynamic Server Version 11.10.UC2W2 -- On-Line (CKPT INP) --
> Up 00:07:53 -- 1704708 Kbytes
> Blocked:CKPT
>
> Fg Writes LRU Writes Chunk Writes
> 0 171931 4700
>
> address flusher state data # LRU Chunk Wakeups Idle Tim
> 8c58c5dc 0 C 4 16 194 641 442.592
> 8c58cba0 1 I 0 13 1 454 444.043
> 8c58d164 2 L e9 15 0 452 438.621
> 8c58d728 3 L 1b 14 0 451 439.441
> 8c58dcec 4 I 0 17 0 454 439.822
> 8c58e2b0 5 L 1 22 0 460 442.573
> 8c58e874 6 L cb 18 0 454 438.354
>
> ...
>
> 8c5b8544 122 L 15 13 0 453 442.079
> 8c5b8b08 123 L d1 13 0 455 444.793
> 8c5b90cc 124 L b 10 0 449 440.193
> 8c5b9690 125 L 4f 14 0 450 440.198
> 8c5b9c54 126 L e5 12 0 446 437.391
> states: Exit Idle Chunk Lru
>
> Must be new to 11.10 and non blocking checkpoints. Cleaners in state C and
> state L.
> I thought this might be part of the problem but now, I don't know. More on
> this later.
>
> > During the checkpoint if you look at onstat -g act how many flush_sub
> > threads are running, that's the LRU cleaners?

>
> flush_sub threads don't appear (they're not idle for sure from the onstat -F
> output above)
>
> kaio threads dominate the active threads followed by the main_loop() thread
> and then the index building threads (xchg, psortproc)
>
> > Also in the onstat -F how many page cleaners do you see assigned to doing
> > chunk cleaning?

>
> Just the 1 for the 1 index chunk with dirty pages (index building is the
> only activity on the machine at the time)
>
> > I would also try and get several onstat -g stk all's while the
> > checkpoint is occuring
> >From what I can tell, they all look like this:

>
> Stack for thread: 30 flush_sub(1)
> base: 0x8d3eb000
> len: 36864
> pc: 0x088a58bb
> tos: 0x8d3f3a60
> state: sleeping
> vp: 1
>
> 0x088a58bb (oninit)yield_processor_mvp(0x8d3cba14, 0x8d3f3ba0, 0x0,
> 0xf63510, 0x310025, 0x1)
> 0x088a6a1f (oninit)mt_yield(0xffffffff, 0x8cc779d8, 0x0, 0x0, 0x8ddf1b28,
> 0x8dd8dd58)
> 0x088b5f22 (oninit)mt_aio_wait(0x8cb6f018, 0x8cb6f018, 0x8cb6f018,
> 0x88a6a1f, 0x8c2ba098, 0x0)
> 0x088b77fe (oninit)mt_aio_start(0x8cb6f018, 0x1, 0x9, 0x616ccc, 0x4, 0x1)
> 0x08680372 (oninit)aclean_chunk(0x4, 0x8c3d7080, 0x8d3f3eac, 0x0, 0x0, 0x0)
> 0x08680ae2 (oninit)flush_sub(0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
> 0x088ade7f (oninit)startup (0x1b, 0x8d3f6fa8, 0x8d3eafa8, 0x2fa8,
> 0x1a7e4ef9, 0x0)
> 0x00000000 (*nosymtab*)0x(nil)


If you are always catching the flush_sub threads with this stack, then
they have submitted an I/O request to the KAIO threads (which is what
you say you see dominate the -g act output). So it appears there
might be some sort of physical disk I/O problem or contention? If you
look at sar output or something similiar which might report some sort
of average service time, can you get something to compare sometime
when things seem ok, compared to when this slow down occurs? I
suppose there could be a problem with the kaio threads themselves, but
that would be harder to determine. Might try and use pstack or
procstack maybe on linux to get stacks off the oninit processes that
have the kaio threads on them to see if you can catch the kaio threads
executing and get stacks for what they are doing.


>
> I'm running the index build again to see what I can see and I see this:
>
> Checkpoint in progress - 1 C cleaner
> No LRU cleaning activity (LRU dirty pages < lru_min_dirty so no need)
> single sqlexec thread reading my new index pages (automatic update
> statistics?)
> very poor write performance and not so good read performance to the same
> chunk (sqlexec thread reading, checkpoint writing dirty pages)
>
> It looks like the badness starts when the sqlexec thread starts reading the
> index pages while a checkpoint is in progress and goes away when the
> checkpoint finishes (no more dirty buffers to clean.) Could this just be a
> result of contention at the disk level from trying to read and write to the
> same device? Seems a little too bad to be just that.
>
> I mean, I guess this isn't the end of the world. I would mostly like to
> understand what is going on and if this is expected behaviour.
>
> Thanks for the help,
>
> Andrew
>
> ----- Original Message -----
> From: <jpren...@yahoo.com>
>
> Newsgroups: comp.databases.informix
> To: <informix-l...@iiug.org>
> Sent: Tuesday, May 13, 2008 7:02 PM
> Subject: Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -
> RedHatEL5
>
> > On May 13, 5:31 pm, "Andrew Ford" <af...@networkip.net> wrote:
> >> From: "Neil Truby" <neil.tr...@ardenta.com>

>
> >> > is the dbspace to which the indexes being written very fragmented: ie
> >> > could IDS be spending time scanning for free space?

>
> >> I can guarantee the dbspace is not fragmented. The dbspace is empty
> >> before
> >> the index is created, the table extent sizes are 1GB, so my index extents
> >> should be large enough to make this not a allocating many small index
> >> extents problem either.

>
> >> I looks to me, and I could be way way off here, that the non blocking
> >> checkpoint is attempting to flush the dirty pages to disk and the LRU
> >> cleaners are still trying to do their business and they are involved ina
> >> battle royale for resources because we're flooding the buffer cache with
> >> these 3 million new index pages we just created and nobody wins.

>
> >> I am using large (to me anyhow) chunks, 100 GB for the data dbspace and
> >> 100
> >> GB for the index dbspace and I do have a decent number of buffers (over
> >> 500,000 2K buffers = 1 GB) Maybe something is not optimized and IDS is
> >> spinning its wheels trying to sort something before it dumps it to disk..

>
> >> I also see the main_loop() thread relatively often in the onstat -g act
> >> output during these silly long checkpoints. This is the checkpoint
> >> trying
> >> to do its thing, correct?

>
> >> The search continues...

>
> >> Thanks,

>
> >> Andrew

>
> >> ----- Original Message -----
> >> From: "Neil Truby" <neil.tr...@ardenta.com>

>
> >> Newsgroups: comp.databases.informix
> >> To: <informix-l...@iiug.org>
> >> Sent: Tuesday, May 13, 2008 5:06 PM
> >> Subject: Re: Create Index Mysteriously Slow at the End - IDS
> >> 11.10.UC2W2 -
> >> RedHatEL5

>
> >> > "Andrew Ford" <af...@networkip.net> wrote in message
> >> >news:mailman.1093.1210707144.20610.informix-list@iiug.org...
> >> >> I'm running into some very strange high checkpoints and really poor
> >> >> performance towards the end of an index build. Can anyone shed some
> >> >> light on what could be going on?

>
> >> >> Things start off really great. Scan threads, xchng threads and
> >> >> psortproc
> >> >> threads hum along just fine writing out my temporary files to
> >> >> PSORT_DBTEMP. But when it comes time to actually put my new index
> >> >> pages
> >> >> on disk things start getting weird. I start experiencing very long
> >> >> checkpoints (700 seconds followed by 350 seconds). I realize that
> >> >> these
> >> >> are non blocking checkpoints and I shouldn't be concerned but
> >> >> read/write
> >> >> performance goes right into the terlet. Before the checkpoints I'm
> >> >> reading about 30K pages a second and writing 10K pages a second,
> >> >> during
> >> >> and after the checkpoints I'm reading about 2K pages a second and
> >> >> writing
> >> >> 200 pages a second.

>
> >> >> So the really complicated stuff that I would expect to take the
> >> >> longest
> >> >> time (scanning the data pages and doing all of the sorting) completes
> >> >> in
> >> >> about 10 minutes while the easier stuff that I would expect to go
> >> >> really
> >> >> quickly (writing the index pages to disk) is taking 40 minutes.

>
> >> >> IDS 11.10.UC2W2 - Red Hat EL5

>
> >> >> 115 million rows - Unique index key size = 40 bytes

>
> >> >> 524288 2K buffers, 127 LRUs, 127 Cleaners

>
> >> >> Getting the same results with RTO_SERVER_RESTART enabled and disabled
> >> >> with CKPTINVL set as low as 60 seconds and as high as 10 minutes.

>
> >> >> Getting the same results with lru_min and lru_max set as low as 5/10
> >> >> and
> >> >> as high as 70/80.

>
> >> >> Andrew

>
> >> > I've no particualr experience on this version, but just one thing I've
> >> > seen before in others: is the dbspace to which the indexes being
> >> > written
> >> > very fragmented: ie could IDS be spending time scanning for free space?

>
> >> > _______________________________________________
> >> > Informix-list mailing list
> >> > Informix-l...@iiug.org
> >> >http://www.iiug.org/mailman/listinfo/informix-list

>
> > What does onstat -g ckp show for the long checkpoints? I supposed
> > secondly if you are using interval checkpoints then people might need
> > to get used to seeing long checkpoint durations since I believe the
> > checkpoint duration includes

>
> ...
>
> read more »


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 05-16-2008, 01:44 PM
Andrew Ford
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5


From: <jprenaut@yahoo.com>

> If you are always catching the flush_sub threads with this stack, then
> they have submitted an I/O request to the KAIO threads


I thought it would be interesting if I disabled KAIO and went with AIOVPs.

Totally different behaviour with the AIOVPs.

Scanning the data goes well enough, 20K pages per second. When that is done
I see simultaneous reads from the index dbspace and writes to the physical
log at a rate of 200 to 1000 pages per second each with almost exactly 1
read for every 1 write, the physical log and index dbspace are on different
physical disks. I don't see this physical logging with KAIO enabled. LRU
flushing kicks in from time to time and some pages are written to the index
dbspace.

This goes on for a while until I force a checkpoint. At this point the
index reads and physical log writes stop, the LRU queues become very dirty
and LRU flushing starts writing pages to the index dbspace at a rate of 2000
to 5000 pages per second.

When index pages stop being written to the index dbspace we start reading
from the index dbspace again, very slowly at times. Sometimes as slow as 20
pages per second (no additional I/O going on.) Something strange also
happens, LRU flushing stops completely even though there are multiple LRU
queues that are 35% dirty (lru_min is 10, lru_max is 20.)

The index dbspace reads continue until all of the index pages are read and
the create index statement completes.

Does using AIOVPs cause different behaviour when creating indexes than KAIO?
It looks like it does, but should it?

Everything else with the engine I have tested works great and this isn't
really a show stopper for us since we don't rebuild these large indexes
unless we're migrating to new hardware, but it is strange.

Andrew


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 05-16-2008, 01:44 PM
TBP
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5

Andrew Ford wrote:
> From: <jprenaut@yahoo.com>
>
>>If you are always catching the flush_sub threads with this stack, then
>>they have submitted an I/O request to the KAIO threads

>
>
> I thought it would be interesting if I disabled KAIO and went with AIOVPs.
>
> Totally different behaviour with the AIOVPs.
>
> Scanning the data goes well enough, 20K pages per second. When that is done
> I see simultaneous reads from the index dbspace and writes to the physical
> log at a rate of 200 to 1000 pages per second each with almost exactly 1
> read for every 1 write, the physical log and index dbspace are on different
> physical disks. I don't see this physical logging with KAIO enabled. LRU
> flushing kicks in from time to time and some pages are written to the index
> dbspace.
>
> This goes on for a while until I force a checkpoint. At this point the
> index reads and physical log writes stop, the LRU queues become very dirty
> and LRU flushing starts writing pages to the index dbspace at a rate of 2000
> to 5000 pages per second.
>
> When index pages stop being written to the index dbspace we start reading
> from the index dbspace again, very slowly at times. Sometimes as slow as 20
> pages per second (no additional I/O going on.) Something strange also
> happens, LRU flushing stops completely even though there are multiple LRU
> queues that are 35% dirty (lru_min is 10, lru_max is 20.)
>
> The index dbspace reads continue until all of the index pages are read and
> the create index statement completes.
>
> Does using AIOVPs cause different behaviour when creating indexes than KAIO?
> It looks like it does, but should it?
>
> Everything else with the engine I have tested works great and this isn't
> really a show stopper for us since we don't rebuild these large indexes
> unless we're migrating to new hardware, but it is strange.
>
> Andrew
>
>

How many CPU VPs?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 05-16-2008, 01:44 PM
jprenaut@yahoo.com
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5

On May 14, 1:21 am, "Andrew Ford" <af...@networkip.net> wrote:
> From: <jpren...@yahoo.com>
>
> > If you are always catching the flush_sub threads with this stack, then
> > they have submitted an I/O request to the KAIO threads

>
> I thought it would be interesting if I disabled KAIO and went with AIOVPs.
>
> Totally different behaviour with the AIOVPs.
>
> Scanning the data goes well enough, 20K pages per second. When that is done
> I see simultaneous reads from the index dbspace and writes to the physical
> log at a rate of 200 to 1000 pages per second each with almost exactly 1
> read for every 1 write, the physical log and index dbspace are on different
> physical disks. I don't see this physical logging with KAIO enabled. LRU
> flushing kicks in from time to time and some pages are written to the index
> dbspace.


Have you dropped the index to rebuild it, or just building a new
similiar index? If you dropped the index, the new physical logging
could be because the engine is now seeing that the pages it's using
for the new index used to be valid previously existing pages so it has
to now physically log them. When you first create a new chunk for a
dbspace the engine doesn't go out and initialize all the page headers,
so then the 1st time one of those pages is used, we bypass physical
logging since the page doesn't make any sense, so we know that the
page doens't need to be logged because there wasn't anything of value
there before. So that's a possibility on the new physical log
activity. Also, I think there is some performance code in the area
that we don't need to physicallly log a dbspace until a table/
partition is created in the dbspace or maybe a table/partition was
dropped. I can't recall that off the top of my head.

>
> This goes on for a while until I force a checkpoint. At this point the
> index reads and physical log writes stop, the LRU queues become very dirty
> and LRU flushing starts writing pages to the index dbspace at a rate of 2000
> to 5000 pages per second.
>
> When index pages stop being written to the index dbspace we start reading
> from the index dbspace again, very slowly at times. Sometimes as slow as 20
> pages per second (no additional I/O going on.) Something strange also
> happens, LRU flushing stops completely even though there are multiple LRU
> queues that are 35% dirty (lru_min is 10, lru_max is 20.)
>
> The index dbspace reads continue until all of the index pages are read and
> the create index statement completes.
>
> Does using AIOVPs cause different behaviour when creating indexes than KAIO?
> It looks like it does, but should it?
>


It shouldn't. The only difference between AIO and KAIO should be
which threads are doing the work, and the specifics on how each of
those threads does it's work.


> Everything else with the engine I have tested works great and this isn't
> really a show stopper for us since we don't rebuild these large indexes
> unless we're migrating to new hardware, but it is strange.
>
> Andrew


I agree something is off here. It might be worth opening a PMR with
support for this as it sounds like there could be a problem.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump