View Single Post

   
  #4 (permalink)  
Old 05-16-2008, 02:44 PM
jprenaut@yahoo.com
 
Posts: n/a
Default Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -RedHatEL5

On May 13, 5:31 pm, "Andrew Ford" <af...@networkip.net> wrote:
> From: "Neil Truby" <neil.tr...@ardenta.com>
>
> > is the dbspace to which the indexes being written very fragmented: ie
> > could IDS be spending time scanning for free space?

>
> I can guarantee the dbspace is not fragmented. The dbspace is empty before
> the index is created, the table extent sizes are 1GB, so my index extents
> should be large enough to make this not a allocating many small index
> extents problem either.
>
> I looks to me, and I could be way way off here, that the non blocking
> checkpoint is attempting to flush the dirty pages to disk and the LRU
> cleaners are still trying to do their business and they are involved in a
> battle royale for resources because we're flooding the buffer cache with
> these 3 million new index pages we just created and nobody wins.
>
> I am using large (to me anyhow) chunks, 100 GB for the data dbspace and 100
> GB for the index dbspace and I do have a decent number of buffers (over
> 500,000 2K buffers = 1 GB) Maybe something is not optimized and IDS is
> spinning its wheels trying to sort something before it dumps it to disk.
>
> I also see the main_loop() thread relatively often in the onstat -g act
> output during these silly long checkpoints. This is the checkpoint trying
> to do its thing, correct?
>
> The search continues...
>
> Thanks,
>
> Andrew
>
> ----- Original Message -----
> From: "Neil Truby" <neil.tr...@ardenta.com>
>
> Newsgroups: comp.databases.informix
> To: <informix-l...@iiug.org>
> Sent: Tuesday, May 13, 2008 5:06 PM
> Subject: Re: Create Index Mysteriously Slow at the End - IDS 11.10.UC2W2 -
> RedHatEL5
>
> > "Andrew Ford" <af...@networkip.net> wrote in message
> >news:mailman.1093.1210707144.20610.informix-list@iiug.org...
> >> I'm running into some very strange high checkpoints and really poor
> >> performance towards the end of an index build. Can anyone shed some
> >> light on what could be going on?

>
> >> Things start off really great. Scan threads, xchng threads and psortproc
> >> threads hum along just fine writing out my temporary files to
> >> PSORT_DBTEMP. But when it comes time to actually put my new index pages
> >> on disk things start getting weird. I start experiencing very long
> >> checkpoints (700 seconds followed by 350 seconds). I realize that these
> >> are non blocking checkpoints and I shouldn't be concerned but read/write
> >> performance goes right into the terlet. Before the checkpoints I'm
> >> reading about 30K pages a second and writing 10K pages a second, during
> >> and after the checkpoints I'm reading about 2K pages a second and writing
> >> 200 pages a second.

>
> >> So the really complicated stuff that I would expect to take the longest
> >> time (scanning the data pages and doing all of the sorting) completes in
> >> about 10 minutes while the easier stuff that I would expect to go really
> >> quickly (writing the index pages to disk) is taking 40 minutes.

>
> >> IDS 11.10.UC2W2 - Red Hat EL5

>
> >> 115 million rows - Unique index key size = 40 bytes

>
> >> 524288 2K buffers, 127 LRUs, 127 Cleaners

>
> >> Getting the same results with RTO_SERVER_RESTART enabled and disabled
> >> with CKPTINVL set as low as 60 seconds and as high as 10 minutes.

>
> >> Getting the same results with lru_min and lru_max set as low as 5/10 and
> >> as high as 70/80.

>
> >> Andrew

>
> > I've no particualr experience on this version, but just one thing I've
> > seen before in others: is the dbspace to which the indexes being written
> > very fragmented: ie could IDS be spending time scanning for free space?

>
> > _______________________________________________
> > Informix-list mailing list
> > Informix-l...@iiug.org
> >http://www.iiug.org/mailman/listinfo/informix-list


What does onstat -g ckp show for the long checkpoints? I supposed
secondly if you are using interval checkpoints then people might need
to get used to seeing long checkpoint durations since I believe the
checkpoint duration includes the non-blocked time when the buffer pool
is being written. How are you measuring the reads and writes, and
where are those reads and writes occuring, on the disk containing the
file system where you are writing with PSORT_DBTEMP or to the disks
where your chunks are? If you think it's contention between LRU
flushers and the checkpoint, well, I wouldn't think that would happen
since the LRU cleaners would actually stop cleaning LRU's when the
checkpoint came up, and then the LRU cleaners are assigned chunks to
clean during the checkpoint (well that's the way we used to do it, I
haven't looked at 11 to see if there was any significant changes to
this code). I'm wondering if the problem is you have all this index
going to 1 chunk of 1 dbspace. Where in the past with chunk limits of
2gig you would have multiple page cleaners working in parallel to
clean multiple chunks, but if your system is now just 1 chunk, then I
think the code would still only assign 1 LRU cleaner to clean all the
pages in that chunk, so it's now you aren't getting nearly the amount
of parallel processing as before. During the checkpoint if you look
at onstat -g act how many flush_sub threads are running, that's the
LRU cleaners? Also in the onstat -F how many page cleaners do you see
assigned to doing chunk cleaning? Again if I remember off the top of
my head that would be a "state" of "C".

Jacques
Reply With Quote