Unix Technical Forum

10gR1 - Checkpoint not complete -

This is a discussion on 10gR1 - Checkpoint not complete - within the Oracle Database forums, part of the Database Server Software category; --> Hey guys, I thought I try to get some help in trying to get this problem resolved. I've struggled ...


Go Back   Unix Technical Forum > Database Server Software > Oracle Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-25-2008, 09:09 AM
p.santos000@gmail.com
 
Posts: n/a
Default 10gR1 - Checkpoint not complete -

Hey guys,
I thought I try to get some help in trying to get this problem
resolved.
I've struggled with the 'Checkpoint not complete' error message
before, and thought I had it
resolved, but I can't seem to make it go way.

I'm currently running 10.1.0.4 on Solaris 64bit.
From my analysis of this issue, I get the "Checkpoint not complete"
message nearly evertime
during a log switch. From looking at the checkpoint error timestamps
and the log switches,
99% of the time the CKPT process fails during the log swith.

I've always been reluctant to checkpoint too much, but maybe I need
to increase checkpoints,
sot that the entire process is faster during log switches ?????

Here is the data. (log siwtches are occurring around 20+ minutes)

V$LOG
=================

GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
|LASTED_MIN

----------|----------|----------|---|---------------|--------------|----------
1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
21
2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
26
3| 32200| 600|NO |CURRENT |10/27/06 10:11|
6
4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
27
5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
26
6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
40
7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
29
8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
10


ALERT_LOG (timestamps match log swith)
=================
--- These all happen during log switch...

Fri Oct 27 08:05:31 2006
Thread 1 cannot allocate new log, sequence 32195
Checkpoint not complete
Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log


Fri Oct 27 08:45:18 2006
Thread 1 cannot allocate new log, sequence 32196
Checkpoint not complete
Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log


Fri Oct 27 09:24:21 2006
Thread 1 cannot allocate new log, sequence 32198
Checkpoint not complete
Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log


Fri Oct 27 09:44:59 2006
Thread 1 cannot allocate new log, sequence 32199
Checkpoint not complete
Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log


Fri Oct 27 10:11:13 2006
Thread 1 cannot allocate new log, sequence 32200
Checkpoint not complete
Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log


KEY PARAMETERS
=====================

NAME |TYPE |VALUE
-------------------------------------------|----------------|--------
log_checkpoint_interval |integer |80000
log_checkpoint_timeout |integer |1800
fast_start_mttr_target |integer |0
log_buffer |integer |524288
disk_asynch_io |boolean |TRUE

** Haven't configured fast_start_mttr_target because log_checK*
parameters
were already set when I upgrade from 8.1.7 and so It was just one less
thing to
worry about configuring...

AWR report from 9-10 AM this morning
========================
Statistic |Total
----------------------------------------------------------- | ......
DBWR checkpoints | 4
background checkpoints completed | 3
background checkpoints started | 2

- the way I see it I'm probably not checkpointing enough .. which
causes the ckpt
process during log switches to take too long .. is this accurate ?

BTW - fixing application to generate less redo, has been explored and
we can't do any more on that front .... We generate anywhere between
200K - 1MB of redo per second ... averaging out around 600-700K per
second.

as always any feedback is helpful...

-peter

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-25-2008, 09:09 AM
Mark D Powell
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -



On Oct 27, 10:49 am, p.santos...@gmail.com wrote:
> Hey guys,
> I thought I try to get some help in trying to get this problem
> resolved.
> I've struggled with the 'Checkpoint not complete' error message
> before, and thought I had it
> resolved, but I can't seem to make it go way.
>
> I'm currently running 10.1.0.4 on Solaris 64bit.
> From my analysis of this issue, I get the "Checkpoint not complete"
> message nearly evertime
> during a log switch. From looking at the checkpoint error timestamps
> and the log switches,
> 99% of the time the CKPT process fails during the log swith.
>
> I've always been reluctant to checkpoint too much, but maybe I need
> to increase checkpoints,
> sot that the entire process is faster during log switches ?????
>
> Here is the data. (log siwtches are occurring around 20+ minutes)
>
> V$LOG
> =================
>
> GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> |LASTED_MIN
>
> ----------|----------|----------|---|---------------|--------------|-------*---
> 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> 21
> 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> 26
> 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> 6
> 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> 27
> 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> 26
> 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> 40
> 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> 29
> 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> 10
>
> ALERT_LOG (timestamps match log swith)
> =================
> --- These all happen during log switch...
>
> Fri Oct 27 08:05:31 2006
> Thread 1 cannot allocate new log, sequence 32195
> Checkpoint not complete
> Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
>
> Fri Oct 27 08:45:18 2006
> Thread 1 cannot allocate new log, sequence 32196
> Checkpoint not complete
> Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
>
> Fri Oct 27 09:24:21 2006
> Thread 1 cannot allocate new log, sequence 32198
> Checkpoint not complete
> Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
>
> Fri Oct 27 09:44:59 2006
> Thread 1 cannot allocate new log, sequence 32199
> Checkpoint not complete
> Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
>
> Fri Oct 27 10:11:13 2006
> Thread 1 cannot allocate new log, sequence 32200
> Checkpoint not complete
> Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
>
> KEY PARAMETERS
> =====================
>
> NAME |TYPE |VALUE
> -------------------------------------------|----------------|--------
> log_checkpoint_interval |integer |80000
> log_checkpoint_timeout |integer |1800
> fast_start_mttr_target |integer |0
> log_buffer |integer |524288
> disk_asynch_io |boolean |TRUE
>
> ** Haven't configured fast_start_mttr_target because log_checK*
> parameters
> were already set when I upgrade from 8.1.7 and so It was just one less
> thing to
> worry about configuring...
>
> AWR report from 9-10 AM this morning
> ========================
> Statistic |Total
> ----------------------------------------------------------- | ......
> DBWR checkpoints | 4
> background checkpoints completed | 3
> background checkpoints started | 2
>
> - the way I see it I'm probably not checkpointing enough .. which
> causes the ckpt
> process during log switches to take too long .. is this accurate ?
>
> BTW - fixing application to generate less redo, has been explored and
> we can't do any more on that front .... We generate anywhere between
> 200K - 1MB of redo per second ... averaging out around 600-700K per
> second.
>
> as always any feedback is helpful...
>
> -peter


Peter, if I am reading your post correctly your online redo logs are
600M. If the log block was meansured in 8k database blocks then you
have set the interval to about 625M but if the OS block size behind
this parameter is only 512 bytes then you have set it to around 40M. I
suggest setting the value up and see if it makes any difference.

>> LOG_CHECKPOINT_INTERVAL specifies the frequency of checkpoints in terms of the number of redo log file blocks that can exist between an incrementalcheckpoint and the last block written to the redo log. This number refers to physical operating system blocks, not database blocks. <<


HTH -- Mark D Powell --

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-25-2008, 09:09 AM
hpuxrac
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -

p.santos000@gmail.com wrote:
> Hey guys,
> I thought I try to get some help in trying to get this problem
> resolved.
> I've struggled with the 'Checkpoint not complete' error message
> before, and thought I had it
> resolved, but I can't seem to make it go way.
>
> I'm currently running 10.1.0.4 on Solaris 64bit.
> From my analysis of this issue, I get the "Checkpoint not complete"
> message nearly evertime
> during a log switch. From looking at the checkpoint error timestamps
> and the log switches,
> 99% of the time the CKPT process fails during the log swith.
>
> I've always been reluctant to checkpoint too much, but maybe I need
> to increase checkpoints,
> sot that the entire process is faster during log switches ?????
>
> Here is the data. (log siwtches are occurring around 20+ minutes)
>
> V$LOG
> =================
>
> GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> |LASTED_MIN
>
> ----------|----------|----------|---|---------------|--------------|----------
> 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> 21
> 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> 26
> 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> 6
> 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> 27
> 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> 26
> 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> 40
> 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> 29
> 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> 10
>
>
> ALERT_LOG (timestamps match log swith)
> =================
> --- These all happen during log switch...
>
> Fri Oct 27 08:05:31 2006
> Thread 1 cannot allocate new log, sequence 32195
> Checkpoint not complete
> Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
>
>
> Fri Oct 27 08:45:18 2006
> Thread 1 cannot allocate new log, sequence 32196
> Checkpoint not complete
> Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
>
>
> Fri Oct 27 09:24:21 2006
> Thread 1 cannot allocate new log, sequence 32198
> Checkpoint not complete
> Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
>
>
> Fri Oct 27 09:44:59 2006
> Thread 1 cannot allocate new log, sequence 32199
> Checkpoint not complete
> Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
>
>
> Fri Oct 27 10:11:13 2006
> Thread 1 cannot allocate new log, sequence 32200
> Checkpoint not complete
> Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
>
>
> KEY PARAMETERS
> =====================
>
> NAME |TYPE |VALUE
> -------------------------------------------|----------------|--------
> log_checkpoint_interval |integer |80000
> log_checkpoint_timeout |integer |1800
> fast_start_mttr_target |integer |0
> log_buffer |integer |524288
> disk_asynch_io |boolean |TRUE
>
> ** Haven't configured fast_start_mttr_target because log_checK*
> parameters
> were already set when I upgrade from 8.1.7 and so It was just one less
> thing to
> worry about configuring...
>
> AWR report from 9-10 AM this morning
> ========================
> Statistic |Total
> ----------------------------------------------------------- | ......
> DBWR checkpoints | 4
> background checkpoints completed | 3
> background checkpoints started | 2
>
> - the way I see it I'm probably not checkpointing enough .. which
> causes the ckpt
> process during log switches to take too long .. is this accurate ?
>
> BTW - fixing application to generate less redo, has been explored and
> we can't do any more on that front .... We generate anywhere between
> 200K - 1MB of redo per second ... averaging out around 600-700K per
> second.
>
> as always any feedback is helpful...


Have you looked at the docs and questions Tom Kyte has produced in
regards to this at http://asktom.oracle.com yet?

There are several good discussions search using " checkpoint not
complete ".

Tom writes it up much better than my simple take on it, but the typical
tradeoff's to get away from this situation invovle these
considerations: 1) adding more logs 2) taking checkpoints more
frequently and 3) increasing the size of your logs.

Read Tom's description for more relevant details.

Or of course you can look at oracle support on metalink.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-25-2008, 09:09 AM
joel garry
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -


p.santos000@gmail.com wrote:
> Hey guys,
> I thought I try to get some help in trying to get this problem
> resolved.
> I've struggled with the 'Checkpoint not complete' error message
> before, and thought I had it
> resolved, but I can't seem to make it go way.
>
> I'm currently running 10.1.0.4 on Solaris 64bit.
> From my analysis of this issue, I get the "Checkpoint not complete"
> message nearly evertime
> during a log switch. From looking at the checkpoint error timestamps
> and the log switches,
> 99% of the time the CKPT process fails during the log swith.
>
> I've always been reluctant to checkpoint too much, but maybe I need
> to increase checkpoints,
> sot that the entire process is faster during log switches ?????
>
> Here is the data. (log siwtches are occurring around 20+ minutes)
>
> V$LOG
> =================
>
> GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> |LASTED_MIN
>
> ----------|----------|----------|---|---------------|--------------|----------
> 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> 21
> 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> 26
> 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> 6
> 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> 27
> 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> 26
> 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> 40
> 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> 29
> 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> 10
>
>
> ALERT_LOG (timestamps match log swith)
> =================
> --- These all happen during log switch...
>
> Fri Oct 27 08:05:31 2006
> Thread 1 cannot allocate new log, sequence 32195
> Checkpoint not complete
> Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
>
>
> Fri Oct 27 08:45:18 2006
> Thread 1 cannot allocate new log, sequence 32196
> Checkpoint not complete
> Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
>
>
> Fri Oct 27 09:24:21 2006
> Thread 1 cannot allocate new log, sequence 32198
> Checkpoint not complete
> Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
>
>
> Fri Oct 27 09:44:59 2006
> Thread 1 cannot allocate new log, sequence 32199
> Checkpoint not complete
> Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
>
>
> Fri Oct 27 10:11:13 2006
> Thread 1 cannot allocate new log, sequence 32200
> Checkpoint not complete
> Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
>
>
> KEY PARAMETERS
> =====================
>
> NAME |TYPE |VALUE
> -------------------------------------------|----------------|--------
> log_checkpoint_interval |integer |80000
> log_checkpoint_timeout |integer |1800
> fast_start_mttr_target |integer |0
> log_buffer |integer |524288
> disk_asynch_io |boolean |TRUE
>
> ** Haven't configured fast_start_mttr_target because log_checK*
> parameters
> were already set when I upgrade from 8.1.7 and so It was just one less
> thing to
> worry about configuring...


Are you saying you used 817 parameters for 10g? I hope I'm misreading
that. See the last sentence in
http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html

In addition to what Mark said, you should post what your top waits are,
and show SGA. I expect some issues on the size of your SGA and your
log_buffer.

Also you may want to research the use of async i/o on your particular
system.

>
> AWR report from 9-10 AM this morning
> ========================
> Statistic |Total
> ----------------------------------------------------------- | ......
> DBWR checkpoints | 4
> background checkpoints completed | 3
> background checkpoints started | 2
>
> - the way I see it I'm probably not checkpointing enough .. which
> causes the ckpt
> process during log switches to take too long .. is this accurate ?
>
> BTW - fixing application to generate less redo, has been explored and
> we can't do any more on that front .... We generate anywhere between
> 200K - 1MB of redo per second ... averaging out around 600-700K per
> second.
>
> as always any feedback is helpful...


You may need more and smaller logs with a bigger log buffer.

45 minutes for a checkpoint not to complete is rather worrisome. Your
application code may be doing something that keeps fighting over dirty
buffers - do you have something that repeatedly updates a small number
of rows?

jg
--
@home.com is bogus.
So which DS should I buy the kids?
http://www.signonsandiego.com/uniont..._1b27sony.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-25-2008, 09:09 AM
joel garry
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -


p.santos000@gmail.com wrote:
> Hey guys,
> I thought I try to get some help in trying to get this problem
> resolved.
> I've struggled with the 'Checkpoint not complete' error message
> before, and thought I had it
> resolved, but I can't seem to make it go way.
>
> I'm currently running 10.1.0.4 on Solaris 64bit.
> From my analysis of this issue, I get the "Checkpoint not complete"
> message nearly evertime
> during a log switch. From looking at the checkpoint error timestamps
> and the log switches,
> 99% of the time the CKPT process fails during the log swith.
>
> I've always been reluctant to checkpoint too much, but maybe I need
> to increase checkpoints,
> sot that the entire process is faster during log switches ?????
>
> Here is the data. (log siwtches are occurring around 20+ minutes)
>
> V$LOG
> =================
>
> GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> |LASTED_MIN
>
> ----------|----------|----------|---|---------------|--------------|----------
> 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> 21
> 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> 26
> 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> 6
> 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> 27
> 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> 26
> 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> 40
> 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> 29
> 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> 10
>
>
> ALERT_LOG (timestamps match log swith)
> =================
> --- These all happen during log switch...
>
> Fri Oct 27 08:05:31 2006
> Thread 1 cannot allocate new log, sequence 32195
> Checkpoint not complete
> Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
>
>
> Fri Oct 27 08:45:18 2006
> Thread 1 cannot allocate new log, sequence 32196
> Checkpoint not complete
> Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
>
>
> Fri Oct 27 09:24:21 2006
> Thread 1 cannot allocate new log, sequence 32198
> Checkpoint not complete
> Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
>
>
> Fri Oct 27 09:44:59 2006
> Thread 1 cannot allocate new log, sequence 32199
> Checkpoint not complete
> Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
>
>
> Fri Oct 27 10:11:13 2006
> Thread 1 cannot allocate new log, sequence 32200
> Checkpoint not complete
> Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
>
>
> KEY PARAMETERS
> =====================
>
> NAME |TYPE |VALUE
> -------------------------------------------|----------------|--------
> log_checkpoint_interval |integer |80000
> log_checkpoint_timeout |integer |1800
> fast_start_mttr_target |integer |0
> log_buffer |integer |524288
> disk_asynch_io |boolean |TRUE
>
> ** Haven't configured fast_start_mttr_target because log_checK*
> parameters
> were already set when I upgrade from 8.1.7 and so It was just one less
> thing to
> worry about configuring...


Are you saying you used 817 parameters for 10g? I hope I'm misreading
that. See the last sentence in
http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html

In addition to what Mark said, you should post what your top waits are,
and show SGA. I expect some issues on the size of your SGA and your
log_buffer.

Also you may want to research the use of async i/o on your particular
system.

>
> AWR report from 9-10 AM this morning
> ========================
> Statistic |Total
> ----------------------------------------------------------- | ......
> DBWR checkpoints | 4
> background checkpoints completed | 3
> background checkpoints started | 2
>
> - the way I see it I'm probably not checkpointing enough .. which
> causes the ckpt
> process during log switches to take too long .. is this accurate ?
>
> BTW - fixing application to generate less redo, has been explored and
> we can't do any more on that front .... We generate anywhere between
> 200K - 1MB of redo per second ... averaging out around 600-700K per
> second.
>
> as always any feedback is helpful...


You may need more and smaller logs with a bigger log buffer.

45 minutes for a checkpoint not to complete is rather worrisome. Your
application code may be doing something that keeps fighting over dirty
buffers - do you have something that repeatedly updates a small number
of rows?

jg
--
@home.com is bogus.
So which DS should I buy the kids?
http://www.signonsandiego.com/uniont..._1b27sony.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-25-2008, 09:09 AM
joel garry
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -


p.santos000@gmail.com wrote:
> Hey guys,
> I thought I try to get some help in trying to get this problem
> resolved.
> I've struggled with the 'Checkpoint not complete' error message
> before, and thought I had it
> resolved, but I can't seem to make it go way.
>
> I'm currently running 10.1.0.4 on Solaris 64bit.
> From my analysis of this issue, I get the "Checkpoint not complete"
> message nearly evertime
> during a log switch. From looking at the checkpoint error timestamps
> and the log switches,
> 99% of the time the CKPT process fails during the log swith.
>
> I've always been reluctant to checkpoint too much, but maybe I need
> to increase checkpoints,
> sot that the entire process is faster during log switches ?????
>
> Here is the data. (log siwtches are occurring around 20+ minutes)
>
> V$LOG
> =================
>
> GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> |LASTED_MIN
>
> ----------|----------|----------|---|---------------|--------------|----------
> 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> 21
> 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> 26
> 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> 6
> 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> 27
> 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> 26
> 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> 40
> 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> 29
> 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> 10
>
>
> ALERT_LOG (timestamps match log swith)
> =================
> --- These all happen during log switch...
>
> Fri Oct 27 08:05:31 2006
> Thread 1 cannot allocate new log, sequence 32195
> Checkpoint not complete
> Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
>
>
> Fri Oct 27 08:45:18 2006
> Thread 1 cannot allocate new log, sequence 32196
> Checkpoint not complete
> Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
>
>
> Fri Oct 27 09:24:21 2006
> Thread 1 cannot allocate new log, sequence 32198
> Checkpoint not complete
> Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
>
>
> Fri Oct 27 09:44:59 2006
> Thread 1 cannot allocate new log, sequence 32199
> Checkpoint not complete
> Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
>
>
> Fri Oct 27 10:11:13 2006
> Thread 1 cannot allocate new log, sequence 32200
> Checkpoint not complete
> Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
>
>
> KEY PARAMETERS
> =====================
>
> NAME |TYPE |VALUE
> -------------------------------------------|----------------|--------
> log_checkpoint_interval |integer |80000
> log_checkpoint_timeout |integer |1800
> fast_start_mttr_target |integer |0
> log_buffer |integer |524288
> disk_asynch_io |boolean |TRUE
>
> ** Haven't configured fast_start_mttr_target because log_checK*
> parameters
> were already set when I upgrade from 8.1.7 and so It was just one less
> thing to
> worry about configuring...


Are you saying you used 817 parameters for 10g? I hope I'm misreading
that. See the last sentence in
http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html

In addition to what Mark said, you should post what your top waits are,
and show SGA. I expect some issues on the size of your SGA and your
log_buffer.

Also you may want to research the use of async i/o on your particular
system.

>
> AWR report from 9-10 AM this morning
> ========================
> Statistic |Total
> ----------------------------------------------------------- | ......
> DBWR checkpoints | 4
> background checkpoints completed | 3
> background checkpoints started | 2
>
> - the way I see it I'm probably not checkpointing enough .. which
> causes the ckpt
> process during log switches to take too long .. is this accurate ?
>
> BTW - fixing application to generate less redo, has been explored and
> we can't do any more on that front .... We generate anywhere between
> 200K - 1MB of redo per second ... averaging out around 600-700K per
> second.
>
> as always any feedback is helpful...


You may need more and smaller logs with a bigger log buffer.

45 minutes for a checkpoint not to complete is rather worrisome. Your
application code may be doing something that keeps fighting over dirty
buffers - do you have something that repeatedly updates a small number
of rows?

jg
--
@home.com is bogus.
So which DS should I buy the kids?
http://www.signonsandiego.com/uniont..._1b27sony.html

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-25-2008, 09:09 AM
p.santos000@gmail.com
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -

Mark,
Your right my logs are 600M and the 'Checkpoint not complete' message
only appears during log switches. What I get from this is that a
checkpoint was in process, when
the log switch was initiated right ???

If that's the case, then I do want to checkpoint more often, so that
it completes
faster right ?

Won't increasing LOG_CHECKPOINT_INTERVAL cause less checkpoints ?

Now I think I'm getting confused...

thanks for the feedback!
-peter



Mark D Powell wrote:
> On Oct 27, 10:49 am, p.santos...@gmail.com wrote:
> > Hey guys,
> > I thought I try to get some help in trying to get this problem
> > resolved.
> > I've struggled with the 'Checkpoint not complete' error message
> > before, and thought I had it
> > resolved, but I can't seem to make it go way.
> >
> > I'm currently running 10.1.0.4 on Solaris 64bit.
> > From my analysis of this issue, I get the "Checkpoint not complete"
> > message nearly evertime
> > during a log switch. From looking at the checkpoint error timestamps
> > and the log switches,
> > 99% of the time the CKPT process fails during the log swith.
> >
> > I've always been reluctant to checkpoint too much, but maybe I need
> > to increase checkpoints,
> > sot that the entire process is faster during log switches ?????
> >
> > Here is the data. (log siwtches are occurring around 20+ minutes)
> >
> > V$LOG
> > =================
> >
> > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> > |LASTED_MIN
> >
> > ----------|----------|----------|---|---------------|--------------|-------*---
> > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> > 21
> > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> > 26
> > 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> > 6
> > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> > 27
> > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> > 26
> > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> > 40
> > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> > 29
> > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> > 10
> >
> > ALERT_LOG (timestamps match log swith)
> > =================
> > --- These all happen during log switch...
> >
> > Fri Oct 27 08:05:31 2006
> > Thread 1 cannot allocate new log, sequence 32195
> > Checkpoint not complete
> > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
> >
> > Fri Oct 27 08:45:18 2006
> > Thread 1 cannot allocate new log, sequence 32196
> > Checkpoint not complete
> > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
> >
> > Fri Oct 27 09:24:21 2006
> > Thread 1 cannot allocate new log, sequence 32198
> > Checkpoint not complete
> > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
> >
> > Fri Oct 27 09:44:59 2006
> > Thread 1 cannot allocate new log, sequence 32199
> > Checkpoint not complete
> > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
> >
> > Fri Oct 27 10:11:13 2006
> > Thread 1 cannot allocate new log, sequence 32200
> > Checkpoint not complete
> > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
> >
> > KEY PARAMETERS
> > =====================
> >
> > NAME |TYPE |VALUE
> > -------------------------------------------|----------------|--------
> > log_checkpoint_interval |integer |80000
> > log_checkpoint_timeout |integer |1800
> > fast_start_mttr_target |integer |0
> > log_buffer |integer |524288
> > disk_asynch_io |boolean |TRUE
> >
> > ** Haven't configured fast_start_mttr_target because log_checK*
> > parameters
> > were already set when I upgrade from 8.1.7 and so It was just one less
> > thing to
> > worry about configuring...
> >
> > AWR report from 9-10 AM this morning
> > ========================
> > Statistic |Total
> > ----------------------------------------------------------- | ......
> > DBWR checkpoints | 4
> > background checkpoints completed | 3
> > background checkpoints started | 2
> >
> > - the way I see it I'm probably not checkpointing enough .. which
> > causes the ckpt
> > process during log switches to take too long .. is this accurate ?
> >
> > BTW - fixing application to generate less redo, has been explored and
> > we can't do any more on that front .... We generate anywhere between
> > 200K - 1MB of redo per second ... averaging out around 600-700K per
> > second.
> >
> > as always any feedback is helpful...
> >
> > -peter

>
> Peter, if I am reading your post correctly your online redo logs are
> 600M. If the log block was meansured in 8k database blocks then you
> have set the interval to about 625M but if the OS block size behind
> this parameter is only 512 bytes then you have set it to around 40M. I
> suggest setting the value up and see if it makes any difference.
>
> >> LOG_CHECKPOINT_INTERVAL specifies the frequency of checkpoints in terms of the number of redo log file blocks that can exist between an incremental checkpoint and the last block written to the redo log. This number refers to physical operating system blocks, not database blocks. <<

>
> HTH -- Mark D Powell --


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-25-2008, 09:09 AM
p.santos000@gmail.com
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -

Thanks for the feedback,
I'll be sure to check out Tom's site.. probably tonight.
I have increased the logs files from 200MB to their current value of
600MB.
Unfortunately, our application generates lots of redo, and there is
very little
I can do about it. Adding more log files is really not going to
help.
At this point, I have to find a better way to balance checkpointing
with log switches (one could of course argue that increasing the size
effects that).

thanks for your input.
-peter



hpuxrac wrote:
> p.santos000@gmail.com wrote:
> > Hey guys,
> > I thought I try to get some help in trying to get this problem
> > resolved.
> > I've struggled with the 'Checkpoint not complete' error message
> > before, and thought I had it
> > resolved, but I can't seem to make it go way.
> >
> > I'm currently running 10.1.0.4 on Solaris 64bit.
> > From my analysis of this issue, I get the "Checkpoint not complete"
> > message nearly evertime
> > during a log switch. From looking at the checkpoint error timestamps
> > and the log switches,
> > 99% of the time the CKPT process fails during the log swith.
> >
> > I've always been reluctant to checkpoint too much, but maybe I need
> > to increase checkpoints,
> > sot that the entire process is faster during log switches ?????
> >
> > Here is the data. (log siwtches are occurring around 20+ minutes)
> >
> > V$LOG
> > =================
> >
> > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> > |LASTED_MIN
> >
> > ----------|----------|----------|---|---------------|--------------|----------
> > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> > 21
> > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> > 26
> > 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> > 6
> > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> > 27
> > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> > 26
> > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> > 40
> > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> > 29
> > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> > 10
> >
> >
> > ALERT_LOG (timestamps match log swith)
> > =================
> > --- These all happen during log switch...
> >
> > Fri Oct 27 08:05:31 2006
> > Thread 1 cannot allocate new log, sequence 32195
> > Checkpoint not complete
> > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
> >
> >
> > Fri Oct 27 08:45:18 2006
> > Thread 1 cannot allocate new log, sequence 32196
> > Checkpoint not complete
> > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
> >
> >
> > Fri Oct 27 09:24:21 2006
> > Thread 1 cannot allocate new log, sequence 32198
> > Checkpoint not complete
> > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
> >
> >
> > Fri Oct 27 09:44:59 2006
> > Thread 1 cannot allocate new log, sequence 32199
> > Checkpoint not complete
> > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
> >
> >
> > Fri Oct 27 10:11:13 2006
> > Thread 1 cannot allocate new log, sequence 32200
> > Checkpoint not complete
> > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
> >
> >
> > KEY PARAMETERS
> > =====================
> >
> > NAME |TYPE |VALUE
> > -------------------------------------------|----------------|--------
> > log_checkpoint_interval |integer |80000
> > log_checkpoint_timeout |integer |1800
> > fast_start_mttr_target |integer |0
> > log_buffer |integer |524288
> > disk_asynch_io |boolean |TRUE
> >
> > ** Haven't configured fast_start_mttr_target because log_checK*
> > parameters
> > were already set when I upgrade from 8.1.7 and so It was just one less
> > thing to
> > worry about configuring...
> >
> > AWR report from 9-10 AM this morning
> > ========================
> > Statistic |Total
> > ----------------------------------------------------------- | ......
> > DBWR checkpoints | 4
> > background checkpoints completed | 3
> > background checkpoints started | 2
> >
> > - the way I see it I'm probably not checkpointing enough .. which
> > causes the ckpt
> > process during log switches to take too long .. is this accurate ?
> >
> > BTW - fixing application to generate less redo, has been explored and
> > we can't do any more on that front .... We generate anywhere between
> > 200K - 1MB of redo per second ... averaging out around 600-700K per
> > second.
> >
> > as always any feedback is helpful...

>
> Have you looked at the docs and questions Tom Kyte has produced in
> regards to this at http://asktom.oracle.com yet?
>
> There are several good discussions search using " checkpoint not
> complete ".
>
> Tom writes it up much better than my simple take on it, but the typical
> tradeoff's to get away from this situation invovle these
> considerations: 1) adding more logs 2) taking checkpoints more
> frequently and 3) increasing the size of your logs.
>
> Read Tom's description for more relevant details.
>
> Or of course you can look at oracle support on metalink.


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 02-25-2008, 09:09 AM
p.santos000@gmail.com
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -

Joel,

Here is what v$sgainfo looks like.

NAME | BYTES
--------------------------------|----------------
Fixed SGA Size | 1,309,712
Redo Buffers | 524,288
Buffer Cache Size | 2,147,483,648
Shared Pool Size | 1,073,741,824
Large Pool Size | 218,103,808
Java Pool Size | 33,554,432
Streams Pool Size | 0
Granule Size | 16,777,216
Maximum SGA Size | 6,442,450,944
Startup overhead in Shared Pool | 100,663,296
Free SGA Memory Available | 2,952,790,016

Our application does a lot of very quirky things that are painful for a
DBA, but
until we redesign our new application next year, I just have to find
ways to
deal with the thrashing of the database.

For example, our customers at any point during the day can update
in batches of 500,000 any column in their tables. While at the same
time users can go only and update their own information... so the batch
updates blocks these single record updates etc ... customers have
their own tables ..causing lots of similar sql ... this is the reason
for a 2GB buffer cache.

So anytime the top 5 wait events change .. we can be really slow, then
all of the sudden 3-5 customers start batch updating columns with
anywhere from 15MM to 40MM records per table. .. this is in part of the
reason we generate so much redo.

Our top five events this morning were these.

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) DB Time
Wait Class
------------------------------ ------------ ----------- ---------
--------------
db file sequential read 4,598,287 28,970 35.84
User I/O
CPU time 20,952 25.92
buffer busy waits 16,131 15,356 19.00
Concurrency
db file parallel write 34,904 13,730 16.99
System I/O
db file scattered read 292,277 6,568 8.13
User I/O
-------------------------------------------------------------

'buffer busy waits' was because of a batch update running in
parallel... probably updating records in same/similar blocks.

I understand that the LOG_CHECKPOINT parameters changed and although
the recommendation is to use FAST_START_MTTR_TARGET, when I upgraded
from 8.1.7 to 10.1.0.4 .. I had other battles to fight.

-peter







joel garry wrote:
> p.santos000@gmail.com wrote:
> > Hey guys,
> > I thought I try to get some help in trying to get this problem
> > resolved.
> > I've struggled with the 'Checkpoint not complete' error message
> > before, and thought I had it
> > resolved, but I can't seem to make it go way.
> >
> > I'm currently running 10.1.0.4 on Solaris 64bit.
> > From my analysis of this issue, I get the "Checkpoint not complete"
> > message nearly evertime
> > during a log switch. From looking at the checkpoint error timestamps
> > and the log switches,
> > 99% of the time the CKPT process fails during the log swith.
> >
> > I've always been reluctant to checkpoint too much, but maybe I need
> > to increase checkpoints,
> > sot that the entire process is faster during log switches ?????
> >
> > Here is the data. (log siwtches are occurring around 20+ minutes)
> >
> > V$LOG
> > =================
> >
> > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME
> > |LASTED_MIN
> >
> > ----------|----------|----------|---|---------------|--------------|----------
> > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24|
> > 21
> > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44|
> > 26
> > 3| 32200| 600|NO |CURRENT |10/27/06 10:11|
> > 6
> > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12|
> > 27
> > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39|
> > 26
> > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05|
> > 40
> > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45|
> > 29
> > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14|
> > 10
> >
> >
> > ALERT_LOG (timestamps match log swith)
> > =================
> > --- These all happen during log switch...
> >
> > Fri Oct 27 08:05:31 2006
> > Thread 1 cannot allocate new log, sequence 32195
> > Checkpoint not complete
> > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log
> >
> >
> > Fri Oct 27 08:45:18 2006
> > Thread 1 cannot allocate new log, sequence 32196
> > Checkpoint not complete
> > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log
> >
> >
> > Fri Oct 27 09:24:21 2006
> > Thread 1 cannot allocate new log, sequence 32198
> > Checkpoint not complete
> > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log
> >
> >
> > Fri Oct 27 09:44:59 2006
> > Thread 1 cannot allocate new log, sequence 32199
> > Checkpoint not complete
> > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log
> >
> >
> > Fri Oct 27 10:11:13 2006
> > Thread 1 cannot allocate new log, sequence 32200
> > Checkpoint not complete
> > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log
> >
> >
> > KEY PARAMETERS
> > =====================
> >
> > NAME |TYPE |VALUE
> > -------------------------------------------|----------------|--------
> > log_checkpoint_interval |integer |80000
> > log_checkpoint_timeout |integer |1800
> > fast_start_mttr_target |integer |0
> > log_buffer |integer |524288
> > disk_asynch_io |boolean |TRUE
> >
> > ** Haven't configured fast_start_mttr_target because log_checK*
> > parameters
> > were already set when I upgrade from 8.1.7 and so It was just one less
> > thing to
> > worry about configuring...

>
> Are you saying you used 817 parameters for 10g? I hope I'm misreading
> that. See the last sentence in
> http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html
>
> In addition to what Mark said, you should post what your top waits are,
> and show SGA. I expect some issues on the size of your SGA and your
> log_buffer.
>
> Also you may want to research the use of async i/o on your particular
> system.
>
> >
> > AWR report from 9-10 AM this morning
> > ========================
> > Statistic |Total
> > ----------------------------------------------------------- | ......
> > DBWR checkpoints | 4
> > background checkpoints completed | 3
> > background checkpoints started | 2
> >
> > - the way I see it I'm probably not checkpointing enough .. which
> > causes the ckpt
> > process during log switches to take too long .. is this accurate ?
> >
> > BTW - fixing application to generate less redo, has been explored and
> > we can't do any more on that front .... We generate anywhere between
> > 200K - 1MB of redo per second ... averaging out around 600-700K per
> > second.
> >
> > as always any feedback is helpful...

>
> You may need more and smaller logs with a bigger log buffer.
>
> 45 minutes for a checkpoint not to complete is rather worrisome. Your
> application code may be doing something that keeps fighting over dirty
> buffers - do you have something that repeatedly updates a small number
> of rows?
>
> jg
> --
> @home.com is bogus.
> So which DS should I buy the kids?
> http://www.signonsandiego.com/uniont..._1b27sony.html


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 02-25-2008, 09:09 AM
hpuxrac
 
Posts: n/a
Default Re: 10gR1 - Checkpoint not complete -


p.santos000@gmail.com wrote:
> Thanks for the feedback,
> I'll be sure to check out Tom's site.. probably tonight.
> I have increased the logs files from 200MB to their current value of
> 600MB.
> Unfortunately, our application generates lots of redo, and there is
> very little
> I can do about it. Adding more log files is really not going to
> help.
> At this point, I have to find a better way to balance checkpointing
> with log switches (one could of course argue that increasing the size
> effects that).
>
> thanks for your input.
> -peter


I would do the searches on asktom and read what he has to say on the
subject sooner rather than later.

There's a good amount of information and give and take from various
different posters.

Do the research and understand what you are doing and why before making
other changes is my recommendation.

Most of the people that hang out on cdos don't appreciate people that
respond with "top posting". They much prefer people to add on
additional comments notes at the bottom.

Good luck.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump