This is a discussion on 10gR1 - Checkpoint not complete - within the Oracle Database forums, part of the Database Server Software category; --> Hey guys, I thought I try to get some help in trying to get this problem resolved. I've struggled ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hey guys, I thought I try to get some help in trying to get this problem resolved. I've struggled with the 'Checkpoint not complete' error message before, and thought I had it resolved, but I can't seem to make it go way. I'm currently running 10.1.0.4 on Solaris 64bit. From my analysis of this issue, I get the "Checkpoint not complete" message nearly evertime during a log switch. From looking at the checkpoint error timestamps and the log switches, 99% of the time the CKPT process fails during the log swith. I've always been reluctant to checkpoint too much, but maybe I need to increase checkpoints, sot that the entire process is faster during log switches ????? Here is the data. (log siwtches are occurring around 20+ minutes) V$LOG ================= GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME |LASTED_MIN ----------|----------|----------|---|---------------|--------------|---------- 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| 21 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| 26 3| 32200| 600|NO |CURRENT |10/27/06 10:11| 6 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| 27 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| 26 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| 40 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| 29 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| 10 ALERT_LOG (timestamps match log swith) ================= --- These all happen during log switch... Fri Oct 27 08:05:31 2006 Thread 1 cannot allocate new log, sequence 32195 Checkpoint not complete Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log Fri Oct 27 08:45:18 2006 Thread 1 cannot allocate new log, sequence 32196 Checkpoint not complete Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log Fri Oct 27 09:24:21 2006 Thread 1 cannot allocate new log, sequence 32198 Checkpoint not complete Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log Fri Oct 27 09:44:59 2006 Thread 1 cannot allocate new log, sequence 32199 Checkpoint not complete Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log Fri Oct 27 10:11:13 2006 Thread 1 cannot allocate new log, sequence 32200 Checkpoint not complete Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log KEY PARAMETERS ===================== NAME |TYPE |VALUE -------------------------------------------|----------------|-------- log_checkpoint_interval |integer |80000 log_checkpoint_timeout |integer |1800 fast_start_mttr_target |integer |0 log_buffer |integer |524288 disk_asynch_io |boolean |TRUE ** Haven't configured fast_start_mttr_target because log_checK* parameters were already set when I upgrade from 8.1.7 and so It was just one less thing to worry about configuring... AWR report from 9-10 AM this morning ======================== Statistic |Total ----------------------------------------------------------- | ...... DBWR checkpoints | 4 background checkpoints completed | 3 background checkpoints started | 2 - the way I see it I'm probably not checkpointing enough .. which causes the ckpt process during log switches to take too long .. is this accurate ? BTW - fixing application to generate less redo, has been explored and we can't do any more on that front .... We generate anywhere between 200K - 1MB of redo per second ... averaging out around 600-700K per second. as always any feedback is helpful... -peter |
| |||
| On Oct 27, 10:49 am, p.santos...@gmail.com wrote: > Hey guys, > I thought I try to get some help in trying to get this problem > resolved. > I've struggled with the 'Checkpoint not complete' error message > before, and thought I had it > resolved, but I can't seem to make it go way. > > I'm currently running 10.1.0.4 on Solaris 64bit. > From my analysis of this issue, I get the "Checkpoint not complete" > message nearly evertime > during a log switch. From looking at the checkpoint error timestamps > and the log switches, > 99% of the time the CKPT process fails during the log swith. > > I've always been reluctant to checkpoint too much, but maybe I need > to increase checkpoints, > sot that the entire process is faster during log switches ????? > > Here is the data. (log siwtches are occurring around 20+ minutes) > > V$LOG > ================= > > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME > |LASTED_MIN > > ----------|----------|----------|---|---------------|--------------|-------*--- > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| > 21 > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| > 26 > 3| 32200| 600|NO |CURRENT |10/27/06 10:11| > 6 > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| > 27 > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| > 26 > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| > 40 > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| > 29 > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| > 10 > > ALERT_LOG (timestamps match log swith) > ================= > --- These all happen during log switch... > > Fri Oct 27 08:05:31 2006 > Thread 1 cannot allocate new log, sequence 32195 > Checkpoint not complete > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log > > Fri Oct 27 08:45:18 2006 > Thread 1 cannot allocate new log, sequence 32196 > Checkpoint not complete > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log > > Fri Oct 27 09:24:21 2006 > Thread 1 cannot allocate new log, sequence 32198 > Checkpoint not complete > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log > > Fri Oct 27 09:44:59 2006 > Thread 1 cannot allocate new log, sequence 32199 > Checkpoint not complete > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log > > Fri Oct 27 10:11:13 2006 > Thread 1 cannot allocate new log, sequence 32200 > Checkpoint not complete > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log > > KEY PARAMETERS > ===================== > > NAME |TYPE |VALUE > -------------------------------------------|----------------|-------- > log_checkpoint_interval |integer |80000 > log_checkpoint_timeout |integer |1800 > fast_start_mttr_target |integer |0 > log_buffer |integer |524288 > disk_asynch_io |boolean |TRUE > > ** Haven't configured fast_start_mttr_target because log_checK* > parameters > were already set when I upgrade from 8.1.7 and so It was just one less > thing to > worry about configuring... > > AWR report from 9-10 AM this morning > ======================== > Statistic |Total > ----------------------------------------------------------- | ...... > DBWR checkpoints | 4 > background checkpoints completed | 3 > background checkpoints started | 2 > > - the way I see it I'm probably not checkpointing enough .. which > causes the ckpt > process during log switches to take too long .. is this accurate ? > > BTW - fixing application to generate less redo, has been explored and > we can't do any more on that front .... We generate anywhere between > 200K - 1MB of redo per second ... averaging out around 600-700K per > second. > > as always any feedback is helpful... > > -peter Peter, if I am reading your post correctly your online redo logs are 600M. If the log block was meansured in 8k database blocks then you have set the interval to about 625M but if the OS block size behind this parameter is only 512 bytes then you have set it to around 40M. I suggest setting the value up and see if it makes any difference. >> LOG_CHECKPOINT_INTERVAL specifies the frequency of checkpoints in terms of the number of redo log file blocks that can exist between an incrementalcheckpoint and the last block written to the redo log. This number refers to physical operating system blocks, not database blocks. << HTH -- Mark D Powell -- |
| |||
| p.santos000@gmail.com wrote: > Hey guys, > I thought I try to get some help in trying to get this problem > resolved. > I've struggled with the 'Checkpoint not complete' error message > before, and thought I had it > resolved, but I can't seem to make it go way. > > I'm currently running 10.1.0.4 on Solaris 64bit. > From my analysis of this issue, I get the "Checkpoint not complete" > message nearly evertime > during a log switch. From looking at the checkpoint error timestamps > and the log switches, > 99% of the time the CKPT process fails during the log swith. > > I've always been reluctant to checkpoint too much, but maybe I need > to increase checkpoints, > sot that the entire process is faster during log switches ????? > > Here is the data. (log siwtches are occurring around 20+ minutes) > > V$LOG > ================= > > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME > |LASTED_MIN > > ----------|----------|----------|---|---------------|--------------|---------- > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| > 21 > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| > 26 > 3| 32200| 600|NO |CURRENT |10/27/06 10:11| > 6 > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| > 27 > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| > 26 > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| > 40 > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| > 29 > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| > 10 > > > ALERT_LOG (timestamps match log swith) > ================= > --- These all happen during log switch... > > Fri Oct 27 08:05:31 2006 > Thread 1 cannot allocate new log, sequence 32195 > Checkpoint not complete > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log > > > Fri Oct 27 08:45:18 2006 > Thread 1 cannot allocate new log, sequence 32196 > Checkpoint not complete > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log > > > Fri Oct 27 09:24:21 2006 > Thread 1 cannot allocate new log, sequence 32198 > Checkpoint not complete > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log > > > Fri Oct 27 09:44:59 2006 > Thread 1 cannot allocate new log, sequence 32199 > Checkpoint not complete > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log > > > Fri Oct 27 10:11:13 2006 > Thread 1 cannot allocate new log, sequence 32200 > Checkpoint not complete > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log > > > KEY PARAMETERS > ===================== > > NAME |TYPE |VALUE > -------------------------------------------|----------------|-------- > log_checkpoint_interval |integer |80000 > log_checkpoint_timeout |integer |1800 > fast_start_mttr_target |integer |0 > log_buffer |integer |524288 > disk_asynch_io |boolean |TRUE > > ** Haven't configured fast_start_mttr_target because log_checK* > parameters > were already set when I upgrade from 8.1.7 and so It was just one less > thing to > worry about configuring... > > AWR report from 9-10 AM this morning > ======================== > Statistic |Total > ----------------------------------------------------------- | ...... > DBWR checkpoints | 4 > background checkpoints completed | 3 > background checkpoints started | 2 > > - the way I see it I'm probably not checkpointing enough .. which > causes the ckpt > process during log switches to take too long .. is this accurate ? > > BTW - fixing application to generate less redo, has been explored and > we can't do any more on that front .... We generate anywhere between > 200K - 1MB of redo per second ... averaging out around 600-700K per > second. > > as always any feedback is helpful... Have you looked at the docs and questions Tom Kyte has produced in regards to this at http://asktom.oracle.com yet? There are several good discussions search using " checkpoint not complete ". Tom writes it up much better than my simple take on it, but the typical tradeoff's to get away from this situation invovle these considerations: 1) adding more logs 2) taking checkpoints more frequently and 3) increasing the size of your logs. Read Tom's description for more relevant details. Or of course you can look at oracle support on metalink. |
| |||
| p.santos000@gmail.com wrote: > Hey guys, > I thought I try to get some help in trying to get this problem > resolved. > I've struggled with the 'Checkpoint not complete' error message > before, and thought I had it > resolved, but I can't seem to make it go way. > > I'm currently running 10.1.0.4 on Solaris 64bit. > From my analysis of this issue, I get the "Checkpoint not complete" > message nearly evertime > during a log switch. From looking at the checkpoint error timestamps > and the log switches, > 99% of the time the CKPT process fails during the log swith. > > I've always been reluctant to checkpoint too much, but maybe I need > to increase checkpoints, > sot that the entire process is faster during log switches ????? > > Here is the data. (log siwtches are occurring around 20+ minutes) > > V$LOG > ================= > > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME > |LASTED_MIN > > ----------|----------|----------|---|---------------|--------------|---------- > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| > 21 > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| > 26 > 3| 32200| 600|NO |CURRENT |10/27/06 10:11| > 6 > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| > 27 > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| > 26 > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| > 40 > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| > 29 > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| > 10 > > > ALERT_LOG (timestamps match log swith) > ================= > --- These all happen during log switch... > > Fri Oct 27 08:05:31 2006 > Thread 1 cannot allocate new log, sequence 32195 > Checkpoint not complete > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log > > > Fri Oct 27 08:45:18 2006 > Thread 1 cannot allocate new log, sequence 32196 > Checkpoint not complete > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log > > > Fri Oct 27 09:24:21 2006 > Thread 1 cannot allocate new log, sequence 32198 > Checkpoint not complete > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log > > > Fri Oct 27 09:44:59 2006 > Thread 1 cannot allocate new log, sequence 32199 > Checkpoint not complete > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log > > > Fri Oct 27 10:11:13 2006 > Thread 1 cannot allocate new log, sequence 32200 > Checkpoint not complete > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log > > > KEY PARAMETERS > ===================== > > NAME |TYPE |VALUE > -------------------------------------------|----------------|-------- > log_checkpoint_interval |integer |80000 > log_checkpoint_timeout |integer |1800 > fast_start_mttr_target |integer |0 > log_buffer |integer |524288 > disk_asynch_io |boolean |TRUE > > ** Haven't configured fast_start_mttr_target because log_checK* > parameters > were already set when I upgrade from 8.1.7 and so It was just one less > thing to > worry about configuring... Are you saying you used 817 parameters for 10g? I hope I'm misreading that. See the last sentence in http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html In addition to what Mark said, you should post what your top waits are, and show SGA. I expect some issues on the size of your SGA and your log_buffer. Also you may want to research the use of async i/o on your particular system. > > AWR report from 9-10 AM this morning > ======================== > Statistic |Total > ----------------------------------------------------------- | ...... > DBWR checkpoints | 4 > background checkpoints completed | 3 > background checkpoints started | 2 > > - the way I see it I'm probably not checkpointing enough .. which > causes the ckpt > process during log switches to take too long .. is this accurate ? > > BTW - fixing application to generate less redo, has been explored and > we can't do any more on that front .... We generate anywhere between > 200K - 1MB of redo per second ... averaging out around 600-700K per > second. > > as always any feedback is helpful... You may need more and smaller logs with a bigger log buffer. 45 minutes for a checkpoint not to complete is rather worrisome. Your application code may be doing something that keeps fighting over dirty buffers - do you have something that repeatedly updates a small number of rows? jg -- @home.com is bogus. So which DS should I buy the kids? http://www.signonsandiego.com/uniont..._1b27sony.html |
| |||
| p.santos000@gmail.com wrote: > Hey guys, > I thought I try to get some help in trying to get this problem > resolved. > I've struggled with the 'Checkpoint not complete' error message > before, and thought I had it > resolved, but I can't seem to make it go way. > > I'm currently running 10.1.0.4 on Solaris 64bit. > From my analysis of this issue, I get the "Checkpoint not complete" > message nearly evertime > during a log switch. From looking at the checkpoint error timestamps > and the log switches, > 99% of the time the CKPT process fails during the log swith. > > I've always been reluctant to checkpoint too much, but maybe I need > to increase checkpoints, > sot that the entire process is faster during log switches ????? > > Here is the data. (log siwtches are occurring around 20+ minutes) > > V$LOG > ================= > > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME > |LASTED_MIN > > ----------|----------|----------|---|---------------|--------------|---------- > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| > 21 > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| > 26 > 3| 32200| 600|NO |CURRENT |10/27/06 10:11| > 6 > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| > 27 > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| > 26 > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| > 40 > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| > 29 > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| > 10 > > > ALERT_LOG (timestamps match log swith) > ================= > --- These all happen during log switch... > > Fri Oct 27 08:05:31 2006 > Thread 1 cannot allocate new log, sequence 32195 > Checkpoint not complete > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log > > > Fri Oct 27 08:45:18 2006 > Thread 1 cannot allocate new log, sequence 32196 > Checkpoint not complete > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log > > > Fri Oct 27 09:24:21 2006 > Thread 1 cannot allocate new log, sequence 32198 > Checkpoint not complete > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log > > > Fri Oct 27 09:44:59 2006 > Thread 1 cannot allocate new log, sequence 32199 > Checkpoint not complete > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log > > > Fri Oct 27 10:11:13 2006 > Thread 1 cannot allocate new log, sequence 32200 > Checkpoint not complete > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log > > > KEY PARAMETERS > ===================== > > NAME |TYPE |VALUE > -------------------------------------------|----------------|-------- > log_checkpoint_interval |integer |80000 > log_checkpoint_timeout |integer |1800 > fast_start_mttr_target |integer |0 > log_buffer |integer |524288 > disk_asynch_io |boolean |TRUE > > ** Haven't configured fast_start_mttr_target because log_checK* > parameters > were already set when I upgrade from 8.1.7 and so It was just one less > thing to > worry about configuring... Are you saying you used 817 parameters for 10g? I hope I'm misreading that. See the last sentence in http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html In addition to what Mark said, you should post what your top waits are, and show SGA. I expect some issues on the size of your SGA and your log_buffer. Also you may want to research the use of async i/o on your particular system. > > AWR report from 9-10 AM this morning > ======================== > Statistic |Total > ----------------------------------------------------------- | ...... > DBWR checkpoints | 4 > background checkpoints completed | 3 > background checkpoints started | 2 > > - the way I see it I'm probably not checkpointing enough .. which > causes the ckpt > process during log switches to take too long .. is this accurate ? > > BTW - fixing application to generate less redo, has been explored and > we can't do any more on that front .... We generate anywhere between > 200K - 1MB of redo per second ... averaging out around 600-700K per > second. > > as always any feedback is helpful... You may need more and smaller logs with a bigger log buffer. 45 minutes for a checkpoint not to complete is rather worrisome. Your application code may be doing something that keeps fighting over dirty buffers - do you have something that repeatedly updates a small number of rows? jg -- @home.com is bogus. So which DS should I buy the kids? http://www.signonsandiego.com/uniont..._1b27sony.html |
| |||
| p.santos000@gmail.com wrote: > Hey guys, > I thought I try to get some help in trying to get this problem > resolved. > I've struggled with the 'Checkpoint not complete' error message > before, and thought I had it > resolved, but I can't seem to make it go way. > > I'm currently running 10.1.0.4 on Solaris 64bit. > From my analysis of this issue, I get the "Checkpoint not complete" > message nearly evertime > during a log switch. From looking at the checkpoint error timestamps > and the log switches, > 99% of the time the CKPT process fails during the log swith. > > I've always been reluctant to checkpoint too much, but maybe I need > to increase checkpoints, > sot that the entire process is faster during log switches ????? > > Here is the data. (log siwtches are occurring around 20+ minutes) > > V$LOG > ================= > > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME > |LASTED_MIN > > ----------|----------|----------|---|---------------|--------------|---------- > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| > 21 > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| > 26 > 3| 32200| 600|NO |CURRENT |10/27/06 10:11| > 6 > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| > 27 > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| > 26 > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| > 40 > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| > 29 > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| > 10 > > > ALERT_LOG (timestamps match log swith) > ================= > --- These all happen during log switch... > > Fri Oct 27 08:05:31 2006 > Thread 1 cannot allocate new log, sequence 32195 > Checkpoint not complete > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log > > > Fri Oct 27 08:45:18 2006 > Thread 1 cannot allocate new log, sequence 32196 > Checkpoint not complete > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log > > > Fri Oct 27 09:24:21 2006 > Thread 1 cannot allocate new log, sequence 32198 > Checkpoint not complete > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log > > > Fri Oct 27 09:44:59 2006 > Thread 1 cannot allocate new log, sequence 32199 > Checkpoint not complete > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log > > > Fri Oct 27 10:11:13 2006 > Thread 1 cannot allocate new log, sequence 32200 > Checkpoint not complete > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log > > > KEY PARAMETERS > ===================== > > NAME |TYPE |VALUE > -------------------------------------------|----------------|-------- > log_checkpoint_interval |integer |80000 > log_checkpoint_timeout |integer |1800 > fast_start_mttr_target |integer |0 > log_buffer |integer |524288 > disk_asynch_io |boolean |TRUE > > ** Haven't configured fast_start_mttr_target because log_checK* > parameters > were already set when I upgrade from 8.1.7 and so It was just one less > thing to > worry about configuring... Are you saying you used 817 parameters for 10g? I hope I'm misreading that. See the last sentence in http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html In addition to what Mark said, you should post what your top waits are, and show SGA. I expect some issues on the size of your SGA and your log_buffer. Also you may want to research the use of async i/o on your particular system. > > AWR report from 9-10 AM this morning > ======================== > Statistic |Total > ----------------------------------------------------------- | ...... > DBWR checkpoints | 4 > background checkpoints completed | 3 > background checkpoints started | 2 > > - the way I see it I'm probably not checkpointing enough .. which > causes the ckpt > process during log switches to take too long .. is this accurate ? > > BTW - fixing application to generate less redo, has been explored and > we can't do any more on that front .... We generate anywhere between > 200K - 1MB of redo per second ... averaging out around 600-700K per > second. > > as always any feedback is helpful... You may need more and smaller logs with a bigger log buffer. 45 minutes for a checkpoint not to complete is rather worrisome. Your application code may be doing something that keeps fighting over dirty buffers - do you have something that repeatedly updates a small number of rows? jg -- @home.com is bogus. So which DS should I buy the kids? http://www.signonsandiego.com/uniont..._1b27sony.html |
| |||
| Mark, Your right my logs are 600M and the 'Checkpoint not complete' message only appears during log switches. What I get from this is that a checkpoint was in process, when the log switch was initiated right ??? If that's the case, then I do want to checkpoint more often, so that it completes faster right ? Won't increasing LOG_CHECKPOINT_INTERVAL cause less checkpoints ? Now I think I'm getting confused... thanks for the feedback! -peter Mark D Powell wrote: > On Oct 27, 10:49 am, p.santos...@gmail.com wrote: > > Hey guys, > > I thought I try to get some help in trying to get this problem > > resolved. > > I've struggled with the 'Checkpoint not complete' error message > > before, and thought I had it > > resolved, but I can't seem to make it go way. > > > > I'm currently running 10.1.0.4 on Solaris 64bit. > > From my analysis of this issue, I get the "Checkpoint not complete" > > message nearly evertime > > during a log switch. From looking at the checkpoint error timestamps > > and the log switches, > > 99% of the time the CKPT process fails during the log swith. > > > > I've always been reluctant to checkpoint too much, but maybe I need > > to increase checkpoints, > > sot that the entire process is faster during log switches ????? > > > > Here is the data. (log siwtches are occurring around 20+ minutes) > > > > V$LOG > > ================= > > > > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME > > |LASTED_MIN > > > > ----------|----------|----------|---|---------------|--------------|-------*--- > > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| > > 21 > > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| > > 26 > > 3| 32200| 600|NO |CURRENT |10/27/06 10:11| > > 6 > > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| > > 27 > > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| > > 26 > > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| > > 40 > > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| > > 29 > > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| > > 10 > > > > ALERT_LOG (timestamps match log swith) > > ================= > > --- These all happen during log switch... > > > > Fri Oct 27 08:05:31 2006 > > Thread 1 cannot allocate new log, sequence 32195 > > Checkpoint not complete > > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log > > > > Fri Oct 27 08:45:18 2006 > > Thread 1 cannot allocate new log, sequence 32196 > > Checkpoint not complete > > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log > > > > Fri Oct 27 09:24:21 2006 > > Thread 1 cannot allocate new log, sequence 32198 > > Checkpoint not complete > > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log > > > > Fri Oct 27 09:44:59 2006 > > Thread 1 cannot allocate new log, sequence 32199 > > Checkpoint not complete > > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log > > > > Fri Oct 27 10:11:13 2006 > > Thread 1 cannot allocate new log, sequence 32200 > > Checkpoint not complete > > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log > > > > KEY PARAMETERS > > ===================== > > > > NAME |TYPE |VALUE > > -------------------------------------------|----------------|-------- > > log_checkpoint_interval |integer |80000 > > log_checkpoint_timeout |integer |1800 > > fast_start_mttr_target |integer |0 > > log_buffer |integer |524288 > > disk_asynch_io |boolean |TRUE > > > > ** Haven't configured fast_start_mttr_target because log_checK* > > parameters > > were already set when I upgrade from 8.1.7 and so It was just one less > > thing to > > worry about configuring... > > > > AWR report from 9-10 AM this morning > > ======================== > > Statistic |Total > > ----------------------------------------------------------- | ...... > > DBWR checkpoints | 4 > > background checkpoints completed | 3 > > background checkpoints started | 2 > > > > - the way I see it I'm probably not checkpointing enough .. which > > causes the ckpt > > process during log switches to take too long .. is this accurate ? > > > > BTW - fixing application to generate less redo, has been explored and > > we can't do any more on that front .... We generate anywhere between > > 200K - 1MB of redo per second ... averaging out around 600-700K per > > second. > > > > as always any feedback is helpful... > > > > -peter > > Peter, if I am reading your post correctly your online redo logs are > 600M. If the log block was meansured in 8k database blocks then you > have set the interval to about 625M but if the OS block size behind > this parameter is only 512 bytes then you have set it to around 40M. I > suggest setting the value up and see if it makes any difference. > > >> LOG_CHECKPOINT_INTERVAL specifies the frequency of checkpoints in terms of the number of redo log file blocks that can exist between an incremental checkpoint and the last block written to the redo log. This number refers to physical operating system blocks, not database blocks. << > > HTH -- Mark D Powell -- |
| |||
| Thanks for the feedback, I'll be sure to check out Tom's site.. probably tonight. I have increased the logs files from 200MB to their current value of 600MB. Unfortunately, our application generates lots of redo, and there is very little I can do about it. Adding more log files is really not going to help. At this point, I have to find a better way to balance checkpointing with log switches (one could of course argue that increasing the size effects that). thanks for your input. -peter hpuxrac wrote: > p.santos000@gmail.com wrote: > > Hey guys, > > I thought I try to get some help in trying to get this problem > > resolved. > > I've struggled with the 'Checkpoint not complete' error message > > before, and thought I had it > > resolved, but I can't seem to make it go way. > > > > I'm currently running 10.1.0.4 on Solaris 64bit. > > From my analysis of this issue, I get the "Checkpoint not complete" > > message nearly evertime > > during a log switch. From looking at the checkpoint error timestamps > > and the log switches, > > 99% of the time the CKPT process fails during the log swith. > > > > I've always been reluctant to checkpoint too much, but maybe I need > > to increase checkpoints, > > sot that the entire process is faster during log switches ????? > > > > Here is the data. (log siwtches are occurring around 20+ minutes) > > > > V$LOG > > ================= > > > > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME > > |LASTED_MIN > > > > ----------|----------|----------|---|---------------|--------------|---------- > > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| > > 21 > > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| > > 26 > > 3| 32200| 600|NO |CURRENT |10/27/06 10:11| > > 6 > > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| > > 27 > > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| > > 26 > > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| > > 40 > > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| > > 29 > > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| > > 10 > > > > > > ALERT_LOG (timestamps match log swith) > > ================= > > --- These all happen during log switch... > > > > Fri Oct 27 08:05:31 2006 > > Thread 1 cannot allocate new log, sequence 32195 > > Checkpoint not complete > > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log > > > > > > Fri Oct 27 08:45:18 2006 > > Thread 1 cannot allocate new log, sequence 32196 > > Checkpoint not complete > > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log > > > > > > Fri Oct 27 09:24:21 2006 > > Thread 1 cannot allocate new log, sequence 32198 > > Checkpoint not complete > > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log > > > > > > Fri Oct 27 09:44:59 2006 > > Thread 1 cannot allocate new log, sequence 32199 > > Checkpoint not complete > > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log > > > > > > Fri Oct 27 10:11:13 2006 > > Thread 1 cannot allocate new log, sequence 32200 > > Checkpoint not complete > > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log > > > > > > KEY PARAMETERS > > ===================== > > > > NAME |TYPE |VALUE > > -------------------------------------------|----------------|-------- > > log_checkpoint_interval |integer |80000 > > log_checkpoint_timeout |integer |1800 > > fast_start_mttr_target |integer |0 > > log_buffer |integer |524288 > > disk_asynch_io |boolean |TRUE > > > > ** Haven't configured fast_start_mttr_target because log_checK* > > parameters > > were already set when I upgrade from 8.1.7 and so It was just one less > > thing to > > worry about configuring... > > > > AWR report from 9-10 AM this morning > > ======================== > > Statistic |Total > > ----------------------------------------------------------- | ...... > > DBWR checkpoints | 4 > > background checkpoints completed | 3 > > background checkpoints started | 2 > > > > - the way I see it I'm probably not checkpointing enough .. which > > causes the ckpt > > process during log switches to take too long .. is this accurate ? > > > > BTW - fixing application to generate less redo, has been explored and > > we can't do any more on that front .... We generate anywhere between > > 200K - 1MB of redo per second ... averaging out around 600-700K per > > second. > > > > as always any feedback is helpful... > > Have you looked at the docs and questions Tom Kyte has produced in > regards to this at http://asktom.oracle.com yet? > > There are several good discussions search using " checkpoint not > complete ". > > Tom writes it up much better than my simple take on it, but the typical > tradeoff's to get away from this situation invovle these > considerations: 1) adding more logs 2) taking checkpoints more > frequently and 3) increasing the size of your logs. > > Read Tom's description for more relevant details. > > Or of course you can look at oracle support on metalink. |
| |||
| Joel, Here is what v$sgainfo looks like. NAME | BYTES --------------------------------|---------------- Fixed SGA Size | 1,309,712 Redo Buffers | 524,288 Buffer Cache Size | 2,147,483,648 Shared Pool Size | 1,073,741,824 Large Pool Size | 218,103,808 Java Pool Size | 33,554,432 Streams Pool Size | 0 Granule Size | 16,777,216 Maximum SGA Size | 6,442,450,944 Startup overhead in Shared Pool | 100,663,296 Free SGA Memory Available | 2,952,790,016 Our application does a lot of very quirky things that are painful for a DBA, but until we redesign our new application next year, I just have to find ways to deal with the thrashing of the database. For example, our customers at any point during the day can update in batches of 500,000 any column in their tables. While at the same time users can go only and update their own information... so the batch updates blocks these single record updates etc ... customers have their own tables ..causing lots of similar sql ... this is the reason for a 2GB buffer cache. So anytime the top 5 wait events change .. we can be really slow, then all of the sudden 3-5 customers start batch updating columns with anywhere from 15MM to 40MM records per table. .. this is in part of the reason we generate so much redo. Our top five events this morning were these. Top 5 Timed Events ~~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) DB Time Wait Class ------------------------------ ------------ ----------- --------- -------------- db file sequential read 4,598,287 28,970 35.84 User I/O CPU time 20,952 25.92 buffer busy waits 16,131 15,356 19.00 Concurrency db file parallel write 34,904 13,730 16.99 System I/O db file scattered read 292,277 6,568 8.13 User I/O ------------------------------------------------------------- 'buffer busy waits' was because of a batch update running in parallel... probably updating records in same/similar blocks. I understand that the LOG_CHECKPOINT parameters changed and although the recommendation is to use FAST_START_MTTR_TARGET, when I upgraded from 8.1.7 to 10.1.0.4 .. I had other battles to fight. -peter joel garry wrote: > p.santos000@gmail.com wrote: > > Hey guys, > > I thought I try to get some help in trying to get this problem > > resolved. > > I've struggled with the 'Checkpoint not complete' error message > > before, and thought I had it > > resolved, but I can't seem to make it go way. > > > > I'm currently running 10.1.0.4 on Solaris 64bit. > > From my analysis of this issue, I get the "Checkpoint not complete" > > message nearly evertime > > during a log switch. From looking at the checkpoint error timestamps > > and the log switches, > > 99% of the time the CKPT process fails during the log swith. > > > > I've always been reluctant to checkpoint too much, but maybe I need > > to increase checkpoints, > > sot that the entire process is faster during log switches ????? > > > > Here is the data. (log siwtches are occurring around 20+ minutes) > > > > V$LOG > > ================= > > > > GROUP#| SEQUENCE#| MBYTES|ARC|STATUS |FIRST_TIME > > |LASTED_MIN > > > > ----------|----------|----------|---|---------------|--------------|---------- > > 1| 32198| 600|YES|INACTIVE |10/27/06 09:24| > > 21 > > 2| 32199| 600|YES|INACTIVE |10/27/06 09:44| > > 26 > > 3| 32200| 600|NO |CURRENT |10/27/06 10:11| > > 6 > > 4| 32193| 600|YES|INACTIVE |10/27/06 07:12| > > 27 > > 5| 32194| 600|YES|INACTIVE |10/27/06 07:39| > > 26 > > 6| 32195| 600|YES|INACTIVE |10/27/06 08:05| > > 40 > > 7| 32196| 600|YES|INACTIVE |10/27/06 08:45| > > 29 > > 8| 32197| 600|YES|INACTIVE |10/27/06 09:14| > > 10 > > > > > > ALERT_LOG (timestamps match log swith) > > ================= > > --- These all happen during log switch... > > > > Fri Oct 27 08:05:31 2006 > > Thread 1 cannot allocate new log, sequence 32195 > > Checkpoint not complete > > Current log# 5 seq# 32194 mem# 0: /z1/oradata/mail/redo_5_5.log > > > > > > Fri Oct 27 08:45:18 2006 > > Thread 1 cannot allocate new log, sequence 32196 > > Checkpoint not complete > > Current log# 6 seq# 32195 mem# 0: /z0/oradata/mail/redo_6_6.log > > > > > > Fri Oct 27 09:24:21 2006 > > Thread 1 cannot allocate new log, sequence 32198 > > Checkpoint not complete > > Current log# 8 seq# 32197 mem# 0: /z0/oradata/mail/redo_8_8.log > > > > > > Fri Oct 27 09:44:59 2006 > > Thread 1 cannot allocate new log, sequence 32199 > > Checkpoint not complete > > Current log# 1 seq# 32198 mem# 0: /z1/oradata/mail/redo_1_1.log > > > > > > Fri Oct 27 10:11:13 2006 > > Thread 1 cannot allocate new log, sequence 32200 > > Checkpoint not complete > > Current log# 2 seq# 32199 mem# 0: /z0/oradata/mail/redo_2_2.log > > > > > > KEY PARAMETERS > > ===================== > > > > NAME |TYPE |VALUE > > -------------------------------------------|----------------|-------- > > log_checkpoint_interval |integer |80000 > > log_checkpoint_timeout |integer |1800 > > fast_start_mttr_target |integer |0 > > log_buffer |integer |524288 > > disk_asynch_io |boolean |TRUE > > > > ** Haven't configured fast_start_mttr_target because log_checK* > > parameters > > were already set when I upgrade from 8.1.7 and so It was just one less > > thing to > > worry about configuring... > > Are you saying you used 817 parameters for 10g? I hope I'm misreading > that. See the last sentence in > http://www.jlcomp.demon.co.uk/faq/log_checkpoint.html > > In addition to what Mark said, you should post what your top waits are, > and show SGA. I expect some issues on the size of your SGA and your > log_buffer. > > Also you may want to research the use of async i/o on your particular > system. > > > > > AWR report from 9-10 AM this morning > > ======================== > > Statistic |Total > > ----------------------------------------------------------- | ...... > > DBWR checkpoints | 4 > > background checkpoints completed | 3 > > background checkpoints started | 2 > > > > - the way I see it I'm probably not checkpointing enough .. which > > causes the ckpt > > process during log switches to take too long .. is this accurate ? > > > > BTW - fixing application to generate less redo, has been explored and > > we can't do any more on that front .... We generate anywhere between > > 200K - 1MB of redo per second ... averaging out around 600-700K per > > second. > > > > as always any feedback is helpful... > > You may need more and smaller logs with a bigger log buffer. > > 45 minutes for a checkpoint not to complete is rather worrisome. Your > application code may be doing something that keeps fighting over dirty > buffers - do you have something that repeatedly updates a small number > of rows? > > jg > -- > @home.com is bogus. > So which DS should I buy the kids? > http://www.signonsandiego.com/uniont..._1b27sony.html |
| ||||
| p.santos000@gmail.com wrote: > Thanks for the feedback, > I'll be sure to check out Tom's site.. probably tonight. > I have increased the logs files from 200MB to their current value of > 600MB. > Unfortunately, our application generates lots of redo, and there is > very little > I can do about it. Adding more log files is really not going to > help. > At this point, I have to find a better way to balance checkpointing > with log switches (one could of course argue that increasing the size > effects that). > > thanks for your input. > -peter I would do the searches on asktom and read what he has to say on the subject sooner rather than later. There's a good amount of information and give and take from various different posters. Do the research and understand what you are doing and why before making other changes is my recommendation. Most of the people that hang out on cdos don't appreciate people that respond with "top posting". They much prefer people to add on additional comments notes at the bottom. Good luck. |