This is a discussion on Checkpoint logging, revised patch within the Pgsql Patches forums, part of the PostgreSQL category; --> Here's a reworked version Greg Smith's patch for logging checkpoints. It adds a GUC variable, log_checkpoints, along the lines ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Here's a reworked version Greg Smith's patch for logging checkpoints. It adds a GUC variable, log_checkpoints, along the lines of log_connections and log_autovacuum. Enabling it causes two lines to be printed to the system log per checkpoint, like this: LOG: checkpoint starting LOG: checkpoint complete; buffers written=5869 (35.8%); write=2.081 s, sync=4.851 s, total=7.066 s The purpose of the first message is that by inspecting the logs, you can easily see when a checkpoint is/was active. And if you experience mysterious slowdowns etc., you can see if they coincide with a checkpoint if you have log_min_duration enabled. The completion message prints stats conveniently in a single line so it can be easily grepped and parsed from the logs. The implementation uses a global CheckpointStats-struct where data is collected during the checkpoint. At the end, it's spit out to the log as a nicely formatted message. I refactored PreallocXlogFiles and RemoveXlogFiles to use that struct as well for reporting the # of WAL segments removed, added and recycled. Global variables are generally not considered good style, but IMHO in this case it's the cleanest approach. Another options would be to pass down a reference to the struct to all functions involved in checkpointing, or return individual numbers from all functions and collect them together in CreateCheckPoint. Both alternatives would complicate the function signatures significantly. There's precedent for this approach in BgWriterStats and the buffer usage counters (BufferHitCount et al). One thing that's missing, that I originally hoped to achieve with this, is logging the cause of a checkpoint. Doing that would have required either sprinkling elogs to all callers of RequestCheckpoint, or adding a "reason" parameter to it and decoding that to a log message inside RequestCheckpoint, which seems a bit ugly just for logging purposes. Also, that would have meant printing one more log message per checkpoint, making it more chatty than I'd like it to be. Instead, I'm just printing a different log message for immediate checkpoint requests. Differentiating between immediate checkpoints triggered by a command like CHECKPOINT or CREATE DATABASE and normal checkpoints should be enough in practice. I'm done with this. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > One thing that's missing, that I originally hoped to achieve with this, > is logging the cause of a checkpoint. Doing that would have required > either sprinkling elogs to all callers of RequestCheckpoint, or adding a > "reason" parameter to it and decoding that to a log message inside > RequestCheckpoint, which seems a bit ugly just for logging purposes. It strikes me that we could get most of the mileage needed here just by logging all of the "flags" bits. In particular, it would cost almost nothing to define an extra flag bit that's set by the bgwriter when a time-based checkpoint is triggered, and then we would have flag bits telling us: * time-based checkpoint * xlog-based checkpoint * immediate checkpoint * shutdown checkpoint and that would really cover pretty much everything you'd need to know. Does anyone particularly care whether an immediate checkpoint is triggered by CHECKPOINT, CREATE DATABASE or DROP DATABASE? If anyone *does* care, there are plenty of leftover flag bits that we could use to pass that info, but it seems like gilding the lily. > Also, that would have meant printing one more log message per > checkpoint, making it more chatty than I'd like it to be. I'm envisioning something like LOG: checkpoint starting: immediate force wait where we're just adding a word to the message for each flag bit. No more lines than you have. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Fri, 29 Jun 2007, Heikki Linnakangas wrote: > LOG: checkpoint complete; buffers written=5869 (35.8%); write=2.081 s, > sync=4.851 s, total=7.066 s My original patch converted the buffers written to MB. Easier to estimate MB/s by eye; I really came to hate multiplying by 8K. And people who have multiple boxes with different BLCKZ could merge their logs together and not have to worry about adjusting for that. I found this simpler to present the results to non-PostgreSQL people without having to explain the buffer size. It's easy for any IT person to follow; the idea I was working toward was "see, this log entry shows the long pauses are because it's writing 700MB of data all at once here, right next to that statment that took 10 seconds we found with log_min_duration_statement". The current equivilant of what I had would be CheckpointStats.ckpt_bufs_written * BLCKSZ / (1024*1024) formatted as "%.1f MB" > One thing that's missing, that I originally hoped to achieve with this, > is logging the cause of a checkpoint. The way pg_stat_bgwriter keeps two separate counts for requested (which includes timeouts) and required (out of segments) checkpoints now satisfies the main reason I originally collected that info. I felt it was critical when I wrote that patch for it to be possible to distinguish between which of the two was happening more, it's only in the nice to have category now. Now, onto some slightly different things here, which were all aimed at developer-level troubleshooting. With the nice reworking for logging checkpoints better, there were three tiny things my original patch did that got lost along the way that I'd suggest Tom might want to consider putting back during the final apply. I'll include mini pseudo-diffs here for those so no one has to find them in my original patch, they're all one-liners: 1) Log every time a new WAL file was created, which ties into the recent discussion here that being a possible performance issue. At least you can look for it happening this way: src/backend/access/transam/xlog.c --- 1856,1863 ---- (errcode_for_file_access(), errmsg("could not create file \"%s\": %m", tmppath))); + ereport(DEBUG2, (errmsg("WAL creating and filling new file on disk"))); /* * Zero-fill the file. We have to do this the hard way to ensure that all 2) Add a lower-level DEBUG statement when autovaccum was finished, which helped me in several causes figure out if that was the cause of a problem (when really doing low-level testing, I would see a vacuum start, not know if it was done, and then wonder if that was the cause of a slow statement): *** src/backend/postmaster/autovacuum.c --- 811,814 ---- do_autovacuum(); + ereport(DEBUG2, + (errmsg("autovacuum: processing database \"%s\" complete", dbname))); } 3) I fixed a line in postmaster.c so it formatted fork PIDs the same way most other log statements do; most statements report it as (PID %d) and the difference in this form seemed undesirable (I spent a lot of time at DEBUG2 and these little things started to bug me): *** src/backend/postmaster/postmaster.c *** 2630,2636 **** /* in parent, successful fork */ ereport(DEBUG2, ! (errmsg_internal("forked new backend, pid=%d socket=%d", (int) pid, port->sock))); --- 2630,2636 ---- /* in parent, successful fork */ ereport(DEBUG2, ! (errmsg_internal("forked new backend (PID %d) socket=%d", (int) pid, port->sock))); Little stuff, but all things I've found valuable on several occasions, which suggests eventually someone else may appreciate them as well. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Greg Smith wrote: > On Fri, 29 Jun 2007, Heikki Linnakangas wrote: > >> LOG: checkpoint complete; buffers written=5869 (35.8%); write=2.081 >> s, sync=4.851 s, total=7.066 s > > My original patch converted the buffers written to MB. Easier to > estimate MB/s by eye; I really came to hate multiplying by 8K. And > people who have multiple boxes with different BLCKZ could merge their > logs together and not have to worry about adjusting for that. > > I found this simpler to present the results to non-PostgreSQL people > without having to explain the buffer size. It's easy for any IT person > to follow; the idea I was working toward was "see, this log entry shows > the long pauses are because it's writing 700MB of data all at once here, > right next to that statment that took 10 seconds we found with > log_min_duration_statement". > > The current equivilant of what I had would be > CheckpointStats.ckpt_bufs_written * BLCKSZ / (1024*1024) formatted as > "%.1f MB" I don't think we currently use MB in any other log messages. If we go down that route, we need to consider switching to MB everywhere. > 1) Log every time a new WAL file was created, which ties into the recent > discussion here that being a possible performance issue. At least you > can look for it happening this way: > > src/backend/access/transam/xlog.c > --- 1856,1863 ---- > (errcode_for_file_access(), > errmsg("could not create file \"%s\": > %m", tmppath))); > > + ereport(DEBUG2, (errmsg("WAL creating and filling new file on > disk"))); > > /* > * Zero-fill the file. We have to do this the hard way to > ensure that all This could be useful. > 2) Add a lower-level DEBUG statement when autovaccum was finished, which > helped me in several causes figure out if that was the cause of a > problem (when really doing low-level testing, I would see a vacuum > start, not know if it was done, and then wonder if that was the cause of > a slow statement): > > *** src/backend/postmaster/autovacuum.c > --- 811,814 ---- > do_autovacuum(); > + ereport(DEBUG2, > + (errmsg("autovacuum: processing database > \"%s\" complete", dbname))); > } Did you check out log_autovacuum? Doesn't it do what you need? > 3) I fixed a line in postmaster.c so it formatted fork PIDs the same way > most other log statements do; most statements report it as (PID %d) and > the difference in this form seemed undesirable (I spent a lot of time at > DEBUG2 and these little things started to bug me): > > *** src/backend/postmaster/postmaster.c > *** 2630,2636 **** > /* in parent, successful fork */ > ereport(DEBUG2, > ! (errmsg_internal("forked new backend, pid=%d > socket=%d", > (int) pid, > port->sock))); > --- 2630,2636 ---- > /* in parent, successful fork */ > ereport(DEBUG2, > ! (errmsg_internal("forked new backend (PID %d) > socket=%d", > (int) pid, > port->sock))); Hmm. Since it's DEBUG2 I don't care much either way. The changed message looks inconsistent to me, since socket is printed differently. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Sat, 30 Jun 2007, Heikki Linnakangas wrote: > I don't think we currently use MB in any other log messages. If we go down > that route, we need to consider switching to MB everywhere. Are there a lot of messages that speak in terms of buffer pages though? Other than the debug-level stuff, which this checkpoint message has now been elevated out of, I don't recall seeing enough things that spoke in those terms to consider it a standard. Someone feel free to correct me if I'm wrong here. > Did you check out log_autovacuum? Doesn't it do what you need? I have not (most of this patch was done against 8.2 and then ported forward). If that already shows clearly the start and end of each autovacuum section, ignore that I even brought this up. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > Here's a reworked version Greg Smith's patch for logging checkpoints. Applied with minor editorialization. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > Greg Smith wrote: >> My original patch converted the buffers written to MB. > I don't think we currently use MB in any other log messages. If we go > down that route, we need to consider switching to MB everywhere. I left this as submitted by Heikki, mainly because the percentage-of- shared-buffers bit seemed useful to me and it would make less sense IMHO if attached to a number of megabytes instead of a number of buffers. But it's a judgment call for sure. Any other opinions out there? >> 1) Log every time a new WAL file was created, which ties into the recent >> discussion here that being a possible performance issue. > This could be useful. Done; I put in two DEBUG2 messages, one at start and one at completion of the file-creation. >> 2) Add a lower-level DEBUG statement when autovaccum was finished, > Did you check out log_autovacuum? Doesn't it do what you need? I concur that log_autovacuum seems to cover this already. >> 3) I fixed a line in postmaster.c so it formatted fork PIDsthe same way >> most other log statements do; most statements report it as (PID %d) and >> the difference in this form seemed undesirable > Hmm. Since it's DEBUG2 I don't care much either way. The changed message > looks inconsistent to me, since socket is printed differently. No strong opinion, but I left it as-is for the moment in case Alvaro is about to commit something in postmaster.c (I suspect the double-shutdown business is a bug in there). Don't want to cause merge problems for him just for cosmetic message cleanup. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |