This is a discussion on Re: Checkpoint cost, looks like it is WAL/CRC within the pgsql Hackers forums, part of the PostgreSQL category; --> > Here's an idea: > > We read the page that we would have backed up, calc the CRC ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > Here's an idea: > > We read the page that we would have backed up, calc the CRC and > write a short WAL record with just the CRC, not the block. When > we recover we re-read the database page, calc its CRC and > compare it with the CRC from the transaction log. If they > differ, we know that the page was torn and we know the database > needs recovery. (So we calc the CRC when we log AND when we recover). Won't work, since the page on disk may have x different contents between 2 checkpoints (bgwriter from lru). Only workable solution would imho be to write the LSN to each 512 byte block (not that I am propagating that idea). Andreas ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| "Zeugswetter Andreas DAZ SD" <ZeugswetterA@spardat.at> writes: > Only workable solution would imho be to write the LSN to each 512 > byte block (not that I am propagating that idea). We're not doing anything like that, as it would create an impossible space-management problem (or are you happy with limiting tuples to 500 bytes?). What we *could* do is calculate a page-level CRC and store it in the page header just before writing out. Torn pages would then manifest as a wrong CRC on read. No correction ability, but at least a reliable detection ability. However, this will do nothing to solve the performance problem if the core of that problem is the cost of computing page-sized CRCs :-( We still don't know enough about the situation to know what a solution might look like. Is the slowdown Josh is seeing due to the extra CPU cost of the CRCs, or the extra I/O cost, or excessive locking of the WAL-related data structures while we do this stuff, or ???. Need more data. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster |
| |||
| I wrote: > We still don't know enough about the situation to know what a solution > might look like. Is the slowdown Josh is seeing due to the extra CPU > cost of the CRCs, or the extra I/O cost, or excessive locking of the > WAL-related data structures while we do this stuff, or ???. Need more > data. Josh, is OSDL up enough that you can try another comparison run? If so, please undo the previous patch (which disabled page dumping entirely) and instead try removing this block of code, starting at about xlog.c line 620 in CVS tip: /* * Now add the backup block headers and data into the CRC */ for (i = 0; i < XLR_MAX_BKP_BLOCKS; i++) { if (dtbuf_bkp[i]) { BkpBlock *bkpb = &(dtbuf_xlg[i]); char *page; COMP_CRC32(rdata_crc, (char *) bkpb, sizeof(BkpBlock)); page = (char *) BufferGetBlock(dtbuf[i]); if (bkpb->hole_length == 0) { COMP_CRC32(rdata_crc, page, BLCKSZ); } else { /* must skip the hole */ COMP_CRC32(rdata_crc, page, bkpb->hole_offset); COMP_CRC32(rdata_crc, page + (bkpb->hole_offset + bkpb->hole_length), BLCKSZ - (bkpb->hole_offset + bkpb->hole_length)); } } } This will remove just the CRC calculation work associated with backed-up pages. Note that any attempt to recover from the WAL will fail, but I assume you don't need that for the purposes of the test run. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend |
| |||
| Tom Lane <tgl@sss.pgh.pa.us> writes: > "Zeugswetter Andreas DAZ SD" <ZeugswetterA@spardat.at> writes: > > Only workable solution would imho be to write the LSN to each 512 > > byte block (not that I am propagating that idea). > > We're not doing anything like that, as it would create an impossible > space-management problem (or are you happy with limiting tuples to > 500 bytes?). What we *could* do is calculate a page-level CRC and > store it in the page header just before writing out. Torn pages > would then manifest as a wrong CRC on read. No correction ability, > but at least a reliable detection ability. At the same time as you do the CRC you can copy the bytes to a fresh page skipping the LSNs. Likewise, when writing out the page you have to calculate the CRC; at the same time as you calculate the CRC you write out the bytes to a temporary buffer adding LSNs and write that to disk. This would be "zero-copy" if you're already scanning the bytes to calculate the CRC since you can add and remove LSNs at the same time. It does require an extra buffer to store the page in before writing and that entails some amount of cache thrashing. But maybe you could reuse the same buffer over and over again for every read/write. -- greg ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> What we *could* do is calculate a page-level CRC and >> store it in the page header just before writing out. Torn pages >> would then manifest as a wrong CRC on read. No correction ability, >> but at least a reliable detection ability. > At the same time as you do the CRC you can copy the bytes to a fresh page > skipping the LSNs. Likewise, when writing out the page you have to calculate > the CRC; at the same time as you calculate the CRC you write out the bytes to > a temporary buffer adding LSNs and write that to disk. Huh? You seem to be proposing doing *both* things, which sounds entirely pointless. BTW, I was envisioning the page CRCs as something we'd only check during crash recovery, not normal-operation reads. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| Tom, > Josh, is OSDL up enough that you can try another comparison run? Thankfully, yes. > If so, please undo the previous patch (which disabled page dumping > entirely) and instead try removing this block of code, starting > at about xlog.c line 620 in CVS tip: Will do. Results in a few days. -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, Jul 07, 2005 at 11:36:40AM -0400, Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> What we *could* do is calculate a page-level CRC and > >> store it in the page header just before writing out. Torn pages > >> would then manifest as a wrong CRC on read. No correction ability, > >> but at least a reliable detection ability. > > > At the same time as you do the CRC you can copy the bytes to a fresh page > > skipping the LSNs. Likewise, when writing out the page you have to calculate > > the CRC; at the same time as you calculate the CRC you write out the bytes to > > a temporary buffer adding LSNs and write that to disk. > > Huh? You seem to be proposing doing *both* things, which sounds entirely > pointless. > > BTW, I was envisioning the page CRCs as something we'd only check during > crash recovery, not normal-operation reads. > > regards, tom lane > Does the DB page on disk have a version number? If so, maybe we could update the WAL with the CRC+version anytime the page is update. You may need to check the log for multiple CRC+version entries to determine the torn-page status. Ken ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |
| |||
| Josh Berkus <josh@agliodbs.com> writes: >> If so, please undo the previous patch (which disabled page dumping >> entirely) and instead try removing this block of code, starting >> at about xlog.c line 620 in CVS tip: > Will do. Results in a few days. Great. BTW, don't bother testing snapshots between 2005/07/05 2300 EDT and just now --- Bruce's full_page_writes patch introduced a large random negative component into the timing ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, 7 Jul 2005, Tom Lane wrote: > We still don't know enough about the situation to know what a solution > might look like. Is the slowdown Josh is seeing due to the extra CPU > cost of the CRCs, or the extra I/O cost, or excessive locking of the > WAL-related data structures while we do this stuff, or ???. Need more > data. I wonder if a different BLCKSZ would make a difference either way. Say, 1024 bytes instead of the default 8192. - Heikki ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| Tom, > Great. BTW, don't bother testing snapshots between 2005/07/05 2300 EDT > and just now --- Bruce's full_page_writes patch introduced a large > random negative component into the timing ... Ach. Starting over, then. --Josh -- Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |