Unix Technical Forum

Re: [GENERAL] Undetected corruption of table files

This is a discussion on Re: [GENERAL] Undetected corruption of table files within the pgsql Hackers forums, part of the PostgreSQL category; --> Tom Lane wrote: >> Would it be an option to have a checksum somewhere in each >> data block ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 10:40 PM
Albe Laurenz
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

Tom Lane wrote:
>> Would it be an option to have a checksum somewhere in each
>> data block that is verified upon read?

>
> That's been proposed before and rejected before. See the archives ...


I searched for "checksum" and couldn't find it. Could someone
give me a pointer? I'm not talking about WAL files here.

Thanks,
Laurenz Albe

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 10:40 PM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

"Albe Laurenz" <all@adv.magwien.gv.at> writes:
> Tom Lane wrote:
>>> Would it be an option to have a checksum somewhere in each
>>> data block that is verified upon read?


>> That's been proposed before and rejected before. See the archives ...


> I searched for "checksum" and couldn't find it. Could someone
> give me a pointer? I'm not talking about WAL files here.


"CRC" maybe? Also, make sure your search goes all the way back; I think
the prior discussions were around the same time WAL was initially put
in, and/or when we dropped the WAL CRC width from 64 to 32 bits.
The very measurable overhead of WAL CRCs are the main thing that's
discouraged us from having page CRCs. (Well, that and the lack of
evidence that they'd actually gain anything.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 10:40 PM
Jonah H. Harris
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

On 8/27/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> that and the lack of evidence that they'd actually gain anything


I find it somewhat ironic that PostgreSQL strives to be fairly
non-corruptable, yet has no way to detect a corrupted page. The only
reason for not having CRCs is because it will slow down performance...
which is exactly opposite of conventional PostgreSQL wisdom (no
performance trade-off for durability).

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 10:40 PM
Trevor Talbot
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

On 8/27/07, Jonah H. Harris <jonah.harris@gmail.com> wrote:
> On 8/27/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > that and the lack of evidence that they'd actually gain anything

>
> I find it somewhat ironic that PostgreSQL strives to be fairly
> non-corruptable, yet has no way to detect a corrupted page. The only
> reason for not having CRCs is because it will slow down performance...
> which is exactly opposite of conventional PostgreSQL wisdom (no
> performance trade-off for durability).


But how does detecting a corrupted data page gain you any durability?
All it means is that the platform underneath screwed up, and you've
already *lost* durability. What do you do then?

It seems like the same idea as an application trying to detect RAM errors.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 10:40 PM
Gregory Stark
 
Posts: n/a
Default Re: Undetected corruption of table files

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Albe Laurenz" <all@adv.magwien.gv.at> writes:
>> Tom Lane wrote:
>>>> Would it be an option to have a checksum somewhere in each
>>>> data block that is verified upon read?

>
>>> That's been proposed before and rejected before. See the archives ...

>
>> I searched for "checksum" and couldn't find it. Could someone
>> give me a pointer? I'm not talking about WAL files here.

>
> "CRC" maybe? Also, make sure your search goes all the way back; I think
> the prior discussions were around the same time WAL was initially put
> in, and/or when we dropped the WAL CRC width from 64 to 32 bits.
> The very measurable overhead of WAL CRCs are the main thing that's
> discouraged us from having page CRCs. (Well, that and the lack of
> evidence that they'd actually gain anything.)


I thought we determined the reason WAL CRCs are expensive is because we have
to checksum each WAL record individually. I recall the last time this came up
I ran some microbenchmarks and found that the cost to CRC an entire 8k block
was on the order of tens of microseconds.

The last time it came up was in the context of allowing turning off
full_page_writes but offering a guarantee that torn pages would be detected on
recovery and no later. I was a proponent of using writev to embed bytes in
each 512 byte block and Jonah said it would be no faster than a CRC (and
obviously considerably more complicated). My benchmarks showed that Jonah was
right and the CRC was cheaper than a the added cost of using writev.

I do agree the benefits of having a CRC are overstated. Most times corruption
is caused by bad memory and a CRC will happily checksum the corrupted memory
just fine. A checksum is no guarantee. But I've also seen data corruption
caused by bad memory in an i/o controller, for example. There are always going
to be cases where it could help.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 10:40 PM
Alban Hertroys
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

Jonah H. Harris wrote:
> On 8/27/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> that and the lack of evidence that they'd actually gain anything

>
> I find it somewhat ironic that PostgreSQL strives to be fairly
> non-corruptable, yet has no way to detect a corrupted page. The only
> reason for not having CRCs is because it will slow down performance...
> which is exactly opposite of conventional PostgreSQL wisdom (no
> performance trade-off for durability).


Why? I can't say I speak for the developers, but I think the reason is
that data corruption can (with the very rare exception of undetected
programming errors) only be caused by hardware problems.

If you have a "proper" production database server, your memory has error
checking, and your RAID controller has something of the kind as well. If
not you would probably be running the database on a filesystem that has
reliable integrity verification mechanisms.

In the worst case (all the above mechanisms fail), you have backups.

IMHO the problem is covered quite adequately. The operating system and
the hardware cover for the database, as they should; it's _their_ job.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 10:40 PM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

"Trevor Talbot" <quension@gmail.com> writes:
> On 8/27/07, Jonah H. Harris <jonah.harris@gmail.com> wrote:
>> I find it somewhat ironic that PostgreSQL strives to be fairly
>> non-corruptable, yet has no way to detect a corrupted page.


> But how does detecting a corrupted data page gain you any durability?
> All it means is that the platform underneath screwed up, and you've
> already *lost* durability. What do you do then?


Indeed. In fact, the most likely implementation of this (refuse to do
anything with a page with a bad CRC) would be a net loss from that
standpoint, because you couldn't get *any* data out of a page, even if
only part of it had been zapped.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-15-2008, 10:40 PM
Jonah H. Harris
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

On 8/27/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Indeed. In fact, the most likely implementation of this (refuse to do
> anything with a page with a bad CRC) would be a net loss from that
> standpoint, because you couldn't get *any* data out of a page, even if
> only part of it had been zapped.


At least you would know it was corrupted, instead of getting funky
errors and/or crashes.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-15-2008, 10:40 PM
Decibel!
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

On Mon, Aug 27, 2007 at 12:08:17PM -0400, Jonah H. Harris wrote:
> On 8/27/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Indeed. In fact, the most likely implementation of this (refuse to do
> > anything with a page with a bad CRC) would be a net loss from that
> > standpoint, because you couldn't get *any* data out of a page, even if
> > only part of it had been zapped.


I think it'd be perfectly reasonable to have a mode where you could
bypass the check so that you could see what was in the corrupted page
(as well as deleting everything on the page so that you could "fix" the
corruption). Obviously, this should be restricted to superusers.

> At least you would know it was corrupted, instead of getting funky
> errors and/or crashes.


Or worse, getting what appears to be perfectly valid data, but isn't.
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.3 (FreeBSD)

iD8DBQFG0x4ydO30qud8SkgRAiavAKDV74KbXrSkvF6L6zpo1S ScksisjQCbBGa1
tGaJocyArEjao/wU6LSmxZ0=
=ZyKA
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-15-2008, 10:41 PM
Albe Laurenz
 
Posts: n/a
Default Re: [GENERAL] Undetected corruption of table files

Tom Lane wrote:
>>>> Would it be an option to have a checksum somewhere in each
>>>> data block that is verified upon read?

>
>>> That's been proposed before and rejected before. See the
>>> archives ...

>
> I think
> the prior discussions were around the same time WAL was initially put
> in, and/or when we dropped the WAL CRC width from 64 to 32 bits.
> The very measurable overhead of WAL CRCs are the main thing that's
> discouraged us from having page CRCs. (Well, that and the lack of
> evidence that they'd actually gain anything.)


Hmmm - silence me if I'm misunderstanding this, but the most
conclusive hit I had was a mail by you:

http://archives.postgresql.org/pgsql...0/msg01142.php

which only got affirmative feedback.

Also, there's a TODO entry:

- Add optional CRC checksum to heap and index pages

This seems to me to be exactly what I wish for...

To the best of my knowledge, the most expensive thing in databases
today is disk I/O, because CPU speed is increasing faster. Although
calculating a checksum upon writing a block to disk will
certainly incur CPU overhead, what may have seemed too expensive
a couple of years ago could be acceptable today.

I understand the argument that it's the task of hardware and
OS to see that data don't get corrupted, but it would improve
PostgreSQL's reliabitity if it can detect such errors and at
least issue a warning.
This wouldn't fix the underlying problem, but it would tell you
to not overwrite last week's backup tape...

Not all databases are on enterprise scale storage systems, and
there's also the small possibility of PostgreSQL bugs that could
be detected that way.

Yours,
Laurenz Albe

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:15 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com