Unix Technical Forum

Re: [GENERAL] PANIC: heap_update_redo: no block

This is a discussion on Re: [GENERAL] PANIC: heap_update_redo: no block within the pgsql Hackers forums, part of the PostgreSQL category; --> [ redirecting to a more appropriate mailing list ] "Alex bahdushka" <bahdushka@gmail.com> writes: > LOG: REDO @ D/19176610; LSN ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 02:45 AM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

[ redirecting to a more appropriate mailing list ]

"Alex bahdushka" <bahdushka@gmail.com> writes:

> LOG: REDO @ D/19176610; LSN D/19176644: prev D/191765E8; xid 81148979: Heap - clean: rel 1663/16386/16559898; blk 0
> LOG: REDO @ D/19176644; LSN D/191766A4: prev D/19176610; xid 81148979: Heap - move: rel 1663/16386/16559898; tid 1/1; new 0/10
> PANIC: heap_update_redo: no block: target blcknum: 1, relation(1663/16386/16559898) length: 1


I think what's happened here is that VACUUM FULL moved the only tuple
off page 1 of the relation, then truncated off page 1, and now
heap_update_redo is panicking because it can't find page 1 to replay the
move. Curious that we've not seen a case like this before, because it
seems like a generic hazard for WAL replay.

The simplest fix would be to treat WAL records as no-ops if they refer
to nonexistent pages, but that seems much too prone to hide real failure
conditions. Another thought is to remember that we ignored this record,
and then complain if we don't see a TRUNCATE that would've removed the
page. That would be pretty complicated but not impossible. Anyone have
a better idea?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 02:45 AM
Greg Stark
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block


Tom Lane <tgl@sss.pgh.pa.us> writes:

> I think what's happened here is that VACUUM FULL moved the only tuple
> off page 1 of the relation, then truncated off page 1, and now
> heap_update_redo is panicking because it can't find page 1 to replay the
> move. Curious that we've not seen a case like this before, because it
> seems like a generic hazard for WAL replay.


This sounds familiar

http://archives.postgresql.org/pgsql...5/msg01369.php


--
greg


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 02:45 AM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

Greg Stark <gsstark@mit.edu> writes:
> This sounds familiar
> http://archives.postgresql.org/pgsql...5/msg01369.php


Hm, I had totally forgotten about that todo item :-(. Time to push it
back up the priority list.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 02:45 AM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I think what's happened here is that VACUUM FULL moved the only tuple
>> off page 1 of the relation, then truncated off page 1, and now
>> heap_update_redo is panicking because it can't find page 1 to replay the
>> move. Curious that we've not seen a case like this before, because it
>> seems like a generic hazard for WAL replay.


> This sounds familiar
> http://archives.postgresql.org/pgsql...5/msg01369.php


After further review I've concluded that there is not a systemic bug
here, but there are several nearby local bugs. The reason it's not
a systemic bug is that this scenario is supposed to be handled by the
same mechanism that prevents torn-page writes: the first XLOG record
that touches a given page after a checkpoint is supposed to rewrite
the entire page, rather than update it incrementally. Since XLOG replay
always begins at a checkpoint, this means we should always be able to
write a fresh copy of the page, even after relation deletion or
truncation. Furthermore, during XLOG replay we are willing to create
a table (or even a whole tablespace or database directory) if it's not
there when touched. The subsequent replay of the deletion or truncation
will get rid of any unwanted data again.

Therefore, there is no systemic bug --- unless you are running with
full_page_writes=off. I assert that that GUC variable is broken and
must be removed.

There are, however, a bunch of local bugs, including these:

* On a symlink-less platform (ie, Windows), TablespaceCreateDbspace is
#ifdef'd to be a no-op. This is wrong because it performs the essential
function of re-creating a tablespace or database directory if needed
during replay. AFAICS the #if can just be removed and have the same
code with or without symlinks.

* log_heap_update decides that it can set XLOG_HEAP_INIT_PAGE instead
of storing the full destination page, if the destination contains only
the single tuple being moved. This is fine, except it also resets the
buffer indicator for the *source* page, which is wrong --- that page
may still need to be re-generated from the xlog record. This is the
proximate cause of the bug report that started this thread.

* btree_xlog_split passes extend=false to XLogReadBuffer for the left
sibling, which is silly because it is going to rewrite that whole page
from the xlog record anyway. It should pass true so that there's no
complaint if the left sib page was later truncated away. This accounts
for one of the bug reports mentioned in the message cited above.

* btree_xlog_delete_page passes extend=false for the target page,
which is likewise silly because it's going to init the page (not that
there was any useful data on it anyway). This accounts for the other
bug report mentioned in the message cited above.

Clearly, we need to go through the xlog code with a fine tooth comb
and convince ourselves that all pages touched by any xlog record will
be properly reconstituted if they've later been truncated off. I have
not yet examined any of the code except the above.

Notice that these are each, individually, pretty low-probability
scenarios, which is why we've not seen many bug reports. If we had had
a systemic bug I'm sure we'd be seeing far more.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 02:46 AM
Simon Riggs
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

On Mon, 2006-03-27 at 22:03 -0500, Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> I think what's happened here is that VACUUM FULL moved the only tuple
> >> off page 1 of the relation, then truncated off page 1, and now
> >> heap_update_redo is panicking because it can't find page 1 to replay the
> >> move. Curious that we've not seen a case like this before, because it
> >> seems like a generic hazard for WAL replay.

>
> > This sounds familiar
> > http://archives.postgresql.org/pgsql...5/msg01369.php


Yes, I remember that also.

> After further review I've concluded that there is not a systemic bug
> here, but there are several nearby local bugs.


IMHO that's amazing to find so many bugs in a code review of existing
production code. Cool.

> The reason it's not
> a systemic bug is that this scenario is supposed to be handled by the
> same mechanism that prevents torn-page writes: the first XLOG record
> that touches a given page after a checkpoint is supposed to rewrite
> the entire page, rather than update it incrementally. Since XLOG replay
> always begins at a checkpoint, this means we should always be able to
> write a fresh copy of the page, even after relation deletion or
> truncation. Furthermore, during XLOG replay we are willing to create
> a table (or even a whole tablespace or database directory) if it's not
> there when touched. The subsequent replay of the deletion or truncation
> will get rid of any unwanted data again.


That will all work, agreed.

> The subsequent replay of the deletion or truncation
> will get rid of any unwanted data again.


Trouble is, it is not a watertight assumption that there *will be* a
subsequent truncation, even if it is a strong one. If there is not a
later truncation, we will just ignore what we ought to now know is an
error and then try to continue as if the database was fine, which it
would not be.

The overall problem is that auto extension fails to take action or
provide notification with regard to file system corruptions. Clearly we
would like xlog replay to work even in the face of strong file
corruptions, but we should make attempts to identify this situation and
notify people that this has occurred.

I'd suggest both WARNING messages in the log and something more extreme
still: anyone touching a corrupt table should receive a NOTICE saying
"database recovery displayed errors for this table" "HINT: check the
database logfiles for specific messages". Indexes should have a log
WARNING saying "database recovery displayed errors for this index"
"HINT: use REINDEX to rebuild this index".

So I guess I had better help if we agree this is beneficial.

> Therefore, there is no systemic bug --- unless you are running with
> full_page_writes=off. I assert that that GUC variable is broken and
> must be removed.


On this analysis, I would agree for current production systems. But what
this says is something deeper: we must log full pages, not because we
fear a partial page write has occurred, but because the xlog mechanism
intrinsically depends upon the existence of those full pages after each
checkpoint.

The writing of full pages in this way is a serious performance issue
that it would be good to improve upon. Perhaps this is the spur to
discuss a new xlog format that would support higher performance logging
as well as log-mining for replication?

> There are, however, a bunch of local bugs, including these:


....

> Notice that these are each, individually, pretty low-probability
> scenarios, which is why we've not seen many bug reports.


Most people don't file bug reports. If we have a recovery mode that
ignores file system corruptions we'll get even less because any errors
that occur will be deemed as gamma rays or some other excuse.

> a systemic bug


Perhaps we do have one systemic problem: systems documentation.

The xlog code is distinct from other parts of the codebase in that it
has almost zero comments with it and the overall mechanisms are
relatively poorly documented in README form. Methinks there are very few
people who could attempt such a code review and even fewer who would
find any bugs by inspection. I'll think some more on that...

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 02:46 AM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

Simon Riggs <simon@2ndquadrant.com> writes:
> On Mon, 2006-03-27 at 22:03 -0500, Tom Lane wrote:
>> The subsequent replay of the deletion or truncation
>> will get rid of any unwanted data again.


> Trouble is, it is not a watertight assumption that there *will be* a
> subsequent truncation, even if it is a strong one.


Well, in fact we'll have correctly recreated the page, so I'm not
thinking that it's necessary or desirable to check this. What's the
point? "PANIC: we think your filesystem screwed up. We don't know
exactly how or why, and we successfully rebuilt all our data, but
we're gonna refuse to start up anyway." Doesn't seem like robust
behavior to me. If you check the archives you'll find that we've
backed off panic-for-panic's-sake behaviors in replay several times
before, after concluding they made the system less robust rather than
more so. This just seems like another one of the same.

> Perhaps we do have one systemic problem: systems documentation.


I agree on that ;-). The xlog code is really poorly documented.
I'm going to try to improve the comments for at least the xlogutils
routines while I'm fixing this.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 02:46 AM
Jim C. Nasby
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

On Tue, Mar 28, 2006 at 10:07:35AM -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Mon, 2006-03-27 at 22:03 -0500, Tom Lane wrote:
> >> The subsequent replay of the deletion or truncation
> >> will get rid of any unwanted data again.

>
> > Trouble is, it is not a watertight assumption that there *will be* a
> > subsequent truncation, even if it is a strong one.

>
> Well, in fact we'll have correctly recreated the page, so I'm not
> thinking that it's necessary or desirable to check this. What's the
> point? "PANIC: we think your filesystem screwed up. We don't know
> exactly how or why, and we successfully rebuilt all our data, but
> we're gonna refuse to start up anyway." Doesn't seem like robust
> behavior to me. If you check the archives you'll find that we've
> backed off panic-for-panic's-sake behaviors in replay several times
> before, after concluding they made the system less robust rather than
> more so. This just seems like another one of the same.


Would the suggestion made in
http://archives.postgresql.org/pgsql...5/msg01374.php help
in this regard? (Sorry, much of this is over my head, but not everyone
may have read that...)
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 02:46 AM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Tue, Mar 28, 2006 at 10:07:35AM -0500, Tom Lane wrote:
>> Well, in fact we'll have correctly recreated the page, so I'm not
>> thinking that it's necessary or desirable to check this.


> Would the suggestion made in
> http://archives.postgresql.org/pgsql...5/msg01374.php help
> in this regard?


That's exactly what we are debating: whether it's still necessary/useful
to make such a check, given that we now realize the failures are just
isolated bugs and not a systemic problem with truncated files.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 02:46 AM
Simon Riggs
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

On Tue, 2006-03-28 at 10:07 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > On Mon, 2006-03-27 at 22:03 -0500, Tom Lane wrote:
> >> The subsequent replay of the deletion or truncation
> >> will get rid of any unwanted data again.

>
> > Trouble is, it is not a watertight assumption that there *will be* a
> > subsequent truncation, even if it is a strong one.

>
> Well, in fact we'll have correctly recreated the page, so I'm not
> thinking that it's necessary or desirable to check this. What's the
> point?


We recreated *a* page but we are shying away from exploring *why* we
needed to in the first place. If there was no later truncation then
there absolutely should have been a page there already and the fact
there wasn't one needs to be reported.

I don't want to write that code either, I just think we should.

> "PANIC: we think your filesystem screwed up. We don't know
> exactly how or why, and we successfully rebuilt all our data, but
> we're gonna refuse to start up anyway." Doesn't seem like robust
> behavior to me.


Agreed, which is why I explicitly said we shouldn't do that.

grass_up_filesystem = on should be the only setting we support, but
you're right we can't know why its wrong, but the sysadmin might.

> > Perhaps we do have one systemic problem: systems documentation.

>
> I agree on that ;-). The xlog code is really poorly documented.
> I'm going to try to improve the comments for at least the xlogutils
> routines while I'm fixing this.


I'll take a look also.

Best Regards, Simon Riggs


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 02:46 AM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] PANIC: heap_update_redo: no block

I wrote:
> * log_heap_update decides that it can set XLOG_HEAP_INIT_PAGE instead
> of storing the full destination page, if the destination contains only
> the single tuple being moved. This is fine, except it also resets the
> buffer indicator for the *source* page, which is wrong --- that page
> may still need to be re-generated from the xlog record. This is the
> proximate cause of the bug report that started this thread.


I have to retract that particular bit of analysis: I had misread the
log_heap_update code. It seems to be doing the right thing, and in any
case, given Alex's output

LOG: REDO @ D/19176644; LSN D/191766A4: prev D/19176610; xid 81148979: Heap - move: rel 1663/16386/16559898; tid 1/1; new 0/10

we can safely conclude that log_heap_update did not set the INIT_PAGE
bit, because the "new" tid doesn't have offset=1. (The fact that the
WAL_DEBUG printout doesn't report the bit's state is an oversight I plan
to fix, but anyway we can be pretty sure it's not set here.)

What we should be seeing, and don't see, is an indication of a backup
block attached to this WAL record. Furthermore, I don't see any
indication of a backup block attached to *any* of the WAL records in
Alex's printout. The only conclusion I can draw is that he had
full_page_writes turned OFF, and as we have just realized that that
setting is completely unsafe, that is the explanation for his failure.

> Clearly, we need to go through the xlog code with a fine tooth comb
> and convince ourselves that all pages touched by any xlog record will
> be properly reconstituted if they've later been truncated off. I have
> not yet examined any of the code except the above.


I've finished going through the xlog code looking for related problems,
and AFAICS this is the score:

* full_page_writes = OFF doesn't work.

* btree_xlog_split and btree_xlog_delete_page should pass TRUE not FALSE
to XLogReadBuffer for all pages that they are going to re-initialize.

* the recently-added gist xlog code is badly broken --- it pays no
attention whatever to preventing torn pages :-(. It's not going to be
easy to fix, either, because the page split code assumes that a single
WAL record can describe changes to any number of pages, which is not
the case.

Everything else seems to be getting it right.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:36 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com