Unix Technical Forum

BUG #3362: xlog corruption just after initdb on irix

This is a discussion on BUG #3362: xlog corruption just after initdb on irix within the pgsql Bugs forums, part of the PostgreSQL category; --> The following bug has been logged online: Bug reference: 3362 Logged by: Herve Boulouis Email address: amon@sockar.homeip.net PostgreSQL version: ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Bugs

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 11:52 AM
Herve Boulouis
 
Posts: n/a
Default BUG #3362: xlog corruption just after initdb on irix


The following bug has been logged online:

Bug reference: 3362
Logged by: Herve Boulouis
Email address: amon@sockar.homeip.net
PostgreSQL version: 8.2.4
Operating system: IRIX 6.5.26
Description: xlog corruption just after initdb on irix
Details:

what I did:

- compile postgres 8.2.4 (MIPSPro 7.4.4m compiler)

export CC=c99
export CXX=CC
../configure --enable-integer-datetimes --with-perl --with-python

- compile and install, create pgsql user, ...

- initialize db

su - pgsql
export PGLIB=/usr/local/pgsql/lib
export PGDATA=$HOME/data
initdb

- trigger the bug

/etc/init.d/pgsql start
/etc/init.d/pgsql stop
/etc/init.d/pgsql start
/etc/init.d/pgsql stop
/etc/init.d/pgsql start
<signal 6 on postmaster startup, see logfile>

from that point, postmaster won't start unless another initdb is done.

- command used by the script to start/stop

su - pgsql -c "exec /usr/local/pgsql/bin/pg_ctl start -w -D
'/usr/local/pgsql/data'"
su - pgsql -c "/usr/local/pgsql/bin/pg_ctl stop -D '/usr/local/pgsql/data'
-s -m smart"

- logfile of the 5 start/stop operations :

* first start
Jun 5 15:22:59 6Trigin300 postgres[2007]: [1-1] LOG: could not create
IPv6 socket: Address family not supported by protocol family
Jun 5 15:22:59 6Trigin300 postgres[2007]: [2-1] LOG: could not create
socket for statistics collector: Address family not supported by protocol
family
Jun 5 15:22:59 6Trigin300 postgres[2007]: [3-1] LOG: trying another
address for the statistics collector
Jun 5 15:22:59 6Trigin300 postgres[2008]: [4-1] LOG: database system was
shut down at 2007-06-05 15:21:59 GMT
Jun 5 15:22:59 6Trigin300 postgres[2008]: [5-1] LOG: checkpoint record
is at 0/4866C8
Jun 5 15:22:59 6Trigin300 postgres[2008]: [6-1] LOG: redo record is at
0/4866C8; undo record is at 0/0; shutdown TRUE
Jun 5 15:22:59 6Trigin300 postgres[2008]: [7-1] LOG: next transaction
ID: 0/593; next OID: 10820
Jun 5 15:22:59 6Trigin300 postgres[2008]: [8-1] LOG: next MultiXactId:
1; next MultiXactOffset: 0
Jun 5 15:22:59 6Trigin300 postgres[2008]: [9-1] LOG: database system is
ready
* first stop
Jun 5 15:23:11 6Trigin300 postgres[2007]: [4-1] LOG: received smart
shutdown request
Jun 5 15:23:11 6Trigin300 postgres[2009]: [4-1] LOG: shutting down
Jun 5 15:23:11 6Trigin300 postgres[2009]: [5-1] LOG: database system is
shut down
* 2d start
Jun 5 15:23:17 6Trigin300 postgres[2027]: [1-1] LOG: could not create
IPv6 socket: Address family not supported by protocol family
Jun 5 15:23:17 6Trigin300 postgres[2027]: [2-1] LOG: could not create
socket for statistics collector: Address family not supported by protocol
family
Jun 5 15:23:17 6Trigin300 postgres[2027]: [3-1] LOG: trying another
address for the statistics collector
Jun 5 15:23:17 6Trigin300 postgres[2028]: [4-1] LOG: database system was
shut down at 2007-06-05 15:23:11 GMT
Jun 5 15:23:17 6Trigin300 postgres[2028]: [5-1] LOG: record with zero
length at 0/486718
* vvv I believe this line is not normal, as there was a clean shutdown
Jun 5 15:23:17 6Trigin300 postgres[2028]: [6-1] LOG: invalid primary
checkpoint record
Jun 5 15:23:17 6Trigin300 postgres[2028]: [7-1] LOG: using previous
checkpoint record at 0/4866C8
Jun 5 15:23:17 6Trigin300 postgres[2028]: [8-1] LOG: redo record is at
0/4866C8; undo record is at 0/0; shutdown TRUE
Jun 5 15:23:17 6Trigin300 postgres[2028]: [9-1] LOG: next transaction
ID: 0/593; next OID: 10820
Jun 5 15:23:17 6Trigin300 postgres[2028]: [10-1] LOG: next MultiXactId:
1; next MultiXactOffset: 0
Jun 5 15:23:17 6Trigin300 postgres[2028]: [11-1] LOG: database system
was not properly shut down; automatic recovery in progress
Jun 5 15:23:17 6Trigin300 postgres[2028]: [12-1] LOG: record with zero
length at 0/486718
Jun 5 15:23:17 6Trigin300 postgres[2028]: [13-1] LOG: redo is not
required
Jun 5 15:23:17 6Trigin300 postgres[2028]: [14-1] LOG: database system is
ready
* 2d stop
Jun 5 15:23:22 6Trigin300 postgres[2027]: [4-1] LOG: received smart
shutdown request
Jun 5 15:23:22 6Trigin300 postgres[2029]: [4-1] LOG: shutting down
Jun 5 15:23:22 6Trigin300 postgres[2029]: [5-1] LOG: database system is
shut down
* 3d start
Jun 5 15:23:25 6Trigin300 postgres[2047]: [1-1] LOG: could not create
IPv6 socket: Address family not supported by protocol family
Jun 5 15:23:25 6Trigin300 postgres[2047]: [2-1] LOG: could not create
socket for statistics collector: Address family not supported by protocol
family
Jun 5 15:23:25 6Trigin300 postgres[2047]: [3-1] LOG: trying another
address for the statistics collector
Jun 5 15:23:25 6Trigin300 postgres[2048]: [4-1] LOG: database system was
shut down at 2007-06-05 15:23:22 GMT
Jun 5 15:23:25 6Trigin300 postgres[2048]: [5-1] LOG: record with zero
length at 0/486768
Jun 5 15:23:25 6Trigin300 postgres[2048]: [6-1] LOG: invalid primary
checkpoint record
Jun 5 15:23:25 6Trigin300 postgres[2048]: [7-1] LOG: record with zero
length at 0/486718
Jun 5 15:23:25 6Trigin300 postgres[2048]: [8-1] LOG: invalid secondary
checkpoint record
* vvv this it gets worse
Jun 5 15:23:25 2Trigin300 postgres[2048]: [9-1] PANIC: could not locate
a valid checkpoint record
Jun 5 15:23:26 6Trigin300 postgres[2047]: [4-1] LOG: startup process
(PID 2048) was terminated by signal 6
Jun 5 15:23:26 6Trigin300 postgres[2047]: [5-1] LOG: aborting startup
due to startup process failure

I had already noticed that problem on this platform with 8.0 or 8.1, but
didn't report it at the time.

I kept the core file and here is the backtrace :

origin300:/usr/local/pgsql/data# dbx /usr/local/pgsql/bin/postgres core
dbx version 7.3.4 (86441_Nov11 MR) Nov 11 2002 11:31:55
Core from signal SIGABRT: Abort (see abort(3c))
(dbx) where
> 0 _kill(0x800, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)

["/xlv41/6.5.26m/work/irix/lib/libc/libc_n32_M4/signal/kill.s":15,
0xfa54258]
1 _raise(0x800, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/xlv41/6.5.26m/work/irix/lib/libc/libc_n32_M4/signal/raise.c":27,
0xfad16dc]
2 abort(0x800, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/xlv41/6.5.26m/work/irix/lib/libc/libc_n32_M4/gen/abort.c":52, 0xfa6ef60]
3 errfinish(0x0, 0x0, 0x3, 0x5, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/utils/error/elog.c":451,
0x102d9678]
4 StartupXLOG(0x800, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/access/transam/xlog.c":4776,
0x10088638]
5 BootstrapMain(0x103b7524, 0x2, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/bootstrap/bootstrap.c":423,
0x100a3cd4]
6 StartChildProcess(0x2, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/postmaster/postmaster.c":3691,
0x101e6de8]
7 PostmasterMain(0x800, 0xfa49bf0, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1,
0x0)
["/opt/install/postgresql-8.2.4/src/backend/postmaster/postmaster.c":961,
0x101e267c]
8 main(0x3, 0x6, 0x86bf8, 0x0, 0x40000, 0x0, 0x68a1, 0x0)
["/opt/install/postgresql-8.2.4/src/backend/main/main.c":188, 0x1017e684]
9 __start()
["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.s":177,
0x1003f2a8]

If necessary, I can provide access to an irix machine if some pgsql
developpers need it to test/maintain the irix port.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 11:52 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

"Herve Boulouis" <amon@sockar.homeip.net> writes:
> - trigger the bug


> /etc/init.d/pgsql start
> /etc/init.d/pgsql stop
> /etc/init.d/pgsql start
> /etc/init.d/pgsql stop
> /etc/init.d/pgsql start
> <signal 6 on postmaster startup, see logfile>


> from that point, postmaster won't start unless another initdb is done.


That's pretty dang weird. It looks like it's failing to write the
shutdown checkpoint record (but is updating pg_control anyway). But if
that's the explanation then it should have failed during initdb, because
each step of initdb goes through the start/stop process.

> I had already noticed that problem on this platform with 8.0 or 8.1, but
> didn't report it at the time.


You should have mentioned it earlier :-( ... whatever the problem is
is surely now long-forgotten. Although I see we have a report of 8.1
working with IRIX 6.5, so apparently it works for some people. Are you
sure your compiler is up-to-date?

> If necessary, I can provide access to an irix machine if some pgsql
> developpers need it to test/maintain the irix port.


I'm willing to take a look, but first please see if you can eliminate
the compiler-bug theory. Check to see if there's a newer compiler
version available; check whether building with CFLAGS=-O0 makes the
problem go away.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 11:52 AM
Simon Riggs
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

On Tue, 2007-06-05 at 14:42 +0000, Herve Boulouis wrote:

> PostgreSQL version: 8.2.4
> Operating system: IRIX 6.5.26


Thats not yet a supported system, but we should fix this.

> Jun 5 15:23:11 6Trigin300 postgres[2009]: [5-1] LOG: database system is
> shut down
> * 2d start
> Jun 5 15:23:17 6Trigin300 postgres[2027]: [1-1] LOG: could not create
> IPv6 socket: Address family not supported by protocol family
> Jun 5 15:23:17 6Trigin300 postgres[2027]: [2-1] LOG: could not create
> socket for statistics collector: Address family not supported by protocol
> family
> Jun 5 15:23:17 6Trigin300 postgres[2027]: [3-1] LOG: trying another
> address for the statistics collector
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [4-1] LOG: database system was
> shut down at 2007-06-05 15:23:11 GMT
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [5-1] LOG: record with zero
> length at 0/486718
> * vvv I believe this line is not normal, as there was a clean shutdown
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [6-1] LOG: invalid primary
> checkpoint record
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [7-1] LOG: using previous
> checkpoint record at 0/4866C8
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [8-1] LOG: redo record is at
> 0/4866C8; undo record is at 0/0; shutdown TRUE
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [9-1] LOG: next transaction
> ID: 0/593; next OID: 10820
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [10-1] LOG: next MultiXactId:
> 1; next MultiXactOffset: 0
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [11-1] LOG: database system
> was not properly shut down; automatic recovery in progress
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [12-1] LOG: record with zero
> length at 0/486718
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [13-1] LOG: redo is not
> required
> Jun 5 15:23:17 6Trigin300 postgres[2028]: [14-1] LOG: database system is
> ready


Well, I'd question what's going on with the hardware/filesystem or the
settings you've chosen for fsync and wal_fsync.

If you shut it down cleanly, yet its flaky when it comes back up that
doesn't sound too much like a database problem to me...

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 11:52 AM
Herve Boulouis
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

Le 05/06/2007 17:33, Simon Riggs a écrit:
> On Tue, 2007-06-05 at 14:42 +0000, Herve Boulouis wrote:
>
> Well, I'd question what's going on with the hardware/filesystem or the
> settings you've chosen for fsync and wal_fsync.
>
> If you shut it down cleanly, yet its flaky when it comes back up that
> doesn't sound too much like a database problem to me...


I should have precised that I have "fsync = on" in my postgresql.conf.
Pretty much everything else is default value.

Regards,

--
Herve Boulouis

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 11:52 AM
Herve Boulouis
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

Le 05/06/2007 12:06, Tom Lane a écrit:
>
> > I had already noticed that problem on this platform with 8.0 or 8.1, but
> > didn't report it at the time.

>
> You should have mentioned it earlier :-( ... whatever the problem is
> is surely now long-forgotten. Although I see we have a report of 8.1
> working with IRIX 6.5, so apparently it works for some people. Are you
> sure your compiler is up-to-date?


Yep, MIPSPro 7.4.4m is the latest available and probably the last one

> > If necessary, I can provide access to an irix machine if some pgsql
> > developpers need it to test/maintain the irix port.

>
> I'm willing to take a look, but first please see if you can eliminate
> the compiler-bug theory. Check to see if there's a newer compiler
> version available; check whether building with CFLAGS=-O0 makes the
> problem go away.


I'll try without optimizations.

Regards,

--
Herve Boulouis

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 11:52 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

Herve Boulouis wrote:
> Le 05/06/2007 17:33, Simon Riggs a écrit:
>> On Tue, 2007-06-05 at 14:42 +0000, Herve Boulouis wrote:
>>
>> Well, I'd question what's going on with the hardware/filesystem or the
>> settings you've chosen for fsync and wal_fsync.
>>
>> If you shut it down cleanly, yet its flaky when it comes back up that
>> doesn't sound too much like a database problem to me...

>
> I should have precised that I have "fsync = on" in my postgresql.conf.
> Pretty much everything else is default value.


Do you mean "fsync=off"? "on" is the default.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-10-2008, 11:52 AM
Herve Boulouis
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

Le 06/06/2007 08:24, Heikki Linnakangas a écrit:
> Herve Boulouis wrote:
> >Le 05/06/2007 17:33, Simon Riggs a ?crit:
> >>On Tue, 2007-06-05 at 14:42 +0000, Herve Boulouis wrote:
> >>
> >>Well, I'd question what's going on with the hardware/filesystem or the
> >>settings you've chosen for fsync and wal_fsync.
> >>
> >>If you shut it down cleanly, yet its flaky when it comes back up that
> >>doesn't sound too much like a database problem to me...

> >
> >I should have precised that I have "fsync = on" in my postgresql.conf.
> >Pretty much everything else is default value.

>
> Do you mean "fsync=off"? "on" is the default.


No, I mean "fsync=on", everything else is commented out so I assume default
values.

--
Herve Boulouis

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-10-2008, 11:52 AM
Herve Boulouis
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

Le 05/06/2007 12:06, Tom Lane a écrit:
>
> I'm willing to take a look, but first please see if you can eliminate
> the compiler-bug theory. Check to see if there's a newer compiler
> version available; check whether building with CFLAGS=-O0 makes the
> problem go away.


quick question : do you think that the --enable-integer-datetimes configure
flag could have an impact on this problem ?

--
Herve Boulouis

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-10-2008, 11:52 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

Herve Boulouis <amon@sockar.homeip.net> writes:
> quick question : do you think that the --enable-integer-datetimes configure
> flag could have an impact on this problem ?


I doubt it.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-10-2008, 11:52 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3362: xlog corruption just after initdb on irix

Herve Boulouis <amon@sockar.homeip.net> writes:
> The mipspro compiler is a proprietary product and I cannot get the
> necessary updates so for now I am unable to verify this theory. Anyway,
> maybe you should a add a note in the pg irix faq to use the 7.4.4m version
> of mipspro when building the sources ?


Maybe, but I'd like confirmation of the theory first. Please let me
know how it goes when you get hold of 7.4.4m ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:59 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com