Unix Technical Forum

Test suite fails on alpha architecture

This is a discussion on Test suite fails on alpha architecture within the pgsql Bugs forums, part of the PostgreSQL category; --> Hi Tom, Tom Lane [2007-11-07 13:49 -0500]: > Bottom line is that I see nothing here that the Postgres ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Bugs

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 04-10-2008, 12:11 PM
Martin Pitt
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

Hi Tom,

Tom Lane [2007-11-07 13:49 -0500]:
> Bottom line is that I see nothing here that the Postgres project can
> fix --- these are library and compiler bugs.


Thank you for your detailled analysis! I'll file bugs to the
appropriate places then.

Thanks,

Martin

--
Martin Pitt http://www.piware.de
Ubuntu Developer http://www.ubuntu.com
Debian Developer http://www.debian.org

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHMg4oDecnbV4Fd/IRAm8OAJ9Zx2x75jpo7SaF0cKslnt5M1i3mgCg8a7D
oe7UWkbbiN0ICMnYCpMekSY=
=S2ec
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-10-2008, 12:11 PM
Falk Hueffner
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

Tom Lane <tgl@sss.pgh.pa.us> writes:

> All the other diffs that Martin showed are divide-by-zero failures,
> and I do not see any of them on Gentoo's machine. I think that this
> must be a compiler bug. The first example in his diffs is just
> "select 1/0", which executes this code:
>
> int32 arg1 = PG_GETARG_INT32(0);
> int32 arg2 = PG_GETARG_INT32(1);
> int32 result;
>
> if (arg2 == 0)
> ereport(ERROR,
> (errcode(ERRCODE_DIVISION_BY_ZERO),
> errmsg("division by zero")));
>
> result = arg1 / arg2;
>
> It looks to me like Debian's compiler must be allowing the division
> instruction to be speculatively executed before the if-test branch
> is taken. Perhaps it is supposing that this is OK because control
> will return from ereport(), when in fact it will not (the routine
> throws a longjmp). Since we've not seen such behavior on any other
> platform, however, I suspect this is just a bug and not intentional.


Can you create a stand-alone testcase for this?

--
Falk

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 04-10-2008, 12:11 PM
Tom Lane
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

Falk Hueffner <falk@debian.org> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> It looks to me like Debian's compiler must be allowing the division
>> instruction to be speculatively executed before the if-test branch
>> is taken.


> Can you create a stand-alone testcase for this?


I don't have access to a machine on which the failure occurs, but
perhaps Martin can try it. I'd think it'd be pretty easy, say

#include <stdio.h>
#include <stdlib.h>

void
ereport(const char *msg)
{
fprintf(stderr, "%s\n", msg);
exit(0);
}

int
main(int argc, char **argv)
{
int arg1 = atoi(argv[1]);
int arg2 = atoi(argv[2]);
int result;

if (arg2 == 0)
ereport("division by zero");

result = arg1 / arg2;

printf("%d\n", result);

return 0;
}


cc -g -O2 -fPIC -fno-strict-aliasing -mieee -D_GNU_SOURCE bug.c
../a.out 1 0

I would not be surprised at all if it's compile-switch dependent; these
look to be the switches Martin tested with.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 04-10-2008, 12:11 PM
Steve Langasek
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

On Wed, Nov 07, 2007 at 01:49:53PM -0500, Tom Lane wrote:
> Steve Langasek <vorlon@debian.org> writes:
> > It may be specific to particular versions of glibc and the kernel. At least
> > one of the test regressions is actually due to the bug described in
> > <http://lists.debian.org/debian-alpha/2007/10/msg00014.html>; I haven't dug
> > into the rest of the failures further at this point.


> > But if it can be reproduced on other distros as well, all the better.


> All the other diffs that Martin showed are divide-by-zero failures,
> and I do not see any of them on Gentoo's machine. I think that this
> must be a compiler bug. The first example in his diffs is just
> "select 1/0", which executes this code:


> int32 arg1 = PG_GETARG_INT32(0);
> int32 arg2 = PG_GETARG_INT32(1);
> int32 result;


> if (arg2 == 0)
> ereport(ERROR,
> (errcode(ERRCODE_DIVISION_BY_ZERO),
> errmsg("division by zero")));


> result = arg1 / arg2;


> It looks to me like Debian's compiler must be allowing the division
> instruction to be speculatively executed before the if-test branch
> is taken. Perhaps it is supposing that this is OK because control
> will return from ereport(), when in fact it will not (the routine
> throws a longjmp). Since we've not seen such behavior on any other
> platform, however, I suspect this is just a bug and not intentional.


> FWIW the Gentoo machine is running


> $ gcc -v
> Using built-in specs.
> Target: alpha-unknown-linux-gnu
> Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2 --includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4 --host=alpha-unknown-linux-gnu --build=alpha-unknown-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --disable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
> Thread model: posix
> gcc version 4.1.2 (Gentoo 4.1.2)


Ok, and Debian is building with gcc 4.2:

$ gcc -v
Using built-in specs.
Target: alpha-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2
--enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp
--with-long-double-128 --enable-checking=release --build=alpha-linux-gnu
--host=alpha-linux-gnu --target=alpha-linux-gnu
Thread model: posix
gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)
$

Any chance of testing with a newer version of gcc on Gentoo as well to help
confirm that the compiler is to blame?

> Bottom line is that I see nothing here that the Postgres project can
> fix --- these are library and compiler bugs.


Right; though whereas the floor() bug could simply be ignored since it will
be fixed in glibc (or the kernel) when the time comes, if the other
regressions are the result of a compiler problem then ignoring those
failures would indeed mean distributing broken binaries.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
vorlon@debian.org http://www.debian.org/

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 04-10-2008, 12:11 PM
Jose Luis Rivero
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

On Wed, Nov 07, 2007 at 02:41:51PM -0500, Steve Langasek wrote:
> On Wed, Nov 07, 2007 at 01:49:53PM -0500, Tom Lane wrote:
> > All the other diffs that Martin showed are divide-by-zero failures,
> > and I do not see any of them on Gentoo's machine. I think that this
> > must be a compiler bug. The first example in his diffs is just
> > "select 1/0", which executes this code:

>
> > int32 arg1 = PG_GETARG_INT32(0);
> > int32 arg2 = PG_GETARG_INT32(1);
> > int32 result;

>
> > if (arg2 == 0)
> > ereport(ERROR,
> > (errcode(ERRCODE_DIVISION_BY_ZERO),
> > errmsg("division by zero")));

>
> > result = arg1 / arg2;

>
> > It looks to me like Debian's compiler must be allowing the division
> > instruction to be speculatively executed before the if-test branch
> > is taken. Perhaps it is supposing that this is OK because control
> > will return from ereport(), when in fact it will not (the routine
> > throws a longjmp). Since we've not seen such behavior on any other
> > platform, however, I suspect this is just a bug and not intentional.

>
> > FWIW the Gentoo machine is running

>
> > $ gcc -v
> > Using built-in specs.
> > Target: alpha-unknown-linux-gnu
> > Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2 --includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4 --host=alpha-unknown-linux-gnu --build=alpha-unknown-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --disable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
> > Thread model: posix
> > gcc version 4.1.2 (Gentoo 4.1.2)

>
> Ok, and Debian is building with gcc 4.2:
>
> $ gcc -v
> Using built-in specs.
> Target: alpha-linux-gnu
> Configured with: ../src/configure -v
> --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
> --enable-shared --with-system-zlib --libexecdir=/usr/lib
> --without-included-gettext --enable-threads=posix --enable-nls
> --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2
> --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp
> --with-long-double-128 --enable-checking=release --build=alpha-linux-gnu
> --host=alpha-linux-gnu --target=alpha-linux-gnu
> Thread model: posix
> gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3)
> $
>
> Any chance of testing with a newer version of gcc on Gentoo as well to help
> confirm that the compiler is to blame?
>


In Gentoo the testcase gives the same "division by zero" under these
gcc versions:

Current Stable:
gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2)

Current Testing:
gcc version 4.2.2 (Gentoo 4.2.2 p1.0)

Feel free to add me if you have an open bug for this, in order to test anything you
need or provide some more information about our platform.

Thanks.

> > Bottom line is that I see nothing here that the Postgres project can
> > fix --- these are library and compiler bugs.

>
> Right; though whereas the floor() bug could simply be ignored since it will
> be fixed in glibc (or the kernel) when the time comes, if the other
> regressions are the result of a compiler problem then ignoring those
> failures would indeed mean distributing broken binaries.
>


--
Jose Luis Rivero <yoswink@gentoo.org>
Gentoo/Doc Gentoo/Alpha


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 04-10-2008, 12:11 PM
Marc 'HE' Brockschmidt
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHN0E6mO5zOp3h7rERAvwUAJ9TK8dr6CIf4WJEtgKqFE REM9sDswCfbf6j
H5XP06mbhPXMNVqCpkNWmP8=
=rAdR
-----END PGP SIGNATURE-----
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 04-10-2008, 12:12 PM
Steve Langasek
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

On Wed, Nov 07, 2007 at 02:44:23PM -0500, Tom Lane wrote:

> I don't have access to a machine on which the failure occurs, but
> perhaps Martin can try it. I'd think it'd be pretty easy, say


> #include <stdio.h>
> #include <stdlib.h>


> void
> ereport(const char *msg)
> {
> fprintf(stderr, "%s\n", msg);
> exit(0);
> }
>
> int
> main(int argc, char **argv)
> {
> int arg1 = atoi(argv[1]);
> int arg2 = atoi(argv[2]);
> int result;
>
> if (arg2 == 0)
> ereport("division by zero");
>
> result = arg1 / arg2;
>
> printf("%d\n", result);
>
> return 0;
> }


> cc -g -O2 -fPIC -fno-strict-aliasing -mieee -D_GNU_SOURCE bug.c
> ./a.out 1 0


> I would not be surprised at all if it's compile-switch dependent; these
> look to be the switches Martin tested with.


So strangely, when I first ran this test case I recall being able to
reproduce the SIGFPE; but now going back to it I'm getting the correct
"division by zero" output.

But postgresql still fails to build with the same errors as before.

FWIW, the first test suite failure involving floor() has been resolved now
in the glibc package in unstable.

--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
vorlon@debian.org http://www.debian.org/

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #18 (permalink)  
Old 04-10-2008, 12:12 PM
Martin Pitt
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

Hi,

Tom Lane [2007-11-07 13:49 -0500]:
> All the other diffs that Martin showed are divide-by-zero failures,
> and I do not see any of them on Gentoo's machine. I think that this
> must be a compiler bug. The first example in his diffs is just
> "select 1/0", which executes this code:
>
> int32 arg1 = PG_GETARG_INT32(0);
> int32 arg2 = PG_GETARG_INT32(1);
> int32 result;
>
> if (arg2 == 0)
> ereport(ERROR,
> (errcode(ERRCODE_DIVISION_BY_ZERO),
> errmsg("division by zero")));
>
> result = arg1 / arg2;
>
> It looks to me like Debian's compiler must be allowing the division
> instruction to be speculatively executed before the if-test branch
> is taken. Perhaps it is supposing that this is OK because control
> will return from ereport(), when in fact it will not (the routine
> throws a longjmp). Since we've not seen such behavior on any other
> platform, however, I suspect this is just a bug and not intentional.


I tried this on a Debian Alpha porter box (thanks, Steve, for pointing
me at it) with Debian's gcc 4.2.2. Latest sid indeed still has this
bug (the floor() one is confirmed fixed), not only on Alpha, but also
on sparc.

Since the simple test case did not reproduce the error, I tried to
make a more sophisticated one which resembles more closely what
PostgreSQL does (sigsetjmp/siglongjmp instead of exit(), some macros,
etc.). Unfortunately in vain, since the test case still works
perfectly with both no compiler options and also the ones used for
PostgreSQL. I attach it here nevertheless just in case someone has
more luck than me.

So I tried to approach it from the other side: Building postgresql
with CFLAGS="-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I
get above bug.

So I guess I'll build with -O1 for the time being on sparc and alpha
to get correct binaries until this is sorted out. Any idea what else I
could try?

Thanks,

Martin

--
Martin Pitt http://www.piware.de
Ubuntu Developer http://www.ubuntu.com
Debian Developer http://www.debian.org

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHVdgcDecnbV4Fd/IRAlgDAJ4kzACAYOqa09WiCML4hSL0RV6l1ACgsyzf
pzt/eyoGT5tYJrY3GXLmNvw=
=/XAt
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #19 (permalink)  
Old 04-10-2008, 12:12 PM
Martin Pitt
 
Posts: n/a
Default Re: Test suite fails on alpha architecture

Martin Pitt [2007-12-04 23:43 +0100]:
> So I tried to approach it from the other side: Building postgresql
> with CFLAGS="-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I
> get above bug.


Just FAOD, building with gcc 4.1 and -O2 works fine. I guess this
sufficiently proves that this is a gcc 4.2 bug.

Martin
--
Martin Pitt http://www.piware.de
Ubuntu Developer http://www.ubuntu.com
Debian Developer http://www.debian.org

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHVduoDecnbV4Fd/IRAoP+AJ4hbORD9p1TI0tD+xToRWHsx0V7FACgtpu0
AcdRkY5hDjYHs5dI7DmImss=
=wcbe
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:16 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com