This is a discussion on Test suite fails on alpha architecture within the pgsql Bugs forums, part of the PostgreSQL category; --> Hi Tom, Tom Lane [2007-11-07 13:49 -0500]: > Bottom line is that I see nothing here that the Postgres ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi Tom, Tom Lane [2007-11-07 13:49 -0500]: > Bottom line is that I see nothing here that the Postgres project can > fix --- these are library and compiler bugs. Thank you for your detailled analysis! I'll file bugs to the appropriate places then. Thanks, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHMg4oDecnbV4Fd/IRAm8OAJ9Zx2x75jpo7SaF0cKslnt5M1i3mgCg8a7D oe7UWkbbiN0ICMnYCpMekSY= =S2ec -----END PGP SIGNATURE----- |
| |||
| Tom Lane <tgl@sss.pgh.pa.us> writes: > All the other diffs that Martin showed are divide-by-zero failures, > and I do not see any of them on Gentoo's machine. I think that this > must be a compiler bug. The first example in his diffs is just > "select 1/0", which executes this code: > > int32 arg1 = PG_GETARG_INT32(0); > int32 arg2 = PG_GETARG_INT32(1); > int32 result; > > if (arg2 == 0) > ereport(ERROR, > (errcode(ERRCODE_DIVISION_BY_ZERO), > errmsg("division by zero"))); > > result = arg1 / arg2; > > It looks to me like Debian's compiler must be allowing the division > instruction to be speculatively executed before the if-test branch > is taken. Perhaps it is supposing that this is OK because control > will return from ereport(), when in fact it will not (the routine > throws a longjmp). Since we've not seen such behavior on any other > platform, however, I suspect this is just a bug and not intentional. Can you create a stand-alone testcase for this? -- Falk ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Falk Hueffner <falk@debian.org> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> It looks to me like Debian's compiler must be allowing the division >> instruction to be speculatively executed before the if-test branch >> is taken. > Can you create a stand-alone testcase for this? I don't have access to a machine on which the failure occurs, but perhaps Martin can try it. I'd think it'd be pretty easy, say #include <stdio.h> #include <stdlib.h> void ereport(const char *msg) { fprintf(stderr, "%s\n", msg); exit(0); } int main(int argc, char **argv) { int arg1 = atoi(argv[1]); int arg2 = atoi(argv[2]); int result; if (arg2 == 0) ereport("division by zero"); result = arg1 / arg2; printf("%d\n", result); return 0; } cc -g -O2 -fPIC -fno-strict-aliasing -mieee -D_GNU_SOURCE bug.c ../a.out 1 0 I would not be surprised at all if it's compile-switch dependent; these look to be the switches Martin tested with. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Wed, Nov 07, 2007 at 01:49:53PM -0500, Tom Lane wrote: > Steve Langasek <vorlon@debian.org> writes: > > It may be specific to particular versions of glibc and the kernel. At least > > one of the test regressions is actually due to the bug described in > > <http://lists.debian.org/debian-alpha/2007/10/msg00014.html>; I haven't dug > > into the rest of the failures further at this point. > > But if it can be reproduced on other distros as well, all the better. > All the other diffs that Martin showed are divide-by-zero failures, > and I do not see any of them on Gentoo's machine. I think that this > must be a compiler bug. The first example in his diffs is just > "select 1/0", which executes this code: > int32 arg1 = PG_GETARG_INT32(0); > int32 arg2 = PG_GETARG_INT32(1); > int32 result; > if (arg2 == 0) > ereport(ERROR, > (errcode(ERRCODE_DIVISION_BY_ZERO), > errmsg("division by zero"))); > result = arg1 / arg2; > It looks to me like Debian's compiler must be allowing the division > instruction to be speculatively executed before the if-test branch > is taken. Perhaps it is supposing that this is OK because control > will return from ereport(), when in fact it will not (the routine > throws a longjmp). Since we've not seen such behavior on any other > platform, however, I suspect this is just a bug and not intentional. > FWIW the Gentoo machine is running > $ gcc -v > Using built-in specs. > Target: alpha-unknown-linux-gnu > Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2 --includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4 --host=alpha-unknown-linux-gnu --build=alpha-unknown-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --disable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu > Thread model: posix > gcc version 4.1.2 (Gentoo 4.1.2) Ok, and Debian is building with gcc 4.2: $ gcc -v Using built-in specs. Target: alpha-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp --with-long-double-128 --enable-checking=release --build=alpha-linux-gnu --host=alpha-linux-gnu --target=alpha-linux-gnu Thread model: posix gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3) $ Any chance of testing with a newer version of gcc on Gentoo as well to help confirm that the compiler is to blame? > Bottom line is that I see nothing here that the Postgres project can > fix --- these are library and compiler bugs. Right; though whereas the floor() bug could simply be ignored since it will be fixed in glibc (or the kernel) when the time comes, if the other regressions are the result of a compiler problem then ignoring those failures would indeed mean distributing broken binaries. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. vorlon@debian.org http://www.debian.org/ ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Wed, Nov 07, 2007 at 02:41:51PM -0500, Steve Langasek wrote: > On Wed, Nov 07, 2007 at 01:49:53PM -0500, Tom Lane wrote: > > All the other diffs that Martin showed are divide-by-zero failures, > > and I do not see any of them on Gentoo's machine. I think that this > > must be a compiler bug. The first example in his diffs is just > > "select 1/0", which executes this code: > > > int32 arg1 = PG_GETARG_INT32(0); > > int32 arg2 = PG_GETARG_INT32(1); > > int32 result; > > > if (arg2 == 0) > > ereport(ERROR, > > (errcode(ERRCODE_DIVISION_BY_ZERO), > > errmsg("division by zero"))); > > > result = arg1 / arg2; > > > It looks to me like Debian's compiler must be allowing the division > > instruction to be speculatively executed before the if-test branch > > is taken. Perhaps it is supposing that this is OK because control > > will return from ereport(), when in fact it will not (the routine > > throws a longjmp). Since we've not seen such behavior on any other > > platform, however, I suspect this is just a bug and not intentional. > > > FWIW the Gentoo machine is running > > > $ gcc -v > > Using built-in specs. > > Target: alpha-unknown-linux-gnu > > Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/alpha-unknown-linux-gnu/gcc-bin/4.1.2 --includedir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/alpha-unknown-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/alpha-unknown-linux-gnu/4.1.2/include/g++-v4 --host=alpha-unknown-linux-gnu --build=alpha-unknown-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --disable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu > > Thread model: posix > > gcc version 4.1.2 (Gentoo 4.1.2) > > Ok, and Debian is building with gcc 4.2: > > $ gcc -v > Using built-in specs. > Target: alpha-linux-gnu > Configured with: ../src/configure -v > --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr > --enable-shared --with-system-zlib --libexecdir=/usr/lib > --without-included-gettext --enable-threads=posix --enable-nls > --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 > --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp > --with-long-double-128 --enable-checking=release --build=alpha-linux-gnu > --host=alpha-linux-gnu --target=alpha-linux-gnu > Thread model: posix > gcc version 4.2.3 20071014 (prerelease) (Debian 4.2.2-3) > $ > > Any chance of testing with a newer version of gcc on Gentoo as well to help > confirm that the compiler is to blame? > In Gentoo the testcase gives the same "division by zero" under these gcc versions: Current Stable: gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2) Current Testing: gcc version 4.2.2 (Gentoo 4.2.2 p1.0) Feel free to add me if you have an open bug for this, in order to test anything you need or provide some more information about our platform. Thanks. > > Bottom line is that I see nothing here that the Postgres project can > > fix --- these are library and compiler bugs. > > Right; though whereas the floor() bug could simply be ignored since it will > be fixed in glibc (or the kernel) when the time comes, if the other > regressions are the result of a compiler problem then ignoring those > failures would indeed mean distributing broken binaries. > -- Jose Luis Rivero <yoswink@gentoo.org> Gentoo/Doc Gentoo/Alpha ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Wed, Nov 07, 2007 at 02:44:23PM -0500, Tom Lane wrote: > I don't have access to a machine on which the failure occurs, but > perhaps Martin can try it. I'd think it'd be pretty easy, say > #include <stdio.h> > #include <stdlib.h> > void > ereport(const char *msg) > { > fprintf(stderr, "%s\n", msg); > exit(0); > } > > int > main(int argc, char **argv) > { > int arg1 = atoi(argv[1]); > int arg2 = atoi(argv[2]); > int result; > > if (arg2 == 0) > ereport("division by zero"); > > result = arg1 / arg2; > > printf("%d\n", result); > > return 0; > } > cc -g -O2 -fPIC -fno-strict-aliasing -mieee -D_GNU_SOURCE bug.c > ./a.out 1 0 > I would not be surprised at all if it's compile-switch dependent; these > look to be the switches Martin tested with. So strangely, when I first ran this test case I recall being able to reproduce the SIGFPE; but now going back to it I'm getting the correct "division by zero" output. But postgresql still fails to build with the same errors as before. FWIW, the first test suite failure involving floor() has been resolved now in the glibc package in unstable. -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. vorlon@debian.org http://www.debian.org/ ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Hi, Tom Lane [2007-11-07 13:49 -0500]: > All the other diffs that Martin showed are divide-by-zero failures, > and I do not see any of them on Gentoo's machine. I think that this > must be a compiler bug. The first example in his diffs is just > "select 1/0", which executes this code: > > int32 arg1 = PG_GETARG_INT32(0); > int32 arg2 = PG_GETARG_INT32(1); > int32 result; > > if (arg2 == 0) > ereport(ERROR, > (errcode(ERRCODE_DIVISION_BY_ZERO), > errmsg("division by zero"))); > > result = arg1 / arg2; > > It looks to me like Debian's compiler must be allowing the division > instruction to be speculatively executed before the if-test branch > is taken. Perhaps it is supposing that this is OK because control > will return from ereport(), when in fact it will not (the routine > throws a longjmp). Since we've not seen such behavior on any other > platform, however, I suspect this is just a bug and not intentional. I tried this on a Debian Alpha porter box (thanks, Steve, for pointing me at it) with Debian's gcc 4.2.2. Latest sid indeed still has this bug (the floor() one is confirmed fixed), not only on Alpha, but also on sparc. Since the simple test case did not reproduce the error, I tried to make a more sophisticated one which resembles more closely what PostgreSQL does (sigsetjmp/siglongjmp instead of exit(), some macros, etc.). Unfortunately in vain, since the test case still works perfectly with both no compiler options and also the ones used for PostgreSQL. I attach it here nevertheless just in case someone has more luck than me. So I tried to approach it from the other side: Building postgresql with CFLAGS="-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I get above bug. So I guess I'll build with -O1 for the time being on sparc and alpha to get correct binaries until this is sorted out. Any idea what else I could try? Thanks, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHVdgcDecnbV4Fd/IRAlgDAJ4kzACAYOqa09WiCML4hSL0RV6l1ACgsyzf pzt/eyoGT5tYJrY3GXLmNvw= =/XAt -----END PGP SIGNATURE----- |
| ||||
| Martin Pitt [2007-12-04 23:43 +0100]: > So I tried to approach it from the other side: Building postgresql > with CFLAGS="-O0 -g" or "-O1 -g" works correctly, but with "-O2 -g" I > get above bug. Just FAOD, building with gcc 4.1 and -O2 works fine. I guess this sufficiently proves that this is a gcc 4.2 bug. Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHVduoDecnbV4Fd/IRAoP+AJ4hbORD9p1TI0tD+xToRWHsx0V7FACgtpu0 AcdRkY5hDjYHs5dI7DmImss= =wcbe -----END PGP SIGNATURE----- |
| Thread Tools | |
| Display Modes | |
|
|