This is a discussion on How to investigate crash inside LIBC ? within the AIX Operating System forums, part of the Unix Operating Systems category; --> Hi All. AIX 5.2 (uname -a = "AIX {OurServer} 2 5 000AC99D4C00"). I have a problem - crash (illegal ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi All. AIX 5.2 (uname -a = "AIX {OurServer} 2 5 000AC99D4C00"). I have a problem - crash (illegal instruction) occured somewhere inside in LIBC.A. Or course more probably it happened due to bug in my program (build in form of shared lib) but how I find where it happened? Could you please help me with this? The test program do very simple thing - load my shared library, and exits (source code see below). Test program and shared lib compiled with GCC 3.3.3. As you can see test program really simple then I assume that source of problem is my shared lib but how to find the crash point ?... The GDB shows the following: 745 return 0; (gdb) 746 } (gdb) stepi 0x10002cf4 746 } (gdb) 0x10002cf8 in main (argc=2, argv=0x2ff22934) at tcpkick.cpp:746 746 } (gdb) 0x10002cfc 746 } (gdb) 0x10002d00 746 } (gdb) 0x10002d04 746 } (gdb) 0x10002d08 746 } (gdb) 0x100001dc in __start () (gdb) 0x100001e0 in __start () (gdb) 0x100001e4 in __start () (gdb) 0x100001e8 in __start () (gdb) 0x100001f0 in __start () (gdb) 0x1000f110 in exit () (gdb) 0x1000f114 in exit () (gdb) 0x1000f118 in exit () (gdb) 0x1000f11c in exit () (gdb) 0x1000f120 in exit () (gdb) 0x1000f124 in exit () (gdb) 0xd01e4124 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4128 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e412c in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4130 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4134 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4138 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e413c in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4140 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e417c in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4180 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4184 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4188 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e418c in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4190 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4194 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e4198 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e419c in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e41a0 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e41a4 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01e41a8 in exit () from /usr/lib/libc.a(shr.o) (gdb) 0xd01cef88 in _ptrgl () from /usr/lib/libc.a(shr.o) (gdb) 0xd01cef8c in _ptrgl () from /usr/lib/libc.a(shr.o) (gdb) 0xd01cef90 in _ptrgl () from /usr/lib/libc.a(shr.o) (gdb) 0xd01cef94 in _ptrgl () from /usr/lib/libc.a(shr.o) (gdb) 0xd01cef98 in _ptrgl () from /usr/lib/libc.a(shr.o) (gdb) 0xd01cef9c in _ptrgl () from /usr/lib/libc.a(shr.o) (gdb) Program received signal SIGILL, Illegal instruction. 0x00000000 in ?? () (gdb) WBR, Dmitry. ps. source code of test program: ===begin=== #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <dlfcn.h> int main( int argc, char* argv[] ) { if (argc <= 1) { printf("\nUsage: zload lib_name [lib_name]"); return 0; } for (int i=1; i<argc; i++) { printf("\nLoading \"%s\"... ", argv[i]); void* h = dlopen(argv[i], RTLD_NOW); if (h == NULL) { char* info = dlerror(); printf("error (%d). Msg=\"%s\"", errno, (info != NULL ? info : "(null)") ); } } printf("\ndone.\nexiting...\n"); return 0; } ===end=== when run it prints: Loading "xddapplib.so"... done. exiting... Illegal instruction (core dumped) |
| |||
| Btw, is is possible at all to see (for debug purposes) sources of libc ? "Dmitry Bond." <dima_ben@ukr.net> wrote in message news:1100625415.425707@moxa.united.net.ua... > Hi All. [...] > (gdb) > 0xd01cef98 in _ptrgl () from /usr/lib/libc.a(shr.o) > (gdb) > 0xd01cef9c in _ptrgl () from /usr/lib/libc.a(shr.o) > (gdb) > |
| |||
| "Dmitry Bond." <dima_ben@ukr.net> writes: > AIX 5.2 (uname -a = "AIX {OurServer} 2 5 000AC99D4C00"). > > I have a problem - crash (illegal instruction) occured somewhere inside in > LIBC.A. Actually, the crash is most likely happening because libc had an exit-handler registered with it, and that handler jumped to location 0. You may wish to set a breakpoint on atexit(), and see what handlers are registered at the time of dlopen(). The problem is most likely due to incorrect building of the shared lib. How did you build it? > Btw, is is possible at all to see (for debug purposes) sources of libc ? No, they are not released to the public. BTW, please do not top-post. Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email. |
| |||
| "Paul Pluzhnikov" <ppluzhnikov-nsp@charter.net> wrote in message news:m37jolgzin.fsf@salmon.parasoft.com... > "Dmitry Bond." <dima_ben@ukr.net> writes: [...] > location 0. You may wish to set a breakpoint on atexit(), and see > what handlers are registered at the time of dlopen(). Thank you very much for your answer. And sorry I forgot to thank you for your answer concerning the limiting set of exported functions... :-) > The problem is most likely due to incorrect building of the > shared lib. How did you build it? We are using gcc 3.3.3 and gmake 3.80 to build our binaries. For compiling C sources we are using the command line: gcc -c -x c -g -fexceptions -fpic -DDEBUG -D_DEBUG -D__XORA -D__UNIX -D__AIX -DPROGNAME="xddapplib" -I/oracle/9201/xdk/include [...others -I skipped...] unit_x1.c -o unit_x1.o For compiling C++ sources we are using: g++ -c -x c++ -g -fpic -DDEBUG -D_DEBUG -D__XORA -D__UNIX -D__AIX -DPROGNAME="xddappli b" -I/oracle/9201/xdk/include [...others -I skipped...] unit_x2.cpp -o unit_x2.o And finally, link all that stuff with: g++ -g -Wl,-bE:def_server.exp orautil.a libtux_stub.so -L/oracle/9201/lib32 -lclntsh -shared -o xddapplib.so unit_x1.o unit_x2.o [...others *.o...] Can you see anything wrong here? WBR, Dmitry. |
| |||
| >>The problem is most likely due to incorrect building of the >>shared lib. How did you build it? > > > We are using gcc 3.3.3 and gmake 3.80 to build our binaries. Was the gcc bootstrapped on this OS level, incl. maintenance level? The gcc uses kinda wrapper around the system libc (libgcc). Does your gcc uses ld from GNU binutils or the system ld? Just a guess for a direction. What's in your gcc's specs file? -- Uli (Reply to ulrich <dot> link <domain-delimiter> epost <dot> de) |
| |||
| "Dmitry Bond." <dima_ben@ukr.net> writes: > And finally, link all that stuff with: [...] > Can you see anything wrong here? No, I can't. One thing you can do is look for "deferred resolution" symbols (these are most likely to jump to 0), and (if there are any) verify that a definition is available for each at runtime: dump -Tv xddapplib.so | grep 'undef .* \.\.' Other then that, see if my 'atexit' theory leads anywhere. Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email. |
| ||||
| "Paul Pluzhnikov" <ppluzhnikov-nsp@charter.net> wrote in message news:m3fz38bjvc.fsf@salmon.parasoft.com... > "Dmitry Bond." <dima_ben@ukr.net> writes: [...] > dump -Tv xddapplib.so | grep 'undef .* \.\.' > Other then that, see if my 'atexit' theory leads anywhere. Thank you. :-) You were right. I found a bug in my shared lib. There was NULL object pointer in finalization code. That bug never happened if some functions of library were called but occurs if library was unloaded without using it... WBR, Dmitry. |