vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a single threaded application (daemon) which links to the pthread library on HPUX 11.11. The daemon listens on a socket and forks off a new process to serve the incoming connection. It also does a waitpid for its children on receipt of SIGCHLD. It hangs after running for sometime (2-3 hours) in ksleep. This code has been working fine on all other platforms: Solaris, Linux, AIX. Basically, the application code calls some non-thread safe functions. To implement critical section around this, we use pthread_mutex_lock and pthread_mutex_unlock. Thought our application is single threaded, we have critical sections since, this code is a part of a shared library (used by other multithreaded applications) Is this a problem with the pthread library implementation on HP 11.11? Because, it is not reproducible on HP 11.31. gdb does not show any useful information tusc: igvec(SIGCLD, 0x7f7f10a0, 0x7f7f10b0) .................................................. .................................................. = 0 Received signal 18, SIGCLD, in setpgrp(), [caught], no siginfo setpgrp(2) .................................................. .................................................. ............................ ERR#1 EPERM waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 14006 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 sigvec(SIGCLD, 0x7f7f1650, 0x7f7f1660) .................................................. .................................................. = 0 sigprocmask(SIG_BLOCK, 0x7f7f0c6c, 0x7f7f0c8c) .................................................. .......................................... = 0 fork() .................................................. .................................................. ................................ = 14009 sigprocmask(SIG_SETMASK, 0x7f7f0c8c, NULL) .................................................. .............................................. = 0 close(5) .................................................. .................................................. .............................. = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 sigvec(SIGCLD, 0x7f7f10a0, 0x7f7f10b0) .................................................. .................................................. = 0 select(5, 0x7f7f0c00, NULL, NULL, NULL) .................................................. ................................................. = 1 accept(4, 0x7f7f0d80, 0x7f7f0d7c) .................................................. .................................................. ..... = 5 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 sigvec(SIGCLD, 0x7f7f10a0, 0x7f7f10b0) .................................................. .................................................. = 0 setpgrp(2) .................................................. .................................................. ............................ ERR#1 EPERM sigprocmask(SIG_BLOCK, 0x7f7f0c6c, 0x7f7f0c8c) .................................................. .......................................... = 0 fork() .................................................. .................................................. ................................ = 14010 sigprocmask(SIG_SETMASK, 0x7f7f0c8c, NULL) .................................................. .............................................. = 0 close(5) .................................................. .................................................. .............................. = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 sigvec(SIGCLD, 0x7f7f10a0, 0x7f7f10b0) .................................................. .................................................. = 0 select(5, 0x7f7f0c00, NULL, NULL, NULL) .................................................. ................................................. = 1 accept(4, 0x7f7f0d80, 0x7f7f0d7c) .................................................. .................................................. ..... = 5 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 Received signal 18, SIGCLD, in waitpid(), [caught], no siginfo waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 14005 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 sigvec(SIGCLD, 0x7f7f1b10, 0x7f7f1b20) .................................................. .................................................. = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 sigvec(SIGCLD, 0x7f7f10a0, 0x7f7f10b0) .................................................. .................................................. = 0 setpgrp(2) .................................................. .................................................. ............................ ERR#1 EPERM sigprocmask(SIG_BLOCK, 0x7f7f0c6c, 0x7f7f0c8c) .................................................. .......................................... = 0 fork() .................................................. .................................................. ................................ = 14011 Received signal 18, SIGCLD, in sigprocmask(), [caught], no siginfo sigprocmask(SIG_SETMASK, 0x7f7f0c8c, NULL) .................................................. .............................................. = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 14007 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 sigvec(SIGCLD, 0x7f7f1710, 0x7f7f1720) .................................................. .................................................. = 0 close(5) .................................................. .................................................. .............................. = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 sigvec(SIGCLD, 0x7f7f10a0, 0x7f7f10b0) .................................................. .................................................. = 0 select(5, 0x7f7f0c00, NULL, NULL, NULL) .................................................. ................................................. = 1 accept(4, 0x7f7f0d80, 0x7f7f0d7c) .................................................. .................................................. ..... = 5 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 Received signal 18, SIGCLD, in user mode, [caught], no siginfo waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 14008 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 waitpid(-1, WIFEXITED(0), WNOHANG) .................................................. .................................................. .... = 0 ksleep(PTH_MUTEX_OBJECT, 0x7b031178, 0x7b031180, NULL) .................................................. .................................. [sleeping] ( Detaching from process 8625 ("bin/rscd") ) -------------------------------------------------------------------------------------------------------------------------------------------- Reading symbols from /usr/lib/libnss_dns.1...done. Reading symbols from /usr/lib/libnsl.1...done. Reading symbols from /usr/lib/libxti.2...done. ---Type <return> to continue, or q <return> to quit--- Reading symbols from /usr/lib/libnss_files.1...done. 0xc01feb68 in ?? () from /lib/libc.2 (gdb) thread apply all bt (gdb) where #0 0xc01feb68 in ?? () from /lib/libc.2 #1 0xc003da4c in ?? () from /lib/libpthread.1 #2 0xc0210668 in ?? () from /lib/libc.2 #3 0xc0207cc8 in ?? () from /lib/libc.2 #4 0xc0207c48 in ?? () from /lib/libc.2 (gdb) thread apply all bt (gdb) bt #0 0xc01feb68 in ?? () from /lib/libc.2 #1 0xc003da4c in ?? () from /lib/libpthread.1 #2 0xc0210668 in ?? () from /lib/libc.2 #3 0xc0207cc8 in ?? () from /lib/libc.2 #4 0xc0207c48 in ?? () from /lib/libc.2 (gdb) thread bt Thread ID 0 not known. Use the "info threads" command to see the IDs of currently known threads. (gdb) thread apply all bt (gdb) bt #0 0xc01feb68 in ?? () from /lib/libc.2 #1 0xc003da4c in ?? () from /lib/libpthread.1 #2 0xc0210668 in ?? () from /lib/libc.2 #3 0xc0207cc8 in ?? () from /lib/libc.2 #4 0xc0207c48 in ?? () from /lib/libc.2 (gdb) where #0 0xc01feb68 in ?? () from /lib/libc.2 #1 0xc003da4c in ?? () from /lib/libpthread.1 #2 0xc0210668 in ?? () from /lib/libc.2 #3 0xc0207cc8 in ?? () from /lib/libc.2 #4 0xc0207c48 in ?? () from /lib/libc.2 (gdb) help List of classes of commands: aliases -- Aliases of other commands |
| |||
| There is an another variation of this issue that i came accross: [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 [13783] ksleep(RELATIVE_TIMEOUT_VALUE|PTH_SPINLOCK_OBJECT, 0x7b02f748, NULL, 0x7f7f2350) .................................................. = -ETIMEDOUT [13783] sched_yield() .................................................. .................................................. ................. = 0 |
| |||
| rajeshjangam@gmail.com wrote: > Basically, the application code calls some non-thread safe > functions. To implement critical section around this, we use > pthread_mutex_lock and pthread_mutex_unlock. > Thought our application is single threaded, we have critical > sections since, this code is a part of a shared library (used by > other multithreaded applications) Are you sure that the other multithreaded applications are making the same pthread_mutex_lock/unlock calls around their calls (if any) to the non-thread-safe functions? If it _always_ hangs after two or three hours you might consider tuscing the entire run, although that would create a _lot_ of data, and unless the mutex actually blocks, it won't go through a system-call path. While I won't rule-out a problem with the pthread library, my first instinct (naturally, given my position to wonder if there were a path which slipped a mutex call somewhere. Perhaps something got nested that wasn't meant to be. rick jones -- The glass is neither half-empty nor half-full. The glass has a leak. The real question is "Can it be patched?" these opinions are mine, all mine; HP might not want them anyway... feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... |
| |||
| Rick, My responses inline: Yes these are lock/unlock calls implemented around functions like "gethostbyname" and "inet_ntoa". I removed the lock/unlock calls temporarily, rebuilt the application and tried it again. But still it hangs consitently after around 2-3 hours. Thanks, Rajesh |
| |||
| rajeshjangam@gmail.com wrote: > Yes these are lock/unlock calls implemented around functions like > "gethostbyname" and "inet_ntoa". Um, I was under the impression that gethostbyname() was threadsave on HP-UX. I recall being involved in questions concering the thread-scalability of that call. What leads you to believe that HP-UX 11i gethostbyname() is not thread-safe? Drifting... folks should be migrating from gethostbyname() to getaddrinfo() by now... > I removed the lock/unlock calls temporarily, rebuilt the application > and tried it again. But still it hangs consitently after around 2-3 > hours. Well, if you can get it distilled down to a small test case, then it would be good to phone the Response Centre. rick jones -- oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag these opinions are mine, all mine; HP might not want them anyway... feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH... |
| |||
| On Jul 2, 11:40 am, Rick Jones <rick.jon...@hp.com> wrote: > Um, I was under the impression that gethostbyname() was threadsave on > HP-UX. I recall being involved in questions concering the > thread-scalability of that call. What leads you to believe that HP-UX > 11i gethostbyname() is not thread-safe? This is common code for all Operating systems side of it, was enclosed by the lock/unlock calls. Yes we'll consider moving to getaddrinfo now. > Well, if you can get it distilled down to a small test case, then it > would be good to phone the Response Centre. Someone suggested me (on one of the ITRC HPUX forums) to apply the pthread library cumulative patch PHCO_36229. This did not help either. I just discovered that if I remove the linking to pthread library completely, the application works fine without hanging. As you suggested, I am going to create a sample program to demonstrate this particular problem and send it to HP support. Best Regards, Rajesh |
| ||||
| Rajesh, rajeshjangam@gmail.com wrote: > It hangs after running for sometime (2-3 hours) in ksleep. The process hangs here while trying to acquire a mutex. The very likely reason for this is that this mutex is in a "locked" state and the lock owner has already exited without doing the unlock. Do you know if this process ever became multi-threaded before it got hung? To get more insights, would you please do the following after you attach thru' gdb and post the output: (gdb) f 1 (gdb) x /40x $r26 While it does appear like some 'libc' mutex, it is hard to tell which mutex is this without a proper stacktrace. It will be very helpful, if you could come with a test case. If not, we at least need a good stacktrace. Are you running the latest version of 'gdb' from HP? Thanks, --Vasu |