vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| We have a multi-threading application using third party tool "CPLEX" which had been running for quite number of years without any problem. Lately, we have encountered with jobs hanging. We noticed this problem after upgrading from HP-UX 11.0 to 11.11. There was no code change when upgrading to HP-UX 11.11 (only compilation in the new environment). The following is from the stack dump. As you can see thread 1 is waiting for thread 2 to finish before joining. However, thread 2 never finishes. This problem does not always happen from run to run, but on the average happens in 1 out of 10 runs. It occurs more often when the machine is heavily used with multiple jobs running at the same time and each job uses 2 or more threads. Any idea what's happening? uname -a: HP-UX carlsbad B.11.11 U 9000/800 unknown unknown HP-UX thread 2 where #0 0x40000000003618a4 in autoorder_par () #1 0x40000000001a76a4 in posforkstub () #2 0xc00000000006c250 in __pthread_body () from /usr/lib/pa20_64/libpthread.1 #3 0xc000000000076b0c in __pthread_start () from /usr/lib/pa20_64/libpthread.1 thread 1 where #0 0xc000000000076dbc in ___lwp_wait_sys () from /usr/lib/pa20_64/libpthread.1 warning: reading `r3' register: No data #1 0xc000000000076870 in _lwp_wait () from /usr/lib/pa20_64/libpthread.1 warning: reading `r3' register: No data #2 0xc00000000006d778 in __vp_join () from /usr/lib/pa20_64/libpthread.1 warning: reading `r3' register: No data #3 0xc00000000006c368 in pthread_join () from /usr/lib/pa20_64/libpthread.1 #4 0x40000000001a761c in CPXPparfork () #5 0x40000000003612c8 in BAR_neword () #6 0x4000000000360054 in CPXPbar_doorder () #7 0x4000000000353c34 in CPXPbar_main () #8 0x400000000026714c in CPXPcpxbar () #9 0x4000000000266d68 in baropt () #10 0x40000000002651e0 in CPXPShybbaropt () #11 0x400000000018408c in CPXhybbaropt () #12 0x40000000001ea6d4 in bsolvelp () #13 0x40000000001e9ff8 in CPXPsolvelp () #14 0x40000000001aa490 in mipsetup () #15 0x40000000001a8b94 in mipopt () #16 0x40000000001a8368 in CPXPSmipopt () #17 0x40000000001841bc in CPXmipopt () #18 0x40000000000c7608 in CPXOptimizer::Optimize (this=0x80000001000395a0, optimizati #19 0x4000000000087390 in FAMOptimizerBase::SolveMIP (this=0x80000001000395a0, varGro #20 0x400000000008bffc in FAMOptimizerBase::BranchAndBound (this=0x80000001000395a0, #21 0x400000000008b218 in FAMOptimizerBase::BranchAndBound (this=0x80000001000395a0) #22 0x40000000000a9970 in SolveProblem (optimizerObj=0x80000001000395a0, isLegFamForS AFfamDir=0x800003ffe80f1b38 "/export/local_fs/riad", scenario=0x800003ffe80f1a30 #23 0x40000000000a7c14 in main (argc=3, argv=0x800003ffe80f1380) at FAM_OptDriver.C:1 Nurman |
| |||
| In <943aa8d.0404121154.68818bb5@posting.google.com> nurman.haripin@aa.com (Nurman Haripin) writes: [trimmed the newsgroup list - I suspect it's an operating system problem, not a general one] >We have a multi-threading application using third party tool "CPLEX" >which had been running for quite number of years without any problem. >Lately, we have encountered with jobs hanging. We noticed this problem >after upgrading from HP-UX 11.0 to 11.11. There was no code change >when upgrading to HP-UX 11.11 (only compilation in the new >environment). Have you applied the lastest and greatest patches for 11.11? Particularly if there are some for the libc & libpthread. Do you use thread local storage? How big did you set the thread stack size? I vaguely remember that the default stack size for threads changed from 11.0 to 11.something. At least, we got nice stack smashers back then..... >The following is from the stack dump. As you can see thread 1 is >waiting for thread 2 to finish before joining. However, thread 2 >never finishes. This problem does not always happen from run to run, >but on the average happens in 1 out of 10 runs. It occurs more often >when the machine is heavily used with multiple jobs running at the >same time and each job uses 2 or more threads. >Any idea what's happening? >uname -a: >HP-UX carlsbad B.11.11 U 9000/800 unknown unknown HP-UX >thread 2 where >#0 0x40000000003618a4 in autoorder_par () >#1 0x40000000001a76a4 in posforkstub () >#2 0xc00000000006c250 in __pthread_body () from >/usr/lib/pa20_64/libpthread.1 >#3 0xc000000000076b0c in __pthread_start () from >/usr/lib/pa20_64/libpthread.1 [del] To me, this looks like thread 2 has never properly started, if this is the whole stack trace of it. I would first check if this thread does something useful in this situation at all. If not, check for patches or report the problem to HP support (if you have a support contract, that is), HTH, Uli -- Dipl. Inf. Ulrich Teichert|e-mail: Ulrich.Teichert@gmx.de Stormweg 24 |listening to: Noticable One (Rotten Apples) 24539 Neumuenster, Germany|Obstacle 1 (Interpol) Paranoia (N.Y. Rel-X) |
| |||
| nurman.haripin@aa.com (Nurman Haripin) writes: > Any idea what's happening? Looks like your program fork()s after having created at least one thread. Fork()ing multithreaded programs is *extremely* tricky business, best avoided. Be sure you correctly lock/unlock all your mutexes in pre/post_fork, and that you execute no async-signal-unsafe functions between fork() and exec(). > Lately, we have encountered with jobs hanging. We noticed this problem > after upgrading from HP-UX 11.0 to 11.11. There was no code change It is quite likely that the race condition leading to the current deadlock always existed, but internal libc locks under 11.0 prevented it from ever showing up. If you can port/run your program under Linux, valgrind (with hellgrind skin) at http://valgrind.kde.org/tools.html and HP/Digital's Visual Threads at http://h21007.www2.hp.com/dspp/tech/...3,5062,00.html may provide some help. As I just dicovered, beta release of Visual Threads is also available on HP-UX 11.22/IA-64 Cheers, -- In order to understand recursion you must first understand recursion. Remove /-nsp/ for email. |
| |||
| > We have a multi-threading application using third party tool "CPLEX" > It occurs more often > when the machine is heavily used with multiple jobs running at the > Any idea what's happening? Install this patch: PHCO_29028 1.0 libsec cumulative patch Alain. |
| |||
| Hi, Did you do a check on the thread stack size for thread -2 ? use pthreadsetstack and check. Nurman Haripin wrote: > We have a multi-threading application using third party tool "CPLEX" > which had been running for quite number of years without any problem. > Lately, we have encountered with jobs hanging. We noticed this problem > after upgrading from HP-UX 11.0 to 11.11. There was no code change > when upgrading to HP-UX 11.11 (only compilation in the new > environment). > The following is from the stack dump. As you can see thread 1 is > waiting for thread 2 to finish before joining. However, thread 2 > never finishes. This problem does not always happen from run to run, > but on the average happens in 1 out of 10 runs. It occurs more often > when the machine is heavily used with multiple jobs running at the > same time and each job uses 2 or more threads. > Any idea what's happening? > > uname -a: > HP-UX carlsbad B.11.11 U 9000/800 unknown unknown HP-UX > > thread 2 where > #0 0x40000000003618a4 in autoorder_par () > #1 0x40000000001a76a4 in posforkstub () > #2 0xc00000000006c250 in __pthread_body () from > /usr/lib/pa20_64/libpthread.1 > #3 0xc000000000076b0c in __pthread_start () from > /usr/lib/pa20_64/libpthread.1 > > thread 1 where > #0 0xc000000000076dbc in ___lwp_wait_sys () from > /usr/lib/pa20_64/libpthread.1 > warning: reading `r3' register: No data > #1 0xc000000000076870 in _lwp_wait () from > /usr/lib/pa20_64/libpthread.1 > warning: reading `r3' register: No data > #2 0xc00000000006d778 in __vp_join () from > /usr/lib/pa20_64/libpthread.1 > warning: reading `r3' register: No data > #3 0xc00000000006c368 in pthread_join () from > /usr/lib/pa20_64/libpthread.1 > #4 0x40000000001a761c in CPXPparfork () > #5 0x40000000003612c8 in BAR_neword () > #6 0x4000000000360054 in CPXPbar_doorder () > #7 0x4000000000353c34 in CPXPbar_main () > #8 0x400000000026714c in CPXPcpxbar () > #9 0x4000000000266d68 in baropt () > #10 0x40000000002651e0 in CPXPShybbaropt () > #11 0x400000000018408c in CPXhybbaropt () > #12 0x40000000001ea6d4 in bsolvelp () > #13 0x40000000001e9ff8 in CPXPsolvelp () > #14 0x40000000001aa490 in mipsetup () > #15 0x40000000001a8b94 in mipopt () > #16 0x40000000001a8368 in CPXPSmipopt () > #17 0x40000000001841bc in CPXmipopt () > #18 0x40000000000c7608 in CPXOptimizer::Optimize > (this=0x80000001000395a0, optimizati > #19 0x4000000000087390 in FAMOptimizerBase::SolveMIP > (this=0x80000001000395a0, varGro > #20 0x400000000008bffc in FAMOptimizerBase::BranchAndBound > (this=0x80000001000395a0, > #21 0x400000000008b218 in FAMOptimizerBase::BranchAndBound > (this=0x80000001000395a0) > #22 0x40000000000a9970 in SolveProblem > (optimizerObj=0x80000001000395a0, isLegFamForS > AFfamDir=0x800003ffe80f1b38 "/export/local_fs/riad", > scenario=0x800003ffe80f1a30 > #23 0x40000000000a7c14 in main (argc=3, argv=0x800003ffe80f1380) at > FAM_OptDriver.C:1 > > > > Nurman |
| |||
| Sorry, but how do you use "pthreadsetstack"? Is it a command in gnu debugger? Kiran K Patel <KiranKPatel@verizon.net> wrote in message news:<5G0fc.5904$hg1.5198@nwrddc02.gnilink.net>... > Hi, > > Did you do a check on the thread stack size for thread -2 ? > use pthreadsetstack and check. > > Nurman Haripin wrote: > > We have a multi-threading application using third party tool "CPLEX" > > which had been running for quite number of years without any problem. > > Lately, we have encountered with jobs hanging. We noticed this problem > > after upgrading from HP-UX 11.0 to 11.11. There was no code change > > when upgrading to HP-UX 11.11 (only compilation in the new > > environment). > > The following is from the stack dump. As you can see thread 1 is > > waiting for thread 2 to finish before joining. However, thread 2 > > never finishes. This problem does not always happen from run to run, > > but on the average happens in 1 out of 10 runs. It occurs more often > > when the machine is heavily used with multiple jobs running at the > > same time and each job uses 2 or more threads. > > Any idea what's happening? > > > > uname -a: > > HP-UX carlsbad B.11.11 U 9000/800 unknown unknown HP-UX > > > > thread 2 where > > #0 0x40000000003618a4 in autoorder_par () > > #1 0x40000000001a76a4 in posforkstub () > > #2 0xc00000000006c250 in __pthread_body () from > > /usr/lib/pa20_64/libpthread.1 > > #3 0xc000000000076b0c in __pthread_start () from > > /usr/lib/pa20_64/libpthread.1 > > > > thread 1 where > > #0 0xc000000000076dbc in ___lwp_wait_sys () from > > /usr/lib/pa20_64/libpthread.1 > > warning: reading `r3' register: No data > > #1 0xc000000000076870 in _lwp_wait () from > > /usr/lib/pa20_64/libpthread.1 > > warning: reading `r3' register: No data > > #2 0xc00000000006d778 in __vp_join () from > > /usr/lib/pa20_64/libpthread.1 > > warning: reading `r3' register: No data > > #3 0xc00000000006c368 in pthread_join () from > > /usr/lib/pa20_64/libpthread.1 > > #4 0x40000000001a761c in CPXPparfork () > > #5 0x40000000003612c8 in BAR_neword () > > #6 0x4000000000360054 in CPXPbar_doorder () > > #7 0x4000000000353c34 in CPXPbar_main () > > #8 0x400000000026714c in CPXPcpxbar () > > #9 0x4000000000266d68 in baropt () > > #10 0x40000000002651e0 in CPXPShybbaropt () > > #11 0x400000000018408c in CPXhybbaropt () > > #12 0x40000000001ea6d4 in bsolvelp () > > #13 0x40000000001e9ff8 in CPXPsolvelp () > > #14 0x40000000001aa490 in mipsetup () > > #15 0x40000000001a8b94 in mipopt () > > #16 0x40000000001a8368 in CPXPSmipopt () > > #17 0x40000000001841bc in CPXmipopt () > > #18 0x40000000000c7608 in CPXOptimizer::Optimize > > (this=0x80000001000395a0, optimizati > > #19 0x4000000000087390 in FAMOptimizerBase::SolveMIP > > (this=0x80000001000395a0, varGro > > #20 0x400000000008bffc in FAMOptimizerBase::BranchAndBound > > (this=0x80000001000395a0, > > #21 0x400000000008b218 in FAMOptimizerBase::BranchAndBound > > (this=0x80000001000395a0) > > #22 0x40000000000a9970 in SolveProblem > > (optimizerObj=0x80000001000395a0, isLegFamForS > > AFfamDir=0x800003ffe80f1b38 "/export/local_fs/riad", > > scenario=0x800003ffe80f1a30 > > #23 0x40000000000a7c14 in main (argc=3, argv=0x800003ffe80f1380) at > > FAM_OptDriver.C:1 > > > > > > > > Nurman |
| ||||
| In comp.sys.hp.hpux Nurman Haripin <nurman.haripin@aa.com> wrote: > Sorry, but how do you use "pthreadsetstack"? Is it a command in gnu > debugger? Could that be pthread_attr_getstacksize() $ man pthread_attr_getstacksize Reformatting entry. Wait... done pthread_attr(3T) pthread_attr(3T) Pthread Library NAME pthread_attr_set*(), pthread_attr_get*() - set and get thread attributes SYNOPSIS #include <pthread.h> rick jones -- portable adj, code that compiles under more than one compiler these opinions are mine, all mine; HP might not want them anyway... feel free to post, OR email to raj in cup.hp.com but NOT BOTH... |