Unix Technical Forum

Problem with system() calls in a multithreaded program on HPUX 11

This is a discussion on Problem with system() calls in a multithreaded program on HPUX 11 within the HP-UX Operating System forums, part of the Unix Operating Systems category; --> Hello, I am porting a multithreaded program to HPUX 11 from Solaris, in which threads make calls to system() ...


Go Back   Unix Technical Forum > Unix Operating Systems > HP-UX Operating System

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 01-16-2008, 06:53 PM
Mahesh Kumar
 
Posts: n/a
Default Problem with system() calls in a multithreaded program on HPUX 11

Hello,

I am porting a multithreaded program to HPUX 11 from Solaris, in
which threads make calls to system() functions. The program basically
creates a number of threads and runs them specified number of times.
Each thread performs some task and creates a trace file. The threads
then verify the trace file against standard ones to check whether the
run was successful or not. The number of threads is variable.

Problem:
========
As I increase the number of threads, the program hangs, while trying
to run some system() call. If I remove the system() calls altogether,
the program runs fine. Below is an explanation of the problem followed
by the code. The program uses pthreads and runs fine on Solaris.


There may be some variable names used in explanation. These names
appear in code given after that.

Explanation:
============
If I increase the value of noOfThreads to say 3, 4 and so on. The
program hangs say around when noOfThreads is 6 or 7. Now as the
problem occurs, two three defunct processes are created. I ran "ps -f
-u" command and output was something like this (mtreg is the name of
above program)
-bash-2.05b$ ps -f -u mkumar
UID PID PPID C STIME TTY TIME COMMAND
mkumar 1726 1190 0 00:06:12 pts/ta 0:10 mtreg
mkumar 1190 1189 0 23:04:02 pts/ta 0:01 -bash
mkumar 1731 1726 0 00:06:20 pts/ta 0:00 <defunct>
mkumar 1730 1726 2 00:06:20 pts/ta 0:00 <defunct>
mkumar 1743 0 0 00:06:20 pts/ta 0:00 mtreg
mkumar 1741 1726 0 00:06:21 pts/ta 0:00 sh -c perl strip.pl
/export/home/configdev/tmp/FAAa01726mod0
mkumar 1742 1741 0 00:06:21 pts/ta 0:00 perl strip.pl
/export/home/configdev/tmp/FAAa01726mod0456a.m
mkumar 1751 1190 5 00:07:48 pts/ta 0:00 ps -f -u mkumar

Before hanging the output at the console was:
================================================== ==============================
Running perl strip.pl /export/home/configdev/tmp/EAAa07614mod0456a.myt
Running perl strip.pl /export/home/configdev/tmp/DAAa07614mod0456a.myt
Running perl strip.pl /export/home/configdev/tmp/AAAa07614mod0456a.myt
Running perl strip.pl /export/home/configdev/tmp/CAAa07614mod0456a.myt
Finished running: perl strip.pl
/export/home/configdev/tmp/EAAa07614mod0456a.myt
Running diff -w mod0456a.trc
/export/home/configdev/tmp/EAAa07614mod0456a.myt >
/export/home/configdev/tmp/EAAa07614mod0456a.myt.diff
Finished running diff -w mod0456a.trc
/export/home/configdev/tmp/EAAa07614mod0456a.myt >
/export/home/configdev/tmp/EAAa07614mod0456a.myt.diff
Running perl strip.pl /export/home/configdev/tmp/BAAa07614mod0456a.myt
Finished running: perl strip.pl
/export/home/configdev/tmp/DAAa07614mod0456a.myt
Running diff -w mod0456a.trc
/export/home/configdev/tmp/DAAa07614mod0456a.myt >
/export/home/configdev/tmp/DAAa07614mod0456a.myt.diff
Finished running: perl strip.pl
/export/home/configdev/tmp/AAAa07614mod0456a.myt
Running perl strip.pl /export/home/configdev/tmp/FAAa01726mod0456a.myt
Running diff -w mod0456a.trc
/export/home/configdev/tmp/AAAa07614mod0456a.myt >
/export/home/configdev/tmp/AAAa07614mod0456a.myt.diff
Finished running diff -w mod0456a.trc
/export/home/configdev/tmp/DAAa07614mod0456a.myt >
/export/home/configdev/tmp/DAAa07614mod0456a.myt.diff
Running diff -w mod0456a.trc
/export/home/configdev/tmp/CAAa07614mod0456a.myt >
/export/home/configdev/tmp/CAAa07614mod0456a.myt.diff
================================================== ==============================

Now some things that I observed are:
1. I started only one mtreg process (PID 1726). But when the program
hanged, there is one more mtreg process with PPID 0 which is there. It
was idle.
2. Each time the program hangs, there are one or more defunct
processes.
3. I am unable to kill the program once it hangs, and system has to be
rebooted.
4. The number of threads for which the program hangs is not fixed. It
can hang at 5, 6 ,7 or 8 threads. It even hanged once for only 4
threads
5. Although last statement is Running "diff", it has not yet started.
6. I tried an experiment in which I removed all the system() function
calls, and instead placed fclose(fopen(diffFileName, "w")). It meant
just creating the file without doing anything.
This time I was able to run the program even with 10 threads each
doing 10 iterations. And it seems that the program might run fine for
any number of threads. ( I checked uptil 15 threads).

================================================== ==============================

CODE: ( The code is representative of whole code. It may not be
compilable)


#include <pthread.h>
#include <iostream.h>
#include <stdio.h>
#include <string.h>

#define noOfThreads 1
#define noOfIterations 1

char outFileName[512];
char standardTraceFile[512];

void * threadStartRoutine(void* p);

void doOneIterationOfThread();

/*
* Creates a number of threads and runs them. Waits for their
completion and then exits.
*/
void createThreadsAndRun()
{
pthread_t threadList[noOfThreads];
for(int i=0; i<noOfThreads; i++)
{
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM );
pthread_create(&threadList[i],&attr, threadStartRoutine, nextReq());
cerr << "start thread " << i << endl;
}
for (i = 0; i < noOfThreads; i++)
{
pthread_join(threadList[i], NULL);
cerr << "finish thread " << i << endl;
}

}

int main(int argc, char* argv)
{
//The arguments are not shown, as the functions are just
representative of the function they are intended to perform
setOutFileName(); //depending on argc and argv set the value of
outFileName; outFileName is the filename for trace file
setStandardTraceFileName(); //obtained from one of the arguments.
sets the variable standard trace file name
createThreadsAndRun();
}

void * threadStartRoutine(void* p)
{
char* prefix = tempnam(NULL,"");
sprintf(newTraceFile, "%s%s", prefix, outFileName); // set the
tracefile name
for(i = 0; i<noOfIterations; i++)
{
//do some initializations
if(!doOneIterationOfThread())
{
cerr<<"Run failed: "<<newTraceFile<<"for run "<<i<<endl;
}
else
{
cerr<<"Run suzzessful: "<<newTraceFile<<"for run "<<i<<endl;
}
}
}

int doOneIterationOfThread()
{
doCoreWork(); //writes trace into the tracefile with actual values
if(verifyTrace(standardTraceFile, newTraceFile) != 0) //to verify
this run of the thread
{
return false;
}
else
{
return true;
}
}

/*
* tracefile names are with full path
*/
int verifyTrace(char* standardTraceFile, char* newTraceFile)
{
char cmd[512];
char diffFileName[512];
sprintf(diffFileName, "%s.diff", newTraceFile);

sprintf(cmd, "perl strip.pl %s", newTraceFile);
cerr<<"Running "<<cmd<<endl;
system(cmd);
cerr<<"Finished running: "<<cmd<<endl;

sprintf(cmd, "diff -w %s %s > %s", standardTraceFile, newTraceFile,
diffFileName);
cerr<<"Running "<<cmd<<endl;
system(cmd);
cerr<<"Finished running: "<<cmd<<endl;

struct stat buf;
stat(diffFileName, &buf);

unlink(diffFileName);

if(buf.st_size == 0)
return true;
else
return false;
}

/*
* Note ******************
* "perl strip.pl <new-trace-file-name>" actually brings the file into
a normalized form. It means, that it
* changes the values that are run dependent in the trace file, like
time stamps and some other info to predecided
* normal value. ( Like time stamps may be converted to 0x0) This
makes the new trace file and standard trace file
* comparable. strip.pl (perl script) performs this task by
substituting regular expressions.
* Note Ends *************
*/

================================================== ==============================

Can anyone please tell me why system() calls are causing problem in
HPUX 11 whereas the same thing runs fine on Solaris? It would be
really great if you can suggest a possible solution?

Thanks and regards,

Mahesh Kumar
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 01-16-2008, 06:53 PM
Joe Seigh
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX 11

On 11 Feb 2005 03:46:23 -0800, Mahesh Kumar <maheshkumarjha@gmail.com> wrote:

> Hello,
>
> I am porting a multithreaded program to HPUX 11 from Solaris, in
> which threads make calls to system() functions. The program basically
> creates a number of threads and runs them specified number of times.
> Each thread performs some task and creates a trace file. The threads
> then verify the trace file against standard ones to check whether the
> run was successful or not. The number of threads is variable.
>
> Problem:
> ========
> As I increase the number of threads, the program hangs, while trying
> to run some system() call. If I remove the system() calls altogether,
> the program runs fine. Below is an explanation of the problem followed
> by the code. The program uses pthreads and runs fine on Solaris.
>
>
> There may be some variable names used in explanation. These names
> appear in code given after that.
>
> Explanation:
> ============
> If I increase the value of noOfThreads to say 3, 4 and so on. The
> program hangs say around when noOfThreads is 6 or 7. Now as the
> problem occurs, two three defunct processes are created. I ran "ps -f
> -u" command and output was something like this (mtreg is the name of
> above program)
> -bash-2.05b$ ps -f -u mkumar
> UID PID PPID C STIME TTY TIME COMMAND
> mkumar 1726 1190 0 00:06:12 pts/ta 0:10 mtreg
> mkumar 1190 1189 0 23:04:02 pts/ta 0:01 -bash
> mkumar 1731 1726 0 00:06:20 pts/ta 0:00 <defunct>
> mkumar 1730 1726 2 00:06:20 pts/ta 0:00 <defunct>
> mkumar 1743 0 0 00:06:20 pts/ta 0:00 mtreg
> mkumar 1741 1726 0 00:06:21 pts/ta 0:00 sh -c perl strip.pl
> /export/home/configdev/tmp/FAAa01726mod0
> mkumar 1742 1741 0 00:06:21 pts/ta 0:00 perl strip.pl
> /export/home/configdev/tmp/FAAa01726mod0456a.m
> mkumar 1751 1190 5 00:07:48 pts/ta 0:00 ps -f -u mkumar
>

[...]
>
> Can anyone please tell me why system() calls are causing problem in
> HPUX 11 whereas the same thing runs fine on Solaris? It would be
> really great if you can suggest a possible solution?
>


Probably the SIGCHLD signal handing got messed up. You have defunct
programs that have finished but have not had their exit status collected
yet. Even though system() is supposed to be thread safe it's way too
sensitive to signal disposition to be using in anything but a a single
threaded program.

Change your program to be single threaded and use fork(), exec(),
and wait(). See the unix programming books by Stevens on how to
do it.

Using system() from threads was a major violation of the KISS rule
and you should expect to have problems when that happens. And you
should expect that we aren't going to try to make programs, that are
much more complicated than they should be, work.

--
Joe Seigh

Lock-free synchronization primitives
http://atomic-ptr-plus.sourceforge.net/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 01-16-2008, 06:53 PM
Paul Pluzhnikov
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX 11

maheshkumarjha@gmail.com (Mahesh Kumar) writes:

> I am porting a multithreaded program to HPUX 11 from Solaris, in
> which threads make calls to system() functions.


This is extremely bad idea (TM).

Writing multithreaded programs that fork() correctly requires
careful use of pthread_atfork handlers for every possible lock used,
both in your code and in every library you link against.

> The program basically
> creates a number of threads and runs them specified number of times.


I don't see why you couldn't just fork() N copies of your program.
Your threads don't appear to exchange any info while they run ...

> Now some things that I observed are: ...


These are all consistent with a race condition, where some mutex
is held across the fork().

> 2. Each time the program hangs, there are one or more defunct
> processes.


What you want to do is attach debugger to the parent of 'defunct',
and see why that parent is not wait()ing for the zombie.

The parent is likely deadlocked somewhere.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 01-16-2008, 06:53 PM
Stefaan A Eeckels
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX11

On Fri, 11 Feb 2005 07:48:37 -0500
"Joe Seigh" <jseigh_01@xemaps.com> wrote:

> On 11 Feb 2005 03:46:23 -0800, Mahesh Kumar <maheshkumarjha@gmail.com>
> wrote:


[...]

> > Explanation:
> > ============
> > If I increase the value of noOfThreads to say 3, 4 and so on. The
> > program hangs say around when noOfThreads is 6 or 7. Now as the
> > problem occurs, two three defunct processes are created. I ran "ps
> > -f
> > -u" command and output was something like this (mtreg is the name of
> > above program)
> > -bash-2.05b$ ps -f -u mkumar
> > UID PID PPID C STIME TTY TIME COMMAND
> > mkumar 1726 1190 0 00:06:12 pts/ta 0:10 mtreg
> > mkumar 1190 1189 0 23:04:02 pts/ta 0:01 -bash
> > mkumar 1731 1726 0 00:06:20 pts/ta 0:00 <defunct>
> > mkumar 1730 1726 2 00:06:20 pts/ta 0:00 <defunct>
> > mkumar 1743 0 0 00:06:20 pts/ta 0:00 mtreg
> > mkumar 1741 1726 0 00:06:21 pts/ta 0:00 sh -c perl strip.pl
> > /export/home/configdev/tmp/FAAa01726mod0
> > mkumar 1742 1741 0 00:06:21 pts/ta 0:00 perl strip.pl
> > /export/home/configdev/tmp/FAAa01726mod0456a.m
> > mkumar 1751 1190 5 00:07:48 pts/ta 0:00 ps -f -u mkumar
> >

> [...]
> >
> > Can anyone please tell me why system() calls are causing problem in
> > HPUX 11 whereas the same thing runs fine on Solaris? It would be
> > really great if you can suggest a possible solution?
> >

>
> Probably the SIGCHLD signal handing got messed up. You have defunct
> programs that have finished but have not had their exit status
> collected
> yet. Even though system() is supposed to be thread safe it's way too
> sensitive to signal disposition to be using in anything but a a single
> threaded program.


The Solaris 9 man page says that system() isn't thread-safe:

USAGE
The system() function manipulates the signal handlers for
SIGINT, SIGQUIT, and SIGCHLD. For this reason it is not
safe to call system() in a multithreaded process. Concurrent
calls to system() will interfere destructively with the
disposition of these signals, even if they are not manipu-
lated by other threads in the application. See popen(3C) for
a replacement for system() that is thread-safe.

So the fact that the program runs on Solaris is pure luck.

> Change your program to be single threaded and use fork(), exec(),
> and wait(). See the unix programming books by Stevens on how to
> do it.


The correct way to go about this is to use popen(). There's
no need to be so drastic and forgo multi-threading altogether,
but the OP should ask himself if it's really required to
achieve the desired result.

> Using system() from threads was a major violation of the KISS rule
> and you should expect to have problems when that happens. And you
> should expect that we aren't going to try to make programs, that are
> much more complicated than they should be, work.


Well, that depends on what one considers to be "simple". At first
sight, "system()" is less complex than "popen()".

System() abuse lures the inexperienced programmer.

--
Stefaan
--
As complexity rises, precise statements lose meaning,
and meaningful statements lose precision. -- Lotfi Zadeh
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 01-16-2008, 06:53 PM
Patrick TJ McPhee
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX 11

In article <m3mzub6rjq.fsf@salmon.parasoft.com>,
Paul Pluzhnikov <ppluzhnikov-nsp@charter.net> wrote:
% maheshkumarjha@gmail.com (Mahesh Kumar) writes:
%
% > I am porting a multithreaded program to HPUX 11 from Solaris, in
% > which threads make calls to system() functions.
%
% This is extremely bad idea (TM).
%
% Writing multithreaded programs that fork() correctly requires
% careful use of pthread_atfork handlers for every possible lock used,
% both in your code and in every library you link against.

I'd say that atfork handlers are largely useless. Whatever you do with
mutexes, for instance, is irrelevant because the child process isn't
allowed to lock or unlock them anyway. execing a new process image is
allowed, though. system() is problematic, but the problem is with
waiting for the child process to finish, rather than with forking and
execing.



--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 01-16-2008, 06:53 PM
Mahesh Kumar
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX 11

ptjm@interlog.com (Patrick TJ McPhee) wrote in message news:<37a1suF55hncsU2@uni-berlin.de>...
> In article <m3mzub6rjq.fsf@salmon.parasoft.com>,
> Paul Pluzhnikov <ppluzhnikov-nsp@charter.net> wrote:
> % maheshkumarjha@gmail.com (Mahesh Kumar) writes:
> %
> % > I am porting a multithreaded program to HPUX 11 from Solaris, in
> % > which threads make calls to system() functions.
> %
> % This is extremely bad idea (TM).
> %
> % Writing multithreaded programs that fork() correctly requires
> % careful use of pthread_atfork handlers for every possible lock used,
> % both in your code and in every library you link against.
>
> I'd say that atfork handlers are largely useless. Whatever you do with
> mutexes, for instance, is irrelevant because the child process isn't
> allowed to lock or unlock them anyway. execing a new process image is
> allowed, though. system() is problematic, but the problem is with
> waiting for the child process to finish, rather than with forking and
> execing.


Well, just to make sure that there was no problem with fork and wait,
I tried replacing system() function call with following,

int my_system (char *command) {
int status;

if (command == 0)
return 1;
pid_t pid = fork();
if (pid == -1)
return -1;
if (pid == 0) {
char *argv[4];
argv[0] = "sh";
argv[1] = "-c";
argv[2] = command;
argv[3] = NULL;
execve("/bin/sh", argv, environ);
exit(127);
}
do {
if (waitpid(pid, &status, 0) == -1) {
if (errno != EINTR)
return -1;
} else
return status;
} while(1);
}

But still it hanged when I tried to run it for 6 threads. (I already
said number of threads is not consistent.) The only difference being
that there were no <defunct> processes this time.

Can anyone suggest a possible reason for still hanging?

Thanks and regards.
Mahesh Kumar
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 01-16-2008, 06:53 PM
Stefaan A Eeckels
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX11

On 13 Feb 2005 23:09:37 -0800
maheshkumarjha@gmail.com (Mahesh Kumar) wrote:

> Can anyone suggest a possible reason for still hanging?


Remember that the advice was to use fork()/exec()
_AND_ to go single-threaded.

The fork() functions suspend all threads in the process before
they run. Threads that are already running and are in an
uninterruptible wait inside the kernel cause the fork() to pause
until they exit the uninterruptible wait. As a result, fork(),
and your whole process, appears to be hung.

Using fork() In a multithreaded process can result in
interrupted blocking system calls - you should be prepared
to handle EINTR errors.

fork() per se isn't thread-safe, and requires TLC to mesh
well with threads. If you really have to continue with
your multi-threaded approach, read the popen(3) man
page carefully. Notice where it says:

| popen() and pclose() are thread-safe. These interfaces are not
| async-cancel-safe. A cancellation point may occur when a thread is
| executing popen() or pclose().

and consider whether your objectives cannot achieved through
a less complex approach.

Take care,

--
Stefaan
--
As complexity rises, precise statements lose meaning,
and meaningful statements lose precision. -- Lotfi Zadeh
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 01-16-2008, 06:53 PM
Dan Koren
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX 11



The briefest answer to your question is that system() is
not MT-safe, and is almost guaranteed to break if invoked
from a multi-threaded program.

Replace 'system(cmd)' by 'pclose(popen(cmd, "w"))' and
things should work as long as 'cmd' is reasonably well
behaved.



dk



"Mahesh Kumar" <maheshkumarjha@gmail.com> wrote in message
news:5bf55c06.0502110346.e055886@posting.google.co m...
> Hello,
>
> I am porting a multithreaded program to HPUX 11 from Solaris, in
> which threads make calls to system() functions. The program basically
> creates a number of threads and runs them specified number of times.
> Each thread performs some task and creates a trace file. The threads
> then verify the trace file against standard ones to check whether the
> run was successful or not. The number of threads is variable.
>
> Problem:
> ========
> As I increase the number of threads, the program hangs, while trying
> to run some system() call. If I remove the system() calls altogether,
> the program runs fine. Below is an explanation of the problem followed
> by the code. The program uses pthreads and runs fine on Solaris.
>
>
> There may be some variable names used in explanation. These names
> appear in code given after that.
>
> Explanation:
> ============
> If I increase the value of noOfThreads to say 3, 4 and so on. The
> program hangs say around when noOfThreads is 6 or 7. Now as the
> problem occurs, two three defunct processes are created. I ran "ps -f
> -u" command and output was something like this (mtreg is the name of
> above program)
> -bash-2.05b$ ps -f -u mkumar
> UID PID PPID C STIME TTY TIME COMMAND
> mkumar 1726 1190 0 00:06:12 pts/ta 0:10 mtreg
> mkumar 1190 1189 0 23:04:02 pts/ta 0:01 -bash
> mkumar 1731 1726 0 00:06:20 pts/ta 0:00 <defunct>
> mkumar 1730 1726 2 00:06:20 pts/ta 0:00 <defunct>
> mkumar 1743 0 0 00:06:20 pts/ta 0:00 mtreg
> mkumar 1741 1726 0 00:06:21 pts/ta 0:00 sh -c perl strip.pl
> /export/home/configdev/tmp/FAAa01726mod0
> mkumar 1742 1741 0 00:06:21 pts/ta 0:00 perl strip.pl
> /export/home/configdev/tmp/FAAa01726mod0456a.m
> mkumar 1751 1190 5 00:07:48 pts/ta 0:00 ps -f -u mkumar
>
> Before hanging the output at the console was:
> ================================================== ==============================
> Running perl strip.pl /export/home/configdev/tmp/EAAa07614mod0456a.myt
> Running perl strip.pl /export/home/configdev/tmp/DAAa07614mod0456a.myt
> Running perl strip.pl /export/home/configdev/tmp/AAAa07614mod0456a.myt
> Running perl strip.pl /export/home/configdev/tmp/CAAa07614mod0456a.myt
> Finished running: perl strip.pl
> /export/home/configdev/tmp/EAAa07614mod0456a.myt
> Running diff -w mod0456a.trc
> /export/home/configdev/tmp/EAAa07614mod0456a.myt >
> /export/home/configdev/tmp/EAAa07614mod0456a.myt.diff
> Finished running diff -w mod0456a.trc
> /export/home/configdev/tmp/EAAa07614mod0456a.myt >
> /export/home/configdev/tmp/EAAa07614mod0456a.myt.diff
> Running perl strip.pl /export/home/configdev/tmp/BAAa07614mod0456a.myt
> Finished running: perl strip.pl
> /export/home/configdev/tmp/DAAa07614mod0456a.myt
> Running diff -w mod0456a.trc
> /export/home/configdev/tmp/DAAa07614mod0456a.myt >
> /export/home/configdev/tmp/DAAa07614mod0456a.myt.diff
> Finished running: perl strip.pl
> /export/home/configdev/tmp/AAAa07614mod0456a.myt
> Running perl strip.pl /export/home/configdev/tmp/FAAa01726mod0456a.myt
> Running diff -w mod0456a.trc
> /export/home/configdev/tmp/AAAa07614mod0456a.myt >
> /export/home/configdev/tmp/AAAa07614mod0456a.myt.diff
> Finished running diff -w mod0456a.trc
> /export/home/configdev/tmp/DAAa07614mod0456a.myt >
> /export/home/configdev/tmp/DAAa07614mod0456a.myt.diff
> Running diff -w mod0456a.trc
> /export/home/configdev/tmp/CAAa07614mod0456a.myt >
> /export/home/configdev/tmp/CAAa07614mod0456a.myt.diff
> ================================================== ==============================
>
> Now some things that I observed are:
> 1. I started only one mtreg process (PID 1726). But when the program
> hanged, there is one more mtreg process with PPID 0 which is there. It
> was idle.
> 2. Each time the program hangs, there are one or more defunct
> processes.
> 3. I am unable to kill the program once it hangs, and system has to be
> rebooted.
> 4. The number of threads for which the program hangs is not fixed. It
> can hang at 5, 6 ,7 or 8 threads. It even hanged once for only 4
> threads
> 5. Although last statement is Running "diff", it has not yet started.
> 6. I tried an experiment in which I removed all the system() function
> calls, and instead placed fclose(fopen(diffFileName, "w")). It meant
> just creating the file without doing anything.
> This time I was able to run the program even with 10 threads each
> doing 10 iterations. And it seems that the program might run fine for
> any number of threads. ( I checked uptil 15 threads).
>
> ================================================== ==============================
>
> CODE: ( The code is representative of whole code. It may not be
> compilable)
>
>
> #include <pthread.h>
> #include <iostream.h>
> #include <stdio.h>
> #include <string.h>
>
> #define noOfThreads 1
> #define noOfIterations 1
>
> char outFileName[512];
> char standardTraceFile[512];
>
> void * threadStartRoutine(void* p);
>
> void doOneIterationOfThread();
>
> /*
> * Creates a number of threads and runs them. Waits for their
> completion and then exits.
> */
> void createThreadsAndRun()
> {
> pthread_t threadList[noOfThreads];
> for(int i=0; i<noOfThreads; i++)
> {
> pthread_attr_t attr;
> pthread_attr_init(&attr);
> pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM );
> pthread_create(&threadList[i],&attr, threadStartRoutine, nextReq());
> cerr << "start thread " << i << endl;
> }
> for (i = 0; i < noOfThreads; i++)
> {
> pthread_join(threadList[i], NULL);
> cerr << "finish thread " << i << endl;
> }
>
> }
>
> int main(int argc, char* argv)
> {
> //The arguments are not shown, as the functions are just
> representative of the function they are intended to perform
> setOutFileName(); //depending on argc and argv set the value of
> outFileName; outFileName is the filename for trace file
> setStandardTraceFileName(); //obtained from one of the arguments.
> sets the variable standard trace file name
> createThreadsAndRun();
> }
>
> void * threadStartRoutine(void* p)
> {
> char* prefix = tempnam(NULL,"");
> sprintf(newTraceFile, "%s%s", prefix, outFileName); // set the
> tracefile name
> for(i = 0; i<noOfIterations; i++)
> {
> //do some initializations
> if(!doOneIterationOfThread())
> {
> cerr<<"Run failed: "<<newTraceFile<<"for run "<<i<<endl;
> }
> else
> {
> cerr<<"Run suzzessful: "<<newTraceFile<<"for run "<<i<<endl;
> }
> }
> }
>
> int doOneIterationOfThread()
> {
> doCoreWork(); //writes trace into the tracefile with actual values
> if(verifyTrace(standardTraceFile, newTraceFile) != 0) //to verify
> this run of the thread
> {
> return false;
> }
> else
> {
> return true;
> }
> }
>
> /*
> * tracefile names are with full path
> */
> int verifyTrace(char* standardTraceFile, char* newTraceFile)
> {
> char cmd[512];
> char diffFileName[512];
> sprintf(diffFileName, "%s.diff", newTraceFile);
>
> sprintf(cmd, "perl strip.pl %s", newTraceFile);
> cerr<<"Running "<<cmd<<endl;
> system(cmd);
> cerr<<"Finished running: "<<cmd<<endl;
>
> sprintf(cmd, "diff -w %s %s > %s", standardTraceFile, newTraceFile,
> diffFileName);
> cerr<<"Running "<<cmd<<endl;
> system(cmd);
> cerr<<"Finished running: "<<cmd<<endl;
>
> struct stat buf;
> stat(diffFileName, &buf);
>
> unlink(diffFileName);
>
> if(buf.st_size == 0)
> return true;
> else
> return false;
> }
>
> /*
> * Note ******************
> * "perl strip.pl <new-trace-file-name>" actually brings the file into
> a normalized form. It means, that it
> * changes the values that are run dependent in the trace file, like
> time stamps and some other info to predecided
> * normal value. ( Like time stamps may be converted to 0x0) This
> makes the new trace file and standard trace file
> * comparable. strip.pl (perl script) performs this task by
> substituting regular expressions.
> * Note Ends *************
> */
>
> ================================================== ==============================
>
> Can anyone please tell me why system() calls are causing problem in
> HPUX 11 whereas the same thing runs fine on Solaris? It would be
> really great if you can suggest a possible solution?
>
> Thanks and regards,
>
> Mahesh Kumar



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 01-16-2008, 06:53 PM
Patrick TJ McPhee
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX 11

In article <5bf55c06.0502132309.4dccb49a@posting.google.com >,
Mahesh Kumar <maheshkumarjha@gmail.com> wrote:

[...]

% if (waitpid(pid, &status, 0) == -1) {

If the implementation of waitpid depends on SIGCHLD being delivered
to the waiting thread, then you could have problems because the
signal could be delivered to any thread. I'm not sure if this is
what's happening here, but it's worth considering.

% said number of threads is not consistent.) The only difference being
% that there were no <defunct> processes this time.

That suggests all the waits completed. Curious.
--

Patrick TJ McPhee
North York Canada
ptjm@interlog.com
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 01-16-2008, 06:54 PM
Otto Lind
 
Posts: n/a
Default Re: Problem with system() calls in a multithreaded program on HPUX 11

In article <4211a1e8@news.meer.net>,
"Dan Koren" <dankoren@yahoo.com> writes:
>
> Replace 'system(cmd)' by 'pclose(popen(cmd, "w"))' and
> things should work as long as 'cmd' is reasonably well
> behaved.


Note that this will crash if popen() fails, since passing in NULL to
pclose() will cause a segfault. I see this is used as a suggestion
in the popen man page on Solaris (in fact both examples they give
are buggy), they should really fix this so that naive users don't
write broken programs.

Otto
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:44 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com