Unix Technical Forum

Random pids being "16113 Killed "

This is a discussion on Random pids being "16113 Killed " within the comp.unix.solaris forums, part of the Solaris Operating System category; --> Thomas Schulz wrote: > In article <1169046937.102623.318180@51g2000cwl.googlegroups. com>, > Tranz <mcass22135@hotmail.com> wrote: > > > >Dexthor wrote: > >> ...


Go Back   Unix Technical Forum > Unix Operating Systems > Solaris Operating System > comp.unix.solaris

FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

 

LinkBack Thread Tools Display Modes
  #11 (permalink)  
Old 01-12-2008, 01:49 AM
Tranz
 
Posts: n/a
Default Re: Random pids being "16113 Killed "


Thomas Schulz wrote:
> In article <1169046937.102623.318180@51g2000cwl.googlegroups. com>,
> Tranz <mcass22135@hotmail.com> wrote:
> >
> >Dexthor wrote:
> >> > Dexthor, I did think of this and we have 100s, sometimes 1000s of
> >> > simulationous processes running, so narrowing it down could be
> >> > difficult. And if there was something running in the background
> >> > looking for patterns to kill. I would imagine it would kill the same
> >> > process everytime. Where as now it dies once and then upon restart it
> >> > is just fine.
> >> >
> >> > None of the processes run as root.
> >>
> >> For example: Suppose I am writing a smart watchdog script which reads a
> >> file or someother source for patterns a PID string or a process string.
> >> ps -ef |grep "pattern"|awk '{ print $2}'|xargs kill
> >>
> >> if the pattern above is a PID string, and suppose it is 162, it can
> >> match any PID which has pattern "162" in it. I think pgrep also
> >> suffers from similar shortfalls.
> >>
> >> If I were you, I will start with any processes that seem to be "shell
> >> scripts" which is where there is a good chance for mistakes.
> >>
> >> Can you sweep the server for all "shell scripts and see which ones try
> >> to do a kill" ?? Something to start with ?
> >>
> >> -Dexthor.

> >
> >Ahhh ok I understand what you are saying. Intresting. Ya I will scan
> >for kill.
> >
> >Odd thing is I have done some experments using the kill command. I have
> >never got stderr to print out "16113 Killed" It always says "Terminated"
> >

>
> If you do a 'kill -9' or a 'kill -KILL' you will get a 'Killed'.
> --
> Tom Schulz
> schulz@adi.com


I scanned all the production jobs, none have a kill command in them.
Now there still could be some that run as root. I am unable to view
those.

Also thanks about the kill -9, that does put Killed, but still can't
get it to say 16113 Killed.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 01-12-2008, 01:49 AM
Matt Atterbury
 
Posts: n/a
Default Re: Random pids being "16113 Killed "

"Tranz" <mcass22135@hotmail.com> writes:

> We run all of our batch thru Trivoli job scheduler, and until the past
> day I thought that maybe the problem. But when this same thing happened
> at the command line. not running thru that scheduler, i began to
> believe that it was not the problem.


Are you running them from the command line as the same uid as used
by the Tivoli job scheduler? If so, it's still possible that it is
killing them - you could try running them as a uid that Tivoli
doesn't have permission to kill.

m.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 01-12-2008, 01:50 AM
Wen-King Su
 
Posts: n/a
Default Re: Random pids being "16113 Killed "

In a previous article "Tranz" <mcass22135@hotmail.com> writes:
>
>Also thanks about the kill -9, that does put Killed, but still can't
>get it to say 16113 Killed.


% sh
$ sleep 10000 &
10811
$ kill -9 10811
$
10811 Killed
$

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 01-12-2008, 01:50 AM
Tranz
 
Posts: n/a
Default Re: Random pids being "16113 Killed "


Wen-King Su wrote:
> In a previous article "Tranz" <mcass22135@hotmail.com> writes:
> >
> >Also thanks about the kill -9, that does put Killed, but still can't
> >get it to say 16113 Killed.

>
> % sh
> $ sleep 10000 &
> 10811
> $ kill -9 10811
> $
> 10811 Killed
> $


Thanks Wen.

Ok so it does look like something is killing this pid and only this
pid. Now I need to figure out what is doing it.
At this point I am def assuming there is something running in cron that
is doing this. Perhpas still tivoli.

I'll let you know what i find out.
Thanks for all the help guys.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 01-12-2008, 01:50 AM
greek_philosophizer@hotmail.com
 
Posts: n/a
Default Re: Random pids being "16113 Killed "


Tranz wrote:

> Ok so it does look like something is killing this pid and only this
> pid. Now I need to figure out what is doing it.
> At this point I am def assuming there is something running in cron that
> is doing this. Perhpas still tivoli.
>
> I'll let you know what i find out.
> Thanks for all the help guys.


If it is an intermittent issue, just kick off a bunch of trussed sleeps
and wait for a hit.

COUNT=1

while [ $COUNT -le 1000 ]
do

truss sleep 9999999999 > $COUNT 2>&1

COUNT=`expr $COUNT + 1 `

done

..

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 01-12-2008, 01:50 AM
Tranz
 
Posts: n/a
Default Re: Random pids being "16113 Killed "


greek_philosophizer@hotmail.com wrote:
> Tranz wrote:
>
> > Ok so it does look like something is killing this pid and only this
> > pid. Now I need to figure out what is doing it.
> > At this point I am def assuming there is something running in cron that
> > is doing this. Perhpas still tivoli.
> >
> > I'll let you know what i find out.
> > Thanks for all the help guys.

>
> If it is an intermittent issue, just kick off a bunch of trussed sleeps
> and wait for a hit.
>
> COUNT=1
>
> while [ $COUNT -le 1000 ]
> do
>
> truss sleep 9999999999 > $COUNT 2>&1
>
> COUNT=`expr $COUNT + 1 `
>
> done
>
> .


exaclty what I am trying now.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 01-12-2008, 01:50 AM
Tranz
 
Posts: n/a
Default Re: Random pids being "16113 Killed "


greek_philosophizer@hotmail.com wrote:
> Tranz wrote:
>
> > Ok so it does look like something is killing this pid and only this
> > pid. Now I need to figure out what is doing it.
> > At this point I am def assuming there is something running in cron that
> > is doing this. Perhpas still tivoli.
> >
> > I'll let you know what i find out.
> > Thanks for all the help guys.

>
> If it is an intermittent issue, just kick off a bunch of trussed sleeps
> and wait for a hit.
>
> COUNT=1
>
> while [ $COUNT -le 1000 ]
> do
>
> truss sleep 9999999999 > $COUNT 2>&1
>
> COUNT=`expr $COUNT + 1 `
>
> done
>
> .


Message didn't show up. But i am trying something simular now.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #18 (permalink)  
Old 01-12-2008, 01:52 AM
Tranz
 
Posts: n/a
Default Re: Random pids being "16113 Killed "

I wrote a sciprt that spins thru pids, until it gets close to 16113.
Then it calls another sciprt that just puts the pid to sleep for 99999.
I am hoping this will at least prove the hypothsis that this pid is
being killed by something.

I didn't end up using truss, b/c in my other test, if a kill -9 is used
it won't tell you the issuing pid that killed it. Only if a standard
kill (pid) is used then it will show what pid killed it.

Here is the quick script I wrote up.

i=0
until [ $i -gt 10000000 ]
do
echo "$!" &
pid=$!
wait ${pid}
if [ ${pid} -gt 16090 ] && [ ${pid} -lt 16150 ]
then
x=0
until [ $x -gt 20 ]
do
test_loop.ksh &
pid2=$!
if [ ${pid2} -eq 16113 ]
then
echo "PID FOUND"
fi
x=`expr $x + 1`
done
sleep 400
fi
i=`expr $i + 1`
done

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #19 (permalink)  
Old 01-12-2008, 01:57 AM
greek_philosophizer@hotmail.com
 
Posts: n/a
Default Re: Random pids being "16113 Killed "



On Jan 19, 3:05 pm, "Tranz" <mcass22...@hotmail.com> wrote:
> I wrote a sciprt that spins thru pids, until it gets close to 16113.
> Then it calls another sciprt that just puts the pid to sleep for 99999.
> I am hoping this will at least prove the hypothsis that this pid is
> being killed by something.
>
> I didn't end up using truss, b/c in my other test, if a kill -9 is used
> it won't tell you the issuing pid that killed it. Only if a standard
> kill (pid) is used then it will show what pid killed it.
>
> Here is the quick script I wrote up.
>
> i=0
> until [ $i -gt 10000000 ]
> do
> echo "$!" &
> pid=$!
> wait ${pid}
> if [ ${pid} -gt 16090 ] && [ ${pid} -lt 16150 ]
> then
> x=0
> until [ $x -gt 20 ]
> do
> test_loop.ksh &
> pid2=$!
> if [ ${pid2} -eq 16113 ]
> then
> echo "PID FOUND"
> fi
> x=`expr $x + 1`
> done
> sleep 400
> fi
> i=`expr $i + 1`
> done


any luck?

..

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #20 (permalink)  
Old 01-12-2008, 02:55 AM
Tranz
 
Posts: n/a
Default Re: Random pids being "16113 Killed "

Well after a few months of this I think we finally got a solution..

It appears some monitoring software, TNG, was the culpriate..


What we are under the assumption now is that, TNG is recycled every so
often, and the script that does it, finds the pids that belong to TNG.
When it attempts to kill them, it is able to kill 1 of x. It
contiunes to try to kill them all, including the one that it already
killed, until all of them are dead. For whatever reason some of the
pids won't be killed so it sits in memory killing those pids. When our
batch grabs that one pid that was killed, the TNG agent kills it off
again, thinking it is one of its own.

Now we gotta get the system guys to fix this or remove. That will be
another problem.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:04 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com