vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| just discovered that no all mail local, uucp, and smtp is stuck in mqueue for no apparent reason - syslog shows entries such as below: any idea what my problem might be? Aug 30 16:07:02 casedrum sendmail[5839]: rejecting connections on daemon Daemon0 : load average: 36 Aug 30 16:07:17 casedrum sendmail[5839]: rejecting connections on daemon Daemon0 : load average: 36 Aug 30 16:07:37 casedrum sendmail[5839]: runqueue: Skipping queue run -- load av erage too high Aug 30 16:07:37 casedrum sendmail[5839]: rejecting connections on daemon Daemon0 : load average: 35 |
| |||
| Ron Kirschner typed (on Thu, Aug 30, 2007 at 04:12:29PM -0400): | just discovered that no all mail local, uucp, and smtp is stuck in mqueue | for no apparent reason - syslog shows entries such as below: | any idea what my problem might be? | | | | Aug 30 16:07:02 casedrum sendmail[5839]: rejecting connections on daemon Daemon0 : load average: 36 | Aug 30 16:07:17 casedrum sendmail[5839]: rejecting connections on daemon Daemon0 : load average: 36 | Aug 30 16:07:37 casedrum sendmail[5839]: runqueue: Skipping queue run -- load average too high | Aug 30 16:07:37 casedrum sendmail[5839]: rejecting connections on daemon Daemon0: load average: 35 Please consider the possibility that the problem is exactly what you are being told: the load average is too high. -- JP ==> http://www.frappr.com/cusm <== |
| |||
| On Aug 30, 1:30 pm, Jean-Pierre Radley <j...@jpr.com> wrote: > Ron Kirschner typed (on Thu, Aug 30, 2007 at 04:12:29PM -0400): > | just discovered that no all mail local, uucp, and smtp is stuck in mqueue > | for no apparent reason - syslog shows entries such as below: > | any idea what my problem might be? > | > | > | > | Aug 30 16:07:02 casedrum sendmail[5839]: rejecting connections on daemon Daemon0 : load average: 36 > | Aug 30 16:07:17 casedrum sendmail[5839]: rejecting connections on daemon Daemon0 : load average: 36 > | Aug 30 16:07:37 casedrum sendmail[5839]: runqueue: Skipping queue run -- load average too high > | Aug 30 16:07:37 casedrum sendmail[5839]: rejecting connections on daemon Daemon0: load average: 35 > > Please consider the possibility that the problem is exactly what you are > being told: the load average is too high. > > -- > JP > ==>http://www.frappr.com/cusm<== By which my esteemed colleague means that it seems like a sendmail issue, not a Unixware issue. Sendmail's configuration file specifies load limits beyond which it will merely queue requests or reject them totally (QueueLA and RefuseLA, respectively). Not sure what Unixware's defaults are but except in a dedicated mail server they'd typically be well below the 35% range. If you're sure that the pending mail is legitimate (that is, your system isn't originating or relaying spam) you can force a temporary override with the appropriate sendmail -O option or permanently change the limits. --RLR |
| |||
| In article <fb78p8$oco$1@aioe.org>, Ron Kirschner <ron@jedron.com> wrote: >just discovered that no all mail local, uucp, and smtp is stuck in mqueue >for no apparent reason - syslog shows entries such as below: >any idea what my problem might be? > > > >Aug 30 16:07:02 casedrum sendmail[5839]: rejecting connections on daemon >Daemon0 >: load average: 36 >Aug 30 16:07:17 casedrum sendmail[5839]: rejecting connections on daemon >Daemon0 >: load average: 36 >Aug 30 16:07:37 casedrum sendmail[5839]: runqueue: Skipping queue run -- >load av >erage too high >Aug 30 16:07:37 casedrum sendmail[5839]: rejecting connections on daemon >Daemon0 >: load average: 35 > > As others said there are settings in the sendmail.cf about what to do at certain load averages. Typical default [at least on my 8.14.1 version will only queue messages when the load average is 8. When the load average is 12 it refuses connections. At this time the only thing that will happen is the machine will try to process what is in the queue. If this is large you may find that you are actually running in swap space as the system tries to manage the mess. I had this happen when a client got hit by one of the bots, and I wound up getting spams for them at the rate of 15/second, and I could barely log into the machine. That's the only problem I've found with a 100Mbit/sec connection into a 40Gb/sec tier 1 link. Later I could not login and it required going to the console [about 5 miles away at the colo] and logging in there, as the ssh login just wouldn't make it. At that point my load average was close to 500!!!!! There were over 100,000 messages in the queue, and it was trying to process all of those. So check your queue. I moved everything from the queue to another place, and then slowly worked through the queue getting the message IDs of anything destined for that client, and then removing them. That took about 5 hours as I didn't dare lose anything for other clients on the machine. So from what you have described, you have a mqueue that is full. Kill all of sendmail. Move the mqueue to something like mqueue-hold, recreate mqueue, and make the permissions the same as the original and restart sendmail. Then start working your way through the files in mqueue-hold. I had so many that no utilities would work easily so I wildcarded with *[0-5]0, *[6-9]0, *[0-5]1, and so on. I'd grep for the destination and direct that to a file with the message IDs so I could remove the ones I needed. Then when it was all over I moved the files from mqueue-hold back to mqueue. Lots of luck. If the above scenario is true, then you have a fair amount of work ahead of you. Bill -- Bill Vermillion - bv @ wjv . com |
| |||
| ----- Original Message ----- From: "ThreeStar" <sco@3starsoftware.com> Newsgroups: comp.unix.sco.misc To: <distro@jpr.com> Sent: Thursday, August 30, 2007 6:38 PM Subject: Re: unixware 7.1.4 sendmail issues > On Aug 30, 1:30 pm, Jean-Pierre Radley <j...@jpr.com> wrote: >> Ron Kirschner typed (on Thu, Aug 30, 2007 at 04:12:29PM -0400): >> | just discovered that no all mail local, uucp, and smtp is stuck in >> mqueue >> | for no apparent reason - syslog shows entries such as below: >> | any idea what my problem might be? >> | >> | >> | >> | Aug 30 16:07:02 casedrum sendmail[5839]: rejecting connections on >> daemon Daemon0 : load average: 36 >> | Aug 30 16:07:17 casedrum sendmail[5839]: rejecting connections on >> daemon Daemon0 : load average: 36 >> | Aug 30 16:07:37 casedrum sendmail[5839]: runqueue: Skipping queue >> run -- load average too high >> | Aug 30 16:07:37 casedrum sendmail[5839]: rejecting connections on >> daemon Daemon0: load average: 35 >> >> Please consider the possibility that the problem is exactly what you are >> being told: the load average is too high. >> >> -- >> JP >> ==>http://www.frappr.com/cusm<== > > By which my esteemed colleague means that it seems like a sendmail > issue, not a Unixware issue. Sendmail's configuration file specifies > load limits beyond which it will merely queue requests or reject them > totally (QueueLA and RefuseLA, respectively). Not sure what > Unixware's defaults are but except in a dedicated mail server they'd > typically be well below the 35% range. > > If you're sure that the pending mail is legitimate (that is, your > system isn't originating or relaying spam) you can force a temporary > override with the appropriate sendmail -O option or permanently change > the limits. A load average of 35 is not 35% of anything. What it is is astronomical. Something somewhere on the box is running away like crazy, or has been building up over time. Forcing the mail to go might resolve it if it just happens to be the mail server itself that is running away and making the load average so high, but we don't know anything of the sort at this point. It might be a cron job that's running every night and hanging every night, thus adding 1 to the load average every day. Like a tape backup asking /dev/null for a second tape. Or it might be a one time fluke event like a report that generated a zillion emails or faxes or print jobs... Or it might be a pc on the network with a virus slamming some service it found on the server like the web server or mail or facetwin/samba/visiofs. Or maybe the machine has a raid array in a degraded state running very slow causing normal ops to pile up with no obvious problem processes to account for it. Or maybe... anything. So the first thing is find out what are the 35 or 36 processes trying to run that the cpu can't find time to get to: Find cpu hogs: ps -eopcpu,pid,tty,args |sort The worst offenders will be at the bottom of the list and don't worry about the ones the scrolled off the top. Or if you wanna get fancy yet can't install "top" from skunkware: ----------------- #!/bin/ksh # ptop - quick-n-dirty top process display # usage: ptop [n] # shows top n cpu hogs. Default is 10. # brian@aljex.com N=${1:-10} while true ; do clear uptime echo "Top $N CPU Hogs..." echo "%CPU PID TTY COMMAND" ps -eopcpu,pid,tty,args |sort -rn |head -$N sleep 1 done ----------------- Except the processes currently using the most cpu might not be offending anything. So just look at that but don't read too much into it yet. Do look at this though... Find any non-sleeping processes: ps -elf |awk '($2!="S"){print $0}' Any non sleepers, that are eating cpu, and that have been running a long time, are likely culprits. Non-sleepers that aren't eating cpu might just be normal stuff that's being held up by other stuff. Be careful what you kill and why. Brian K. White brian@aljex.com http://www.myspace.com/KEYofR +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! |
| ||||
| I found a bunch of processes started by cron running one of my programs that had an error. Once I knew that load average meant system load, not anything to do with sendmaill load which I originally assumed, I found and killed the hung jobs. Thanks for the help. "Brian K. White" <brian@aljex.com> wrote in message news:00a101c7ebb7$ab5c3d80$6d00000a@venti... > > ----- Original Message ----- > From: "ThreeStar" <sco@3starsoftware.com> > Newsgroups: comp.unix.sco.misc > To: <distro@jpr.com> > Sent: Thursday, August 30, 2007 6:38 PM > Subject: Re: unixware 7.1.4 sendmail issues > > >> On Aug 30, 1:30 pm, Jean-Pierre Radley <j...@jpr.com> wrote: >>> Ron Kirschner typed (on Thu, Aug 30, 2007 at 04:12:29PM -0400): >>> | just discovered that no all mail local, uucp, and smtp is stuck in >>> mqueue >>> | for no apparent reason - syslog shows entries such as below: >>> | any idea what my problem might be? >>> | >>> | >>> | >>> | Aug 30 16:07:02 casedrum sendmail[5839]: rejecting connections on >>> daemon Daemon0 : load average: 36 >>> | Aug 30 16:07:17 casedrum sendmail[5839]: rejecting connections on >>> daemon Daemon0 : load average: 36 >>> | Aug 30 16:07:37 casedrum sendmail[5839]: runqueue: Skipping queue >>> run -- load average too high >>> | Aug 30 16:07:37 casedrum sendmail[5839]: rejecting connections on >>> daemon Daemon0: load average: 35 >>> >>> Please consider the possibility that the problem is exactly what you are >>> being told: the load average is too high. >>> >>> -- >>> JP >>> ==>http://www.frappr.com/cusm<== >> >> By which my esteemed colleague means that it seems like a sendmail >> issue, not a Unixware issue. Sendmail's configuration file specifies >> load limits beyond which it will merely queue requests or reject them >> totally (QueueLA and RefuseLA, respectively). Not sure what >> Unixware's defaults are but except in a dedicated mail server they'd >> typically be well below the 35% range. >> >> If you're sure that the pending mail is legitimate (that is, your >> system isn't originating or relaying spam) you can force a temporary >> override with the appropriate sendmail -O option or permanently change >> the limits. > > A load average of 35 is not 35% of anything. What it is is astronomical. > Something somewhere on the box is running away like crazy, or has been > building up over time. Forcing the mail to go might resolve it if it just > happens to be the mail server itself that is running away and making the > load average so high, but we don't know anything of the sort at this > point. > It might be a cron job that's running every night and hanging every night, > thus adding 1 to the load average every day. Like a tape backup asking > /dev/null for a second tape. Or it might be a one time fluke event like a > report that generated a zillion emails or faxes or print jobs... Or it > might > be a pc on the network with a virus slamming some service it found on the > server like the web server or mail or facetwin/samba/visiofs. Or maybe the > machine has a raid array in a degraded state running very slow causing > normal ops to pile up with no obvious problem processes to account for it. > Or maybe... anything. > > So the first thing is find out what are the 35 or 36 processes trying to > run > that the cpu can't find time to get to: > > Find cpu hogs: > ps -eopcpu,pid,tty,args |sort > The worst offenders will be at the bottom of the list and don't worry > about > the ones the scrolled off the top. > > Or if you wanna get fancy yet can't install "top" from skunkware: > > ----------------- > #!/bin/ksh > # ptop - quick-n-dirty top process display > # usage: ptop [n] > # shows top n cpu hogs. Default is 10. > # brian@aljex.com > > N=${1:-10} > while true ; do > clear > uptime > echo "Top $N CPU Hogs..." > echo "%CPU PID TTY COMMAND" > ps -eopcpu,pid,tty,args |sort -rn |head -$N > sleep 1 > done > ----------------- > > Except the processes currently using the most cpu might not be offending > anything. So just look at that but don't read too much into it yet. Do > look > at this though... > > Find any non-sleeping processes: > ps -elf |awk '($2!="S"){print $0}' > > > Any non sleepers, that are eating cpu, and that have been running a long > time, are likely culprits. > Non-sleepers that aren't eating cpu might just be normal stuff that's > being > held up by other stuff. > Be careful what you kill and why. > > Brian K. White brian@aljex.com http://www.myspace.com/KEYofR > +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. > filePro BBx Linux SCO FreeBSD #callahans Satriani Filk! > |