vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Sorry for the multipost from alt.linux, but this group is better suited. Last week I set up a NIS master on our fileserver. Clients worked fine. Then, out of the blue, the machine started to fail rather horribly after running cron.daily. Server processes continued to operate, but ssh and local logins resulted in a hanging terminal. Samba was also non-functional. Over ssh, Ctrl-C and Ctrl-4 couldn't recover the prompt. We managed to gain the ability to execute commands on it remotely as root by a not entirely intentional backdoor, so we could investigate and fix the machine. After a while we managed to get a process list. There were a few dozen defunct sshd children and smb also had children. The most striking was a whole bunch of crond children which seemed to have hung up. root 1177 0.0 0.0 1568 148 ? S Nov07 0:02 crond root 1082 0.0 0.0 1568 108 ? S 04:42 0:00 \_ CROND smmsp 1084 0.0 0.0 5760 684 ? S 04:42 0:00 | \_ [sendmail] root 1091 0.0 0.0 1564 64 ? D 05:01 0:00 \_ CROND root 1093 0.0 0.0 1564 64 ? D 05:01 0:00 \_ CROND root 1095 0.0 0.0 1564 64 ? D 05:01 0:00 \_ CROND root 1109 0.0 0.0 1564 64 ? D 06:01 0:00 \_ CROND root 1111 0.0 0.0 1564 64 ? D 06:01 0:00 \_ CROND root 1113 0.0 0.0 1564 64 ? D 06:01 0:00 \_ CROND The list continued like this until the present - three hanging children every hour. Syslogd was also still running, even though /var/log/messages was empty: root 1037 0.0 0.0 1464 564 ? S Nov07 0:11 syslogd -m 0 Strangely, the /var/log/messages file seemed empty. We then attempted to restart sshd to see if that would allow logins. This got stuck and blocked the only way we could execute commands on the machine. Minutes later the NIS server failed as well and clients were beginning to have problems. We had to reboot the machine (after informing a few dozen customers) and expected it to break down during boot. Interestingly though, it booted and still runs normally. But now, /var/log/messages was no longer empty - there was a rather conflicting and suspicious line at the top (note the timestamp). Dec 1 14:42:45 tk7 syslogd 1.4.1: restart. Dec 1 04:08:03 tk7 su(pam_unix)[1026]: session opened for user news by (uid=0) Dec 1 04:08:03 tk7 su(pam_unix)[1026]: session closed for user news Dec 1 04:42:00 tk7 CROND[1078]: (root) CMD (run-parts /etc/cron.monthly) Dec 1 04:42:00 tk7 CROND[1083]: (root) CMD (root run-parts /etc/cron.monthly) /var/log/boot.log also seems to have moved back in time: Dec 1 15:45:47 tk7 sshd: Dec 1 15:45:47 tk7 sshd: succeeded Dec 1 15:45:48 tk7 tomcat: Starting tomcat: Dec 1 15:45:48 tk7 tomcat: tomcat. Dec 1 15:45:49 tk7 tomcat: Usage: /etc/init.d/tomcat {start|stop|restart|force-reload} Dec 1 15:45:49 tk7 rc: Starting tomcat: failed Dec 1 14:48:24 tk7 ntpd: succeeded Dec 1 14:48:24 tk7 ntpd: ntpd startup succeeded Dec 1 14:48:26 tk7 nfs: Starting NFS services: succeeded Dec 1 14:48:26 tk7 nfs: rpc.rquotad startup succeeded Dec 1 14:48:27 tk7 nfs: rpc.nfsd startup succeeded Dec 1 14:48:27 tk7 nfs: rpc.mountd startup succeeded I'm at a loss here and don't like it one bit. The machine is rather mission-critical and the possibility of a rootkit comes to mind, even though chkrootkit couldn't find a trace of it. Does this sound familiar to someone? Any comments appreciated. Steven Mocking |
| |||
| On Fri, 02 Dec 2005 11:44:40 +0100, Steven Mocking <s.mocking@gmail.com> wrote: > Sorry for the multipost from alt.linux, but this group is better > suited. > > Last week I set up a NIS master on our fileserver. Clients worked fine. > Then, out of the blue, the machine started to fail rather horribly > after running cron.daily. Server processes continued to operate, but > ssh and local logins resulted in a hanging terminal. Samba was also > non-functional. > > Over ssh, Ctrl-C and Ctrl-4 couldn't recover the prompt. We managed to > gain the ability to execute commands on it remotely as root by a not > entirely intentional backdoor, so we could investigate and fix the > machine. After a while we managed to get a process list. There were a > few dozen defunct sshd children and smb also had children. The most > striking was a whole bunch of crond children which seemed to have hung > up. Ugh, posting through Google is problematic, because they wrap the lines making the post so hard to read, that you seriouly risk loosing the best comments. > root 1177 0.0 0.0 1568 148 ? S Nov07 0:02 crond > root 1082 0.0 0.0 1568 108 ? S 04:42 0:00 \_ CROND > smmsp 1084 0.0 0.0 5760 684 ? S 04:42 0:00 | \_ [sendmail] > root 1091 0.0 0.0 1564 64 ? D 05:01 0:00 \_ CROND > root 1093 0.0 0.0 1564 64 ? D 05:01 0:00 \_ CROND > root 1095 0.0 0.0 1564 64 ? D 05:01 0:00 \_ CROND > root 1109 0.0 0.0 1564 64 ? D 06:01 0:00 \_ CROND > root 1111 0.0 0.0 1564 64 ? D 06:01 0:00 \_ CROND > root 1113 0.0 0.0 1564 64 ? D 06:01 0:00 \_ CROND > The list continued like this until the present - three hanging children > every hour. Syslogd was also still running, even though > /var/log/messages was empty: > root 1037 0.0 0.0 1464 564 ? S Nov07 0:11 syslogd -m 0 > Strangely, the /var/log/messages file seemed empty. We then attempted What about /var/log/messages.* ? Cron.daily runs logrotate, no? that would leave /var/log/messages empty. > to restart sshd to see if that would allow logins. This got stuck and > blocked the only way we could execute commands on the machine. Minutes > later the NIS server failed as well and clients were beginning to have > problems. We had to reboot the machine (after informing a few dozen > customers) and expected it to break down during boot. Interestingly > though, it booted and still runs normally. > But now, /var/log/messages was no longer empty - there was a rather > conflicting and suspicious line at the top (note the timestamp). > > Dec 1 14:42:45 tk7 syslogd 1.4.1: restart. > Dec 1 04:08:03 tk7 su(pam_unix)[1026]: session opened for user news by > (uid=0) What is your geographical location? Your timezone? > Dec 1 04:08:03 tk7 su(pam_unix)[1026]: session closed for user news > Dec 1 04:42:00 tk7 CROND[1078]: (root) CMD (run-parts > /etc/cron.monthly) > Dec 1 04:42:00 tk7 CROND[1083]: (root) CMD (root run-parts > /etc/cron.monthly) > > /var/log/boot.log also seems to have moved back in time: > > Dec 1 15:45:47 tk7 sshd: > Dec 1 15:45:47 tk7 sshd: succeeded > Dec 1 15:45:48 tk7 tomcat: Starting tomcat: > Dec 1 15:45:48 tk7 tomcat: tomcat. > Dec 1 15:45:49 tk7 tomcat: Usage: /etc/init.d/tomcat > {start|stop|restart|force-reload} > Dec 1 15:45:49 tk7 rc: Starting tomcat: failed mmm. Init runs "/etc/rc.d/rc 5" (or some other runlevel), and /etc/rc.d/rc runs the scripts with "start" or "stop". I would investicate why this script is complaining. But that may not be the most urgent, this could just be some old cruft. (I don't run apache, so I am no good source here, just my two cents.) > Dec 1 14:48:24 tk7 ntpd: succeeded Ntpd warps your time. One hour and three minutes, it seems. Use /sbin/hwclock to check what the hardware clock is doing. What time was it on your "wall clock" when you rebooted, do you know? > Dec 1 14:48:24 tk7 ntpd: ntpd startup succeeded > Dec 1 14:48:26 tk7 nfs: Starting NFS services: succeeded > Dec 1 14:48:26 tk7 nfs: rpc.rquotad startup succeeded > Dec 1 14:48:27 tk7 nfs: rpc.nfsd startup succeeded > Dec 1 14:48:27 tk7 nfs: rpc.mountd startup succeeded > > I'm at a loss here and don't like it one bit. The machine is rather > mission-critical and the possibility of a rootkit comes to mind, even > though chkrootkit couldn't find a trace of it. I don't thinks it smells terribly strongly of a rootkit quite yet. At least, if you find the messages.1, messages.2 etc are there and have reasonable contents. A hardware problem, perhaps. A disk failure, memory failure... Some data about the hw config? What does cron.daily do on your system? That is, what is in your /etc/cron.daily? > Does this sound familiar to someone? Not to me. I am in short supply of explanations of what was bugging the computer after the cron.daily. Perhaps it's a coincidence, cron.daily may be a false lead. But I would like to see suggestions from the newsgroup as to what could hav made e.g. sshd behave like that. If a rootkit is a serious threat, that is, if the computer holds data that could be easily exchanged for serious money by a resourcefull attacker: Rebuild the system from rpms, on a separate computer. That is, buy a (set of) new disk(s) and rebuild the disks containing the system software, the boot sectors, and the kernel. Supposing you already have all customer data on separate partitions, and no customer-controlled software is running with root privileges, you should be able to replace the newly built system disk with minimal disruption. Once you have the idea of a possible rootkit bugging you, checks like rpm -V (on rpm-based distributions) are not completely satisfactory, because the rootkit may include manipulation of the rpm database. Even the kernel could be manipulated, and cover the traces no matter what you do. You would have to build an equivalend rpm database on an uncompromised computer, and have rpm check a mirror of the server against it. But his is more work, probably, than just building a new system disk. -Enrique |
| |||
| Steven Mocking <s.mocking@gmail.com> wrote: > Over ssh, Ctrl-C and Ctrl-4 couldn't recover the prompt. We managed to > gain the ability to execute commands on it remotely as root by a not > entirely intentional backdoor, so we could investigate and fix the > machine. After a while we managed to get a process list. There were a > few dozen defunct sshd children and smb also had children. The most > striking was a whole bunch of crond children which seemed to have hung > up. Well, what are they waiting on? Run ps axl and check the WCHAN column. > root 1177 0.0 0.0 1568 148 ? S Nov07 0:02 crond > root 1082 0.0 0.0 1568 108 ? S 04:42 0:00 \_ > CROND > smmsp 1084 0.0 0.0 5760 684 ? S 04:42 0:00 | \_ > [sendmail] > root 1091 0.0 0.0 1564 64 ? D 05:01 0:00 \_ > CROND > root 1093 0.0 0.0 1564 64 ? D 05:01 0:00 \_ > CROND > root 1095 0.0 0.0 1564 64 ? D 05:01 0:00 \_ > CROND > root 1109 0.0 0.0 1564 64 ? D 06:01 0:00 \_ > CROND > root 1111 0.0 0.0 1564 64 ? D 06:01 0:00 \_ > CROND > root 1113 0.0 0.0 1564 64 ? D 06:01 0:00 \_ > CROND > The list continued like this until the present - three hanging children > every hour. Syslogd was also still running, even though Shrug - you have some hardware problem. Probably a fubared disk. Maybe just a NFS mount that has gone bad. Find out. > /var/log/messages was empty: > root 1037 0.0 0.0 1464 564 ? S Nov07 0:11 syslogd -m 0 ??? What do you mean? > Strangely, the /var/log/messages file seemed empty. We then attempted > to restart sshd to see if that would allow logins. This got stuck and Well, where did it stick? Strace it. > But now, /var/log/messages was no longer empty - there was a rather > conflicting and suspicious line at the top (note the timestamp). > Dec 1 14:42:45 tk7 syslogd 1.4.1: restart. > Dec 1 04:08:03 tk7 su(pam_unix)[1026]: session opened for user news by (uid=0) > Dec 1 04:08:03 tk7 su(pam_unix)[1026]: session closed for user news > Dec 1 04:42:00 tk7 CROND[1078]: (root) CMD (run-parts /etc/cron.monthly) > Dec 1 04:42:00 tk7 CROND[1083]: (root) CMD (root run-parts /etc/cron.monthly) I see nothing suspicious. Time adjust. So? Reset the bios clock so the adjust won't be necessary. > /var/log/boot.log also seems to have moved back in time: > Dec 1 15:45:47 tk7 sshd: > Dec 1 15:45:47 tk7 sshd: succeeded > Dec 1 15:45:48 tk7 tomcat: Starting tomcat: > Dec 1 15:45:48 tk7 tomcat: tomcat. > Dec 1 15:45:49 tk7 tomcat: Usage: /etc/init.d/tomcat {start|stop|restart|force-reload} > Dec 1 15:45:49 tk7 rc: Starting tomcat: failed > Dec 1 14:48:24 tk7 ntpd: succeeded > Dec 1 14:48:24 tk7 ntpd: ntpd startup succeeded > Dec 1 14:48:26 tk7 nfs: Starting NFS services: succeeded > Dec 1 14:48:26 tk7 nfs: rpc.rquotad startup succeeded > Dec 1 14:48:27 tk7 nfs: rpc.nfsd startup succeeded > Dec 1 14:48:27 tk7 nfs: rpc.mountd startup succeeded Well, you certainly have whacky clocking. > Does this sound familiar to someone? You forgot to trace what was happening. There isn't any data heer to point anywhere. Peter |
| ||||
| Enrique Perez-Terron schreef: > > Strangely, the /var/log/messages file seemed empty. We then attempted > > What about /var/log/messages.* ? > > Cron.daily runs logrotate, no? that would leave /var/log/messages empty. It rotates /var/log/messages at 4 in the morning - it was empty at 2 in the afternoon. Previous monthly crons produced a lot of logs after the last log in /var/log/messages.1. The older ones appeared rather normal after the reboot, but by then /var/log/messages had also materialised somehow. > > to restart sshd to see if that would allow logins. This got stuck and > > blocked the only way we could execute commands on the machine. Minutes > > later the NIS server failed as well and clients were beginning to have > > problems. We had to reboot the machine (after informing a few dozen > > customers) and expected it to break down during boot. Interestingly > > though, it booted and still runs normally. > > > But now, /var/log/messages was no longer empty - there was a rather > > conflicting and suspicious line at the top (note the timestamp). > > > > Dec 1 14:42:45 tk7 syslogd 1.4.1: restart. > > Dec 1 04:08:03 tk7 su(pam_unix)[1026]: session opened for user news by > > (uid=0) > > What is your geographical location? Your timezone? Amsterdam, Netherlands: GMT + 1 and daylight saving. And no, the timeserver isn't in Hawaii. It's a few blocks away at our provider. While an unprecedented feat of general relativity involving black holes and our office being kicked about the solar system faster than you can say "Where did the sun go?" *would* be nice to fantasize about, I'm afraid my cow-orkers wouldn't buy it. > > Dec 1 15:45:47 tk7 sshd: > > Dec 1 15:45:47 tk7 sshd: succeeded > > Dec 1 15:45:48 tk7 tomcat: Starting tomcat: > > Dec 1 15:45:48 tk7 tomcat: tomcat. > > Dec 1 15:45:49 tk7 tomcat: Usage: /etc/init.d/tomcat > > {start|stop|restart|force-reload} > > Dec 1 15:45:49 tk7 rc: Starting tomcat: failed > > mmm. Init runs "/etc/rc.d/rc 5" (or some other runlevel), and > /etc/rc.d/rc runs the scripts with "start" or "stop". I would > investicate why this script is complaining. But that may not > be the most urgent, this could just be some old cruft. (I don't > run apache, so I am no good source here, just my two cents.) Rewrote that from another distro - apparently fscking up the conditionals. Whoops, but it can't be related, since it wasn't being run until the reboot. > > Dec 1 14:48:24 tk7 ntpd: succeeded > > Ntpd warps your time. One hour and three minutes, it seems. Use > /sbin/hwclock to check what the hardware clock is doing. What time > was it on your "wall clock" when you rebooted, do you know? We could only execute commands through a rather hard to exploit hole in a piece of software still running on the server. When we tried to restart sshd, it blocked and no further access could be gained. All we managed to get hold of is a list of changed files in /etc, /var/log/messages (which was empty) and a ps faux listing. > > Dec 1 14:48:24 tk7 ntpd: ntpd startup succeeded > > Dec 1 14:48:26 tk7 nfs: Starting NFS services: succeeded > > Dec 1 14:48:26 tk7 nfs: rpc.rquotad startup succeeded > > Dec 1 14:48:27 tk7 nfs: rpc.nfsd startup succeeded > > Dec 1 14:48:27 tk7 nfs: rpc.mountd startup succeeded > > > > I'm at a loss here and don't like it one bit. The machine is rather > > mission-critical and the possibility of a rootkit comes to mind, even > > though chkrootkit couldn't find a trace of it. > > I don't thinks it smells terribly strongly of a rootkit quite yet. At > least, if you find the messages.1, messages.2 etc are there and have > reasonable contents. A hardware problem, perhaps. A disk failure, memory > failure... Some data about the hw config? It's a quad 2.4 GHz Xeon (512 KB cache) with 1 GB of RAM and five SCSI disks: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda5 303251 186122 101468 65% / /dev/sdc1 35001508 25760460 7463056 78% /home /dev/sda9 1018298 19946 945741 3% /tmp /dev/sda6 10839384 7631428 2657332 75% /usr /dev/sda8 1035660 297728 685324 31% /var /dev/sdb1 35001508 31636408 1587108 96% /data /dev/sde1 288362876 63420984 210293828 24% /projects /dev/sda1 2071384 1323940 642220 68% /scratch /dev/sda7 2071384 1764692 201468 90% /scratch2 /dev/sdd2 20161204 3752428 15384636 20% /winshare none 515112 0 515112 0% /dev/shm 00:00.0 Host bridge: ServerWorks GCNB-LE Host Bridge (rev 32) 00:00.1 Host bridge: ServerWorks GCNB-LE Host Bridge 00:02.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02) 00:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 00:0f.0 Host bridge: ServerWorks CSB5 South Bridge (rev 93) 00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93) 00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 05) 00:0f.3 ISA bridge: ServerWorks GCLE Host Bridge 00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 05) 00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 05) 01:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 (rev 07) > What does cron.daily do on your system? That is, what is in your > /etc/cron.daily? ls of /etc/cron.daily: 00-logwatch logrotate rpm tetex.cron 00webalizer makewhatis.cron slocate.cron tmpwatch 0anacron nis-update.cron slrnpull-expire Not going to cat all those scripts, but of interest is logrotate: /usr/sbin/logrotate /etc/logrotate.conf and nis-update.cron (because I put it there three days before the trouble started): /usr/bin/make -C /var/yp I've ran that nis-update about five times before, without any problems - perhaps it interferes with something else in some way. > > Does this sound familiar to someone? > > Not to me. I am in short supply of explanations of what was bugging > the computer after the cron.daily. Perhaps it's a coincidence, > cron.daily may be a false lead. But I would like to see suggestions > from the newsgroup as to what could hav made e.g. sshd behave like > that. Thanks for all the comments (and those of other people). There are some things we hadn't thought about in these replies. > Rebuild the system from rpms, on a separate computer. That is, buy a (set of) > new disk(s) and rebuild the disks containing the system software, the boot > sectors, and the kernel. Supposing you already have all customer data > on separate partitions, and no customer-controlled software is running > with root privileges, you should be able to replace the newly built system > disk with minimal disruption. And have a spanking new server to play with. Good idea indeed. > Once you have the idea of a possible rootkit bugging you, checks like > rpm -V (on rpm-based distributions) are not completely satisfactory, because > the rootkit may include manipulation of the rpm database. Even the kernel > could be manipulated, and cover the traces no matter what you do. You would > have to build an equivalend rpm database on an uncompromised computer, and > have rpm check a mirror of the server against it. But his is more work, > probably, than just building a new system disk. I've been meaning to set up a monitoring box anyway. This is one of the things it might be useful for. |