From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ross Clarke Subject: Re: Crazy load average & unkillable processes Date: Thu, 28 Aug 2003 00:41:30 +0200 Sender: linux-admin-owner@vger.kernel.org Message-ID: <3F4D339A.8010907@geekz.za.net> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Linux-Admin -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brandon wrote: |Hi Everyone, | | I'm having some bothersome problems with a couple servers of |mine. I'm hoping some of you have some advice on how to trouble shoot |this, because my little brain is running out of ideas. | |All the servers are running Redhat 7.3, 2.4.20-19smp kernels, |apache-1.3.27, and Soft Raid-1. | |Here is what is happening, all of a sudden the server load average |climbs real high. It climbs to 100+ within a few minutes, then |constantly grows after that. The last server that had this happen was |at 375 avg when I rebooted it, which always needs to be a hard reboot - |because the shutdown -r now command doesn't do anything. | |While this is happening, I can not run commands like 'ps fax', 'pstree', |'top', 'killall' etc without them hanging . Most other commands work. I |can SSH to the server no problem. If I do a 'ps ax' I can see a list of |processes, but it always hangs before displaying them all. I narrowed it |down to anything that needs a full process list hangs. | |I wrote a script that runs 'ls -la /proc/$P', and 'cat /proc/$P/cmdline' |on each process in /proc. | |What I found is the processes that hang ps and whatnot are all owned by |apache. The script hangs on the ls -la /proc/$P whenever it hits an |apache process. The processes it hangs on can not be killed with kill |-9. The number of apache owned processes was at 250, while on a regular |server it is only at 20 or so. | |Running sar -v shows the dentunusd grow huge at about the time of the |issues: | |04:30:00 PM dentunusd file-sz %file-sz inode-sz super-sz %super-sz |dquot-sz %dquot-sz rtsig-sz %rtsig-sz |05:30:00 PM 38823 25900 12.35 24755 0 0.00 |0 0.00 7 0.68 |05:40:00 PM 39757 25854 12.33 25054 0 0.00 |0 0.00 7 0.68 |05:50:00 PM 4294967057 23526 11.22 4303 0 0.00 |0 0.00 18 1.76 | |Also, the number of sockets grows by about 3X: | |4:30:00 PM totsck tcpsck udpsck rawsck ip-frag |04:40:00 PM 136 60 5 0 0 |04:50:00 PM 112 35 5 0 0 |05:00:00 PM 121 40 7 0 0 |05:10:00 PM 126 44 5 0 0 |05:20:00 PM 115 38 5 0 0 |05:30:00 PM 119 36 8 0 0 |05:40:00 PM 120 42 6 0 0 |05:50:00 PM 526 236 5 0 1 |06:00:00 PM 531 224 5 0 0 |06:10:00 PM 535 224 5 0 0 | | |That is just about all I have come up so far. If anyone has seen this, |or can recommend on what steps I should take next, I could certainly us |the advice. | |Thank you all | |Brandon Belshaw | | | | | |- |To unsubscribe from this list: send the line "unsubscribe linux-admin" in |the body of a message to majordomo@vger.kernel.org |More majordomo info at http://vger.kernel.org/majordomo-info.html | I just had the same similiar problem twice with 2.6.0-test4, I also used to experience it on 2.4.18. I managed to get ps to list tho, before all commands stopped working, and I noticed many of the proccesses went into D and Z states. I beleive they were getting stuck in the I/O subsystem, my other filesystems were still responding since my XMMS didnt die till it hit an mp3 on my main filesystem, which was about 30 minutes after the problem started. Any currently open application was still working, until I tried to do anything that required I/O, then they died aswell. That last happened to me about 12 hours ago, and I had to recover my entire /home directory. I couldnt find out what cuased it, the first time it was MozillaFirebird that died first, the 2nd time it was vim. Also both times I tried hitting the power button to see if I could get any form of shutdown where the data would sync, both times the kernel OOPS'ed on the apmd event. Anybody got any ideas? Regards, Ross Clarke -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla Thunderbird - http://enigmail.mozdev.org iD8DBQE/TTOa1+7fkD/L8TgRAkmdAJ9ciSYT6tAQGT0Uk+RD7Y8gkbmEIwCffLIT z2SGntQl8+1sI1QRVFZtxho= =utNU -----END PGP SIGNATURE-----