From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Brandon" Subject: Crazy load average & unkillable processes Date: Wed, 27 Aug 2003 15:13:16 -0700 Sender: linux-admin-owner@vger.kernel.org Message-ID: <003701c36ce8$6d963980$30dd7e42@WITech> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: List-Id: Content-Type: text/plain; charset="us-ascii" To: "Linux-Admin@Vger. Kernel. Org" Hi Everyone, I'm having some bothersome problems with a couple servers of mine. I'm hoping some of you have some advice on how to trouble shoot this, because my little brain is running out of ideas. All the servers are running Redhat 7.3, 2.4.20-19smp kernels, apache-1.3.27, and Soft Raid-1. Here is what is happening, all of a sudden the server load average climbs real high. It climbs to 100+ within a few minutes, then constantly grows after that. The last server that had this happen was at 375 avg when I rebooted it, which always needs to be a hard reboot - because the shutdown -r now command doesn't do anything. While this is happening, I can not run commands like 'ps fax', 'pstree', 'top', 'killall' etc without them hanging . Most other commands work. I can SSH to the server no problem. If I do a 'ps ax' I can see a list of processes, but it always hangs before displaying them all. I narrowed it down to anything that needs a full process list hangs. I wrote a script that runs 'ls -la /proc/$P', and 'cat /proc/$P/cmdline' on each process in /proc. What I found is the processes that hang ps and whatnot are all owned by apache. The script hangs on the ls -la /proc/$P whenever it hits an apache process. The processes it hangs on can not be killed with kill -9. The number of apache owned processes was at 250, while on a regular server it is only at 20 or so. Running sar -v shows the dentunusd grow huge at about the time of the issues: 04:30:00 PM dentunusd file-sz %file-sz inode-sz super-sz %super-sz dquot-sz %dquot-sz rtsig-sz %rtsig-sz 05:30:00 PM 38823 25900 12.35 24755 0 0.00 0 0.00 7 0.68 05:40:00 PM 39757 25854 12.33 25054 0 0.00 0 0.00 7 0.68 05:50:00 PM 4294967057 23526 11.22 4303 0 0.00 0 0.00 18 1.76 Also, the number of sockets grows by about 3X: 4:30:00 PM totsck tcpsck udpsck rawsck ip-frag 04:40:00 PM 136 60 5 0 0 04:50:00 PM 112 35 5 0 0 05:00:00 PM 121 40 7 0 0 05:10:00 PM 126 44 5 0 0 05:20:00 PM 115 38 5 0 0 05:30:00 PM 119 36 8 0 0 05:40:00 PM 120 42 6 0 0 05:50:00 PM 526 236 5 0 1 06:00:00 PM 531 224 5 0 0 06:10:00 PM 535 224 5 0 0 That is just about all I have come up so far. If anyone has seen this, or can recommend on what steps I should take next, I could certainly us the advice. Thank you all Brandon Belshaw