linux-admin.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ross Clarke <encrypted@geekz.za.net>
To: Linux-Admin <linux-admin@vger.kernel.org>
Subject: Re: Crazy load average & unkillable processes
Date: Thu, 28 Aug 2003 00:41:30 +0200	[thread overview]
Message-ID: <3F4D339A.8010907@geekz.za.net> (raw)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brandon wrote:

|Hi Everyone,
|
|    I'm having some bothersome problems with a couple servers of
|mine.  I'm hoping some of you have some advice on how to trouble shoot
|this, because my little brain is running out of ideas.
|
|All the servers are running Redhat 7.3, 2.4.20-19smp kernels,
|apache-1.3.27, and Soft Raid-1.
|
|Here is what is happening, all of  a sudden the server load average
|climbs real high.  It climbs to 100+ within a few minutes, then
|constantly grows after that.  The last server that had this happen was
|at 375 avg when I rebooted it, which always needs to be a hard reboot -
|because the shutdown -r now command doesn't do anything.
|
|While this is happening, I can not run commands like 'ps fax', 'pstree',
|'top', 'killall' etc without them hanging .  Most other commands work. I
|can SSH to the server no problem.  If I do a 'ps ax' I can see a list of
|processes, but it always hangs before displaying them all. I narrowed it
|down to anything that needs a full process list hangs.
|
|I wrote a script that runs 'ls -la /proc/$P', and 'cat /proc/$P/cmdline'
|on each process in /proc.
|
|What I found is the processes that hang ps and whatnot are all owned by
|apache.  The script hangs on the ls -la /proc/$P whenever it hits an
|apache process.  The processes it hangs on can not be killed with kill
|-9.  The number of apache owned processes was at 250, while on a regular
|server it is only at 20 or so.
|
|Running sar -v shows the dentunusd grow huge at about the time of the
|issues:
|
|04:30:00 PM dentunusd   file-sz  %file-sz  inode-sz  super-sz %super-sz
|dquot-sz %dquot-sz  rtsig-sz %rtsig-sz
|05:30:00 PM     38823     25900     12.35     24755         0      0.00
|0      0.00         7      0.68
|05:40:00 PM     39757     25854     12.33     25054         0      0.00
|0      0.00         7      0.68
|05:50:00 PM 4294967057     23526     11.22      4303         0      0.00
|0      0.00        18      1.76
|
|Also, the number of sockets grows by about 3X:
|
|4:30:00 PM    totsck    tcpsck    udpsck    rawsck   ip-frag
|04:40:00 PM       136        60         5         0         0
|04:50:00 PM       112        35         5         0         0
|05:00:00 PM       121        40         7         0         0
|05:10:00 PM       126        44         5         0         0
|05:20:00 PM       115        38         5         0         0
|05:30:00 PM       119        36         8         0         0
|05:40:00 PM       120        42         6         0         0
|05:50:00 PM       526       236         5         0         1
|06:00:00 PM       531       224         5         0         0
|06:10:00 PM       535       224         5         0         0
|
|
|That is just about all I have come up so far.  If anyone has seen this,
|or can recommend on what steps I should take next, I could certainly us
|the advice.
|
|Thank you all
|
|Brandon Belshaw
|
|
|
|
|
|-
|To unsubscribe from this list: send the line "unsubscribe linux-admin" in
|the body of a message to majordomo@vger.kernel.org
|More majordomo info at  http://vger.kernel.org/majordomo-info.html
|

I just had the same similiar problem twice with 2.6.0-test4, I also used
to experience it on 2.4.18. I managed to get ps to list tho, before all
commands stopped working, and I noticed many of the proccesses went into
D and Z states. I beleive they were getting stuck in the I/O subsystem,
my other filesystems were still responding since my XMMS didnt die till
it hit an mp3 on my main filesystem, which was about 30 minutes after
the problem started. Any currently open application was still working,
until I tried to do anything that required I/O, then  they died aswell.
That last happened to me about 12 hours ago, and I had to recover my
entire /home directory. I couldnt find out what cuased it, the first
time it was MozillaFirebird that died first, the 2nd time it was vim.
Also both times I tried hitting the power button to see if I could get
any form of shutdown where the data would sync, both times the kernel
OOPS'ed on the apmd event.

Anybody got any ideas?

Regards,
Ross Clarke

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla Thunderbird - http://enigmail.mozdev.org

iD8DBQE/TTOa1+7fkD/L8TgRAkmdAJ9ciSYT6tAQGT0Uk+RD7Y8gkbmEIwCffLIT
z2SGntQl8+1sI1QRVFZtxho=
=utNU
-----END PGP SIGNATURE-----



             reply	other threads:[~2003-08-27 22:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-27 22:41 Ross Clarke [this message]
2003-08-28  8:55 ` Crazy load average & unkillable processes Nico Schottelius
2003-08-28  9:33   ` Nick Piggin
2003-08-28 10:43     ` Bostjan Skufca (at) domenca.si
2003-08-29  9:01     ` Nico Schottelius
2003-08-29  9:53       ` Nico Schottelius
2003-08-29 11:17       ` Nick Piggin
2003-09-01 13:48       ` Bill Davidsen
2003-09-15 23:32     ` Nico Schottelius
  -- strict thread matches above, loose matches on Subject: below --
2003-08-27 22:13 Brandon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F4D339A.8010907@geekz.za.net \
    --to=encrypted@geekz.za.net \
    --cc=linux-admin@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).