From: Paul Bame <bame@fc.hp.com>
To: parisc-linux@parisc-linux.org
Subject: [parisc-linux] stalling system clues + parisc WCHAN hack
Date: Tue, 21 May 2002 08:35:10 -0600 [thread overview]
Message-ID: <E17AAjL-00079M-00@paul.bame> (raw)
I doubt I'm the only who sees parisc systems become unusuably slow,
apparently because any command needing disk I/O has to wait a long time.
This isn't the same symptom as the traditional Linux problem where one
fills the buffer cache (say, by running a large tar) and then the first
interactive command is slow due to paging. In the traditional problem,
the system fairly quickly recovers normalcy, in our case it never does,
though processes eventually finish. It's as if a timeout is releasing a
needed lock or something.
FYI the load for reproducing this stalling behavior is to run several
network-based (haven't tried local) 'cvs update' of the linux kernel
mixed with some diffs. The load is running on a 50+G partition if that
matters, and I've seen problems in both ext2 and ext3.
It sounds like the disk is seeking in the pattern of a heartbeat, twice
a second. I think the front-panel has a heartbeat monitor with that rhythm.
So I did a quick, simple, ugly hack, mostly to arch-independent code, to
get WCHAN out of parisc (http://ftp.parisc-linux.org/patches/wchan.diff),
and ran a ps on a system which was stalling. The result is attached, as
is a copy of /proc/meminfo.
The interesting clue in the 'ps' to me are the 'D' processes, which I
suspect are those who've called down_uninterruptable. The most frequent
WCHAN culprits are wait_on_buffer/page. Where to go next solving this
problem (oh, with least effort too unfortunately)?
Linux b2000 2.4.18-pa25 #22 Fri May 17 11:04:28 MDT 2002 parisc unknown
PID CMD S WCHAN
1 ini S pipe_poll
2 [keventd] S context_thread
3 [ksoftirqd_CPU0] S start_context_thread
4 [kswapd] S kswapd
5 [bdflush] S start_context_thread
6 [kupdated] S sync_supers
9 [mdrecoveryd] S md_thread
10 [kjournald] S wait_on_buffer
62 [kjournald] S wait_on_buffer
98 /sbin/dhclient-2 S datagram_poll
110 /sbin/portmap S tcp_poll
175 /sbin/syslogd D wait_on_buffer
178 /sbin/klogd S syslog
182 /sbin/rpc.statd S tcp_poll
190 /usr/sbin/inetd S tcp_poll
206 nmbd -a S pipe_poll
208 /usr/sbin/sshd S tcp_poll
213 /usr/bin/X11/xfs S unix_poll
215 /usr/sbin/ntpd S datagram_poll
219 /usr/sbin/atd S wait_on_buffer
222 /usr/sbin/cron S wait4
238 -bash S wait4
783 /usr/sbin/apache S wait4
2748 /usr/sbin/lpd S tcp_poll
4356 /usr/sbin/apache S wait_for_connect
4357 /usr/sbin/apache S wait_for_connect
4358 /usr/sbin/apache S wait_for_connect
4359 /usr/sbin/apache S wait_for_connect
4360 /usr/sbin/apache S wait_for_connect
4361 /usr/sbin/apache S wait_for_connect
4717 /usr/sbin/sshd S normal_poll
4718 -bash S read_chan
4794 /USR/SBIN/CRON S pipe_wait
4795 /usr/bin/perl -w S wait4
4797 /usr/bin/ssh b20 S tcp_poll
4799 /usr/sbin/sshd S unix_poll
4800 /usr/bin/perl -w S wait4
4802 /usr/sbin/sendma S pipe_wait
4824 /bin/sh -eux /pr S wait4
5088 /USR/SBIN/CRON S pipe_wait
5089 /bin/sh -c cd ia S wait4
5090 /bin/sh -uex ./b S wait4
5092 /usr/sbin/sendma S pipe_wait
5179 /bin/sh -uex ./b S wait4
5180 diff -urN --excl D wait_on_page
5209 /bin/sh -eux /pr S wait4
5210 cvs -Qfz4 -d:pse D wait_on_page
5291 /USR/SBIN/CRON S pipe_wait
5292 /bin/sh -c test S wait4
5293 run-parts --repo S pipe_poll
5296 /bin/sh /etc/cro S wait4
5297 /bin/sh /usr/bin S wait4
5311 /bin/sh /usr/bin S wait4
5312 sort -f S pipe_wait
5313 /usr/lib/locate/ S pipe_wait
5314 /usr/bin/find / D wait_on_buffer
5367 /bin/sh ./daemon S wait4
5368 setiathome -nice R wait_on_buffer
5381 ps -eo pid,cmd,s R wait_on_buffer
total: used: free: shared: buffers: cached:
Mem: 525357056 521830400 3526656 0 70672384 356921344
Swap: 511696896 5632000 506064896
MemTotal: 513044 kB
MemFree: 3444 kB
MemShared: 0 kB
Buffers: 69016 kB
Cached: 347420 kB
SwapCached: 1136 kB
Active: 112296 kB
Inactive: 329744 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 513044 kB
LowFree: 3444 kB
SwapTotal: 499704 kB
SwapFree: 494204 kB
next reply other threads:[~2002-05-21 14:35 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-21 14:35 Paul Bame [this message]
2002-05-22 0:40 ` [parisc-linux] stalling system clues + parisc WCHAN hack Paul Bame
2002-05-22 9:28 ` Joel Soete
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E17AAjL-00079M-00@paul.bame \
--to=bame@fc.hp.com \
--cc=parisc-linux@parisc-linux.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox