* [parisc-linux] stalling system clues + parisc WCHAN hack
@ 2002-05-21 14:35 Paul Bame
2002-05-22 0:40 ` Paul Bame
2002-05-22 9:28 ` Joel Soete
0 siblings, 2 replies; 3+ messages in thread
From: Paul Bame @ 2002-05-21 14:35 UTC (permalink / raw)
To: parisc-linux
I doubt I'm the only who sees parisc systems become unusuably slow,
apparently because any command needing disk I/O has to wait a long time.
This isn't the same symptom as the traditional Linux problem where one
fills the buffer cache (say, by running a large tar) and then the first
interactive command is slow due to paging. In the traditional problem,
the system fairly quickly recovers normalcy, in our case it never does,
though processes eventually finish. It's as if a timeout is releasing a
needed lock or something.
FYI the load for reproducing this stalling behavior is to run several
network-based (haven't tried local) 'cvs update' of the linux kernel
mixed with some diffs. The load is running on a 50+G partition if that
matters, and I've seen problems in both ext2 and ext3.
It sounds like the disk is seeking in the pattern of a heartbeat, twice
a second. I think the front-panel has a heartbeat monitor with that rhythm.
So I did a quick, simple, ugly hack, mostly to arch-independent code, to
get WCHAN out of parisc (http://ftp.parisc-linux.org/patches/wchan.diff),
and ran a ps on a system which was stalling. The result is attached, as
is a copy of /proc/meminfo.
The interesting clue in the 'ps' to me are the 'D' processes, which I
suspect are those who've called down_uninterruptable. The most frequent
WCHAN culprits are wait_on_buffer/page. Where to go next solving this
problem (oh, with least effort too unfortunately)?
Linux b2000 2.4.18-pa25 #22 Fri May 17 11:04:28 MDT 2002 parisc unknown
PID CMD S WCHAN
1 ini S pipe_poll
2 [keventd] S context_thread
3 [ksoftirqd_CPU0] S start_context_thread
4 [kswapd] S kswapd
5 [bdflush] S start_context_thread
6 [kupdated] S sync_supers
9 [mdrecoveryd] S md_thread
10 [kjournald] S wait_on_buffer
62 [kjournald] S wait_on_buffer
98 /sbin/dhclient-2 S datagram_poll
110 /sbin/portmap S tcp_poll
175 /sbin/syslogd D wait_on_buffer
178 /sbin/klogd S syslog
182 /sbin/rpc.statd S tcp_poll
190 /usr/sbin/inetd S tcp_poll
206 nmbd -a S pipe_poll
208 /usr/sbin/sshd S tcp_poll
213 /usr/bin/X11/xfs S unix_poll
215 /usr/sbin/ntpd S datagram_poll
219 /usr/sbin/atd S wait_on_buffer
222 /usr/sbin/cron S wait4
238 -bash S wait4
783 /usr/sbin/apache S wait4
2748 /usr/sbin/lpd S tcp_poll
4356 /usr/sbin/apache S wait_for_connect
4357 /usr/sbin/apache S wait_for_connect
4358 /usr/sbin/apache S wait_for_connect
4359 /usr/sbin/apache S wait_for_connect
4360 /usr/sbin/apache S wait_for_connect
4361 /usr/sbin/apache S wait_for_connect
4717 /usr/sbin/sshd S normal_poll
4718 -bash S read_chan
4794 /USR/SBIN/CRON S pipe_wait
4795 /usr/bin/perl -w S wait4
4797 /usr/bin/ssh b20 S tcp_poll
4799 /usr/sbin/sshd S unix_poll
4800 /usr/bin/perl -w S wait4
4802 /usr/sbin/sendma S pipe_wait
4824 /bin/sh -eux /pr S wait4
5088 /USR/SBIN/CRON S pipe_wait
5089 /bin/sh -c cd ia S wait4
5090 /bin/sh -uex ./b S wait4
5092 /usr/sbin/sendma S pipe_wait
5179 /bin/sh -uex ./b S wait4
5180 diff -urN --excl D wait_on_page
5209 /bin/sh -eux /pr S wait4
5210 cvs -Qfz4 -d:pse D wait_on_page
5291 /USR/SBIN/CRON S pipe_wait
5292 /bin/sh -c test S wait4
5293 run-parts --repo S pipe_poll
5296 /bin/sh /etc/cro S wait4
5297 /bin/sh /usr/bin S wait4
5311 /bin/sh /usr/bin S wait4
5312 sort -f S pipe_wait
5313 /usr/lib/locate/ S pipe_wait
5314 /usr/bin/find / D wait_on_buffer
5367 /bin/sh ./daemon S wait4
5368 setiathome -nice R wait_on_buffer
5381 ps -eo pid,cmd,s R wait_on_buffer
total: used: free: shared: buffers: cached:
Mem: 525357056 521830400 3526656 0 70672384 356921344
Swap: 511696896 5632000 506064896
MemTotal: 513044 kB
MemFree: 3444 kB
MemShared: 0 kB
Buffers: 69016 kB
Cached: 347420 kB
SwapCached: 1136 kB
Active: 112296 kB
Inactive: 329744 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 513044 kB
LowFree: 3444 kB
SwapTotal: 499704 kB
SwapFree: 494204 kB
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [parisc-linux] stalling system clues + parisc WCHAN hack
2002-05-21 14:35 [parisc-linux] stalling system clues + parisc WCHAN hack Paul Bame
@ 2002-05-22 0:40 ` Paul Bame
2002-05-22 9:28 ` Joel Soete
1 sibling, 0 replies; 3+ messages in thread
From: Paul Bame @ 2002-05-22 0:40 UTC (permalink / raw)
Cc: parisc-linux
False alarm.
I can only reproduce this on a B2000. C3000 grunted through the
test load and did not get semi-permanently upset like the B2000. I
tried an A500 with pa8600 cpu (as has B2000) and it was ok. B180
is fine too.
After a fresh boot, it took 20+ *minutes* to untar a kernel tree,
use dd to make 3 32Mb files, and 'cp -a' the linux tree, with almost
all the time spent in the 'cp -a' (WCHAN 'wait_on_buffer'). All the
I/O is to a 63G ext3 partition on a decent SCSI disk. dmesg isn't
complaining (e.g., apparently no scsi timeouts). There are no other
loads on the CPU other than normal idle daemons. There's plenty of
free memory remaining. CPU is normally 99% idle according to top.
Gaak
-P
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [parisc-linux] stalling system clues + parisc WCHAN hack
2002-05-21 14:35 [parisc-linux] stalling system clues + parisc WCHAN hack Paul Bame
2002-05-22 0:40 ` Paul Bame
@ 2002-05-22 9:28 ` Joel Soete
1 sibling, 0 replies; 3+ messages in thread
From: Joel Soete @ 2002-05-22 9:28 UTC (permalink / raw)
To: Paul Bame; +Cc: parisc-linux
Hi Paul,
Paul Bame wrote:
...
>
> Linux b2000 2.4.18-pa25 #22 Fri May 17 11:04:28 MDT 2002 parisc unknown
>
> PID CMD S WCHAN
> 1 ini S pipe_poll
> 2 [keventd] S context_thread
...
> 5368 setiathome -nice R wait_on_buffer
> 5381 ps -eo pid,cmd,s R wait_on_buffer
>
> total: used: free: shared: buffers: cached:
> Mem: 525357056 521830400 3526656 0 70672384 356921344
> Swap: 511696896 5632000 506064896
> MemTotal: 513044 kB
> MemFree: 3444 kB
> MemShared: 0 kB
> Buffers: 69016 kB
> Cached: 347420 kB
> SwapCached: 1136 kB
> Active: 112296 kB
> Inactive: 329744 kB
> HighTotal: 0 kB
> HighFree: 0 kB
> LowTotal: 513044 kB
> LowFree: 3444 kB
> SwapTotal: 499704 kB
> SwapFree: 494204 kB
Another question:
Do you feel normal that your system swap (swap used 5632000) with 512Mb
of physical memoy?
I also observe this with my b2000 with 256Mb (just installed last week);
I started to recompile the last cvs kernel and at the begining, top
shows me mem used 22Mb and swap used 0Mb at the end more 100Mb of swap
was used?
I do not understand very well: gcc did not cleanup correctly its space,
is it a kernel problem in managment of swap space or top showing wrong
values?
Thanks in advance for info,
Joel
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2002-05-22 9:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-21 14:35 [parisc-linux] stalling system clues + parisc WCHAN hack Paul Bame
2002-05-22 0:40 ` Paul Bame
2002-05-22 9:28 ` Joel Soete
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.