public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: "Robin H. Johnson" <robbat2@gentoo.org>
Cc: linux-kernel@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>
Subject: Re: Idle loadavg of ~1, maybe MD related
Date: Sat, 5 Jan 2008 01:30:37 -0800	[thread overview]
Message-ID: <20080105013037.a354b33a.akpm@linux-foundation.org> (raw)
In-Reply-To: <20071230100614.GX17327@curie-int.orbis-terrarum.net>

On Sun, 30 Dec 2007 02:06:14 -0800 "Robin H. Johnson" <robbat2@gentoo.org> wrote:

> (Please CC me, I'm subbed to LKML).
> 
> My G5, while running practically nothing (just sshd and some to watch the
> load), has a weird cycle of load averages. I think it might be related to MD,
> simply because that's the only thing that is clocking up cputime.

Usually this is caused by a process which is stuck in "D" state
(uninterruptible sleep) when it shouldn't be.

Check this by running `ps aux' or whatever.

> A full cycle lasts approximately 27 minutes.
> Mins	Load
> 0-2		0.0-0.15 (stable, 0 level)
> 3-5		0.50, 0.80, 0.95 (fast increase)
> 6-21	0.95-1.10 (stable, 1 level)
> 22-24   0.9, 0.8, 0.1 (fast decrease, to 0 level)
> 25-27   0.2, 0.3, 0.15 (local maxima peak)
> 
> Here's a graph of it, spanning 230 minutes:
> http://dev.gentoo.org/~robbat2/20071230-g5-loadavg-bug.png
> 
> Processed data for the graph here:
> http://dev.gentoo.org/~robbat2/20071230-g5-loadavg-bug.txt
> 
> For the entire 230 minute period, there was _no_ disk I/O.
> Not recorded by iostat, nor generated.
> 
> # while true ; do uptime ; iostat -t 60 2  -N -d | tail -n15 ; done >/dev/shm/foo
> 
> Example of single output pass for the above loop:
>  00:59:37 up  8:32,  2 users,  load average: 0.02, 0.47, 0.66
> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> sda               0.00         0.00         0.00          0          0
> sdb               0.00         0.00         0.00          0          0
> md1               0.00         0.00         0.00          0          0
> md0               0.00         0.00         0.00          0          0
> md2               0.00         0.00         0.00          0          0
> md3               0.00         0.00         0.00          0          0
> vg-usr            0.00         0.00         0.00          0          0
> vg-var            0.00         0.00         0.00          0          0
> vg-tmp            0.00         0.00         0.00          0          0
> vg-opt            0.00         0.00         0.00          0          0
> vg-home           0.00         0.00         0.00          0          0
> vg-usr_src        0.00         0.00         0.00          0          0
> vg-usr_portage     0.00         0.00         0.00          0          0
> 
> This is basically 1842c7f2 from Linus's tree, my own stuff is config'd out with
> =n for the moment. And the problem does still occur in the main tree.
> 
> Snippet from the head of 'top', sorting by cputime.
> 
> top - 01:59:08 up  9:32,  2 users,  load average: 1.04, 0.87, 0.70
> Tasks:  74 total,   1 running,  73 sleeping,   0 stopped,   0 zombie
> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  :  0.0%us,  0.5%sy,  0.0%ni, 99.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  :  0.0%us,  0.2%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:  12074480k total,   292520k used, 11781960k free,    76812k buffers
> Swap:  8388536k total,        0k used,  8388536k free,   144276k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4635 root      15  -5     0    0    0 S    0  0.0   8:14.33 md3_raid1
>  3121 root      15  -5     0    0    0 S    1  0.0   3:15.12 md2_raid1
>  3098 root      15  -5     0    0    0 S    0  0.0   3:07.83 md1_raid1
>  3076 root      15  -5     0    0    0 S    0  0.0   0:45.82 md0_raid1
>   829 root      15  -5     0    0    0 D    0  0.0   0:01.85 kwindfarm
>    13 root      15  -5     0    0    0 S    0  0.0   0:01.41 ksoftirqd/3
>    18 root      15  -5     0    0    0 S    0  0.0   0:01.39 events/3
>    10 root      15  -5     0    0    0 S    0  0.0   0:00.94 ksoftirqd/2
>     1 root      20   0  1900  652  576 S    0  0.0   0:00.89 init
> 32086 root      20   0  9336 2696 2076 S    0  0.0   0:00.87 sshd

>From that I'd suspect that kwindfarm is being a bad citizen.

If a process is consistently stuck in D state, run

	echo w > /proc/sysrq-trigger

then record the resulting dmesg output so we can see where it got stuck.

  reply	other threads:[~2008-01-05  9:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-30 10:06 Idle loadavg of ~1, maybe MD related Robin H. Johnson
2008-01-05  9:30 ` Andrew Morton [this message]
2008-01-05 10:04   ` Robin H. Johnson
2008-01-05 10:18     ` Benjamin Herrenschmidt
2008-01-06  3:05       ` Andrew Morton
2008-01-06  3:53         ` Benjamin Herrenschmidt
2008-01-06 11:21         ` Paul Mackerras
2008-01-06 11:29           ` Robin H. Johnson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080105013037.a354b33a.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=robbat2@gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox