From: NeilBrown <neilb@suse.de>
To: Larkin Lowrey <llowrey@nuclearwinter.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid5 device hangs in active state
Date: Wed, 29 Feb 2012 06:52:06 +1100 [thread overview]
Message-ID: <20120229065206.60d1e2ea@notabene.brown> (raw)
In-Reply-To: <4F4D1B33.3010308@nuclearwinter.com>
[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]
On Tue, 28 Feb 2012 12:21:39 -0600 Larkin Lowrey <llowrey@nuclearwinter.com>
wrote:
> I did another sysrq dump and have attached the output.
Thanks. Unfortunately it contains nothing of value - too much has been
lost. It seems that 'Show State' contains a lot more noise than it used to.
You will need to boot with
log_buf_len=4M
or something like that.
>
> Again, 'iostat -dx 1' showed 100% utilization on the LVM which uses
> /dev/md0 as a pv and /sys/block/md0/md/stripe_cache_active was 29 and
> that value did not change. There were no error messages in
> /var/log/messages or 'dmesg'.
The '29' could simply mean that md/raid5 has sent 29 requests down to lower
levels which have not yet completed.
>
> My suspicions lie with md0 since the stripe_cache_active value remains
> at a fixed non-zero value even though all disks are (or appear to be)
> idle. Should I be looking elsewhere? This hardware did not exhibit this
> problem before "upgrading" from Fedora 15 to Fedora 16.
My guess is a problem with one of the drive controllers. Your monthly 'sync'
puts a much heavier load on them than normal IO does. It is consistently
sending a bunch of requests to all devices at exactly the same time. This
could trigger race conditions that normal IO does not.
But that is just a guess. Unfortunately it is very hard to track exactly
what is going wrong in this sort of case.
I'd suggest shuffling devices so they are on different controllers, or maybe
replace a controller. See if you can get the problem to move, and then see
which controller it stayed with.
NeilBrown
>
> Thank you,
>
> --Larkin
>
> On 1/8/2012 6:26 PM, NeilBrown wrote:
> > On Sun, 08 Jan 2012 16:03:10 -0600 Larkin Lowrey
> <llowrey@nuclearwinter.com>
> > wrote:
> >
> >> Suggestions?
> >
> > # echo t > /proc/sysrq-trigger
> >
> > and capture that messages that go to 'dmesg'. Post them.
> >
> > Hopefully your message ring buffer is big enough to collect the entire
> > output. If it isn't you might need to boot with
> > log_buf_len=1M
> > or similar.
> >
> > That should show what process is blocking on what.
> >
> > NeilBrown
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-02-28 19:52 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-08 22:03 Raid5 device hangs in active state Larkin Lowrey
2012-01-09 0:26 ` NeilBrown
2012-02-28 18:23 ` Larkin Lowrey
[not found] ` <4F4D1B33.3010308@nuclearwinter.com>
2012-02-28 19:52 ` NeilBrown [this message]
2012-02-28 21:33 ` Larkin Lowrey
2012-02-28 21:46 ` NeilBrown
2012-03-11 22:39 ` Larkin Lowrey
2012-03-11 23:29 ` Asdo
2012-03-12 0:18 ` Larkin Lowrey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120229065206.60d1e2ea@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=llowrey@nuclearwinter.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.