From: NeilBrown <neilb@suse.de>
To: tbayly@bluehost.com
Cc: linux-raid@vger.kernel.org
Subject: Re: BUG - raid 1 deadlock on handle_read_error / wait_barrier
Date: Mon, 25 Feb 2013 09:43:50 +1100 [thread overview]
Message-ID: <20130225094350.4b8ef084@notabene.brown> (raw)
In-Reply-To: <1361487504.4863.54.camel@linux-lxtg.site>
[-- Attachment #1: Type: text/plain, Size: 2274 bytes --]
On Thu, 21 Feb 2013 15:58:24 -0700 Tregaron Bayly <tbayly@bluehost.com> wrote:
> Symptom:
> A RAID 1 array ends up with two threads (flush and raid1) stuck in D
> state forever. The array is inaccessible and the host must be restarted
> to restore access to the array.
>
> I have some scripted workloads that reproduce this within a maximum of a
> couple hours on kernels from 3.6.11 - 3.8-rc7. I cannot reproduce on
> 3.4.32. 3.5.7 ends up with three threads stuck in D state, but the
> stacks are different from this bug (as it's EOL maybe of interest in
> bisecting the problem?).
Can you post the 3 stacks from the 3.5.7 case? It might help get a more
complete understanding.
...
> Both processes end up in wait_event_lock_irq() waiting for favorable
> conditions in the struct r1conf to proceed. These conditions obviously
> seem to never arrive. I placed printk statements in freeze_array() and
> wait_barrier() directly before calling their respective
> wait_event_lock_irq() and this is an example output:
>
> Feb 20 17:47:35 sanclient kernel: [4946b55d-bb0a-7fce-54c8-ac90615dabc1] Attempting to freeze array: barrier (1), nr_waiting (1), nr_pending (5), nr_queued (3)
> Feb 20 17:47:35 sanclient kernel: [4946b55d-bb0a-7fce-54c8-ac90615dabc1] Awaiting barrier: barrier (1), nr_waiting (2), nr_pending (5), nr_queued (3)
> Feb 20 17:47:38 sanclient kernel: [4946b55d-bb0a-7fce-54c8-ac90615dabc1] Awaiting barrier: barrier (1), nr_waiting (3), nr_pending (5), nr_queued (3)
This is very useful, thanks. Clearly there is one 'pending' request that
isn't being counted, but also isn't being allowed to complete.
Maybe it is in pending_bio_list, and so counted in conf->pending_count.
Could you print out that value as well and try to trigger the bug again? If
conf->pending_count is non-zero, then it seems very likely the we have found
the problem.
Fixing it isn't quite so easy. 'nr_pending' counts request from the
filesystem that are still pending. 'pending_count' sounds request down to
the underlying device that are still pending. There isn't a 1-to-1
correspondence, so we cannot just subtract one from the other. It will
require more thought than that.
Thanks for the thorough report,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-02-24 22:43 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-21 22:58 BUG - raid 1 deadlock on handle_read_error / wait_barrier Tregaron Bayly
2013-02-22 3:44 ` Joe Lawrence
2013-02-22 11:52 ` majianpeng
2013-02-22 16:03 ` Tregaron Bayly
2013-02-22 18:14 ` Joe Lawrence
2013-02-24 22:43 ` NeilBrown [this message]
2013-02-25 0:04 ` NeilBrown
2013-02-25 16:11 ` Tregaron Bayly
2013-02-25 22:54 ` NeilBrown
2013-02-26 14:09 ` Joe Lawrence
2013-05-16 14:07 ` Alexander Lyakas
2013-05-20 7:17 ` NeilBrown
2013-05-30 14:30 ` Alexander Lyakas
2013-06-02 12:43 ` Alexander Lyakas
2013-06-04 1:49 ` NeilBrown
2013-06-04 9:52 ` Alexander Lyakas
2013-06-06 15:00 ` Tregaron Bayly
2013-06-08 9:45 ` Alexander Lyakas
2013-06-12 0:42 ` NeilBrown
2013-06-12 1:30 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130225094350.4b8ef084@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=tbayly@bluehost.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).