From: NeilBrown <neilb@suse.com>
To: Jinpu Wang <jinpu.wang@profitbricks.com>, Coly Li <colyli@suse.de>
Cc: linux-raid@vger.kernel.org, Shaohua Li <shli@fb.com>,
Nate Dailey <nate.dailey@stratus.com>
Subject: Re: [BUG] MD/RAID1 hung forever on freeze_array
Date: Thu, 08 Dec 2016 14:17:18 +1100 [thread overview]
Message-ID: <871sxj2jpd.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <CAMGffE=T15eLaROLCDGBA_OxQgUZbo22LQZJuSji=Z=rZRGr6Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3489 bytes --]
On Thu, Dec 08 2016, Jinpu Wang wrote:
> On Tue, Nov 29, 2016 at 12:15 PM, Jinpu Wang
> <jinpu.wang@profitbricks.com> wrote:
>> On Mon, Nov 28, 2016 at 10:10 AM, Coly Li <colyli@suse.de> wrote:
>>> On 2016/11/28 下午5:02, Jinpu Wang wrote:
>>>> On Mon, Nov 28, 2016 at 9:54 AM, Coly Li <colyli@suse.de> wrote:
>>>>> On 2016/11/28 下午4:24, Jinpu Wang wrote:
>>>>>> snip
>>>>>>>>>
>>>>>>>>> every time nr_pending is 1 bigger then (nr_queued + 1), so seems we
>>>>>>>>> forgot to increase nr_queued somewhere?
>>>>>>>>>
>>>>>>>>> I've noticed (commit ccfc7bf1f09d61)raid1: include bio_end_io_list in
>>>>>>>>> nr_queued to prevent freeze_array hang. Seems it fixed similar bug.
>>>>>>>>>
>>>>>>>>> Could you give your suggestion?
>>>>>>>>>
>>>>>>>> Sorry, forgot to mention kernel version is 4.4.28
>
> I continue debug the bug:
>
> 20161207
> nr_pending = 948,
> nr_waiting = 9,
> nr_queued = 946, // again we need one more to finished wait_event.
> barrier = 0,
> array_frozen = 1,
> on conf->bio_end_io_list we have 91 entries.
> on conf->retry_list we have 855
This is useful. It confirms that nr_queued is correct, and that
nr_pending is consistently 1 higher than expected.
This suggests that a request has been counted in nr_pending, but hasn't
yet been submitted, or has been taken off one of the queues but has not
yet been processed.
I notice that in your first email the Blocked tasks listed included
raid1d which is blocked in freeze_array() and a few others in
make_request() blocked on wait_barrier().
In that case nr_waiting was 100, so there should have been 100 threads
blocked in wait_barrier(). Is that correct? I assume you thought it
was pointless to list them all, which seems reasonable.
I asked because I wonder if there might have been one thread in
make_request() which was blocked on something else. There are a couple
of places when make_request() will wait after having successfully called
wait_barrier(). If that happened, it would cause exactly the symptoms
you report. Could you check all blocked threads carefully please?
There are other ways that nr_pending and nr_queued can get out of sync,
though I think they would result in nr_pending being less than
nr_queued, not more.
If the presense of a bad block in the bad block log causes a request to
be split into two r1bios, and if both of those end up on one of the
queues, then they would be added to nr_queued twice, but to nr_pending
only once. We should fix that.
>
> list -H 0xffff8800b96acac0 r1bio.retry_list -s r1bio
>
> ffff8800b9791ff8
> struct r1bio {
> remaining = {
> counter = 0
> },
> behind_remaining = {
> counter = 0
> },
> sector = 18446612141670676480, // corrupted?
> start_next_window = 18446612141565972992, //ditto
I don't think this is corruption.
> crash> struct r1conf 0xffff8800b9792000
> struct r1conf {
....
> retry_list = {
> next = 0xffff8800afe690c0,
> prev = 0xffff8800b96acac0
> },
The pointer you started at was at the end of the list.
So this r1bio structure you are seeing is not an r1bio at all but the
memory out of the middle of the r1conf, being interpreted as an r1bio.
You can confirm this by noticing that retry_list in the r1bio:
> retry_list = {
> next = 0xffff8800afe690c0,
> prev = 0xffff8800b96acac0
> },
is exactly the same as the retry_list in the r1conf.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2016-12-08 3:17 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-25 13:30 [BUG] MD/RAID1 hung forever on freeze_array Jinpu Wang
2016-11-25 13:59 ` Jinpu Wang
2016-11-28 4:47 ` Coly Li
2016-11-28 8:24 ` Jinpu Wang
2016-11-28 8:54 ` Coly Li
2016-11-28 9:02 ` Jinpu Wang
2016-11-28 9:10 ` Coly Li
2016-11-29 11:15 ` Jinpu Wang
2016-12-07 14:17 ` Jinpu Wang
2016-12-08 3:17 ` NeilBrown [this message]
2016-12-08 9:50 ` Jinpu Wang
2016-12-09 6:01 ` NeilBrown
2016-12-09 15:28 ` Jinpu Wang
2016-12-09 15:36 ` Jinpu Wang
2016-12-12 0:59 ` NeilBrown
2016-12-12 13:10 ` Jinpu Wang
2016-12-12 21:53 ` NeilBrown
2016-12-13 15:08 ` Jinpu Wang
2016-12-13 22:18 ` NeilBrown
2016-12-14 10:22 ` Jinpu Wang
2016-12-14 12:13 ` Jinpu Wang
2016-12-14 14:49 ` Jinpu Wang
2016-12-15 3:20 ` NeilBrown
2016-12-15 9:24 ` Jinpu Wang
[not found] ` <CAMGffEkufeaDytaHxtLR02iiQifZDhcwkLdzMj3X8_yaitSoFQ@mail.gmail.com>
2016-12-19 14:56 ` Jinpu Wang
2016-12-19 22:45 ` NeilBrown
2016-12-20 10:34 ` Jinpu Wang
2016-12-20 21:23 ` NeilBrown
2016-12-21 12:48 ` Jinpu Wang
2016-12-21 23:51 ` NeilBrown
2016-12-22 8:35 ` Jinpu Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871sxj2jpd.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=colyli@suse.de \
--cc=jinpu.wang@profitbricks.com \
--cc=linux-raid@vger.kernel.org \
--cc=nate.dailey@stratus.com \
--cc=shli@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).