From: NeilBrown <neilb@suse.com>
To: Michael Wang <yun.wang@profitbricks.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-block@vger.kernel.org, linux-raid@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>, Shaohua Li <shli@kernel.org>,
Jinpu Wang <jinpu.wang@profitbricks.com>
Subject: Re: [RFC PATCH] blk: reset 'bi_next' when bio is done inside request
Date: Tue, 04 Apr 2017 19:37:17 +1000 [thread overview]
Message-ID: <871st8jyya.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <9be3ca00-d802-bf64-bcdc-1e76608147f0@profitbricks.com>
[-- Attachment #1: Type: text/plain, Size: 3240 bytes --]
On Tue, Apr 04 2017, Michael Wang wrote:
> Hi, Neil
>
> On 04/03/2017 11:25 PM, NeilBrown wrote:
>> On Mon, Apr 03 2017, Michael Wang wrote:
>>
>>> blk_attempt_plug_merge() try to merge bio into request and chain them
>>> by 'bi_next', while after the bio is done inside request, we forgot to
>>> reset the 'bi_next'.
>>>
>>> This lead into BUG while removing all the underlying devices from md-raid1,
>>> the bio once go through:
>>>
>>> md_do_sync()
>>> sync_request()
>>> generic_make_request()
>>
>> This is a read request from the "first" device.
>>
>>> blk_queue_bio()
>>> blk_attempt_plug_merge()
>>> CHAINED HERE
>>>
>>> will keep chained and reused by:
>>>
>>> raid1d()
>>> sync_request_write()
>>> generic_make_request()
>>
>> This is a write request to some other device, isn't it?
>>
>> If sync_request_write() is using a bio that has already been used, it
>> should call bio_reset() and fill in the details again.
>> However I don't see how that would happen.
>> Can you give specific details on the situation that triggers the bug?
>
> We have storage side mapping lv through scst to server, on server side
> we assemble them into multipath device, and then assemble these dm into
> two raid1.
>
> The test is firstly do mkfs.ext4 on raid1 then start fio on it, on storage
> side we unmap all the lv (could during mkfs or fio), then on server side
> we hit the BUG (reproducible).
So I assume the initial resync is still happening at this point?
And you unmap *all* the lv's so you expect IO to fail?
I can see that the code would behave strangely if you have a
bad-block-list configured (which is the default).
Do you have a bbl? If you create the array without the bbl, does it
still crash?
>
> The path of bio was confirmed by add tracing, it is reused in sync_request_write()
> with 'bi_next' once chained inside blk_attempt_plug_merge().
I still don't see why it is re-used.
I assume you didn't explicitly ask for a check/repair (i.e. didn't write
to .../md/sync_action at all?). In that case MD_RECOVERY_REQUESTED is
not set.
So sync_request() sends only one bio to generic_make_request():
r1_bio->bios[r1_bio->read_disk];
then sync_request_write() *doesn't* send that bio again, but does send
all the others.
So where does it reuse a bio?
>
> We also tried to reset the bi_next inside sync_request_write() before
> generic_make_request() which also works.
>
> The testing was done with 4.4, but we found upstream also left bi_next
> chained after done in request, thus we post this RFC.
>
> Regarding raid1, we haven't found the place on path where the bio was
> reset... where does it supposed to be?
I'm not sure what you mean.
We only reset bios when they are being reused.
One place is in process_checks() where bio_reset() is called before
filling in all the details.
Maybe, in sync_request_write(), before
wbio->bi_rw = WRITE;
add something like
if (wbio->bi_next)
printk("bi_next!= NULL i=%d read_disk=%d bi_end_io=%pf\n",
i, r1_bio->read_disk, wbio->bi_end_io);
that might help narrow down what is happening.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-04-04 9:37 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-03 12:05 [RFC PATCH] blk: reset 'bi_next' when bio is done inside request Michael Wang
2017-04-03 21:25 ` NeilBrown
2017-04-04 8:13 ` Michael Wang
2017-04-04 9:37 ` NeilBrown [this message]
2017-04-04 10:23 ` Michael Wang
2017-04-04 12:24 ` Michael Wang
2017-04-04 12:48 ` Michael Wang
2017-04-04 21:52 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871st8jyya.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=axboe@kernel.dk \
--cc=jinpu.wang@profitbricks.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=shli@kernel.org \
--cc=yun.wang@profitbricks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).