From: 王金浦 <jinpuwang@gmail.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: NeilBrown <neilb@suse.com>, Mikulas Patocka <mpatocka@redhat.com>,
Jack Wang <jack.wang.usish@gmail.com>,
Lars Ellenberg <lars.ellenberg@linbit.com>,
Jens Axboe <axboe@kernel.dk>,
linux-raid <linux-raid@vger.kernel.org>,
Michael Wang <yun.wang@profitbricks.com>,
Peter Zijlstra <peterz@infradead.org>,
Jiri Kosina <jkosina@suse.cz>, Ming Lei <ming.lei@canonical.com>,
LKML <linux-kernel@vger.kernel.org>,
Zheng Liu <gnehzuil.liu@gmail.com>,
linux-block@vger.kernel.org, Takashi Iwai <tiwai@suse.de>,
"linux-bcache@vger.kernel.org" <linux-bcache@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>, Alasdair Kergon <agk@redhat.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Keith Busch <keith.busch@intel.com>,
device-mapper development <dm-devel@redhat.>
Subject: Re: [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion
Date: Thu, 5 Jan 2017 11:54:00 +0100 [thread overview]
Message-ID: <CAD9gYJ+U7drRjswfOf310v2aSgUKAnYT4STJcfxwqPhBzXEccg@mail.gmail.com> (raw)
In-Reply-To: <20170104185046.GA982@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4392 bytes --]
2017-01-04 19:50 GMT+01:00 Mike Snitzer <snitzer@redhat.com>:
> On Wed, Jan 04 2017 at 12:12am -0500,
> NeilBrown <neilb@suse.com> wrote:
>
>> On Tue, Jan 03 2017, Jack Wang wrote:
>>
>> > 2016-12-23 12:45 GMT+01:00 Lars Ellenberg <lars.ellenberg@linbit.com>:
>> >> On Fri, Dec 23, 2016 at 09:49:53AM +0100, Michael Wang wrote:
>> >>> Dear Maintainers
>> >>>
>> >>> I'd like to ask for the status of this patch since we hit the
>> >>> issue too during our testing on md raid1.
>> >>>
>> >>> Split remainder bio_A was queued ahead, following by bio_B for
>> >>> lower device, at this moment raid start freezing, the loop take
>> >>> out bio_A firstly and deliver it, which will hung since raid is
>> >>> freezing, while the freezing never end since it waiting for
>> >>> bio_B to finish, and bio_B is still on the queue, waiting for
>> >>> bio_A to finish...
>> >>>
>> >>> We're looking for a good solution and we found this patch
>> >>> already progressed a lot, but we can't find it on linux-next,
>> >>> so we'd like to ask are we still planning to have this fix
>> >>> in upstream?
>> >>
>> >> I don't see why not, I'd even like to have it in older kernels,
>> >> but did not have the time and energy to push it.
>> >>
>> >> Thanks for the bump.
>> >>
>> >> Lars
>> >>
>> > Hi folks,
>> >
>> > As Michael mentioned, we hit a bug this patch is trying to fix.
>> > Neil suggested another way to fix it. I attached below.
>> > I personal prefer Neil's version as it's less code change, and straight forward.
>> >
>> > Could you share your comments, we can get one fix into mainline.
>> >
>> > Thanks,
>> > Jinpu
>> > From 69a4829a55503e496ce9c730d2c8e3dd8a08874a Mon Sep 17 00:00:00 2001
>> > From: NeilBrown <neilb@suse.com>
>> > Date: Wed, 14 Dec 2016 16:55:52 +0100
>> > Subject: [PATCH] block: fix deadlock between freeze_array() and wait_barrier()
>> >
>> > When we call wait_barrier, we might have some bios waiting
>> > in current->bio_list, which prevents the array_freeze call to
>> > complete. Those can only be internal READs, which have already
>> > passed the wait_barrier call (thus incrementing nr_pending), but
>> > still were not submitted to the lower level, due to generic_make_request
>> > logic to avoid recursive calls. In such case, we have a deadlock:
>> > - array_frozen is already set to 1, so wait_barrier unconditionally waits, so
>> > - internal READ bios will not be submitted, thus freeze_array will
>> > never completes.
>> >
>> > To fix this, modify generic_make_request to always sort bio_list_on_stack
>> > first with lowest level, then higher, until same level.
>> >
>> > Sent to linux-raid mail list:
>> > https://marc.info/?l=linux-raid&m=148232453107685&w=2
>> >
>>
>> This should probably also have
>>
>> Inspired-by: Lars Ellenberg <lars.ellenberg@linbit.com>
>>
>> or something that, as I was building on Lars' ideas when I wrote this.
>>
>> It would also be worth noting in the description that this addresses
>> issues with dm and drbd as well as md.
>
> I never saw this patch but certainly like the relative simplicity of the
> solution when compared with other approaches taken, e.g. (5 topmost
> commits on this branch):
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip
>
>> In fact, I think that with this patch in place, much of the need for the
>> rescue_workqueue won't exist any more. I cannot promise it can be
>> removed completely, but it should be to hard to make it optional and
>> only enabled for those few block devices that will still need it.
>> The rescuer should only be needed for a bioset which can be allocated
>> From twice in the one call the ->make_request_fn. This would include
>> raid0 for example, though raid0_make_reqest could be re-written to not
>> use a loop and to just call generic_make_request(bio) if bio != split.
>
> Mikulas, would you be willing to try the below patch with the
> dm-snapshot deadlock scenario and report back on whether it fixes that?
>
> Patch below looks to be the same as here:
> https://marc.info/?l=linux-raid&m=148232453107685&q=p3
>
> Neil and/or others if that isn't the patch that should be tested please
> provide a pointer to the latest.
>
> Thanks,
> Mike
Thanks Mike,
I've rebased the patch on to Linux-4.10-rc2, and updated the
description as Neil suggested.
If Mikulas get possitive feedback, then we can go with it.
Cheers,
Jinpu
[-- Attachment #2: 0001-block-fix-deadlock-between-freeze_array-and-wait_bar.patch --]
[-- Type: text/x-patch, Size: 2467 bytes --]
From 4ffaefb719c129ed51f9fcb235b945caf56de8d1 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.com>
Date: Wed, 14 Dec 2016 16:55:52 +0100
Subject: [PATCH] block: fix deadlock between freeze_array() and wait_barrier()
When we call wait_barrier, we might have some bios waiting
in current->bio_list, which prevents the array_freeze call to
complete. Those can only be internal READs, which have already
passed the wait_barrier call (thus incrementing nr_pending), but
still were not submitted to the lower level, due to generic_make_request
logic to avoid recursive calls. In such case, we have a deadlock:
- array_frozen is already set to 1, so wait_barrier unconditionally waits, so
- internal READ bios will not be submitted, thus freeze_array will
never completes.
To fix this, modify generic_make_request to always sort bio_list_on_stack
first with lowest level, then higher, until same level.
This would address issuses with dm and drbd as well as md.
Sent to linux-raid mail list:
https://marc.info/?l=linux-raid&m=148232453107685&w=2
Inspired-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
block/blk-core.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/block/blk-core.c b/block/blk-core.c
index 61ba08c..2f74129 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2019,9 +2019,30 @@ blk_qc_t generic_make_request(struct bio *bio)
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
if (likely(blk_queue_enter(q, false) == 0)) {
+ struct bio_list lower, same, hold;
+
+ /* Create a fresh bio_list for all subordinate requests */
+ bio_list_init(&hold);
+ bio_list_merge(&hold, &bio_list_on_stack);
+ bio_list_init(&bio_list_on_stack);
+
ret = q->make_request_fn(q, bio);
blk_queue_exit(q);
+ /* sort new bios into those for a lower level
+ * and those for the same level
+ */
+ bio_list_init(&lower);
+ bio_list_init(&same);
+ while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL)
+ if (q == bdev_get_queue(bio->bi_bdev))
+ bio_list_add(&same, bio);
+ else
+ bio_list_add(&lower, bio);
+ /* now assemble so we handle the lowest level first */
+ bio_list_merge(&bio_list_on_stack, &lower);
+ bio_list_merge(&bio_list_on_stack, &same);
+ bio_list_merge(&bio_list_on_stack, &hold);
bio = bio_list_pop(current->bio_list);
} else {
--
2.7.4
next prev parent reply other threads:[~2017-01-05 10:54 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-08 15:04 [PATCH 0/1] block: fix blk_queue_split() resource exhaustion Lars Ellenberg
2016-07-08 15:04 ` [PATCH 1/1] " Lars Ellenberg
2016-07-08 18:49 ` Mike Snitzer
2016-07-11 14:13 ` Lars Ellenberg
2016-07-11 14:10 ` [PATCH v2 " Lars Ellenberg
2016-07-12 2:55 ` [dm-devel] " NeilBrown
2016-07-13 2:18 ` Eric Wheeler
2016-07-13 2:32 ` Mike Snitzer
2016-07-19 9:00 ` Lars Ellenberg
2016-07-21 22:53 ` Eric Wheeler
2016-07-25 20:39 ` Jeff Moyer
2016-08-11 4:16 ` Eric Wheeler
2017-01-07 19:56 ` Lars Ellenberg
2016-09-16 20:14 ` [dm-devel] " Matthias Ferdinand
2016-09-18 23:10 ` Eric Wheeler
2016-09-19 20:43 ` Matthias Ferdinand
2016-09-21 21:08 ` bcache: discard BUG (was: [dm-devel] [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion) Eric Wheeler
2016-12-23 8:49 ` [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion Michael Wang
2016-12-23 11:45 ` Lars Ellenberg
2017-01-02 14:33 ` [dm-devel] " Jack Wang
2017-01-04 5:12 ` NeilBrown
2017-01-04 18:50 ` Mike Snitzer
2017-01-05 10:54 ` 王金浦 [this message]
2017-01-06 16:50 ` Mikulas Patocka
2017-01-06 17:34 ` Mikulas Patocka
2017-01-06 19:52 ` Mike Snitzer
2017-01-06 23:01 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAD9gYJ+U7drRjswfOf310v2aSgUKAnYT4STJcfxwqPhBzXEccg@mail.gmail.com \
--to=jinpuwang@gmail.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat. \
--cc=gnehzuil.liu@gmail.com \
--cc=jack.wang.usish@gmail.com \
--cc=jkosina@suse.cz \
--cc=keith.busch@intel.com \
--cc=lars.ellenberg@linbit.com \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@canonical.com \
--cc=mingo@redhat.com \
--cc=mpatocka@redhat.com \
--cc=neilb@suse.com \
--cc=peterz@infradead.org \
--cc=snitzer@redhat.com \
--cc=tiwai@suse.de \
--cc=yun.wang@profitbricks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).