From: Mike Snitzer <snitzer@redhat.com>
To: Mikulas Patocka <mpatocka@redhat.com>
Cc: NeilBrown <neilb@suse.com>, Jack Wang <jack.wang.usish@gmail.com>,
Lars Ellenberg <lars.ellenberg@linbit.com>,
Jens Axboe <axboe@kernel.dk>,
linux-raid <linux-raid@vger.kernel.org>,
Michael Wang <yun.wang@profitbricks.com>,
Peter Zijlstra <peterz@infradead.org>,
Jiri Kosina <jkosina@suse.cz>, Ming Lei <ming.lei@canonical.com>,
linux-kernel@vger.kernel.org, Zheng Liu <gnehzuil.liu@gmail.com>,
linux-block@vger.kernel.org, Takashi Iwai <tiwai@suse.de>,
"linux-bcache@vger.kernel.org" <linux-bcache@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>, Alasdair Kergon <agk@redhat.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Keith Busch <keith.busch@intel.com>,
device-mapper development <dm-devel@redhat.com>,
Shaohua Li <shli@kernel.org>,
Kent Overstreet <kent.overstreet@gmail.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Roland Kammerer <roland.kammerer@linbit.com>
Subject: Re: [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion
Date: Fri, 6 Jan 2017 14:52:16 -0500 [thread overview]
Message-ID: <20170106195216.GA15583@redhat.com> (raw)
In-Reply-To: <alpine.LRH.2.02.1701061151010.13944@file01.intranet.prod.int.rdu2.redhat.com>
On Fri, Jan 06 2017 at 12:34pm -0500,
Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
> On Fri, 6 Jan 2017, Mikulas Patocka wrote:
>
> >
> >
> > On Wed, 4 Jan 2017, Mike Snitzer wrote:
> >
> > > On Wed, Jan 04 2017 at 12:12am -0500,
> > > NeilBrown <neilb@suse.com> wrote:
> > >
> > > > > Suggested-by: NeilBrown <neilb@suse.com>
> > > > > Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
> > > > > ---
> > > > > block/blk-core.c | 20 ++++++++++++++++++++
> > > > > 1 file changed, 20 insertions(+)
> > > > >
> > > > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > > > index 9e3ac56..47ef373 100644
> > > > > --- a/block/blk-core.c
> > > > > +++ b/block/blk-core.c
> > > > > @@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio)
> > > > > struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> > > > >
> > > > > if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) {
> > > > > + struct bio_list lower, same, hold;
> > > > > +
> > > > > + /* Create a fresh bio_list for all subordinate requests */
> > > > > + bio_list_init(&hold);
> > > > > + bio_list_merge(&hold, &bio_list_on_stack);
> > > > > + bio_list_init(&bio_list_on_stack);
> > > > >
> > > > > ret = q->make_request_fn(q, bio);
> > > > >
> > > > > blk_queue_exit(q);
> > > > > + /* sort new bios into those for a lower level
> > > > > + * and those for the same level
> > > > > + */
> > > > > + bio_list_init(&lower);
> > > > > + bio_list_init(&same);
> > > > > + while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL)
> > > > > + if (q == bdev_get_queue(bio->bi_bdev))
> > > > > + bio_list_add(&same, bio);
> > > > > + else
> > > > > + bio_list_add(&lower, bio);
> > > > > + /* now assemble so we handle the lowest level first */
> > > > > + bio_list_merge(&bio_list_on_stack, &lower);
> > > > > + bio_list_merge(&bio_list_on_stack, &same);
> > > > > + bio_list_merge(&bio_list_on_stack, &hold);
> > > > >
> > > > > bio = bio_list_pop(current->bio_list);
> > > > > } else {
> > > > > --
> > > > > 2.7.4
> > >
> > > Mikulas, would you be willing to try the below patch with the
> > > dm-snapshot deadlock scenario and report back on whether it fixes that?
> > >
> > > Patch below looks to be the same as here:
> > > https://marc.info/?l=linux-raid&m=148232453107685&q=p3
> > >
> > > Neil and/or others if that isn't the patch that should be tested please
> > > provide a pointer to the latest.
> > >
> > > Thanks,
> > > Mike
> >
> > The bad news is that this doesn't fix the snapshot deadlock.
> >
> > I created a test program for the snapshot deadlock bug (it was originally
> > created years ago to test for a different bug, so it contains some cruft).
> > You also need to insert "if (ci->sector_count) msleep(100);" to the end of
> > __split_and_process_non_flush to make the kernel sleep when splitting the
> > bio.
> >
> > And with the above above patch, the snapshot deadlock bug still happens.
That is really unfortunate. Would be useful to dig in and understand
why. Because ordering of the IO in generic_make_request() really should
take care of it.
<snip>
> Here I post a patch that fixes the snapshot deadlock. On schedule(), it
> redirects bios on current->bio_list to helper workqueues.
<snip old patch>
That patch is included in the series of changes sequenced at the top of
this git branch:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/log/?h=wip
At the risk of repeating myself: unfortunately it doesn't have a way
forward with the timed offload implementation (which was done to appease
Ming Lei's concern about context switching causing reduced plugging that
results in less efficient IO).
next prev parent reply other threads:[~2017-01-06 19:52 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-08 15:04 [PATCH 0/1] block: fix blk_queue_split() resource exhaustion Lars Ellenberg
2016-07-08 15:04 ` [PATCH 1/1] " Lars Ellenberg
2016-07-08 18:49 ` Mike Snitzer
2016-07-11 14:13 ` Lars Ellenberg
2016-07-11 14:10 ` [PATCH v2 " Lars Ellenberg
2016-07-12 2:55 ` [dm-devel] " NeilBrown
2016-07-13 2:18 ` Eric Wheeler
2016-07-13 2:32 ` Mike Snitzer
2016-07-19 9:00 ` Lars Ellenberg
2016-07-21 22:53 ` Eric Wheeler
2016-07-25 20:39 ` Jeff Moyer
2016-08-11 4:16 ` Eric Wheeler
2017-01-07 19:56 ` Lars Ellenberg
2016-12-23 8:49 ` Michael Wang
2016-12-23 11:45 ` Lars Ellenberg
2017-01-02 14:33 ` [dm-devel] " Jack Wang
2017-01-04 5:12 ` NeilBrown
2017-01-04 18:50 ` Mike Snitzer
2017-01-05 10:54 ` 王金浦
2017-01-06 16:50 ` Mikulas Patocka
2017-01-06 17:34 ` Mikulas Patocka
2017-01-06 19:52 ` Mike Snitzer [this message]
2017-01-06 23:01 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170106195216.GA15583@redhat.com \
--to=snitzer@redhat.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=gnehzuil.liu@gmail.com \
--cc=jack.wang.usish@gmail.com \
--cc=jkosina@suse.cz \
--cc=keith.busch@intel.com \
--cc=kent.overstreet@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=lars.ellenberg@linbit.com \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@canonical.com \
--cc=mingo@redhat.com \
--cc=mpatocka@redhat.com \
--cc=neilb@suse.com \
--cc=peterz@infradead.org \
--cc=roland.kammerer@linbit.com \
--cc=shli@kernel.org \
--cc=tiwai@suse.de \
--cc=yun.wang@profitbricks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).