linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Mike Snitzer <snitzer@redhat.com>, Mikulas Patocka <mpatocka@redhat.com>
Cc: Jack Wang <jack.wang.usish@gmail.com>,
	Lars Ellenberg <lars.ellenberg@linbit.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-raid <linux-raid@vger.kernel.org>,
	Michael Wang <yun.wang@profitbricks.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Jiri Kosina <jkosina@suse.cz>, Ming Lei <ming.lei@canonical.com>,
	linux-kernel@vger.kernel.org, Zheng Liu <gnehzuil.liu@gmail.com>,
	linux-block@vger.kernel.org, Takashi Iwai <tiwai@suse.de>,
	"linux-bcache\@vger.kernel.org" <linux-bcache@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Alasdair Kergon <agk@redhat.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Keith Busch <keith.busch@intel.com>,
	device-mapper development <dm-devel@redhat.com>,
	Shaohua Li <shli@kernel.org>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Roland Kammerer <roland.kammerer@linbit.com>
Subject: Re: [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion
Date: Sat, 07 Jan 2017 10:01:07 +1100	[thread overview]
Message-ID: <877f67lrnw.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170106195216.GA15583@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4866 bytes --]

On Sat, Jan 07 2017, Mike Snitzer wrote:

> On Fri, Jan 06 2017 at 12:34pm -0500,
> Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>> 
>> 
>> On Fri, 6 Jan 2017, Mikulas Patocka wrote:
>> 
>> > 
>> > 
>> > On Wed, 4 Jan 2017, Mike Snitzer wrote:
>> > 
>> > > On Wed, Jan 04 2017 at 12:12am -0500,
>> > > NeilBrown <neilb@suse.com> wrote:
>> > > 
>> > > > > Suggested-by: NeilBrown <neilb@suse.com>
>> > > > > Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
>> > > > > ---
>> > > > >  block/blk-core.c | 20 ++++++++++++++++++++
>> > > > >  1 file changed, 20 insertions(+)
>> > > > >
>> > > > > diff --git a/block/blk-core.c b/block/blk-core.c
>> > > > > index 9e3ac56..47ef373 100644
>> > > > > --- a/block/blk-core.c
>> > > > > +++ b/block/blk-core.c
>> > > > > @@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio)
>> > > > >  		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>> > > > >  
>> > > > >  		if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) {
>> > > > > +			struct bio_list lower, same, hold;
>> > > > > +
>> > > > > +			/* Create a fresh bio_list for all subordinate requests */
>> > > > > +			bio_list_init(&hold);
>> > > > > +			bio_list_merge(&hold, &bio_list_on_stack);
>> > > > > +			bio_list_init(&bio_list_on_stack);
>> > > > >  
>> > > > >  			ret = q->make_request_fn(q, bio);
>> > > > >  
>> > > > >  			blk_queue_exit(q);
>> > > > > +			/* sort new bios into those for a lower level
>> > > > > +			 * and those for the same level
>> > > > > +			 */
>> > > > > +			bio_list_init(&lower);
>> > > > > +			bio_list_init(&same);
>> > > > > +			while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL)
>> > > > > +				if (q == bdev_get_queue(bio->bi_bdev))
>> > > > > +					bio_list_add(&same, bio);
>> > > > > +				else
>> > > > > +					bio_list_add(&lower, bio);
>> > > > > +			/* now assemble so we handle the lowest level first */
>> > > > > +			bio_list_merge(&bio_list_on_stack, &lower);
>> > > > > +			bio_list_merge(&bio_list_on_stack, &same);
>> > > > > +			bio_list_merge(&bio_list_on_stack, &hold);
>> > > > >  
>> > > > >  			bio = bio_list_pop(current->bio_list);
>> > > > >  		} else {
>> > > > > -- 
>> > > > > 2.7.4
>> > > 
>> > > Mikulas, would you be willing to try the below patch with the
>> > > dm-snapshot deadlock scenario and report back on whether it fixes that?
>> > > 
>> > > Patch below looks to be the same as here:
>> > > https://marc.info/?l=linux-raid&m=148232453107685&q=p3
>> > > 
>> > > Neil and/or others if that isn't the patch that should be tested please
>> > > provide a pointer to the latest.
>> > > 
>> > > Thanks,
>> > > Mike
>> > 
>> > The bad news is that this doesn't fix the snapshot deadlock.
>> > 
>> > I created a test program for the snapshot deadlock bug (it was originally 
>> > created years ago to test for a different bug, so it contains some cruft). 
>> > You also need to insert "if (ci->sector_count) msleep(100);" to the end of 
>> > __split_and_process_non_flush to make the kernel sleep when splitting the 
>> > bio.
>> > 
>> > And with the above above patch, the snapshot deadlock bug still happens.
>
> That is really unfortunate.  Would be useful to dig in and understand
> why.  Because ordering of the IO in generic_make_request() really should
> take care of it.

I *think* you might be able to resolve this by changing
__split_and_process_bio() to only ever perform a single split.  No
looping.
i.e. if the bio is too big to handle directly, then split off the front
to a new bio, which you bio_chain to the original.  The original then
has bio_advance() called to stop over the front, then
generic_make_request() so it is queued.
Then the code proceeds to __clone_and_map_data_bio() on the front that
got split off.
When that completes it *doesn't* loop round, but returns into
generic_make_request() which does the looping and makes sure to handle
the lowest-level bio next.

something vaguely like this:

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3086da5664f3..06ee0960e415 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clone_info *ci)
 
 	len = min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count);
 
+	if (len < ci->sector_count) {
+		struct bio *split = bio_split(bio, len, GFP_NOIO, fs_bio_set);
+		bio_chain(split, bio);
+		generic_make_request(bio);
+		bio = split;
+		ci->sector_count = len;
+	}
+
 	r = __clone_and_map_data_bio(ci, ti, ci->sector, &len);
 	if (r < 0)
 		return r;

though I haven't tested, and the change (if it works) should probably be
more fully integrated into surrounding code.

You probably don't want to use "fs_bio_set" either - a target-local
pool would be best.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

      reply	other threads:[~2017-01-06 23:01 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-08 15:04 [PATCH 0/1] block: fix blk_queue_split() resource exhaustion Lars Ellenberg
2016-07-08 15:04 ` [PATCH 1/1] " Lars Ellenberg
2016-07-08 18:49   ` Mike Snitzer
2016-07-11 14:13     ` Lars Ellenberg
2016-07-11 14:10   ` [PATCH v2 " Lars Ellenberg
2016-07-12  2:55     ` [dm-devel] " NeilBrown
2016-07-13  2:18       ` Eric Wheeler
2016-07-13  2:32         ` Mike Snitzer
2016-07-19  9:00           ` Lars Ellenberg
2016-07-21 22:53             ` Eric Wheeler
2016-07-25 20:39               ` Jeff Moyer
2016-08-11  4:16             ` Eric Wheeler
2017-01-07 19:56             ` Lars Ellenberg
2016-12-23  8:49     ` Michael Wang
2016-12-23 11:45       ` Lars Ellenberg
2017-01-02 14:33         ` [dm-devel] " Jack Wang
2017-01-04  5:12           ` NeilBrown
2017-01-04 18:50             ` Mike Snitzer
2017-01-05 10:54               ` 王金浦
2017-01-06 16:50               ` Mikulas Patocka
2017-01-06 17:34                 ` Mikulas Patocka
2017-01-06 19:52                   ` Mike Snitzer
2017-01-06 23:01                     ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877f67lrnw.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=gnehzuil.liu@gmail.com \
    --cc=jack.wang.usish@gmail.com \
    --cc=jkosina@suse.cz \
    --cc=keith.busch@intel.com \
    --cc=kent.overstreet@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=lars.ellenberg@linbit.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@canonical.com \
    --cc=mingo@redhat.com \
    --cc=mpatocka@redhat.com \
    --cc=peterz@infradead.org \
    --cc=roland.kammerer@linbit.com \
    --cc=shli@kernel.org \
    --cc=snitzer@redhat.com \
    --cc=tiwai@suse.de \
    --cc=yun.wang@profitbricks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).