From: Kent Overstreet <kent.overstreet@gmail.com>
To: Ming Lei <ming.lei@canonical.com>
Cc: Mike Snitzer <snitzer@redhat.com>,
"oleg.drokin@intel.com" <oleg.drokin@intel.com>,
Ming Lin-SSI <ming.l@ssi.samsung.com>,
"andreas.dilger@intel.com" <andreas.dilger@intel.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
"minchan@kernel.org" <minchan@kernel.org>,
"jkosina@suse.cz" <jkosina@suse.cz>,
kernel list <linux-kernel@vger.kernel.org>,
"jim@jtan.com" <jim@jtan.com>,
"pjk1939@linux.vnet.ibm.com" <pjk1939@linux.vnet.ibm.com>,
"axboe@fb.com" <axboe@fb.com>,
"geoff@infradead.org" <geoff@infradead.org>,
"dm-devel@redhat.com" <dm-devel@redhat.com>,
"dpark@posteo.net" <dpark@posteo.net>,
Pavel Machek <pavel@ucw.cz>,
"ngupta@vflare.org" <ngupta@vflare.org>,
"hch@lst.de" <hch@lst.de>, "agk@redhat.com" <agk@redhat.com>
Subject: Re: 4.4-final: 28 bioset threads on small notebook
Date: Tue, 23 Feb 2016 18:23:52 -0900 [thread overview]
Message-ID: <20160224032352.GA22341@kmo-pixel> (raw)
In-Reply-To: <CACVXFVN+xV5_4zktCGpQUrnOtJ1EVOLPt5v4niuj9VUGL8uUEA@mail.gmail.com>
On Wed, Feb 24, 2016 at 10:48:10AM +0800, Ming Lei wrote:
> On Tue, Feb 23, 2016 at 10:54 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> > On Mon, Feb 22 2016 at 9:55pm -0500,
> > Ming Lei <ming.lei@canonical.com> wrote:
> >
> >> On Tue, Feb 23, 2016 at 6:58 AM, Kent Overstreet
> >> <kent.overstreet@gmail.com> wrote:
> >> > On Sun, Feb 21, 2016 at 05:40:59PM +0800, Ming Lei wrote:
> >> >> On Sun, Feb 21, 2016 at 2:43 PM, Ming Lin-SSI <ming.l@ssi.samsung.com> wrote:
> >> >> >>-----Original Message-----
> >> >> >
> >> >> > So it's almost already "per request_queue"
> >> >>
> >> >> Yes, that is because of the following line:
> >> >>
> >> >> q->bio_split = bioset_create(BIO_POOL_SIZE, 0);
> >> >>
> >> >> in blk_alloc_queue_node().
> >> >>
> >> >> Looks like this bio_set doesn't need to be per-request_queue, and
> >> >> now it is only used for fast-cloning bio for splitting, and one global
> >> >> split bio_set should be enough.
> >> >
> >> > It does have to be per request queue for stacking block devices (which includes
> >> > loopback).
> >>
> >> In commit df2cb6daa4(block: Avoid deadlocks with bio allocation by
> >> stacking drivers), deadlock in this situation has been avoided already.
> >> Or are there other issues with global bio_set? I appreciate if you may
> >> explain it a bit if there are.
> >
> > Even with commit df2cb6daa4 there is still risk of deadlocks (even
> > without low memory condition), see:
> > https://patchwork.kernel.org/patch/7398411/
>
> That is definitely another problem which isn't related with low memory,
> and I guess Kent means there might be deadlock risk in case of shared
> bio_set.
>
> >
> > (you may recall you blocked this patch with concerns about performance,
> > context switches, plug merging being compromised, etc.. to which I never
> > circled back to verify your concerns)
>
> I still remember that problem:
>
> 1) Process A
> - two bio(a, b) are splitted in dm's make_request funtion
> - bio(a) is submitted via generic_make_request(), so it is staged
> in current->bio_list
> - time t1
> - before bio(b) is submitted, down_write(&s->lock) is run and
> never return
>
> 2) Process B:
> - just during time t1, wait completion of bio(a) by down_write(&s->lock)
>
> Then Process A waits the lock which is acquired by B first, and the
> two bio(a, b)
> can't reach to driver/device at all.
>
> Looks that current->bio_list is fragile to locks from make_request function,
> and moving the lock into workqueue context should be helpful.
>
> And I am happy to continue to discuss this issue further.
>
> >
> > But it illustrates the type of problems that can occur when your rescue
> > infrastructure is shared across devices (in the context of df2cb6daa4,
> > current->bio_list contains bios from multiple devices).
> >
> > If a single splitting bio_set were shared across devices there would be
> > no guarantee of forward progress with complex stacked devices (one or
> > more devices could exhaust the reserve and starve out other devices in
> > the stack). So keeping the bio_set per request_queue isn't prone to
> > failure like a shared bio_set might be.
>
> Not consider the dm lock problem, from Kent's commit(df2cb6daa4) log and
> the patch, looks forward progress can be guaranteed for stacked devices
> with same bio_set, but better to get Kent's clarification.
>
> If forward progress can be guaranteed, percpu mempool might avoid
> easy exhausting, because it is reasonable to assume that one CPU can only
> provide a certain amount of bandwidth wrt. block transfer.
Generally speaking, with potential deadlocks like this I don't bother to work
out the specific scenario, it's enough to know that there's a shared resource
and multiple users that depend on each other... if you've got that, you'll have
a deadlock.
But, if you're curious: say we've got block devices a and b, when you submit to
a the bio will get passed down to b:
for the bioset itself: if a bio gets split when submitted to a, then needs to be
split again when it's submitted to b - you're allocating twice from the same
mempool, and the first allocation can't be freed until the original bio
completes. deadlock.
with the rescuer threads it's more subtle, but you just need a scenario where
the rescuer is required twice in a row. I'm not going to bother trying to work
out the details, but it's the same principle - you can end up in a situation
where you're blocked, and you need the rescuer thread to make forward progress
(or you'd deadlock - that's why it exists, right?) - well, what happens if that
happens twice in a row, and the second time you're running out of the rescuer
thread? oops.
next prev parent reply other threads:[~2016-02-24 3:23 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-11 10:49 4.4-rc: 28 bioset threads on small notebook Pavel Machek
2015-12-11 14:08 ` Mike Snitzer
2015-12-11 17:14 ` Pavel Machek
2016-02-20 17:40 ` 4.4-final: " Pavel Machek
2016-02-20 18:42 ` Pavel Machek
2016-02-20 19:51 ` Mike Snitzer
2016-02-20 20:04 ` Pavel Machek
2016-02-20 20:38 ` Mike Snitzer
2016-02-20 20:55 ` Pavel Machek
2016-02-21 4:15 ` Kent Overstreet
2016-02-21 6:43 ` Ming Lin-SSI
2016-02-21 9:40 ` Ming Lei
2016-02-22 22:58 ` Kent Overstreet
2016-02-23 2:55 ` Ming Lei
2016-02-23 14:54 ` Mike Snitzer
2016-02-24 2:48 ` Ming Lei
2016-02-24 3:23 ` Kent Overstreet [this message]
2016-02-23 20:45 ` Pavel Machek
2017-02-06 12:53 ` v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone Pavel Machek
2017-02-06 12:53 ` Pavel Machek
2017-02-07 1:47 ` Kent Overstreet
2017-02-07 2:49 ` Kent Overstreet
2017-02-07 17:13 ` Mike Snitzer
2017-02-07 20:39 ` Pavel Machek
2017-02-08 3:12 ` Mike Galbraith
2017-02-08 3:12 ` Mike Galbraith
2017-02-08 4:58 ` Kent Overstreet
2017-02-08 6:22 ` [PATCH] block: Make rescuer threads per request_queue, not per bioset kbuild test robot
2017-02-08 6:22 ` kbuild test robot
2017-02-08 6:23 ` kbuild test robot
2017-02-08 6:57 ` v4.9, 4.4-final: 28 bioset threads on small notebook, 36 threads on cellphone Mike Galbraith
2017-02-08 6:57 ` Mike Galbraith
2017-02-08 16:34 ` Mike Snitzer
2017-02-08 16:34 ` Mike Snitzer
2017-02-09 21:25 ` Kent Overstreet
2017-02-14 16:34 ` [dm-devel] " Mikulas Patocka
2017-02-14 17:33 ` Mike Snitzer
2017-02-08 2:47 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160224032352.GA22341@kmo-pixel \
--to=kent.overstreet@gmail.com \
--cc=agk@redhat.com \
--cc=andreas.dilger@intel.com \
--cc=axboe@fb.com \
--cc=dm-devel@redhat.com \
--cc=dpark@posteo.net \
--cc=geoff@infradead.org \
--cc=hch@lst.de \
--cc=jim@jtan.com \
--cc=jkosina@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=minchan@kernel.org \
--cc=ming.l@ssi.samsung.com \
--cc=ming.lei@canonical.com \
--cc=ngupta@vflare.org \
--cc=oleg.drokin@intel.com \
--cc=pavel@ucw.cz \
--cc=pjk1939@linux.vnet.ibm.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.