From: Mike Snitzer <snitzer@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: axboe@kernel.dk, martin.petersen@oracle.com, jdorminy@redhat.com,
bjohnsto@redhat.com, Ming Lei <ming.lei@redhat.com>,
linux-block@vger.kernel.org, dm-devel@redhat.com
Subject: Re: [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
Date: Thu, 3 Dec 2020 12:56:57 -0500 [thread overview]
Message-ID: <20201203175657.GA29623@redhat.com> (raw)
In-Reply-To: <20201203162738.GA3404013@dhcp-10-100-145-180.wdc.com>
On Thu, Dec 03 2020 at 11:27am -0500,
Keith Busch <kbusch@kernel.org> wrote:
> On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> > On Wed, Dec 02 2020 at 10:26pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> >
> > > I understand it isn't related with correctness, because the underlying
> > > queue can split by its own chunk_sectors limit further. So is the issue
> > > too many further-splitting on queue with chunk_sectors 8? then CPU
> > > utilization is increased? Or other issue?
> >
> > No, this is all about correctness.
> >
> > Seems you're confining the definition of the possible stacking so that
> > the top-level device isn't allowed to have its own hard requirements on
> > IO sizes it sends to its internal implementation. Just because the
> > underlying device can split further doesn't mean that the top-level
> > virtual driver can service larger IO sizes (not if the chunk_sectors
> > stacking throws away the hint the virtual driver provided because it
> > used lcm_not_zero).
>
> I may be missing something obvious here, but if the lower layers split
> to their desired boundary already, why does this limit need to stack?
The problematic scenario is when the topmost layer, or layers, are the
more constrained. _That_ is why the top-level's chunk_sectors limit
cannot be relaxed.
For example (in extreme where chunk_sectors is stacked via gcd):
dm VDO target (chunk_sectors=4K)
on dm-thin (ideally chunk_sectors=1280K, reality chunk_sectors=128K)
on 10+2 RAID6 (chunk_sectors=128K, io_opt=1280K)
on raid members (chunk_sectors=0)
Results in the following bottom up blk_stack_limits() stacking:
gcd(128K, 0) = 128K -> but MD just sets chunk_sectors, no stacking is done afaik
gcd(1280K, 128K) = 128K -> this one hurts dm-thin, needless splitting
gcd(4K, 128K) = 4K -> vdo _must_ receive 4K IOs, hurts but "this is the way" ;)
So this is one extreme that shows stacking chunk_sectors is _not_
helpful (if the resulting chunk_sectors were actually used as basis for
splitting). Better for each layer to just impose its own chunk_sectors
without concern for the layers below.
Think I'd be fine with block core removing the chunk_sectors stacking
from blk_stack_limits()...
(and as you see below, I've been forced to revert to _not_ using stacked
chunk_sectors based splitting in DM)
> Won't it also work if each layer sets their desired chunk_sectors
> without considering their lower layers? The commit that initially
> stacked chunk_sectors doesn't provide any explanation.
Yes, I think it would work. The current stacking doesn't have the
luxury of knowing which layer a blk_stack_limits() maps too. BUT within
a layer chunk_sectors really does need to be compatible/symbiotic. So
it is unfortunately all or nothing as you build up the stack.
And that all-or-nothing stacking of chunk_sectors is why I've now (just
last night, based on further review by jdorminy) had to punt on using
stacked chunk_sectors and revert DM back to doing its own fine-grained
(and varied) splitting on a per DM target basis, see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=6bb38bcc33bf3093c08bd1b71e4f20c82bb60dd1
Kind of depressing that I went so far down the rabbit hole, of wanting
to lean on block core, that I lost sight of an important "tenet of DM":
+ * Does the target need to split IO even further?
+ * - varied (per target) IO splitting is a tenet of DM; this
+ * explains why stacked chunk_sectors based splitting via
+ * blk_max_size_offset() isn't possible here.
And it is because of this that DM is forced to lean on human creation of
an optimal IO stack.. which is prone to human error when a particular
thinp "blocksize" is selected, etc.
Mike
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>,
axboe@kernel.dk, martin.petersen@oracle.com,
linux-block@vger.kernel.org, dm-devel@redhat.com,
jdorminy@redhat.com, bjohnsto@redhat.com
Subject: Re: [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking
Date: Thu, 3 Dec 2020 12:56:57 -0500 [thread overview]
Message-ID: <20201203175657.GA29623@redhat.com> (raw)
In-Reply-To: <20201203162738.GA3404013@dhcp-10-100-145-180.wdc.com>
On Thu, Dec 03 2020 at 11:27am -0500,
Keith Busch <kbusch@kernel.org> wrote:
> On Thu, Dec 03, 2020 at 09:33:59AM -0500, Mike Snitzer wrote:
> > On Wed, Dec 02 2020 at 10:26pm -0500,
> > Ming Lei <ming.lei@redhat.com> wrote:
> >
> > > I understand it isn't related with correctness, because the underlying
> > > queue can split by its own chunk_sectors limit further. So is the issue
> > > too many further-splitting on queue with chunk_sectors 8? then CPU
> > > utilization is increased? Or other issue?
> >
> > No, this is all about correctness.
> >
> > Seems you're confining the definition of the possible stacking so that
> > the top-level device isn't allowed to have its own hard requirements on
> > IO sizes it sends to its internal implementation. Just because the
> > underlying device can split further doesn't mean that the top-level
> > virtual driver can service larger IO sizes (not if the chunk_sectors
> > stacking throws away the hint the virtual driver provided because it
> > used lcm_not_zero).
>
> I may be missing something obvious here, but if the lower layers split
> to their desired boundary already, why does this limit need to stack?
The problematic scenario is when the topmost layer, or layers, are the
more constrained. _That_ is why the top-level's chunk_sectors limit
cannot be relaxed.
For example (in extreme where chunk_sectors is stacked via gcd):
dm VDO target (chunk_sectors=4K)
on dm-thin (ideally chunk_sectors=1280K, reality chunk_sectors=128K)
on 10+2 RAID6 (chunk_sectors=128K, io_opt=1280K)
on raid members (chunk_sectors=0)
Results in the following bottom up blk_stack_limits() stacking:
gcd(128K, 0) = 128K -> but MD just sets chunk_sectors, no stacking is done afaik
gcd(1280K, 128K) = 128K -> this one hurts dm-thin, needless splitting
gcd(4K, 128K) = 4K -> vdo _must_ receive 4K IOs, hurts but "this is the way" ;)
So this is one extreme that shows stacking chunk_sectors is _not_
helpful (if the resulting chunk_sectors were actually used as basis for
splitting). Better for each layer to just impose its own chunk_sectors
without concern for the layers below.
Think I'd be fine with block core removing the chunk_sectors stacking
from blk_stack_limits()...
(and as you see below, I've been forced to revert to _not_ using stacked
chunk_sectors based splitting in DM)
> Won't it also work if each layer sets their desired chunk_sectors
> without considering their lower layers? The commit that initially
> stacked chunk_sectors doesn't provide any explanation.
Yes, I think it would work. The current stacking doesn't have the
luxury of knowing which layer a blk_stack_limits() maps too. BUT within
a layer chunk_sectors really does need to be compatible/symbiotic. So
it is unfortunately all or nothing as you build up the stack.
And that all-or-nothing stacking of chunk_sectors is why I've now (just
last night, based on further review by jdorminy) had to punt on using
stacked chunk_sectors and revert DM back to doing its own fine-grained
(and varied) splitting on a per DM target basis, see:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-5.10-rcX&id=6bb38bcc33bf3093c08bd1b71e4f20c82bb60dd1
Kind of depressing that I went so far down the rabbit hole, of wanting
to lean on block core, that I lost sight of an important "tenet of DM":
+ * Does the target need to split IO even further?
+ * - varied (per target) IO splitting is a tenet of DM; this
+ * explains why stacked chunk_sectors based splitting via
+ * blk_max_size_offset() isn't possible here.
And it is because of this that DM is forced to lean on human creation of
an optimal IO stack.. which is prone to human error when a particular
thinp "blocksize" is selected, etc.
Mike
next prev parent reply other threads:[~2020-12-03 17:57 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-30 17:18 [dm-devel] [PATCH] block: revert to using min_not_zero() when stacking chunk_sectors Mike Snitzer
2020-11-30 17:18 ` Mike Snitzer
2020-11-30 20:51 ` [dm-devel] " John Dorminy
2020-11-30 20:51 ` John Dorminy
2020-11-30 23:24 ` [dm-devel] " Mike Snitzer
2020-11-30 23:24 ` Mike Snitzer
2020-12-01 0:21 ` [dm-devel] " John Dorminy
2020-12-01 0:21 ` John Dorminy
2020-12-01 2:12 ` [dm-devel] " Mike Snitzer
2020-12-01 2:12 ` Mike Snitzer
2020-12-01 16:07 ` [dm-devel] [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking Mike Snitzer
2020-12-01 16:07 ` Mike Snitzer
2020-12-01 17:43 ` [dm-devel] " John Dorminy
2020-12-01 17:43 ` John Dorminy
2020-12-01 17:53 ` [dm-devel] " Jens Axboe
2020-12-01 17:53 ` Jens Axboe
2020-12-01 18:02 ` [dm-devel] " Martin K. Petersen
2020-12-01 18:02 ` Martin K. Petersen
2020-12-02 3:38 ` [dm-devel] [PATCH] dm: " Jeffle Xu
2020-12-02 3:38 ` Jeffle Xu
2020-12-02 3:38 ` [dm-devel] " Jeffle Xu
2020-12-02 3:38 ` Jeffle Xu
2020-12-02 3:57 ` [dm-devel] " JeffleXu
2020-12-02 3:57 ` JeffleXu
2020-12-02 5:03 ` [dm-devel] " Mike Snitzer
2020-12-02 5:03 ` Mike Snitzer
2020-12-02 5:14 ` [dm-devel] " Mike Snitzer
2020-12-02 5:14 ` Mike Snitzer
2020-12-02 6:31 ` [dm-devel] " JeffleXu
2020-12-02 6:31 ` JeffleXu
2020-12-02 6:35 ` [dm-devel] " JeffleXu
2020-12-02 6:35 ` JeffleXu
2020-12-02 6:28 ` [dm-devel] " JeffleXu
2020-12-02 6:28 ` JeffleXu
2020-12-02 7:10 ` [dm-devel] " JeffleXu
2020-12-02 7:10 ` JeffleXu
2020-12-02 15:11 ` [dm-devel] " Mike Snitzer
2020-12-02 15:11 ` Mike Snitzer
2020-12-03 1:48 ` [dm-devel] " JeffleXu
2020-12-03 1:48 ` JeffleXu
2020-12-03 3:26 ` [dm-devel] [PATCH v2] block: " Ming Lei
2020-12-03 3:26 ` Ming Lei
2020-12-03 14:33 ` [dm-devel] " Mike Snitzer
2020-12-03 14:33 ` Mike Snitzer
2020-12-03 16:27 ` [dm-devel] " Keith Busch
2020-12-03 16:27 ` Keith Busch
2020-12-03 17:56 ` Mike Snitzer [this message]
2020-12-03 17:56 ` Mike Snitzer
2020-12-04 1:45 ` [dm-devel] " Ming Lei
2020-12-04 1:45 ` Ming Lei
2020-12-04 2:11 ` [dm-devel] " Mike Snitzer
2020-12-04 2:11 ` Mike Snitzer
2020-12-04 6:22 ` [dm-devel] " Damien Le Moal
2020-12-04 6:22 ` Damien Le Moal
2020-12-04 1:12 ` Ming Lei
2020-12-04 1:12 ` Ming Lei
2020-12-04 2:03 ` [dm-devel] " Mike Snitzer
2020-12-04 2:03 ` Mike Snitzer
2020-12-04 3:59 ` [dm-devel] " Ming Lei
2020-12-04 3:59 ` Ming Lei
2020-12-04 16:47 ` [dm-devel] " Mike Snitzer
2020-12-04 16:47 ` Mike Snitzer
2020-12-04 17:32 ` [dm-devel] [RFC PATCH] dm: fix IO splitting [was: Re: [PATCH v2] block: use gcd() to fix chunk_sectors limit stacking] Mike Snitzer
2020-12-04 17:32 ` Mike Snitzer
2020-12-04 17:49 ` [dm-devel] " Mike Snitzer
2020-12-04 17:49 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201203175657.GA29623@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=bjohnsto@redhat.com \
--cc=dm-devel@redhat.com \
--cc=jdorminy@redhat.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.