From: Mike Snitzer <snitzer@redhat.com>
To: JeffleXu <jefflexu@linux.alibaba.com>
Cc: axboe@kernel.dk, xiaoguang.wang@linux.alibaba.com,
linux-block@vger.kernel.org, joseph.qi@linux.alibaba.com,
dm-devel@redhat.com, haoxu@linux.alibaba.com,
io-uring@vger.kernel.org
Subject: Re: [dm-devel] [RFC 0/3] Add support of iopoll for dm device
Date: Wed, 4 Nov 2020 10:08:48 -0500 [thread overview]
Message-ID: <20201104150847.GB32761@redhat.com> (raw)
In-Reply-To: <f165f38a-91d1-79aa-829d-a9cc69a5eee6@linux.alibaba.com>
On Wed, Nov 04 2020 at 1:47am -0500,
JeffleXu <jefflexu@linux.alibaba.com> wrote:
>
> On 11/2/20 11:28 PM, Mike Snitzer wrote:
> >On Sun, Nov 01 2020 at 10:14pm -0500,
> >JeffleXu <jefflexu@linux.alibaba.com> wrote:
> >
> >>On 10/27/20 2:53 AM, Mike Snitzer wrote:
> >>>What you detailed there isn't properly modeling what it needs to.
> >>>A given dm_target_io could result in quite a few bios (e.g. for
> >>>dm-striped we clone each bio for each of N stripes). So the fan-out,
> >>>especially if then stacked on N layers of stacked devices, to all the
> >>>various hctx at the lowest layers is like herding cats.
> >>>
> >>>But the recursion in block core's submit_bio path makes that challenging
> >>>to say the least. So much so that any solution related to enabling
> >>>proper bio-based IO polling is going to need a pretty significant
> >>>investment in fixing block core (storing __submit_bio()'s cookie during
> >>>recursion, possibly storing to driver provided memory location,
> >>>e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM
> >>>clone bio's per-bio-data).
> >>>
> >>>SO __submit_bio_noacct would become:
> >>>
> >>> retp = &ret;
> >>> if (bio->submit_cookie)
> >>> retp = bio->submit_cookie;
> >>> *retp = __submit_bio(bio);
> >>Sorry for the late reply. Exactly I missed this point before. IF you
> >>have not started working on this, I'd like to try to implement this as
> >>an RFC.
> >I did start on this line of development but it needs quite a bit more
> >work. Even the pseudo code I provided above isn't useful in the context
> >of DM clone bios that have their own per-bio-data to assist with this
> >implementation. Because the __submit_bio_noacct() recursive call
> >drivers/md/dm.c:__split_and_process_bio() makes is supplying the
> >original bio (modified to only point to remaining work).
>
> Yes I noticed this recently. Since the depth-first splitting
> introduced in commit 18a25da84354
>
> ("dm: ensure bio submission follows a depth-first tree walk"), one
> bio to dm device can be
>
> split into multiple bios to this dm device.
>
> ```
>
> one bio to dm device (dm0) = one dm_io (to nvme0) + one bio to this
> same dm device (dm0)
>
> ```
>
>
> In this case we need a mechanism to track all split sub-bios of the
> very beginning original bio.
Yes, splitting causes additional potential for sub-bios. There are
other cases that cause a 1-to-many bio generation (e.g. dm-striped) or
splitting cases where a DM target makes use of dm_accept_partial_bio
(e.g. dm-snapshot, dm-integrity, dm-writecache, etc).
> I'm doubted if this should be implemented in block layer like:
>
> ```
>
> struct bio {
>
> ...
>
> struct list_head cookies;
>
> };
>
> ```
>
> After all it's only used by bio-based queue, or more specifically
> only dm device currently.
I do think this line of work really should be handled in block core
because I cannot see any reason why MD or bcache or whatever bio-based
device wouldn't want the ability to better support io_uring (with IO
poll).
> Another design I can come up with is to maintain a global data
> structure for the very beginning
> original bio. Currently the blocking point is that now one original
> bio to the dm device (@bio of dm_submit()) can correspond to multiple
> dm_io and thus we have nowhere to place the @cookies list.
Yes, and that will always be the case. We need the design to handle an
arbitrary sprawl of splitting from a given bio. The graph of bios
resulting from that fan-out needs to be walked at various levels -- be
it the top-level original bio's submit_bio() returned cookie or some
intermediate point in the chain of bios.
The problem is the lifetime of the data structure created for a given
split bio versus layering boundaries (that come from block core's
simplistic recursion via bio using submit_bio).
> Now we have to maintain one data structure for every original bio,
> something like
>
> ```
>
> struct dm_poll_instance {
>
> ...
>
> struct list_head cookies;
>
> };
>
> ```
I do think we need a hybrid where at the point of recursion we're able
to make the associated data structure available across the recursion
boundary so that modeling the association in a chain of split bios is
possible. (e.g. struct dm_poll_data or dm_poll_instance as you named it,
_but_ that struct definition would live in block core, but would be part
of per-bio-data; so 'struct blk_poll_data' is more logical name when
elevated to block core).
It _might_ be worthwhile to see if a new BIO_ flag could be added to
allow augmenting the bio_split + bio_chain pattern to also track this
additional case of carrying additional data per-bio while creating
bio-chains. I may not be clear yet, said differently: augmenting
bio_chain to not only chain bios, but to _also_ thread/chain together
per-bio-data that lives within those chained bios. SO you have the
chain of bios _and_ the chain of potentially opaque void * that happens
to point to a list head for a list of 'struct blk_poll_data'.
Does that make sense?
> We can transfer this dm_poll_instance between split bios by
> bio->bi_private, like
>
> ```
>
> dm_submit_bio(...) {
>
> struct dm_poll_instance *ins;
>
> if (bio->bi_private)
>
> ins = bio->bi_private;
>
> else {
>
> ins = alloc_poll_instance();
>
> bio->bi_private = ins;
>
> }
>
> ...
>
> }
>
> ```
Sadly, we cannot (ab)use bi_private for this given its (ab)used via the
bio_chain() interface. It's almost like we need to add a new pointer in
the bio that isn't left for block core to hijack.
There is the well-worn pattern of saving off the original bi_private,
hooking a new endio method and then when that endio is called restoring
bi_private but we really want to avoid excessive indirect function calls
for this usecase. The entire point of implementing blk_poll support is
for performance after all.
Mike
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: JeffleXu <jefflexu@linux.alibaba.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
dm-devel@redhat.com, joseph.qi@linux.alibaba.com,
xiaoguang.wang@linux.alibaba.com, haoxu@linux.alibaba.com,
io-uring@vger.kernel.org
Subject: Re: [RFC 0/3] Add support of iopoll for dm device
Date: Wed, 4 Nov 2020 10:08:48 -0500 [thread overview]
Message-ID: <20201104150847.GB32761@redhat.com> (raw)
In-Reply-To: <f165f38a-91d1-79aa-829d-a9cc69a5eee6@linux.alibaba.com>
On Wed, Nov 04 2020 at 1:47am -0500,
JeffleXu <jefflexu@linux.alibaba.com> wrote:
>
> On 11/2/20 11:28 PM, Mike Snitzer wrote:
> >On Sun, Nov 01 2020 at 10:14pm -0500,
> >JeffleXu <jefflexu@linux.alibaba.com> wrote:
> >
> >>On 10/27/20 2:53 AM, Mike Snitzer wrote:
> >>>What you detailed there isn't properly modeling what it needs to.
> >>>A given dm_target_io could result in quite a few bios (e.g. for
> >>>dm-striped we clone each bio for each of N stripes). So the fan-out,
> >>>especially if then stacked on N layers of stacked devices, to all the
> >>>various hctx at the lowest layers is like herding cats.
> >>>
> >>>But the recursion in block core's submit_bio path makes that challenging
> >>>to say the least. So much so that any solution related to enabling
> >>>proper bio-based IO polling is going to need a pretty significant
> >>>investment in fixing block core (storing __submit_bio()'s cookie during
> >>>recursion, possibly storing to driver provided memory location,
> >>>e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM
> >>>clone bio's per-bio-data).
> >>>
> >>>SO __submit_bio_noacct would become:
> >>>
> >>> retp = &ret;
> >>> if (bio->submit_cookie)
> >>> retp = bio->submit_cookie;
> >>> *retp = __submit_bio(bio);
> >>Sorry for the late reply. Exactly I missed this point before. IF you
> >>have not started working on this, I'd like to try to implement this as
> >>an RFC.
> >I did start on this line of development but it needs quite a bit more
> >work. Even the pseudo code I provided above isn't useful in the context
> >of DM clone bios that have their own per-bio-data to assist with this
> >implementation. Because the __submit_bio_noacct() recursive call
> >drivers/md/dm.c:__split_and_process_bio() makes is supplying the
> >original bio (modified to only point to remaining work).
>
> Yes I noticed this recently. Since the depth-first splitting
> introduced in commit 18a25da84354
>
> ("dm: ensure bio submission follows a depth-first tree walk"), one
> bio to dm device can be
>
> split into multiple bios to this dm device.
>
> ```
>
> one bio to dm device (dm0) = one dm_io (to nvme0) + one bio to this
> same dm device (dm0)
>
> ```
>
>
> In this case we need a mechanism to track all split sub-bios of the
> very beginning original bio.
Yes, splitting causes additional potential for sub-bios. There are
other cases that cause a 1-to-many bio generation (e.g. dm-striped) or
splitting cases where a DM target makes use of dm_accept_partial_bio
(e.g. dm-snapshot, dm-integrity, dm-writecache, etc).
> I'm doubted if this should be implemented in block layer like:
>
> ```
>
> struct bio {
>
> ...
>
> struct list_head cookies;
>
> };
>
> ```
>
> After all it's only used by bio-based queue, or more specifically
> only dm device currently.
I do think this line of work really should be handled in block core
because I cannot see any reason why MD or bcache or whatever bio-based
device wouldn't want the ability to better support io_uring (with IO
poll).
> Another design I can come up with is to maintain a global data
> structure for the very beginning
> original bio. Currently the blocking point is that now one original
> bio to the dm device (@bio of dm_submit()) can correspond to multiple
> dm_io and thus we have nowhere to place the @cookies list.
Yes, and that will always be the case. We need the design to handle an
arbitrary sprawl of splitting from a given bio. The graph of bios
resulting from that fan-out needs to be walked at various levels -- be
it the top-level original bio's submit_bio() returned cookie or some
intermediate point in the chain of bios.
The problem is the lifetime of the data structure created for a given
split bio versus layering boundaries (that come from block core's
simplistic recursion via bio using submit_bio).
> Now we have to maintain one data structure for every original bio,
> something like
>
> ```
>
> struct dm_poll_instance {
>
> ...
>
> struct list_head cookies;
>
> };
>
> ```
I do think we need a hybrid where at the point of recursion we're able
to make the associated data structure available across the recursion
boundary so that modeling the association in a chain of split bios is
possible. (e.g. struct dm_poll_data or dm_poll_instance as you named it,
_but_ that struct definition would live in block core, but would be part
of per-bio-data; so 'struct blk_poll_data' is more logical name when
elevated to block core).
It _might_ be worthwhile to see if a new BIO_ flag could be added to
allow augmenting the bio_split + bio_chain pattern to also track this
additional case of carrying additional data per-bio while creating
bio-chains. I may not be clear yet, said differently: augmenting
bio_chain to not only chain bios, but to _also_ thread/chain together
per-bio-data that lives within those chained bios. SO you have the
chain of bios _and_ the chain of potentially opaque void * that happens
to point to a list head for a list of 'struct blk_poll_data'.
Does that make sense?
> We can transfer this dm_poll_instance between split bios by
> bio->bi_private, like
>
> ```
>
> dm_submit_bio(...) {
>
> struct dm_poll_instance *ins;
>
> if (bio->bi_private)
>
> ins = bio->bi_private;
>
> else {
>
> ins = alloc_poll_instance();
>
> bio->bi_private = ins;
>
> }
>
> ...
>
> }
>
> ```
Sadly, we cannot (ab)use bi_private for this given its (ab)used via the
bio_chain() interface. It's almost like we need to add a new pointer in
the bio that isn't left for block core to hijack.
There is the well-worn pattern of saving off the original bi_private,
hooking a new endio method and then when that endio is called restoring
bi_private but we really want to avoid excessive indirect function calls
for this usecase. The entire point of implementing blk_poll support is
for performance after all.
Mike
next prev parent reply other threads:[~2020-11-04 16:09 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-20 6:54 [dm-devel] [RFC 0/3] Add support of iopoll for dm device Jeffle Xu
2020-10-20 6:54 ` Jeffle Xu
2020-10-20 6:54 ` [dm-devel] [RFC 1/3] block/mq: add iterator for polling hw queues Jeffle Xu
2020-10-20 6:54 ` Jeffle Xu
2020-10-20 6:54 ` [dm-devel] [RFC 2/3] block: add back ->poll_fn in request queue Jeffle Xu
2020-10-20 6:54 ` Jeffle Xu
2020-10-20 6:54 ` [dm-devel] [RFC 3/3] dm: add support for IO polling Jeffle Xu
2020-10-20 6:54 ` Jeffle Xu
2020-10-21 20:49 ` [dm-devel] " Mike Snitzer
2020-10-21 20:49 ` Mike Snitzer
2020-10-22 3:14 ` [dm-devel] " Ming Lei
2020-10-22 3:14 ` Ming Lei
2020-10-21 20:39 ` [dm-devel] [RFC 0/3] Add support of iopoll for dm device Mike Snitzer
2020-10-21 20:39 ` Mike Snitzer
2020-10-22 5:28 ` [dm-devel] " JeffleXu
2020-10-22 5:28 ` JeffleXu
2020-10-26 18:53 ` [dm-devel] " Mike Snitzer
2020-10-26 18:53 ` Mike Snitzer
2020-11-02 3:14 ` [dm-devel] " JeffleXu
2020-11-02 3:14 ` JeffleXu
2020-11-02 15:28 ` [dm-devel] " Mike Snitzer
2020-11-02 15:28 ` Mike Snitzer
2020-11-03 8:59 ` [dm-devel] " JeffleXu
2020-11-03 8:59 ` JeffleXu
2020-11-04 6:47 ` [dm-devel] " JeffleXu
2020-11-04 6:47 ` JeffleXu
2020-11-04 15:08 ` Mike Snitzer [this message]
2020-11-04 15:08 ` Mike Snitzer
2020-11-06 2:51 ` [dm-devel] " JeffleXu
2020-11-06 2:51 ` JeffleXu
2020-11-06 2:57 ` JeffleXu
2020-11-06 17:45 ` Mike Snitzer
2020-11-06 17:45 ` Mike Snitzer
2020-11-08 1:09 ` [dm-devel] " JeffleXu
2020-11-08 1:09 ` JeffleXu
2020-11-09 18:15 ` Mike Snitzer
2020-11-09 18:15 ` Mike Snitzer
2020-11-10 1:43 ` [dm-devel] " JeffleXu
2020-11-10 1:43 ` JeffleXu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201104150847.GB32761@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=haoxu@linux.alibaba.com \
--cc=io-uring@vger.kernel.org \
--cc=jefflexu@linux.alibaba.com \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-block@vger.kernel.org \
--cc=xiaoguang.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.