From: Mike Snitzer <snitzer@redhat.com>
To: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: axboe@kernel.dk, xiaoguang.wang@linux.alibaba.com,
linux-block@vger.kernel.org, joseph.qi@linux.alibaba.com,
dm-devel@redhat.com, haoxu@linux.alibaba.com
Subject: Re: [dm-devel] [RFC 0/3] Add support of iopoll for dm device
Date: Wed, 21 Oct 2020 16:39:07 -0400 [thread overview]
Message-ID: <20201021203906.GA10896@redhat.com> (raw)
In-Reply-To: <20201020065420.124885-1-jefflexu@linux.alibaba.com>
On Tue, Oct 20 2020 at 2:54am -0400,
Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> This patch set adds support of iopoll for dm device.
>
> This is only an RFC patch. I'm really looking forward getting your
> feedbacks on if you're interested in supporting iopoll for dm device,
> or if there's a better design to implement that.
>
> Thanks.
>
>
> [Purpose]
> IO polling is an important mode of io_uring. Currently only mq devices
> support iopoll. As for dm devices, only dm-multipath is request-base,
> while others are all bio-based and have no support for iopoll.
> Supporting iopoll for dm devices can be of great meaning when the
> device seen by application is dm device such as dm-linear/dm-stripe,
> in which case iopoll is not usable for io_uring.
I appreciate you taking the initiative on this; polling support is on my
TODO so your work serves as a nice reminder to pursue this more
urgently.
> [Design Notes]
>
> cookie
> ------
> Let's start from cookie. Cookie is one important concept in iopoll. It
> is used to identify one specific request in one specific hardware queue.
> The concept of cookie is initially designed as a per-bio concept, and
> thus it doesn't work well when bio-split involved. When bio is split,
> the returned cookie is indeed the cookie of one of the split bio, and
> the following polling on this returned cookie can only guarantee the
> completion of this specific split bio, while the other split bios may
> be still uncompleted. Bio-split is also resolved for dm device, though
> in another form, in which case the original bio submitted to the dm
> device may be split into multiple bios submitted to the underlying
> devices.
>
> In previous discussion, Lei Ming has suggested that iopoll should be
> disabled for bio-split. This works for the normal bio-split (done in
> blk_mq_submit_bio->__blk_queue_split), while iopoll will never be
> usable for dm devices if this also applies for dm device.
>
> So come back to the design of the cookie for dm devices. At the very
> beginning I want to refactor the design of cookie, from 'unsigned int'
> type to the pointer type for dm device, so that cookie can point to
> something, e.g. a list containing all cookies of one original bio,
> something like this:
>
> struct dm_io { // the returned cookie points to dm_io
> ...
> struct list_head cookies;
> };
>
> In this design, we can fetch all cookies of one original bio, but the
> implementation can be difficult and buggy. For example, the
> 'struct dm_io' structure may be already freed when the returned cookie
> is used in blk_poll(). Then what if maintain a refcount in struct dm_io
> so that 'struct dm_io' structure can not be freed until blk_poll()
> called? Then the 'struct dm_io' structure will never be freed if the
> IO polling is not used at all.
I'd have to look closer at the race in the code you wrote (though you
didn't share it); but we cannot avoid properly mapping a cookie to each
split bio. Otherwise you resort to inefficiently polling everything.
Seems your attempt to have the cookie point to a dm_io object was likely
too coarse-grained (when bios are split they get their own dm_io on
recursive re-entry to DM core from ->submit_bio); but isn't having a
list of cookies still too imprecise for polling purposes? You could
easily have a list that translates to many blk-mq queues. Possibly
better than your current approach of polling everything -- but not
much.
> So finally I gived up refactoring the design of cookie and maintain all
> cookies of one original bio. The initial design of cookie gets retained.
> The returned cookie is still the cookie of one of the split bio, and thus
> it is not used at all when polling dm devices. The polling of dm device,
> is indeed iterating and polling on all hardware queues (in polling mode)
> of all underlying target devices.
>
> This design is simple and easy to implement. But I'm not sure if the
> performance can be an issue if one dm device mapped to too many target
> devices or the dm stack depth grows.
You showed 40% performance improvement but it really just does the bare
minimum of implementing polling. It is so simplistic that I really
don't think the design even serves as a starting point for what will be
needed. So this needs further design time.
What you've _done_ could serve as a stop-gap but I'd really rather we
get it properly designed from the start.
> REQ_NOWAIT
> ----------
> The polling bio will set REQ_NOWAIT in bio->bi_flags, and the device
> need to be capable of handling REQ_NOWAIT. Commit 021a24460dc2
> ("block: add QUEUE_FLAG_NOWAIT") and commit 6abc49468eea ("dm: add
> support for REQ_NOWAIT and enable it for linear target") add this
> support for dm device, and currently only dm-linear supports that.
>
> hybrid polling
> --------------
> When execute hybrid polling, cookie is used to fetch the request,
> and check if the request has completed or not. In the design for
> dm device described above, the returned cookie is still the cookie
> of one of the split bios, and thus we have no way checking if all
> the split bios have completed or not. Thus in the current
> implementation, hybrid polling is not supported for dm device.
I'll look closer at all this. But DM targets do allow for additional
target specific per-bio-data to be added to the overall per-bio-data
size (via ti->per_io_data_size). DM core _could_ internalize adding
additional space to per-bio-data for storing a unique cookie per
split/clone bio.
Conversely, block-core's bio cloning could manage this so long as it
knew how to access the offset into the established per-bio-data. But
that might be more challenging to do without impacting use-cases outside
of DM.
Thanks,
Mike
> [Performance]
> I test 4k-randread on a dm-linear mapped to only one nvme device.
>
> SSD: INTEL SSDPEDME800G4
> dm-linear:dmsetup create testdev --table "0 209715200 linear /dev/nvme0n1 0"
>
> fio configs:
> ```
> ioengine=io_uring
> iodepth=128
> numjobs=1
> thread
> rw=randread
> direct=1
> registerfiles=1
> #hipri=1 with iopoll enabled, hipri=0 with iopoll disabled
> bs=4k
> size=100G
> runtime=60
> time_based
> group_reporting
> randrepeat=0
>
> [device]
> filename=/dev/mapper/testdev
> ```
>
> The test result shows that there's ~40% performance growth when iopoll
> enabled.
>
> test | iops | bw | avg lat
> ---- | ---- | ---- | ----
> iopoll-disabled | 244k | 953MiB/s | 524us
> iopoll-enabled | 345k | 1349MiB/s | 370us
>
> [TODO]
> The store method of "io_poll"/"io_poll_delay" has not been adapted
> for dm device.
>
>
> Jeffle Xu (3):
> block/mq: add iterator for polling hw queues
> block: add back ->poll_fn in request queue
> dm: add support for IO polling
>
> block/blk-mq.c | 30 ++++++++++++++++++++++++------
> drivers/md/dm-core.h | 1 +
> drivers/md/dm-table.c | 30 ++++++++++++++++++++++++++++++
> drivers/md/dm.c | 40 ++++++++++++++++++++++++++++++++++++++++
> include/linux/blk-mq.h | 6 ++++++
> include/linux/blkdev.h | 3 +++
> 6 files changed, 104 insertions(+), 6 deletions(-)
>
> --
> 2.27.0
>
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: axboe@kernel.dk, linux-block@vger.kernel.org,
dm-devel@redhat.com, joseph.qi@linux.alibaba.com,
xiaoguang.wang@linux.alibaba.com, haoxu@linux.alibaba.com
Subject: Re: [RFC 0/3] Add support of iopoll for dm device
Date: Wed, 21 Oct 2020 16:39:07 -0400 [thread overview]
Message-ID: <20201021203906.GA10896@redhat.com> (raw)
In-Reply-To: <20201020065420.124885-1-jefflexu@linux.alibaba.com>
On Tue, Oct 20 2020 at 2:54am -0400,
Jeffle Xu <jefflexu@linux.alibaba.com> wrote:
> This patch set adds support of iopoll for dm device.
>
> This is only an RFC patch. I'm really looking forward getting your
> feedbacks on if you're interested in supporting iopoll for dm device,
> or if there's a better design to implement that.
>
> Thanks.
>
>
> [Purpose]
> IO polling is an important mode of io_uring. Currently only mq devices
> support iopoll. As for dm devices, only dm-multipath is request-base,
> while others are all bio-based and have no support for iopoll.
> Supporting iopoll for dm devices can be of great meaning when the
> device seen by application is dm device such as dm-linear/dm-stripe,
> in which case iopoll is not usable for io_uring.
I appreciate you taking the initiative on this; polling support is on my
TODO so your work serves as a nice reminder to pursue this more
urgently.
> [Design Notes]
>
> cookie
> ------
> Let's start from cookie. Cookie is one important concept in iopoll. It
> is used to identify one specific request in one specific hardware queue.
> The concept of cookie is initially designed as a per-bio concept, and
> thus it doesn't work well when bio-split involved. When bio is split,
> the returned cookie is indeed the cookie of one of the split bio, and
> the following polling on this returned cookie can only guarantee the
> completion of this specific split bio, while the other split bios may
> be still uncompleted. Bio-split is also resolved for dm device, though
> in another form, in which case the original bio submitted to the dm
> device may be split into multiple bios submitted to the underlying
> devices.
>
> In previous discussion, Lei Ming has suggested that iopoll should be
> disabled for bio-split. This works for the normal bio-split (done in
> blk_mq_submit_bio->__blk_queue_split), while iopoll will never be
> usable for dm devices if this also applies for dm device.
>
> So come back to the design of the cookie for dm devices. At the very
> beginning I want to refactor the design of cookie, from 'unsigned int'
> type to the pointer type for dm device, so that cookie can point to
> something, e.g. a list containing all cookies of one original bio,
> something like this:
>
> struct dm_io { // the returned cookie points to dm_io
> ...
> struct list_head cookies;
> };
>
> In this design, we can fetch all cookies of one original bio, but the
> implementation can be difficult and buggy. For example, the
> 'struct dm_io' structure may be already freed when the returned cookie
> is used in blk_poll(). Then what if maintain a refcount in struct dm_io
> so that 'struct dm_io' structure can not be freed until blk_poll()
> called? Then the 'struct dm_io' structure will never be freed if the
> IO polling is not used at all.
I'd have to look closer at the race in the code you wrote (though you
didn't share it); but we cannot avoid properly mapping a cookie to each
split bio. Otherwise you resort to inefficiently polling everything.
Seems your attempt to have the cookie point to a dm_io object was likely
too coarse-grained (when bios are split they get their own dm_io on
recursive re-entry to DM core from ->submit_bio); but isn't having a
list of cookies still too imprecise for polling purposes? You could
easily have a list that translates to many blk-mq queues. Possibly
better than your current approach of polling everything -- but not
much.
> So finally I gived up refactoring the design of cookie and maintain all
> cookies of one original bio. The initial design of cookie gets retained.
> The returned cookie is still the cookie of one of the split bio, and thus
> it is not used at all when polling dm devices. The polling of dm device,
> is indeed iterating and polling on all hardware queues (in polling mode)
> of all underlying target devices.
>
> This design is simple and easy to implement. But I'm not sure if the
> performance can be an issue if one dm device mapped to too many target
> devices or the dm stack depth grows.
You showed 40% performance improvement but it really just does the bare
minimum of implementing polling. It is so simplistic that I really
don't think the design even serves as a starting point for what will be
needed. So this needs further design time.
What you've _done_ could serve as a stop-gap but I'd really rather we
get it properly designed from the start.
> REQ_NOWAIT
> ----------
> The polling bio will set REQ_NOWAIT in bio->bi_flags, and the device
> need to be capable of handling REQ_NOWAIT. Commit 021a24460dc2
> ("block: add QUEUE_FLAG_NOWAIT") and commit 6abc49468eea ("dm: add
> support for REQ_NOWAIT and enable it for linear target") add this
> support for dm device, and currently only dm-linear supports that.
>
> hybrid polling
> --------------
> When execute hybrid polling, cookie is used to fetch the request,
> and check if the request has completed or not. In the design for
> dm device described above, the returned cookie is still the cookie
> of one of the split bios, and thus we have no way checking if all
> the split bios have completed or not. Thus in the current
> implementation, hybrid polling is not supported for dm device.
I'll look closer at all this. But DM targets do allow for additional
target specific per-bio-data to be added to the overall per-bio-data
size (via ti->per_io_data_size). DM core _could_ internalize adding
additional space to per-bio-data for storing a unique cookie per
split/clone bio.
Conversely, block-core's bio cloning could manage this so long as it
knew how to access the offset into the established per-bio-data. But
that might be more challenging to do without impacting use-cases outside
of DM.
Thanks,
Mike
> [Performance]
> I test 4k-randread on a dm-linear mapped to only one nvme device.
>
> SSD: INTEL SSDPEDME800G4
> dm-linear:dmsetup create testdev --table "0 209715200 linear /dev/nvme0n1 0"
>
> fio configs:
> ```
> ioengine=io_uring
> iodepth=128
> numjobs=1
> thread
> rw=randread
> direct=1
> registerfiles=1
> #hipri=1 with iopoll enabled, hipri=0 with iopoll disabled
> bs=4k
> size=100G
> runtime=60
> time_based
> group_reporting
> randrepeat=0
>
> [device]
> filename=/dev/mapper/testdev
> ```
>
> The test result shows that there's ~40% performance growth when iopoll
> enabled.
>
> test | iops | bw | avg lat
> ---- | ---- | ---- | ----
> iopoll-disabled | 244k | 953MiB/s | 524us
> iopoll-enabled | 345k | 1349MiB/s | 370us
>
> [TODO]
> The store method of "io_poll"/"io_poll_delay" has not been adapted
> for dm device.
>
>
> Jeffle Xu (3):
> block/mq: add iterator for polling hw queues
> block: add back ->poll_fn in request queue
> dm: add support for IO polling
>
> block/blk-mq.c | 30 ++++++++++++++++++++++++------
> drivers/md/dm-core.h | 1 +
> drivers/md/dm-table.c | 30 ++++++++++++++++++++++++++++++
> drivers/md/dm.c | 40 ++++++++++++++++++++++++++++++++++++++++
> include/linux/blk-mq.h | 6 ++++++
> include/linux/blkdev.h | 3 +++
> 6 files changed, 104 insertions(+), 6 deletions(-)
>
> --
> 2.27.0
>
next prev parent reply other threads:[~2020-10-21 20:40 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-20 6:54 [dm-devel] [RFC 0/3] Add support of iopoll for dm device Jeffle Xu
2020-10-20 6:54 ` Jeffle Xu
2020-10-20 6:54 ` [dm-devel] [RFC 1/3] block/mq: add iterator for polling hw queues Jeffle Xu
2020-10-20 6:54 ` Jeffle Xu
2020-10-20 6:54 ` [dm-devel] [RFC 2/3] block: add back ->poll_fn in request queue Jeffle Xu
2020-10-20 6:54 ` Jeffle Xu
2020-10-20 6:54 ` [dm-devel] [RFC 3/3] dm: add support for IO polling Jeffle Xu
2020-10-20 6:54 ` Jeffle Xu
2020-10-21 20:49 ` [dm-devel] " Mike Snitzer
2020-10-21 20:49 ` Mike Snitzer
2020-10-22 3:14 ` [dm-devel] " Ming Lei
2020-10-22 3:14 ` Ming Lei
2020-10-21 20:39 ` Mike Snitzer [this message]
2020-10-21 20:39 ` [RFC 0/3] Add support of iopoll for dm device Mike Snitzer
2020-10-22 5:28 ` [dm-devel] " JeffleXu
2020-10-22 5:28 ` JeffleXu
2020-10-26 18:53 ` [dm-devel] " Mike Snitzer
2020-10-26 18:53 ` Mike Snitzer
2020-11-02 3:14 ` [dm-devel] " JeffleXu
2020-11-02 3:14 ` JeffleXu
2020-11-02 15:28 ` [dm-devel] " Mike Snitzer
2020-11-02 15:28 ` Mike Snitzer
2020-11-03 8:59 ` [dm-devel] " JeffleXu
2020-11-03 8:59 ` JeffleXu
2020-11-04 6:47 ` [dm-devel] " JeffleXu
2020-11-04 6:47 ` JeffleXu
2020-11-04 15:08 ` [dm-devel] " Mike Snitzer
2020-11-04 15:08 ` Mike Snitzer
2020-11-06 2:51 ` [dm-devel] " JeffleXu
2020-11-06 2:51 ` JeffleXu
2020-11-06 2:57 ` JeffleXu
2020-11-06 17:45 ` Mike Snitzer
2020-11-06 17:45 ` Mike Snitzer
2020-11-08 1:09 ` [dm-devel] " JeffleXu
2020-11-08 1:09 ` JeffleXu
2020-11-09 18:15 ` Mike Snitzer
2020-11-09 18:15 ` Mike Snitzer
2020-11-10 1:43 ` [dm-devel] " JeffleXu
2020-11-10 1:43 ` JeffleXu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201021203906.GA10896@redhat.com \
--to=snitzer@redhat.com \
--cc=axboe@kernel.dk \
--cc=dm-devel@redhat.com \
--cc=haoxu@linux.alibaba.com \
--cc=jefflexu@linux.alibaba.com \
--cc=joseph.qi@linux.alibaba.com \
--cc=linux-block@vger.kernel.org \
--cc=xiaoguang.wang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.