From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH 4/8] ublk: add segment parameter
Date: Thu, 27 Mar 2025 09:49:39 +0800 [thread overview]
Message-ID: <Z-SusxWoRQ4W2c99@fedora> (raw)
In-Reply-To: <CADUfDZrHUbpRPkZ2UafmET7dd_0aex8o01fB4tiLkzDz5p0phw@mail.gmail.com>
On Wed, Mar 26, 2025 at 09:43:13AM -0700, Caleb Sander Mateos wrote:
> On Tue, Mar 25, 2025 at 7:17 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Tue, Mar 25, 2025 at 12:43:26PM -0700, Caleb Sander Mateos wrote:
> > > On Mon, Mar 24, 2025 at 6:16 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Mon, Mar 24, 2025 at 03:26:06PM -0700, Caleb Sander Mateos wrote:
> > > > > On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > > >
> > > > > > IO split is usually bad in io_uring world, since -EAGAIN is caused and
> > > > > > IO handling may have to fallback to io-wq, this way does hurt performance.
> > > > > >
> > > > > > ublk starts to support zero copy recently, for avoiding unnecessary IO
> > > > > > split, ublk driver's segment limit should be aligned with backend
> > > > > > device's segment limit.
> > > > > >
> > > > > > Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> > > > > > which number is aligned with ublk request segment number, so that big
> > > > > > memory allocation can be avoided by setting reasonable max_segments limit.
> > > > > >
> > > > > > So add segment parameter for providing ublk server chance to align
> > > > > > segment limit with backend, and keep it reasonable from implementation
> > > > > > viewpoint.
> > > > > >
> > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > > ---
> > > > > > drivers/block/ublk_drv.c | 15 ++++++++++++++-
> > > > > > include/uapi/linux/ublk_cmd.h | 9 +++++++++
> > > > > > 2 files changed, 23 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > > index acb6aed7be75..53a463681a41 100644
> > > > > > --- a/drivers/block/ublk_drv.c
> > > > > > +++ b/drivers/block/ublk_drv.c
> > > > > > @@ -74,7 +74,7 @@
> > > > > > #define UBLK_PARAM_TYPE_ALL \
> > > > > > (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> > > > > > UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED | \
> > > > > > - UBLK_PARAM_TYPE_DMA_ALIGN)
> > > > > > + UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
> > > > > >
> > > > > > struct ublk_rq_data {
> > > > > > struct kref ref;
> > > > > > @@ -580,6 +580,13 @@ static int ublk_validate_params(const struct ublk_device *ub)
> > > > > > return -EINVAL;
> > > > > > }
> > > > > >
> > > > > > + if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> > > > > > + const struct ublk_param_segment *p = &ub->params.seg;
> > > > > > +
> > > > > > + if (!is_power_of_2(p->seg_boundary_mask + 1))
> > > > > > + return -EINVAL;
> > > > >
> > > > > Looking at blk_validate_limits(), it seems like there are some
> > > > > additional requirements? Looks like seg_boundary_mask has to be at
> > > > > least PAGE_SIZE - 1
> > > >
> > > > Yeah, it isn't done in ublk because block layer runs the check, and it
> > > > will be failed when starting the device. That said we take block layer's
> > > > default setting, which isn't good from UAPI viewpoint, since block
> > > > layer may change the default setting.
> > >
> > > Even though blk_validate_limits() rejects it, it appears to log a
> > > warning. That seems undesirable for something controllable from
> > > userspace.
> > > /*
> > > * By default there is no limit on the segment boundary alignment,
> > > * but if there is one it can't be smaller than the page size as
> > > * that would break all the normal I/O patterns.
> > > */
> > > if (!lim->seg_boundary_mask)
> > > lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
> > > if (WARN_ON_ONCE(lim->seg_boundary_mask < BLK_MIN_SEGMENT_SIZE - 1))
> > > return -EINVAL;
> >
> > Yes, it has been addressed in my local version, and we need to make it
> > a hw/sw interface.
> >
> > >
> > > >
> > > > Also it is bad to associate device property with PAGE_SIZE which is
> > > > a variable actually. The latest kernel has replaced PAGE_SIZE with 4096
> > > > for segment limits.
> > > >
> > > > I think we can take 4096 for validation here.
> > > >
> > > > > and max_segment_size has to be at least PAGE_SIZE
> > > > > if virt_boundary_mask is set?
> > > >
> > > > If virt_boundary_mask is set, max_segment_size will be ignored usually
> > > > except for some stacking devices.
> > >
> > > Sorry, I had it backwards. The requirement is if virt_boundary_mask is
> > > *not* set:
> > > /*
> > > * Stacking device may have both virtual boundary and max segment
> > > * size limit, so allow this setting now, and long-term the two
> > > * might need to move out of stacking limits since we have immutable
> > > * bvec and lower layer bio splitting is supposed to handle the two
> > > * correctly.
> > > */
> > > if (lim->virt_boundary_mask) {
> > > if (!lim->max_segment_size)
> > > lim->max_segment_size = UINT_MAX;
> > > } else {
> > > /*
> > > * The maximum segment size has an odd historic 64k default that
> > > * drivers probably should override. Just like the I/O size we
> > > * require drivers to at least handle a full page per segment.
> > > */
> > > if (!lim->max_segment_size)
> > > lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
> > > if (WARN_ON_ONCE(lim->max_segment_size < BLK_MIN_SEGMENT_SIZE))
> > > return -EINVAL;
> > > }
> >
> > Right.
> >
> > Please feel free to see if the revised patch is good:
> >
> >
> > From 0718b9f130b3bc9b9b06907c687fb5b9eea172f7 Mon Sep 17 00:00:00 2001
> > From: Ming Lei <ming.lei@redhat.com>
> > Date: Mon, 24 Mar 2025 12:33:59 +0000
> > Subject: [PATCH V2 3/8] ublk: add segment parameter
> >
> > IO split is usually bad in io_uring world, since -EAGAIN is caused and
> > IO handling may have to fallback to io-wq, this way does hurt performance.
> >
> > ublk starts to support zero copy recently, for avoiding unnecessary IO
> > split, ublk driver's segment limit should be aligned with backend
> > device's segment limit.
> >
> > Another reason is that io_buffer_register_bvec() needs to allocate bvecs,
> > which number is aligned with ublk request segment number, so that big
> > memory allocation can be avoided by setting reasonable max_segments limit.
> >
> > So add segment parameter for providing ublk server chance to align
> > segment limit with backend, and keep it reasonable from implementation
> > viewpoint.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > drivers/block/ublk_drv.c | 20 +++++++++++++++++++-
> > include/uapi/linux/ublk_cmd.h | 21 +++++++++++++++++++++
> > 2 files changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 6fa1384c6436..6367476cef2b 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -74,7 +74,7 @@
> > #define UBLK_PARAM_TYPE_ALL \
> > (UBLK_PARAM_TYPE_BASIC | UBLK_PARAM_TYPE_DISCARD | \
> > UBLK_PARAM_TYPE_DEVT | UBLK_PARAM_TYPE_ZONED | \
> > - UBLK_PARAM_TYPE_DMA_ALIGN)
> > + UBLK_PARAM_TYPE_DMA_ALIGN | UBLK_PARAM_TYPE_SEGMENT)
> >
> > struct ublk_rq_data {
> > struct kref ref;
> > @@ -580,6 +580,18 @@ static int ublk_validate_params(const struct ublk_device *ub)
> > return -EINVAL;
> > }
> >
> > + if (ub->params.types & UBLK_PARAM_TYPE_SEGMENT) {
> > + const struct ublk_param_segment *p = &ub->params.seg;
> > +
> > + if (!is_power_of_2(p->seg_boundary_mask + 1))
> > + return -EINVAL;
> > +
> > + if (p->seg_boundary_mask + 1 < UBLK_MIN_SEGMENT_SIZE)
> > + return -EINVAL;
> > + if (p->max_segment_size < UBLK_MIN_SEGMENT_SIZE)
> > + return -EINVAL;
>
> These checks look good, except they don't allow omitting
> seg_boundary_mask or max_segment_size. I can imagine a ublk server
> might want to set only some of the segment limits and leave the others
> as 0?
Any unset field in `ublk_param_segment` should be undefined behavior(or either
rejected or one default value is taken), we can document it.
Thanks for raising the issue.
`seg_boundary_mask` is very similar with `max_segment_size`, so if either
one of the two are set, the other one should get one default value if it is unset.
If either one of the above two are set, 'max_segments' can't be zero.
Thanks,
Ming
next prev parent reply other threads:[~2025-03-27 1:49 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
2025-03-24 13:48 ` [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
2025-03-24 14:32 ` Caleb Sander Mateos
2025-03-24 13:48 ` [PATCH 2/8] ublk: add helper of ublk_need_map_io() Ming Lei
2025-03-24 15:01 ` Caleb Sander Mateos
2025-03-24 13:48 ` [PATCH 3/8] ublk: truncate io command result Ming Lei
2025-03-24 15:51 ` Caleb Sander Mateos
2025-03-25 0:50 ` Ming Lei
2025-03-24 13:48 ` [PATCH 4/8] ublk: add segment parameter Ming Lei
2025-03-24 22:26 ` Caleb Sander Mateos
2025-03-25 1:15 ` Ming Lei
2025-03-25 19:43 ` Caleb Sander Mateos
2025-03-26 2:17 ` Ming Lei
2025-03-26 16:43 ` Caleb Sander Mateos
2025-03-27 1:49 ` Ming Lei [this message]
2025-03-24 13:49 ` [PATCH 5/8] ublk: document zero copy feature Ming Lei
2025-03-25 15:26 ` Caleb Sander Mateos
2025-03-26 2:29 ` Ming Lei
2025-03-26 16:48 ` Caleb Sander Mateos
2025-03-24 13:49 ` [PATCH 6/8] ublk: implement ->queue_rqs() Ming Lei
2025-03-24 17:07 ` Uday Shankar
2025-03-25 1:02 ` Ming Lei
2025-03-26 21:30 ` Caleb Sander Mateos
2025-03-27 9:15 ` Ming Lei
2025-03-24 13:49 ` [PATCH 7/8] selftests: ublk: add more tests for covering MQ Ming Lei
2025-03-24 13:49 ` [PATCH 8/8] selftests: ublk: add test for checking zero copy related parameter Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z-SusxWoRQ4W2c99@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=ushankar@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.