From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH] ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG)
Date: Thu, 12 Jun 2025 11:16:23 +0800 [thread overview]
Message-ID: <aEpGh41uV3AJF-dG@fedora> (raw)
In-Reply-To: <CADUfDZo-5ft7Krx==YLYbabPE+3Z1Yjrw2zcmn7VRqfx5XyFgg@mail.gmail.com>
On Wed, Jun 11, 2025 at 08:54:53AM -0700, Caleb Sander Mateos wrote:
> On Mon, Jun 9, 2025 at 7:07 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Mon, Jun 09, 2025 at 03:29:34PM -0700, Caleb Sander Mateos wrote:
> > > On Mon, Jun 9, 2025 at 5:14 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > Document recently merged feature auto buffer registration(UBLK_F_AUTO_BUF_REG).
> > > >
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > >
> > > Thanks, this is a nice explanation. Just a few suggestions.
> > >
> > > > ---
> > > > Documentation/block/ublk.rst | 67 ++++++++++++++++++++++++++++++++++++
> > > > 1 file changed, 67 insertions(+)
> > > >
> > > > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> > > > index c368e1081b41..16ffca54eed4 100644
> > > > --- a/Documentation/block/ublk.rst
> > > > +++ b/Documentation/block/ublk.rst
> > > > @@ -352,6 +352,73 @@ For reaching best IO performance, ublk server should align its segment
> > > > parameter of `struct ublk_param_segment` with backend for avoiding
> > > > unnecessary IO split, which usually hurts io_uring performance.
> > > >
> > > > +Auto Buffer Registration
> > > > +------------------------
> > > > +
> > > > +The ``UBLK_F_AUTO_BUF_REG`` feature automatically handles buffer registration
> > > > +and unregistration for I/O requests, which simplifies the buffer management
> > > > +process and reduces overhead in the ublk server implementation.
> > > > +
> > > > +This is another feature flag for using zero copy, and it is compatible with
> > > > +``UBLK_F_SUPPORT_ZERO_COPY``.
> > > > +
> > > > +Feature Overview
> > > > +~~~~~~~~~~~~~~~~
> > > > +
> > > > +This feature automatically registers request buffers to the io_uring context
> > > > +before delivering I/O commands to the ublk server and unregisters them when
> > > > +completing I/O commands. This eliminates the need for manual buffer
> > > > +registration/unregistration via ``UBLK_IO_REGISTER_IO_BUF`` and
> > > > +``UBLK_IO_UNREGISTER_IO_BUF`` commands, then IO handling in ublk server
> > > > +can avoid dependency on the two uring_cmd operations.
> > > > +
> > > > +This way not only simplifies ublk server implementation, but also makes
> > > > +concurrent IO handling becomes possible.
> > >
> > > I'm not sure what "concurrent IO handling" refers to. Any ublk server
> > > can handle incoming I/O requests concurrently, regardless of what
> > > features it has enabled. Do you mean it avoids the need for linked
> > > io_uring requests to properly order buffer registration and
> > > unregistration with the I/O operations using the registered buffer?
> >
> > Yes, if io_uring OPs depends on buffer registering & unregistering, these
> > OPs can't be issued concurrently any more, that is one io_uring constraint.
> >
> > I will add the above words.
> >
> > >
> > > > +
> > > > +Usage Requirements
> > > > +~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +1. The ublk server must create a sparse buffer table on the same ``io_ring_ctx``
> > > > + used for ``UBLK_IO_FETCH_REQ`` and ``UBLK_IO_COMMIT_AND_FETCH_REQ``.
> > > > +
> > > > +2. If uring_cmd is issued on a different ``io_ring_ctx``, manual buffer
> > > > + unregistration is required.
> > >
> > > nit: don't think this needs to be a separate point, could be combined with (1).
> >
> > OK.
> >
> > >
> > > > +
> > > > +3. Buffer registration data must be passed via uring_cmd's ``sqe->addr`` with the
> > > > + following structure::
> > >
> > > nit: extra ":"
> >
> > In reStructuredText (reST), the double colon :: serves as a literal block marker to
> > indicate preformatted text.
> >
> > >
> > > > +
> > > > + struct ublk_auto_buf_reg {
> > > > + __u16 index; /* Buffer index for registration */
> > > > + __u8 flags; /* Registration flags */
> > > > + __u8 reserved0; /* Reserved for future use */
> > > > + __u32 reserved1; /* Reserved for future use */
> > > > + };
> > >
> > > Suggest using ublk_auto_buf_reg_to_sqe_addr()? Otherwise, it seems
> > > ambiguous how this struct is "passed" in sqe->addr.
> >
> > OK
> >
> > >
> > > > +
> > > > +4. All reserved fields in ``ublk_auto_buf_reg`` must be zeroed.
> > > > +
> > > > +5. Optional flags can be passed via ``ublk_auto_buf_reg.flags``.
> > > > +
> > > > +Fallback Behavior
> > > > +~~~~~~~~~~~~~~~~~
> > > > +
> > > > +When ``UBLK_AUTO_BUF_REG_FALLBACK`` is enabled:
> > > > +
> > > > +1. If auto buffer registration fails:
> > >
> > > I would switch these. Both (1) and (2) refer to when auto buffer
> > > registration fails. So I would expect something like:
> > >
> > > If auto buffer registration fails:
> > >
> > > 1. When ``UBLK_AUTO_BUF_REG_FALLBACK`` is enabled:
> > > ...
> > > 2. If fallback is not enabled:
> > > ...
> > >
> > > > + - The uring_cmd is completed
> > >
> > > Maybe add "without registering the request buffer"?
> > >
> > > > + - ``UBLK_IO_F_NEED_REG_BUF`` is set in ``ublksrv_io_desc.op_flags``
> > > > + - The ublk server must manually register the buffer
> > >
> > > Only if it wants a registered buffer for the ublk request. Technically
> > > the ublk server could decide to fall back on user-copy, for example.
> >
> > Good catch!
> >
> > >
> > > > +
> > > > +2. If fallback is not enabled:
> > > > + - The ublk I/O request fails silently
> > >
> > > "silently" is a bit ambiguous. It's certainly not silent to the
> > > application submitting the ublk I/O. Maybe say that the ublk I/O
> > > request fails and no uring_cmd is completed to the ublk server?
> >
> > Yes, but the document focus on ublk side, and the client is generic
> > for every driver, so I guess it may be fine.
> >
> > >
> > > > +
> > > > +Limitations
> > > > +~~~~~~~~~~~
> > > > +
> > > > +- Requires same ``io_ring_ctx`` for all operations
> > >
> > > Another limitation that prevents us from adopting the auto buffer
> > > registration feature is the need to reserve a unique buffer table
> > > index for every ublk tag on the io_ring_ctx. Since the io_ring_ctx
> > > buffer table has a max size of 16K (could potentially be increased to
> > > 64K), this limit is easily reached when there are a large number of
> > > ublk devices or the ublk queue depth is large. I think we could remove
> > > this limitation in the future by adding support for allocating buffer
> > > indices on demand, analogous to IORING_FILE_INDEX_ALLOC.
> >
> > OK.
> >
> > But I guess it isn't big deal in reality since the task context should
> > be saturated easily with so big setting.
>
> I don't know about your "reality" but it's certainly a big deal for us :)
> To reduce contention on the blk-mq queues for the application
> submitting I/O to the ublk devices, we want a large number of queues
> for each ublk device. But we also want a large queue depth for each
> individual queue to avoid the async request allocation path in case
> any one application thread issues a lot of concurrent I/O to a single
> ublk device. And we have 128 ublk devices, which again all want large
> queue depths in case the application sends a lot of I/O to a single
> ublk device. The result is that concurrently each ublk server thread
> fetches 512K ublk I/Os, which is significantly above the io_ring_ctx
> buffer table limit.
Yes, you can setup 512K I/Os in single task/io_uring context, but how many
can be actively handled during unit time? The number could be much less than
512k or 16K, because it is a single pthread/io_uring/cpu core, which may be
saturated easily, so most of these IOs may wait somewhere for cpu or whatever
resource.
So when you have one nice per-task buf-index allocation algorithm, it may not
be one issue given 16K is big enough.
Thanks,
Ming
next prev parent reply other threads:[~2025-06-12 3:16 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-09 12:14 [PATCH] ublk: document auto buffer registration(UBLK_F_AUTO_BUF_REG) Ming Lei
2025-06-09 22:29 ` Caleb Sander Mateos
2025-06-10 2:06 ` Ming Lei
2025-06-11 15:54 ` Caleb Sander Mateos
2025-06-12 3:16 ` Ming Lei [this message]
2025-06-12 14:38 ` Caleb Sander Mateos
2025-06-13 1:18 ` Ming Lei
2025-06-13 1:36 ` Caleb Sander Mateos
2025-06-13 1:53 ` Ming Lei
2025-06-13 1:57 ` Caleb Sander Mateos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEpGh41uV3AJF-dG@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=linux-block@vger.kernel.org \
--cc=ushankar@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.