public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
	Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH 5/8] ublk: document zero copy feature
Date: Wed, 26 Mar 2025 10:29:53 +0800	[thread overview]
Message-ID: <Z-NmobqY9C4o8ESL@fedora> (raw)
In-Reply-To: <CADUfDZquEOA7ckJVkBbcso2Paw9viSa9-5eQptgRgQRoxgvVqA@mail.gmail.com>

On Tue, Mar 25, 2025 at 08:26:18AM -0700, Caleb Sander Mateos wrote:
> On Mon, Mar 24, 2025 at 6:49 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Add words to explain how zero copy feature works, and why it has to be
> > trusted for handling IO read command.
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  Documentation/block/ublk.rst | 28 ++++++++++++++++++++--------
> >  1 file changed, 20 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst
> > index 1e0e7358e14a..33efff25b54d 100644
> > --- a/Documentation/block/ublk.rst
> > +++ b/Documentation/block/ublk.rst
> > @@ -309,18 +309,30 @@ with specified IO tag in the command data:
> >    ``UBLK_IO_COMMIT_AND_FETCH_REQ`` to the server, ublkdrv needs to copy
> >    the server buffer (pages) read to the IO request pages.
> >
> > -Future development
> > -==================
> > -
> >  Zero copy
> >  ---------
> >
> > -Zero copy is a generic requirement for nbd, fuse or similar drivers. A
> > -problem [#xiaoguang]_ Xiaoguang mentioned is that pages mapped to userspace
> > -can't be remapped any more in kernel with existing mm interfaces. This can
> > -occurs when destining direct IO to ``/dev/ublkb*``. Also, he reported that
> > -big requests (IO size >= 256 KB) may benefit a lot from zero copy.
> > +ublk zero copy relies on io_uring's fixed kernel buffer, which provides
> > +two APIs: `io_buffer_register_bvec()` and `io_buffer_unregister_bvec`.
> 
> nit: missing parentheses after io_buffer_unregister_bvec

OK

> 
> > +
> > +ublk adds IO command of `UBLK_IO_REGISTER_IO_BUF` to call
> > +`io_buffer_register_bvec()` for ublk server to register client request
> > +buffer into io_uring buffer table, then ublk server can submit io_uring
> > +IOs with the registered buffer index. IO command of `UBLK_IO_UNREGISTER_IO_BUF`
> > +calls `io_buffer_unregister_bvec` to unregister the buffer.
> 
> Parentheses missing here too.
> It might be worth adding a note that the registered buffer and any I/O
> that uses it will hold a reference on the ublk request. For a ublk
> server implementer, I think it's useful to know that the buffer needs
> to be unregistered in order to complete the ublk request, and that the
> zero-copy I/O won't corrupt any data even if it completes after the
> buffer is unregistered.

Good point, how about the following words?

```
The ublk client IO request buffer is guaranteed to be live between calling
`io_buffer_register_bvec()` and `io_buffer_unregister_bvec()`.

And any io_uring operation which supports this kind of kernel buffer will
grab one reference of the buffer until the operation is completed.
```

> 
> > +
> > +ublk server implementing zero copy has to be CAP_SYS_ADMIN and be trusted,
> > +because it is ublk server's responsibility to make sure IO buffer filled
> > +with data, and ublk server has to handle short read correctly by returning
> > +correct bytes filled to io buffer. Otherwise, uninitialized kernel buffer
> > +will be exposed to client application.
> 
> This isn't specific to zero-copy, right? ublk devices configured with
> UBLK_F_USER_COPY also need to initialize the full request buffer. I

Right, but it is important for zero copy.

> would also drop the "handle short read" part; ublk servers don't
> necessarily issue read operations in the backend, so "short read" may
> not be applicable.

Here 'read' in 'short read' means ublk front READ command, not actual read
done in ublk server. Maybe we can make it more accurate:

```
..., and ublk server has to return correct result to ublk driver when handling
READ command, and the result has to match with how many bytes filled to the IO
buffer. Otherwise, uninitialized kernel IO buffer will be exposed to client
application.
```

Thanks,
Ming


  reply	other threads:[~2025-03-26  2:30 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-24 13:48 [PATCH 0/8] ublk: cleanup & improvement & zc follow-up Ming Lei
2025-03-24 13:48 ` [PATCH 1/8] ublk: remove two unused fields from 'struct ublk_queue' Ming Lei
2025-03-24 14:32   ` Caleb Sander Mateos
2025-03-24 13:48 ` [PATCH 2/8] ublk: add helper of ublk_need_map_io() Ming Lei
2025-03-24 15:01   ` Caleb Sander Mateos
2025-03-24 13:48 ` [PATCH 3/8] ublk: truncate io command result Ming Lei
2025-03-24 15:51   ` Caleb Sander Mateos
2025-03-25  0:50     ` Ming Lei
2025-03-24 13:48 ` [PATCH 4/8] ublk: add segment parameter Ming Lei
2025-03-24 22:26   ` Caleb Sander Mateos
2025-03-25  1:15     ` Ming Lei
2025-03-25 19:43       ` Caleb Sander Mateos
2025-03-26  2:17         ` Ming Lei
2025-03-26 16:43           ` Caleb Sander Mateos
2025-03-27  1:49             ` Ming Lei
2025-03-24 13:49 ` [PATCH 5/8] ublk: document zero copy feature Ming Lei
2025-03-25 15:26   ` Caleb Sander Mateos
2025-03-26  2:29     ` Ming Lei [this message]
2025-03-26 16:48       ` Caleb Sander Mateos
2025-03-24 13:49 ` [PATCH 6/8] ublk: implement ->queue_rqs() Ming Lei
2025-03-24 17:07   ` Uday Shankar
2025-03-25  1:02     ` Ming Lei
2025-03-26 21:30   ` Caleb Sander Mateos
2025-03-27  9:15     ` Ming Lei
2025-03-24 13:49 ` [PATCH 7/8] selftests: ublk: add more tests for covering MQ Ming Lei
2025-03-24 13:49 ` [PATCH 8/8] selftests: ublk: add test for checking zero copy related parameter Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-NmobqY9C4o8ESL@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox