All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org, linux-block@vger.kernel.org,
	Kevin Wolf <kwolf@redhat.com>
Subject: Re: [PATCH V3 7/9] io_uring: support providing sqe group buffer
Date: Wed, 12 Jun 2024 08:22:07 +0800	[thread overview]
Message-ID: <ZmjqL+2JqBUSB5vZ@fedora> (raw)
In-Reply-To: <ae3941f8-36a4-42fc-aaf8-027fe2de2d4d@gmail.com>

On Mon, Jun 10, 2024 at 03:00:23AM +0100, Pavel Begunkov wrote:
> On 5/11/24 01:12, Ming Lei wrote:
> > SQE group with REQ_F_SQE_GROUP_DEP introduces one new mechanism to share
> > resource among one group of requests, and all member requests can consume
> > the resource provided by group lead efficiently in parallel.
> > 
> > This patch uses the added sqe group feature REQ_F_SQE_GROUP_DEP to share
> > kernel buffer in sqe group:
> > 
> > - the group lead provides kernel buffer to member requests
> > 
> > - member requests use the provided buffer to do FS or network IO, or more
> > operations in future
> > 
> > - this kernel buffer is returned back after member requests use it up
> > 
> > This way looks a bit similar with kernel's pipe/splice, but there are some
> > important differences:
> > 
> > - splice is for transferring data between two FDs via pipe, and fd_out can
> > only read data from pipe; this feature can borrow buffer from group lead to
> > members, so member request can write data to this buffer if the provided
> > buffer is allowed to write to.
> > 
> > - splice implements data transfer by moving pages between subsystem and
> > pipe, that means page ownership is transferred, and this way is one of the
> > most complicated thing of splice; this patch supports scenarios in which
> > the buffer can't be transferred, and buffer is only borrowed to member
> > requests, and is returned back after member requests consume the provided
> > buffer, so buffer lifetime is simplified a lot. Especially the buffer is
> > guaranteed to be returned back.
> > 
> > - splice can't run in async way basically
> > 
> > It can help to implement generic zero copy between device and related
> > operations, such as ublk, fuse, vdpa, even network receive or whatever.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   include/linux/io_uring_types.h | 33 +++++++++++++++++++
> >   io_uring/io_uring.c            | 10 +++++-
> >   io_uring/io_uring.h            |  5 +++
> >   io_uring/kbuf.c                | 60 ++++++++++++++++++++++++++++++++++
> >   io_uring/kbuf.h                | 13 ++++++++
> >   io_uring/net.c                 | 31 +++++++++++++++++-
> >   io_uring/opdef.c               |  5 +++
> >   io_uring/opdef.h               |  2 ++
> >   io_uring/rw.c                  | 20 +++++++++++-
> >   9 files changed, 176 insertions(+), 3 deletions(-)
> > 
> ...
> > diff --git a/io_uring/net.c b/io_uring/net.c
> > index 070dea9a4eda..83fd5879082e 100644
> > --- a/io_uring/net.c
> > +++ b/io_uring/net.c
> > @@ -79,6 +79,13 @@ struct io_sr_msg {
> ...
> >   retry_bundle:
> >   	if (io_do_buffer_select(req)) {
> >   		struct buf_sel_arg arg = {
> > @@ -1132,6 +1148,11 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
> >   		if (unlikely(ret))
> >   			goto out_free;
> >   		sr->buf = NULL;
> > +	} else if (req->flags & REQ_F_GROUP_KBUF) {
> > +		ret = io_import_group_kbuf(req, user_ptr_to_u64(sr->buf),
> > +				sr->len, ITER_DEST, &kmsg->msg.msg_iter);
> > +		if (unlikely(ret))
> > +			goto out_free;
> >   	}
> >   	kmsg->msg.msg_inq = -1;
> > @@ -1334,6 +1355,14 @@ static int io_send_zc_import(struct io_kiocb *req, struct io_async_msghdr *kmsg)
> >   		if (unlikely(ret))
> >   			return ret;
> >   		kmsg->msg.sg_from_iter = io_sg_from_iter;
> > +	} else if (req->flags & REQ_F_GROUP_KBUF) {
> > +		struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
> > +
> > +		ret = io_import_group_kbuf(req, user_ptr_to_u64(sr->buf),
> > +				sr->len, ITER_SOURCE, &kmsg->msg.msg_iter);
> > +		if (unlikely(ret))
> > +			return ret;
> > +		kmsg->msg.sg_from_iter = io_sg_from_iter;
> 
> Not looking here too deeply I'm pretty sure it's buggy.
> The buffer can only be reused once the notification
> CQE completes, and there is nothing in regards to it.

OK. It isn't triggered in ublk-nbd because the buffer is still valid
until the peer reply is received, when the notification is definitely
ready.

I will remove send zc support in the enablement series, and it can
be added in future without much difficulty.


Thanks,
Ming


  reply	other threads:[~2024-06-12  0:22 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-11  0:12 [PATCH V3 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-05-11  0:12 ` [PATCH V3 1/9] io_uring: add io_link_req() helper Ming Lei
2024-05-11  0:12 ` [PATCH V3 2/9] io_uring: add io_submit_fail_link() helper Ming Lei
2024-05-11  0:12 ` [PATCH V3 3/9] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-06-10  1:18   ` Pavel Begunkov
2024-06-11 13:21     ` Ming Lei
2024-05-11  0:12 ` [PATCH V3 4/9] io_uring: move marking REQ_F_CQE_SKIP out of io_free_req() Ming Lei
2024-06-10  1:23   ` Pavel Begunkov
2024-06-11 13:28     ` Ming Lei
2024-06-16 18:08       ` Pavel Begunkov
2024-05-11  0:12 ` [PATCH V3 5/9] io_uring: support SQE group Ming Lei
2024-05-21  2:58   ` Ming Lei
2024-06-10  1:55     ` Pavel Begunkov
2024-06-11 13:32       ` Ming Lei
2024-06-16 18:14         ` Pavel Begunkov
2024-06-17  1:42           ` Ming Lei
2024-06-10  2:53   ` Pavel Begunkov
2024-06-13  1:45     ` Ming Lei
2024-06-16 19:13       ` Pavel Begunkov
2024-06-17  3:54         ` Ming Lei
2024-05-11  0:12 ` [PATCH V3 6/9] io_uring: support sqe group with members depending on leader Ming Lei
2024-05-11  0:12 ` [PATCH V3 7/9] io_uring: support providing sqe group buffer Ming Lei
2024-06-10  2:00   ` Pavel Begunkov
2024-06-12  0:22     ` Ming Lei [this message]
2024-05-11  0:12 ` [PATCH V3 8/9] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-05-11  0:12 ` [PATCH V3 9/9] ublk: support provide io buffer Ming Lei
2024-06-03  0:05 ` [PATCH V3 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-06-07 12:32   ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmjqL+2JqBUSB5vZ@fedora \
    --to=ming.lei@redhat.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.