All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org, Pavel Begunkov <asml.silence@gmail.com>,
	linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>,
	Keith Busch <kbusch@kernel.org>
Subject: Re: [RFC PATCH 3/7] io_uring: support to register bvec buffer to specified io_uring
Date: Fri, 2 May 2025 23:59:33 +0800	[thread overview]
Message-ID: <aBTr5fz5KOgd9RiD@fedora> (raw)
In-Reply-To: <CADUfDZoROJeDKNWOzbgEqrs_B7kU2qNWwZxfnS2TDqYxiXrY0w@mail.gmail.com>

On Thu, May 01, 2025 at 06:31:03PM -0700, Caleb Sander Mateos wrote:
> On Wed, Apr 30, 2025 at 8:34 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Mon, Apr 28, 2025 at 05:43:12PM -0700, Caleb Sander Mateos wrote:
> > > On Mon, Apr 28, 2025 at 2:44 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > Extend io_buffer_register_bvec() and io_buffer_unregister_bvec() for
> > > > supporting to register/unregister bvec buffer to specified io_uring,
> > > > which FD is usually passed from userspace.
> > > >
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >  include/linux/io_uring/cmd.h |  4 ++
> > > >  io_uring/rsrc.c              | 83 +++++++++++++++++++++++++++---------
> > > >  2 files changed, 67 insertions(+), 20 deletions(-)
> > > >
> > > > diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
> > > > index 78fa336a284b..7516fe5cd606 100644
> > > > --- a/include/linux/io_uring/cmd.h
> > > > +++ b/include/linux/io_uring/cmd.h
> > > > @@ -25,6 +25,10 @@ struct io_uring_cmd_data {
> > > >
> > > >  struct io_buf_data {
> > > >         unsigned short index;
> > > > +       bool has_fd;
> > > > +       bool registered_fd;
> > > > +
> > > > +       int ring_fd;
> > > >         struct request *rq;
> > > >         void (*release)(void *);
> > > >  };
> > > > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> > > > index 5f8ab130a573..701dd33fecf7 100644
> > > > --- a/io_uring/rsrc.c
> > > > +++ b/io_uring/rsrc.c
> > > > @@ -969,21 +969,6 @@ static int __io_buffer_register_bvec(struct io_ring_ctx *ctx,
> > > >         return 0;
> > > >  }
> > > >
> > > > -int io_buffer_register_bvec(struct io_uring_cmd *cmd,
> > > > -                           struct io_buf_data *buf,
> > > > -                           unsigned int issue_flags)
> > > > -{
> > > > -       struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> > > > -       int ret;
> > > > -
> > > > -       io_ring_submit_lock(ctx, issue_flags);
> > > > -       ret = __io_buffer_register_bvec(ctx, buf);
> > > > -       io_ring_submit_unlock(ctx, issue_flags);
> > > > -
> > > > -       return ret;
> > > > -}
> > > > -EXPORT_SYMBOL_GPL(io_buffer_register_bvec);
> > > > -
> > > >  static int __io_buffer_unregister_bvec(struct io_ring_ctx *ctx,
> > > >                                        struct io_buf_data *buf)
> > > >  {
> > > > @@ -1006,19 +991,77 @@ static int __io_buffer_unregister_bvec(struct io_ring_ctx *ctx,
> > > >         return 0;
> > > >  }
> > > >
> > > > -int io_buffer_unregister_bvec(struct io_uring_cmd *cmd,
> > > > -                             struct io_buf_data *buf,
> > > > -                             unsigned int issue_flags)
> > > > +static inline int do_reg_unreg_bvec(struct io_ring_ctx *ctx,
> > > > +                                   struct io_buf_data *buf,
> > > > +                                   unsigned int issue_flags,
> > > > +                                   bool reg)
> > > >  {
> > > > -       struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> > > >         int ret;
> > > >
> > > >         io_ring_submit_lock(ctx, issue_flags);
> > > > -       ret = __io_buffer_unregister_bvec(ctx, buf);
> > > > +       if (reg)
> > > > +               ret = __io_buffer_register_bvec(ctx, buf);
> > > > +       else
> > > > +               ret = __io_buffer_unregister_bvec(ctx, buf);
> > >
> > > It feels like unifying __io_buffer_register_bvec() and
> > > __io_buffer_unregister_bvec() would belong better in the prior patch
> > > that changes their signatures.
> >
> > Can you share how to do above in previous patch?
> 
> I was thinking you could define do_reg_unreg_bvec() in the previous
> patch. It's a logical step once you've extracted out all the
> differences between io_buffer_register_bvec() and
> io_buffer_unregister_bvec() into the helpers
> __io_buffer_register_bvec() and __io_buffer_unregister_bvec(). But
> either way is fine.

'has_fd' and 'ring_fd' fields isn't added yet, the defined do_reg_unreg_bvec()
could be quite simple, looks no big difference, I can do that...

> 
> >
> > >
> > > >         io_ring_submit_unlock(ctx, issue_flags);
> > > >
> > > >         return ret;
> > > >  }
> > > > +
> > > > +static int io_buffer_reg_unreg_bvec(struct io_ring_ctx *ctx,
> > > > +                                   struct io_buf_data *buf,
> > > > +                                   unsigned int issue_flags,
> > > > +                                   bool reg)
> > > > +{
> > > > +       struct io_ring_ctx *remote_ctx = ctx;
> > > > +       struct file *file = NULL;
> > > > +       int ret;
> > > > +
> > > > +       if (buf->has_fd) {
> > > > +               file = io_uring_register_get_file(buf->ring_fd, buf->registered_fd);
> > > > +               if (IS_ERR(file))
> > > > +                       return PTR_ERR(file);
> > >
> > > It would be good to avoid the overhead of this lookup and
> > > reference-counting in the I/O path. Would it be possible to move this
> > > lookup to when UBLK_IO_FETCH_REQ (and UBLK_IO_COMMIT_AND_FETCH_REQ, if
> > > it specifies a different ring_fd) is submitted? I guess that might
> > > require storing an extra io_ring_ctx pointer in struct ublk_io.
> >
> > Let's start from the flexible way & simple implementation.
> >
> > Any optimization & improvement can be done as follow-up.
> 
> Sure, we can start with this as-is. But I suspect the extra
> reference-counting here will significantly decrease the benefit of the
> auto-register register feature.

The reference-counting should only be needed for registering buffer to
external ring, which may have been slow because of the cross-ring thing...

Maybe we can start automatic buffer register for ubq_daemon context only,
meantime allow to register buffer from external io_uring by adding per-io
spin_lock, which may help the per-io task Uday is working on too.

And the interface still allow to support automatic buffer register to
external io_uring since `ublk_auto_buf_reg` includes 'flags' field, we can
enable it in future when efficient implementation is figured out.

What do you think of this approach?

> 
> >
> > Each command may register buffer to different io_uring context,
> > it can't be done in UBLK_IO_FETCH_REQ stage, because new IO with same
> > tag may register buffer to new io_uring context.
> 
> Right, if UBLK_IO_COMMIT_AND_FETCH_REQ specifies a different io_uring
> fd, then we'd have to look it up anew. But it seems likely that all
> UBLK_IO_COMMIT_AND_FETCH_REQs for a single tag will specify the same
> io_uring (certainly that's how our ublk server works). And in that
> case, the I/O could just reuse the io_uring context that was looked up
> for the prior UBLK_IO_(COMMIT_AND_)FETCH_REQ.

It is a special case, and one follow-up feature flag can help to
optimize this case only.


Thanks, 
Ming


  reply	other threads:[~2025-05-02 15:59 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-28  9:44 [RFC PATCH 0/7] ublk: support to register bvec buffer automatically Ming Lei
2025-04-28  9:44 ` [RFC PATCH 1/7] io_uring: add 'struct io_buf_data' for register/unregister bvec buffer Ming Lei
2025-04-29  0:35   ` Caleb Sander Mateos
2025-04-28  9:44 ` [RFC PATCH 2/7] io_uring: add helper __io_buffer_[un]register_bvec Ming Lei
2025-04-29  0:36   ` Caleb Sander Mateos
2025-04-28  9:44 ` [RFC PATCH 3/7] io_uring: support to register bvec buffer to specified io_uring Ming Lei
2025-04-28 10:28   ` Pavel Begunkov
2025-04-29  0:46     ` Caleb Sander Mateos
2025-04-29  0:47     ` Ming Lei
2025-04-30  8:25       ` Pavel Begunkov
2025-04-30 14:44         ` Ming Lei
2025-04-29  0:43   ` Caleb Sander Mateos
2025-04-30 15:34     ` Ming Lei
2025-05-02  1:31       ` Caleb Sander Mateos
2025-05-02 15:59         ` Ming Lei [this message]
2025-05-02 21:21           ` Caleb Sander Mateos
2025-05-03  1:00             ` Ming Lei
2025-05-03 18:55               ` Caleb Sander Mateos
2025-05-06  2:45                 ` Ming Lei
2025-04-28  9:44 ` [RFC PATCH 4/7] ublk: convert to refcount_t Ming Lei
2025-04-28 17:13   ` Caleb Sander Mateos
2025-04-28  9:44 ` [RFC PATCH 5/7] ublk: prepare for supporting to register request buffer automatically Ming Lei
2025-04-29  0:50   ` Caleb Sander Mateos
2025-04-28  9:44 ` [RFC PATCH 6/7] ublk: register buffer to specified io_uring & buf index via UBLK_F_AUTO_BUF_REG Ming Lei
2025-04-29  0:52   ` Caleb Sander Mateos
2025-04-30 15:45     ` Ming Lei
2025-04-30 16:30       ` Caleb Sander Mateos
2025-05-02 14:09         ` Ming Lei
2025-04-28  9:44 ` [RFC PATCH 7/7] selftests: ublk: support UBLK_F_AUTO_BUF_REG Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBTr5fz5KOgd9RiD@fedora \
    --to=ming.lei@redhat.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=io-uring@vger.kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.