From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
io-uring@vger.kernel.org, Pavel Begunkov <asml.silence@gmail.com>,
linux-block@vger.kernel.org,
Uday Shankar <ushankar@purestorage.com>,
Keith Busch <kbusch@kernel.org>
Subject: Re: [RFC PATCH 3/7] io_uring: support to register bvec buffer to specified io_uring
Date: Tue, 6 May 2025 10:45:49 +0800 [thread overview]
Message-ID: <aBl33aVsZ-s2-Kpx@fedora> (raw)
In-Reply-To: <CADUfDZoypP63aBjwUB50hZTiZ_ouN1Bt73-hHBY75xsNq9OGZQ@mail.gmail.com>
On Sat, May 03, 2025 at 11:55:05AM -0700, Caleb Sander Mateos wrote:
> On Fri, May 2, 2025 at 6:00 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Fri, May 02, 2025 at 02:21:05PM -0700, Caleb Sander Mateos wrote:
> > > On Fri, May 2, 2025 at 8:59 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Thu, May 01, 2025 at 06:31:03PM -0700, Caleb Sander Mateos wrote:
> > > > > On Wed, Apr 30, 2025 at 8:34 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Apr 28, 2025 at 05:43:12PM -0700, Caleb Sander Mateos wrote:
> > > > > > > On Mon, Apr 28, 2025 at 2:44 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > > > > >
> > > > > > > > Extend io_buffer_register_bvec() and io_buffer_unregister_bvec() for
> > > > > > > > supporting to register/unregister bvec buffer to specified io_uring,
> > > > > > > > which FD is usually passed from userspace.
> > > > > > > >
> > > > > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > > > > ---
> > > > > > > > include/linux/io_uring/cmd.h | 4 ++
> > > > > > > > io_uring/rsrc.c | 83 +++++++++++++++++++++++++++---------
> > > > > > > > 2 files changed, 67 insertions(+), 20 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
> > > > > > > > index 78fa336a284b..7516fe5cd606 100644
> > > > > > > > --- a/include/linux/io_uring/cmd.h
> > > > > > > > +++ b/include/linux/io_uring/cmd.h
> > > > > > > > @@ -25,6 +25,10 @@ struct io_uring_cmd_data {
> > > > > > > >
> > > > > > > > struct io_buf_data {
> > > > > > > > unsigned short index;
> > > > > > > > + bool has_fd;
> > > > > > > > + bool registered_fd;
> > > > > > > > +
> > > > > > > > + int ring_fd;
> > > > > > > > struct request *rq;
> > > > > > > > void (*release)(void *);
> > > > > > > > };
> > > > > > > > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> > > > > > > > index 5f8ab130a573..701dd33fecf7 100644
> > > > > > > > --- a/io_uring/rsrc.c
> > > > > > > > +++ b/io_uring/rsrc.c
> > > > > > > > @@ -969,21 +969,6 @@ static int __io_buffer_register_bvec(struct io_ring_ctx *ctx,
> > > > > > > > return 0;
> > > > > > > > }
> > > > > > > >
> > > > > > > > -int io_buffer_register_bvec(struct io_uring_cmd *cmd,
> > > > > > > > - struct io_buf_data *buf,
> > > > > > > > - unsigned int issue_flags)
> > > > > > > > -{
> > > > > > > > - struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> > > > > > > > - int ret;
> > > > > > > > -
> > > > > > > > - io_ring_submit_lock(ctx, issue_flags);
> > > > > > > > - ret = __io_buffer_register_bvec(ctx, buf);
> > > > > > > > - io_ring_submit_unlock(ctx, issue_flags);
> > > > > > > > -
> > > > > > > > - return ret;
> > > > > > > > -}
> > > > > > > > -EXPORT_SYMBOL_GPL(io_buffer_register_bvec);
> > > > > > > > -
> > > > > > > > static int __io_buffer_unregister_bvec(struct io_ring_ctx *ctx,
> > > > > > > > struct io_buf_data *buf)
> > > > > > > > {
> > > > > > > > @@ -1006,19 +991,77 @@ static int __io_buffer_unregister_bvec(struct io_ring_ctx *ctx,
> > > > > > > > return 0;
> > > > > > > > }
> > > > > > > >
> > > > > > > > -int io_buffer_unregister_bvec(struct io_uring_cmd *cmd,
> > > > > > > > - struct io_buf_data *buf,
> > > > > > > > - unsigned int issue_flags)
> > > > > > > > +static inline int do_reg_unreg_bvec(struct io_ring_ctx *ctx,
> > > > > > > > + struct io_buf_data *buf,
> > > > > > > > + unsigned int issue_flags,
> > > > > > > > + bool reg)
> > > > > > > > {
> > > > > > > > - struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> > > > > > > > int ret;
> > > > > > > >
> > > > > > > > io_ring_submit_lock(ctx, issue_flags);
> > > > > > > > - ret = __io_buffer_unregister_bvec(ctx, buf);
> > > > > > > > + if (reg)
> > > > > > > > + ret = __io_buffer_register_bvec(ctx, buf);
> > > > > > > > + else
> > > > > > > > + ret = __io_buffer_unregister_bvec(ctx, buf);
> > > > > > >
> > > > > > > It feels like unifying __io_buffer_register_bvec() and
> > > > > > > __io_buffer_unregister_bvec() would belong better in the prior patch
> > > > > > > that changes their signatures.
> > > > > >
> > > > > > Can you share how to do above in previous patch?
> > > > >
> > > > > I was thinking you could define do_reg_unreg_bvec() in the previous
> > > > > patch. It's a logical step once you've extracted out all the
> > > > > differences between io_buffer_register_bvec() and
> > > > > io_buffer_unregister_bvec() into the helpers
> > > > > __io_buffer_register_bvec() and __io_buffer_unregister_bvec(). But
> > > > > either way is fine.
> > > >
> > > > 'has_fd' and 'ring_fd' fields isn't added yet, the defined do_reg_unreg_bvec()
> > > > could be quite simple, looks no big difference, I can do that...
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > io_ring_submit_unlock(ctx, issue_flags);
> > > > > > > >
> > > > > > > > return ret;
> > > > > > > > }
> > > > > > > > +
> > > > > > > > +static int io_buffer_reg_unreg_bvec(struct io_ring_ctx *ctx,
> > > > > > > > + struct io_buf_data *buf,
> > > > > > > > + unsigned int issue_flags,
> > > > > > > > + bool reg)
> > > > > > > > +{
> > > > > > > > + struct io_ring_ctx *remote_ctx = ctx;
> > > > > > > > + struct file *file = NULL;
> > > > > > > > + int ret;
> > > > > > > > +
> > > > > > > > + if (buf->has_fd) {
> > > > > > > > + file = io_uring_register_get_file(buf->ring_fd, buf->registered_fd);
> > > > > > > > + if (IS_ERR(file))
> > > > > > > > + return PTR_ERR(file);
> > > > > > >
> > > > > > > It would be good to avoid the overhead of this lookup and
> > > > > > > reference-counting in the I/O path. Would it be possible to move this
> > > > > > > lookup to when UBLK_IO_FETCH_REQ (and UBLK_IO_COMMIT_AND_FETCH_REQ, if
> > > > > > > it specifies a different ring_fd) is submitted? I guess that might
> > > > > > > require storing an extra io_ring_ctx pointer in struct ublk_io.
> > > > > >
> > > > > > Let's start from the flexible way & simple implementation.
> > > > > >
> > > > > > Any optimization & improvement can be done as follow-up.
> > > > >
> > > > > Sure, we can start with this as-is. But I suspect the extra
> > > > > reference-counting here will significantly decrease the benefit of the
> > > > > auto-register register feature.
> > > >
> > > > The reference-counting should only be needed for registering buffer to
> > > > external ring, which may have been slow because of the cross-ring thing...
> > >
> > > The current code is incrementing and decrementing the io_uring file
> > > reference count even if the remote_ctx == ctx, right? I agree it
> >
> > Yes, but it can be changed to drop the inc/dec file reference easily since we
> > have a flag field.
> >
> > > should definitely be possible to skip the reference count in that
> > > case, as this code is already running in task work context for a
> > > command on the io_uring.
> >
> > The current 'uring_cmd' instance holds one reference of the
> > io_ring_ctx instance.
> >
> > > It should also be possible to avoid atomic
> > > reference-counting in the UBLK_AUTO_BUF_REGISTERED_RING case too.
> >
> > For registering buffer to external io_ring, it is hard to avoid to grag
> > the io_uring_ctx reference when specifying the io_uring_ctx via its FD.
>
> If the io_uring is specified by a file descriptor (not using
> UBLK_AUTO_BUF_REGISTERED_RING), I agree reference counting is
> necessary.
> But the whole point of registering ring fds is to avoid reference
> counting of the io_uring file. See how IORING_ENTER_REGISTERED_RING is
> handled in io_uring_enter(). It simply indexes
> current->io_uring->registered_rings to get the file, skipping the
> fget() and fput(). Since the auto register is running in task work
> context, it should also be able to access the task-local
> registered_rings without reference counting.
registered ring requires the io_uring is registered & used in the local
pthread, which usage is still very limited.
>
> >
> > >
> > > >
> > > > Maybe we can start automatic buffer register for ubq_daemon context only,
> > > > meantime allow to register buffer from external io_uring by adding per-io
> > > > spin_lock, which may help the per-io task Uday is working on too.
> > >
> > > I'm not sure I understand why a spinlock would be required? In Uday's
> > > patch set, each ublk_io still belongs to a single task. So no
> > > additional locking should be required.
> >
> > I think it is very useful to allow to register io buffer in the
> > other(non-ubq_daemon) io_uring context by the offload style.
> >
> > Especially the register/unregister io buffer uring_cmd is for handling
> > target IO, which should have been issued in same context of target io
> > handling.
> >
> > Without one per-io spinlock, it is hard to avoid one race you mentioned:
>
> I don't believe a spinlock is necessary. It should be possible to
> avoid accessing the ublk_io at all when registering the request
> buffer. __ublk_check_and_get_req() calls kref_get_unless_zero() on the
> request, which already ensures the request is owned by the ublk server
I thought the request still may be completed & recycled before calling
__ublk_check_and_get_req(). But it can be treated as one ublk server
logic bug since use-after-free doesn't exist actually.
> and prevents it from completing while its buffer is registered. This
> is analogous to how UBLK_F_USER_COPY works;
> ublk_ch_read_iter()/ublk_ch_write_iter() can be safely called from any
> thread.
OK, spinlock isn't needed.
Thanks,
Ming
next prev parent reply other threads:[~2025-05-06 2:46 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-28 9:44 [RFC PATCH 0/7] ublk: support to register bvec buffer automatically Ming Lei
2025-04-28 9:44 ` [RFC PATCH 1/7] io_uring: add 'struct io_buf_data' for register/unregister bvec buffer Ming Lei
2025-04-29 0:35 ` Caleb Sander Mateos
2025-04-28 9:44 ` [RFC PATCH 2/7] io_uring: add helper __io_buffer_[un]register_bvec Ming Lei
2025-04-29 0:36 ` Caleb Sander Mateos
2025-04-28 9:44 ` [RFC PATCH 3/7] io_uring: support to register bvec buffer to specified io_uring Ming Lei
2025-04-28 10:28 ` Pavel Begunkov
2025-04-29 0:46 ` Caleb Sander Mateos
2025-04-29 0:47 ` Ming Lei
2025-04-30 8:25 ` Pavel Begunkov
2025-04-30 14:44 ` Ming Lei
2025-04-29 0:43 ` Caleb Sander Mateos
2025-04-30 15:34 ` Ming Lei
2025-05-02 1:31 ` Caleb Sander Mateos
2025-05-02 15:59 ` Ming Lei
2025-05-02 21:21 ` Caleb Sander Mateos
2025-05-03 1:00 ` Ming Lei
2025-05-03 18:55 ` Caleb Sander Mateos
2025-05-06 2:45 ` Ming Lei [this message]
2025-04-28 9:44 ` [RFC PATCH 4/7] ublk: convert to refcount_t Ming Lei
2025-04-28 17:13 ` Caleb Sander Mateos
2025-04-28 9:44 ` [RFC PATCH 5/7] ublk: prepare for supporting to register request buffer automatically Ming Lei
2025-04-29 0:50 ` Caleb Sander Mateos
2025-04-28 9:44 ` [RFC PATCH 6/7] ublk: register buffer to specified io_uring & buf index via UBLK_F_AUTO_BUF_REG Ming Lei
2025-04-29 0:52 ` Caleb Sander Mateos
2025-04-30 15:45 ` Ming Lei
2025-04-30 16:30 ` Caleb Sander Mateos
2025-05-02 14:09 ` Ming Lei
2025-04-28 9:44 ` [RFC PATCH 7/7] selftests: ublk: support UBLK_F_AUTO_BUF_REG Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBl33aVsZ-s2-Kpx@fedora \
--to=ming.lei@redhat.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=io-uring@vger.kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=ushankar@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.