From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org,
Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH 2/2] ublk: run auto buf unregisgering in same io_ring_ctx with register
Date: Thu, 22 May 2025 10:37:43 +0800 [thread overview]
Message-ID: <aC6N9w4ijVEkHN0l@fedora> (raw)
In-Reply-To: <CADUfDZoY7rC=SxpFnN6bqBg1SiBccSyYTsKAVe2Rx0wAxBdD6Q@mail.gmail.com>
On Wed, May 21, 2025 at 06:05:14PM -0700, Caleb Sander Mateos wrote:
> On Wed, May 21, 2025 at 5:42 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Wed, May 21, 2025 at 08:58:20AM -0700, Caleb Sander Mateos wrote:
> > > On Tue, May 20, 2025 at 7:55 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > UBLK_F_AUTO_BUF_REG requires that the buffer registered automatically
> > > > is unregistered in same `io_ring_ctx`, so check it explicitly.
> > > >
> > > > Meantime return the failure code if io_buffer_unregister_bvec() fails,
> > > > then ublk server can handle the failure in consistent way.
> > > >
> > > > Also force to clear UBLK_IO_FLAG_AUTO_BUF_REG in ublk_io_release()
> > > > because ublk_io_release() may be triggered not from handling
> > > > UBLK_IO_COMMIT_AND_FETCH_REQ, and from releasing the `io_ring_ctx`
> > > > for registering the buffer.
> > > >
> > > > Fixes: 99c1e4eb6a3f ("ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG")
> > > > Reported-by: Caleb Sander Mateos <csander@purestorage.com>
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > > drivers/block/ublk_drv.c | 35 +++++++++++++++++++++++++++++++----
> > > > include/uapi/linux/ublk_cmd.h | 3 ++-
> > > > 2 files changed, 33 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > index fcf568b89370..2af6422d6a89 100644
> > > > --- a/drivers/block/ublk_drv.c
> > > > +++ b/drivers/block/ublk_drv.c
> > > > @@ -84,6 +84,7 @@ struct ublk_rq_data {
> > > >
> > > > /* for auto-unregister buffer in case of UBLK_F_AUTO_BUF_REG */
> > > > u16 buf_index;
> > > > + unsigned long buf_ctx_id;
> > > > };
> > > >
> > > > struct ublk_uring_cmd_pdu {
> > > > @@ -1192,6 +1193,11 @@ static void ublk_auto_buf_reg_fallback(struct request *req, struct ublk_io *io)
> > > > refcount_set(&data->ref, 1);
> > > > }
> > > >
> > > > +static unsigned long ublk_uring_cmd_ctx_id(struct io_uring_cmd *cmd)
> > > > +{
> > > > + return (unsigned long)(cmd_to_io_kiocb(cmd)->ctx);
> > >
> > > Is the fact that a struct io_uring_cmd * can be passed to
> > > cmd_to_io_kiocb() an io_uring internal implementation detail? Maybe it
> > > would be good to add a helper in include/linux/io_uring/cmd.h so ublk
> > > isn't depending on io_uring internals.
> >
> > All this definition is defined in kernel public header, not sure if it is
> > big deal to add the helper, especially there is just single user.
> >
> > But I will do it.
> >
> > >
> > > Also, storing buf_ctx_id as a void * instead would allow this cast to
> > > be avoided, but not a big deal.
> > >
> > > > +}
> > > > +
> > > > static bool ublk_auto_buf_reg(struct request *req, struct ublk_io *io,
> > > > unsigned int issue_flags)
> > > > {
> > > > @@ -1211,6 +1217,8 @@ static bool ublk_auto_buf_reg(struct request *req, struct ublk_io *io,
> > > > }
> > > > /* one extra reference is dropped by ublk_io_release */
> > > > refcount_set(&data->ref, 2);
> > > > +
> > > > + data->buf_ctx_id = ublk_uring_cmd_ctx_id(io->cmd);
> > > > /* store buffer index in request payload */
> > > > data->buf_index = pdu->buf.index;
> > > > io->flags |= UBLK_IO_FLAG_AUTO_BUF_REG;
> > > > @@ -1994,6 +2002,21 @@ static void ublk_io_release(void *priv)
> > > > {
> > > > struct request *rq = priv;
> > > > struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > > > + struct ublk_io *io = &ubq->ios[rq->tag];
> > > > +
> > > > + /*
> > > > + * In case of UBLK_F_AUTO_BUF_REG, the `io_uring_ctx` for registering
> > > > + * this buffer may be released, so we reach here not from handling
> > > > + * `UBLK_IO_COMMIT_AND_FETCH_REQ`.
> > >
> > > What do you mean by this? That the io_uring was closed while a ublk
> > > I/O owned by the server still had a registered buffer?
> >
> > The buffer is registered to `io_ring_ctx A`, which is closed and the buffer
> > is used up and un-registered on `io_ring_ctx A`, so this callback is
> > triggered, but the io command isn't completed yet, which can be run from
> > `io_ring_ctx B`
> >
> > >
> > > > + *
> > > > + * Force to clear UBLK_IO_FLAG_AUTO_BUF_REG, so that ublk server
> > > > + * still may complete this IO request by issuing uring_cmd from
> > > > + * another `io_uring_ctx` in case that the `io_ring_ctx` for
> > > > + * registering the buffer is gone
> > > > + */
> > > > + if (ublk_support_auto_buf_reg(ubq) &&
> > > > + (io->flags & UBLK_IO_FLAG_AUTO_BUF_REG))
> > > > + io->flags &= ~UBLK_IO_FLAG_AUTO_BUF_REG;
> > >
> > > This is racy, since ublk_io_release() can be called on a thread other
> > > than the ubq_daemon.
> >
> > Yeah, it can be true.
> >
> > > Could we avoid touching io->flags here and
> > > instead have ublk_commit_and_fetch() check whether the reference count
> > > is already 1?
> >
> > It is still a little racy because the buffer unregister from another thread
> > can happen just after the check immediately.
>
> True. I think it might be better to just skip the unregister if the
> contexts don't match rather than returning -EINVAL. Then there is no
> race. If userspace has already closed the old io_uring context,
> skipping the unregister is the desired behavior. If userspace hasn't
> closed the old io_uring, then that's a userspace bug and they get what
> they deserve (a buffer stuck registered). If userspace wants to submit
> the UBLK_IO_COMMIT_AND_FETCH_REQ on a different io_uring for some
> reason, they can always issue an explicit UBLK_IO_UNREGISTER_IO_BUF on
> the old io_uring to unregister the buffer.
Sounds good, will add the above words on ublk_commit_and_fetch().
Follows two invariants:
- ublk_io_release() is always called once no matter if it is called
from any thread context, request can't be completed until ublk_io_release()
is called
- new UBLK_IO_COMMIT_AND_FETCH_REQ can't be issued until old request
is completed & new request comes
The merged code should work just fine.
But we need to document the feature only works on same io_ring_context.
Thanks,
Ming
next prev parent reply other threads:[~2025-05-22 2:38 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-21 2:54 [PATCH 0/2] ublk: two fixes on UBLK_F_AUTO_BUF_REG Ming Lei
2025-05-21 2:54 ` [PATCH 1/2] ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch() Ming Lei
2025-05-21 14:53 ` Caleb Sander Mateos
2025-05-21 2:55 ` [PATCH 2/2] ublk: run auto buf unregisgering in same io_ring_ctx with register Ming Lei
2025-05-21 15:58 ` Caleb Sander Mateos
2025-05-22 0:42 ` Ming Lei
2025-05-22 1:05 ` Caleb Sander Mateos
2025-05-22 2:37 ` Ming Lei [this message]
2025-05-21 19:19 ` Caleb Sander Mateos
2025-05-21 13:11 ` [PATCH 0/2] ublk: two fixes on UBLK_F_AUTO_BUF_REG Jens Axboe
2025-05-21 20:31 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aC6N9w4ijVEkHN0l@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=linux-block@vger.kernel.org \
--cc=ushankar@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox