public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>,
	Guy Eisenberg <geisenberg@nvidia.com>,
	Jared Holzman <jholzman@nvidia.com>, Yoav Cohen <yoav@nvidia.com>,
	Omri Levi <omril@nvidia.com>, Ofer Oshri <ofer@nvidia.com>
Subject: Re: [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
Date: Thu, 24 Apr 2025 09:47:19 +0800	[thread overview]
Message-ID: <aAmYJxaV1-yWEMRo@fedora> (raw)
In-Reply-To: <CADUfDZoUohFbisLOrav35kCbLB0pxi1nFWGaki_fFxxVqU6pew@mail.gmail.com>

On Wed, Apr 23, 2025 at 09:48:55AM -0700, Caleb Sander Mateos wrote:
> On Wed, Apr 23, 2025 at 8:39 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Wed, Apr 23, 2025 at 08:08:17AM -0700, Caleb Sander Mateos wrote:
> > > On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
> > > > we may have scheduled task work via io_uring_cmd_complete_in_task() for
> > > > dispatching request, then kernel crash can be triggered.
> > > >
> > > > Fix it by not trying to canceling the command if ublk block request is
> > > > coming to this slot.
> > > >
> > > > Reported-by: Jared Holzman <jholzman@nvidia.com>
> > > > Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nvidia.com/
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > >  drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------
> > > >  1 file changed, 31 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > index c4d4be4f6fbd..fbfb5b815c8d 100644
> > > > --- a/drivers/block/ublk_drv.c
> > > > +++ b/drivers/block/ublk_drv.c
> > > > @@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > > >         if (res != BLK_STS_OK)
> > > >                 return res;
> > > >
> > > > +       /*
> > > > +        * Order writing to rq->state in blk_mq_start_request() and
> > > > +        * reading ubq->canceling, see comment in ublk_cancel_command()
> > > > +        * wrt. the pair barrier.
> > > > +        */
> > > > +       smp_mb();
> > >
> > > Adding an mfence to every ublk I/O would be really unfortunate. Memory
> > > barriers are very expensive in a system with a lot of CPUs. Why can't
> >
> > I believe perf effect from the little smp_mb() may not be observed, actually
> > there are several main contributions for ublk perf per my last profiling:
> 
> I have seen a single mfence instruction in the I/O path account for
> multiple percent of the CPU profile in the past. The cost of a fence
> scales superlinearly with the number of CPUs, so it can be a real
> parallelism bottleneck. I'm not opposed to the memory barrier if it's
> truly necessary for correctness, but I would love to consider any
> alternatives.

Thinking of further, the added barrier is actually useless, because:

- for any new coming request since ublk_start_cancel(), ubq->canceling is
  always observed

- this patch is only for addressing requests TW is scheduled before or
  during quiesce, but not get chance to run yet

The added single check of `req && blk_mq_request_started(req)` should be
enough because:

- either the started request is aborted via __ublk_abort_rq(), so the
uring_cmd is canceled next time

or

- the uring cmd is done in TW function ublk_dispatch_req() because io_uring
guarantees that it is called

> 
> I have been looking at ublk zero-copy CPU profiles recently, and there
> the most expensive instructions there are the atomic
> reference-counting in ublk_get_req_ref()/ublk_put_req_ref(). I have
> some ideas to reduce that overhead.

The two can be observed by perf, but IMO it is still small part compared with
the other ones, not sure you can get obvious IOPS boost in this area,
especially it is hard to avoid the atomic reference.

I am preparing patch to relax the context limit for register_io/un_register_io
command which should have been run from non ublk queue contexts, just need
one offload ublk server implementation.

thanks,
Ming


  reply	other threads:[~2025-04-24  1:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23  9:24 [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-23  9:24 ` [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA Ming Lei
2025-04-23 14:44   ` Caleb Sander Mateos
2025-04-23 14:52     ` Caleb Sander Mateos
2025-04-24  1:53       ` Ming Lei
2025-04-23  9:24 ` [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-23 15:08   ` Caleb Sander Mateos
2025-04-23 15:39     ` Ming Lei
2025-04-23 16:48       ` Caleb Sander Mateos
2025-04-24  1:47         ` Ming Lei [this message]
2025-04-25  0:55           ` Caleb Sander Mateos
2025-04-24 21:10 ` [PATCH 0/2] " Jared Holzman
2025-04-25  1:43   ` Ming Lei
2025-04-25  1:53     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aAmYJxaV1-yWEMRo@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=geisenberg@nvidia.com \
    --cc=jholzman@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ofer@nvidia.com \
    --cc=omril@nvidia.com \
    --cc=ushankar@purestorage.com \
    --cc=yoav@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox