All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Uday Shankar <ushankar@purestorage.com>,
	"zhang, the-essence-of-life" <zhangweize9@gmail.com>,
	Caleb Sander Mateos <csander@purestorage.com>,
	Jens Axboe <axboe@kernel.dk>, Shuah Khan <shuah@kernel.org>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v2 1/2] ublk: reset per-IO canceled flag on each fetch
Date: Mon, 6 Apr 2026 19:18:48 +0800	[thread overview]
Message-ID: <adOWmG2fGLM-43cl@fedora> (raw)
In-Reply-To: <53aec093-5494-4b4b-a103-bc166381f236@suse.de>

On Mon, Apr 06, 2026 at 09:22:13AM +0200, Hannes Reinecke wrote:
> On 4/6/26 06:25, Uday Shankar wrote:
> > If a ublk server starts recovering devices but dies before issuing fetch
> > commands for all IOs, cancellation of the fetch commands that were
> > successfully issued may never complete. This is because the per-IO
> > canceled flag can remain set even after the fetch for that IO has been
> > submitted - the per-IO canceled flags for all IOs in a queue are reset
> > together only once all IOs for that queue have been fetched. So if a
> > nonempty proper subset of the IOs for a queue are fetched when the ublk
> > server dies, the IOs in that subset will never successfully be canceled,
> > as their canceled flags remain set, and this prevents ublk_cancel_cmd
> > from actually calling io_uring_cmd_done on the commands, despite the
> > fact that they are outstanding.
> > 
> > Fix this by resetting the per-IO cancel flags immediately when each IO
> > is fetched instead of waiting for all IOs for the queue (which may never
> > happen).
> > 
> > Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> > Fixes: 728cbac5fe21 ("ublk: move device reset into ublk_ch_release()")
> > Reviewed-by: Ming Lei <ming.lei@redhat.com>
> > Reviewed-by: zhang, the-essence-of-life <zhangweize9@gmail.com>
> > ---
> >   drivers/block/ublk_drv.c | 21 +++++++++++++--------
> >   1 file changed, 13 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 3ba7da94d31499590a06a8b307ed151919a027cb..92dabeb820344107c9fadfae94396082b933d84e 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -2916,22 +2916,26 @@ static void ublk_stop_dev(struct ublk_device *ub)
> >   	ublk_cancel_dev(ub);
> >   }
> > +static void ublk_reset_io_flags(struct ublk_queue *ubq, struct ublk_io *io)
> > +{
> > +	/* UBLK_IO_FLAG_CANCELED can be cleared now */
> > +	spin_lock(&ubq->cancel_lock);
> > +	io->flags &= ~UBLK_IO_FLAG_CANCELED;
> > +	spin_unlock(&ubq->cancel_lock);
> > +}
> > +
> One wonders why we can't use 'set_bit' here, or, rather,
> convert 'flags' usage to set_bit().

It isn't necessary, because UBLK_F_PER_IO_DAEMON is enabled.

> The spinlock feels a bit silly as it's now per-io, and one would think
> that we don't have concurrent accesses to the same io...

UBLK_IO_FLAG_CANCELED is only used in slow path, yes, it is supposed to be
accessed concurrently.

It could be moved out of io->flags, but we do want to make `struct ublk_io`
held in single cache line.

> 
> >   /* reset per-queue io flags */
> >   static void ublk_queue_reset_io_flags(struct ublk_queue *ubq)
> >   {
> > -	int j;
> > -
> > -	/* UBLK_IO_FLAG_CANCELED can be cleared now */
> >   	spin_lock(&ubq->cancel_lock);
> > -	for (j = 0; j < ubq->q_depth; j++)
> > -		ubq->ios[j].flags &= ~UBLK_IO_FLAG_CANCELED;
> >   	ubq->canceling = false;
> >   	spin_unlock(&ubq->cancel_lock);
> >   	ubq->fail_io = false;
> >   }
> Similar here; as we don't loop anymore, why do we need the spinlock?
> Isn't WRITE_ONCE() sufficient here?

WRITE_ONCE() isn't enough, because we have to make sure that io->cmd is
only completed once, please see ublk_cancel_cmd().

Anyway, all these comments should belong to improvement or new issue,
not a blocker for current bug fix.


Thanks,
Ming


  reply	other threads:[~2026-04-06 11:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06  4:25 [PATCH v2 0/2] ublk: fix infinite loop in ublk server teardown Uday Shankar
2026-04-06  4:25 ` [PATCH v2 1/2] ublk: reset per-IO canceled flag on each fetch Uday Shankar
2026-04-06  7:22   ` Hannes Reinecke
2026-04-06 11:18     ` Ming Lei [this message]
2026-04-06  4:25 ` [PATCH v2 2/2] selftests: ublk: test that teardown after incomplete recovery completes Uday Shankar
2026-04-06 14:19   ` Ming Lei
2026-04-06 14:38 ` [PATCH v2 0/2] ublk: fix infinite loop in ublk server teardown Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adOWmG2fGLM-43cl@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=ushankar@purestorage.com \
    --cc=zhangweize9@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.