From: Ming Lei <ming.lei@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Uday Shankar <ushankar@purestorage.com>,
"zhang, the-essence-of-life" <zhangweize9@gmail.com>,
Caleb Sander Mateos <csander@purestorage.com>,
Jens Axboe <axboe@kernel.dk>, Shuah Khan <shuah@kernel.org>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v2 1/2] ublk: reset per-IO canceled flag on each fetch
Date: Mon, 6 Apr 2026 19:18:48 +0800 [thread overview]
Message-ID: <adOWmG2fGLM-43cl@fedora> (raw)
In-Reply-To: <53aec093-5494-4b4b-a103-bc166381f236@suse.de>
On Mon, Apr 06, 2026 at 09:22:13AM +0200, Hannes Reinecke wrote:
> On 4/6/26 06:25, Uday Shankar wrote:
> > If a ublk server starts recovering devices but dies before issuing fetch
> > commands for all IOs, cancellation of the fetch commands that were
> > successfully issued may never complete. This is because the per-IO
> > canceled flag can remain set even after the fetch for that IO has been
> > submitted - the per-IO canceled flags for all IOs in a queue are reset
> > together only once all IOs for that queue have been fetched. So if a
> > nonempty proper subset of the IOs for a queue are fetched when the ublk
> > server dies, the IOs in that subset will never successfully be canceled,
> > as their canceled flags remain set, and this prevents ublk_cancel_cmd
> > from actually calling io_uring_cmd_done on the commands, despite the
> > fact that they are outstanding.
> >
> > Fix this by resetting the per-IO cancel flags immediately when each IO
> > is fetched instead of waiting for all IOs for the queue (which may never
> > happen).
> >
> > Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> > Fixes: 728cbac5fe21 ("ublk: move device reset into ublk_ch_release()")
> > Reviewed-by: Ming Lei <ming.lei@redhat.com>
> > Reviewed-by: zhang, the-essence-of-life <zhangweize9@gmail.com>
> > ---
> > drivers/block/ublk_drv.c | 21 +++++++++++++--------
> > 1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 3ba7da94d31499590a06a8b307ed151919a027cb..92dabeb820344107c9fadfae94396082b933d84e 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -2916,22 +2916,26 @@ static void ublk_stop_dev(struct ublk_device *ub)
> > ublk_cancel_dev(ub);
> > }
> > +static void ublk_reset_io_flags(struct ublk_queue *ubq, struct ublk_io *io)
> > +{
> > + /* UBLK_IO_FLAG_CANCELED can be cleared now */
> > + spin_lock(&ubq->cancel_lock);
> > + io->flags &= ~UBLK_IO_FLAG_CANCELED;
> > + spin_unlock(&ubq->cancel_lock);
> > +}
> > +
> One wonders why we can't use 'set_bit' here, or, rather,
> convert 'flags' usage to set_bit().
It isn't necessary, because UBLK_F_PER_IO_DAEMON is enabled.
> The spinlock feels a bit silly as it's now per-io, and one would think
> that we don't have concurrent accesses to the same io...
UBLK_IO_FLAG_CANCELED is only used in slow path, yes, it is supposed to be
accessed concurrently.
It could be moved out of io->flags, but we do want to make `struct ublk_io`
held in single cache line.
>
> > /* reset per-queue io flags */
> > static void ublk_queue_reset_io_flags(struct ublk_queue *ubq)
> > {
> > - int j;
> > -
> > - /* UBLK_IO_FLAG_CANCELED can be cleared now */
> > spin_lock(&ubq->cancel_lock);
> > - for (j = 0; j < ubq->q_depth; j++)
> > - ubq->ios[j].flags &= ~UBLK_IO_FLAG_CANCELED;
> > ubq->canceling = false;
> > spin_unlock(&ubq->cancel_lock);
> > ubq->fail_io = false;
> > }
> Similar here; as we don't loop anymore, why do we need the spinlock?
> Isn't WRITE_ONCE() sufficient here?
WRITE_ONCE() isn't enough, because we have to make sure that io->cmd is
only completed once, please see ublk_cancel_cmd().
Anyway, all these comments should belong to improvement or new issue,
not a blocker for current bug fix.
Thanks,
Ming
next prev parent reply other threads:[~2026-04-06 11:19 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-06 4:25 [PATCH v2 0/2] ublk: fix infinite loop in ublk server teardown Uday Shankar
2026-04-06 4:25 ` [PATCH v2 1/2] ublk: reset per-IO canceled flag on each fetch Uday Shankar
2026-04-06 7:22 ` Hannes Reinecke
2026-04-06 11:18 ` Ming Lei [this message]
2026-04-06 4:25 ` [PATCH v2 2/2] selftests: ublk: test that teardown after incomplete recovery completes Uday Shankar
2026-04-06 14:19 ` Ming Lei
2026-04-06 14:38 ` [PATCH v2 0/2] ublk: fix infinite loop in ublk server teardown Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adOWmG2fGLM-43cl@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=hare@suse.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=shuah@kernel.org \
--cc=ushankar@purestorage.com \
--cc=zhangweize9@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox