* [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
@ 2025-04-23 9:24 Ming Lei
2025-04-23 9:24 ` [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA Ming Lei
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Ming Lei @ 2025-04-23 9:24 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Uday Shankar, Caleb Sander Mateos, Guy Eisenberg, Jared Holzman,
Yoav Cohen, Omri Levi, Ofer Oshri, Ming Lei
Hello Jens,
The 2 patches try to fix race between between io_uring_cmd_complete_in_task
and ublk_cancel_cmd, please don't apply until Jared verifies them.
Jared, please test and see if the two can fix your crash issue on v6.15-rc3.
If you can't reproduce it on v6.15-rc3, please backport them to v6.14, and I
can help the backport if you need.
Thanks,
Ming
Ming Lei (2):
ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
ublk: fix race between io_uring_cmd_complete_in_task and
ublk_cancel_cmd
drivers/block/ublk_drv.c | 51 ++++++++++++++++++++++++++--------------
1 file changed, 34 insertions(+), 17 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
2025-04-23 9:24 [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
@ 2025-04-23 9:24 ` Ming Lei
2025-04-23 14:44 ` Caleb Sander Mateos
2025-04-23 9:24 ` [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-24 21:10 ` [PATCH 0/2] " Jared Holzman
2 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-04-23 9:24 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Uday Shankar, Caleb Sander Mateos, Guy Eisenberg, Jared Holzman,
Yoav Cohen, Omri Levi, Ofer Oshri, Ming Lei
The in-tree code calls io_uring_cmd_complete_in_task() to schedule
task_work for dispatching this request to handle
UBLK_U_IO_NEED_GET_DATA.
This ways is really not necessary because the current context is exactly
the ublk queue context, so call ublk_dispatch_req() directly for handling
UBLK_U_IO_NEED_GET_DATA.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 2de7b2bd409d..c4d4be4f6fbd 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1886,15 +1886,6 @@ static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
}
}
-static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id,
- int tag)
-{
- struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
- struct request *req = blk_mq_tag_to_rq(ub->tag_set.tags[q_id], tag);
-
- ublk_queue_cmd(ubq, req);
-}
-
static inline int ublk_check_cmd_op(u32 cmd_op)
{
u32 ioc_type = _IOC_TYPE(cmd_op);
@@ -2103,8 +2094,9 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
goto out;
ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
- ublk_handle_need_get_data(ub, ub_cmd->q_id, ub_cmd->tag);
- break;
+ req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag);
+ ublk_dispatch_req(ubq, req, issue_flags);
+ return -EIOCBQUEUED;
default:
goto out;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-23 9:24 [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-23 9:24 ` [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA Ming Lei
@ 2025-04-23 9:24 ` Ming Lei
2025-04-23 15:08 ` Caleb Sander Mateos
2025-04-24 21:10 ` [PATCH 0/2] " Jared Holzman
2 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-04-23 9:24 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Uday Shankar, Caleb Sander Mateos, Guy Eisenberg, Jared Holzman,
Yoav Cohen, Omri Levi, Ofer Oshri, Ming Lei
ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
we may have scheduled task work via io_uring_cmd_complete_in_task() for
dispatching request, then kernel crash can be triggered.
Fix it by not trying to canceling the command if ublk block request is
coming to this slot.
Reported-by: Jared Holzman <jholzman@nvidia.com>
Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nvidia.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------
1 file changed, 31 insertions(+), 6 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index c4d4be4f6fbd..fbfb5b815c8d 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
if (res != BLK_STS_OK)
return res;
+ /*
+ * Order writing to rq->state in blk_mq_start_request() and
+ * reading ubq->canceling, see comment in ublk_cancel_command()
+ * wrt. the pair barrier.
+ */
+ smp_mb();
/*
* ->canceling has to be handled after ->force_abort and ->fail_io
* is dealt with, otherwise this request may not be failed in case
@@ -1683,14 +1689,35 @@ static void ublk_start_cancel(struct ublk_queue *ubq)
ublk_put_disk(disk);
}
-static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
+static void ublk_cancel_cmd(struct ublk_queue *ubq, unsigned tag,
unsigned int issue_flags)
{
+ struct ublk_io *io = &ubq->ios[tag];
+ struct ublk_device *ub = ubq->dev;
+ struct request *req;
bool done;
if (!(io->flags & UBLK_IO_FLAG_ACTIVE))
return;
+ /*
+ * Don't try to cancel this command if the request is started for
+ * avoiding race between io_uring_cmd_done() and
+ * io_uring_cmd_complete_in_task().
+ *
+ * memory barrier is implied in ublk_start_cancel() for ordering to
+ * WRITE ubq->canceling and READ request state from
+ * blk_mq_request_started().
+ *
+ * If the request state is observed as not started, ublk_queue_rq()
+ * should observe ubq->canceling, so request can be aborted and this
+ * uring_cmd won't be used. Otherwise, this uring_cmd will be completed
+ * from the dispatch code path finally.
+ */
+ req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag);
+ if (req && blk_mq_request_started(req))
+ return;
+
spin_lock(&ubq->cancel_lock);
done = !!(io->flags & UBLK_IO_FLAG_CANCELED);
if (!done)
@@ -1722,7 +1749,6 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
struct ublk_queue *ubq = pdu->ubq;
struct task_struct *task;
- struct ublk_io *io;
if (WARN_ON_ONCE(!ubq))
return;
@@ -1737,9 +1763,8 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
if (!ubq->canceling)
ublk_start_cancel(ubq);
- io = &ubq->ios[pdu->tag];
- WARN_ON_ONCE(io->cmd != cmd);
- ublk_cancel_cmd(ubq, io, issue_flags);
+ WARN_ON_ONCE(ubq->ios[pdu->tag].cmd != cmd);
+ ublk_cancel_cmd(ubq, pdu->tag, issue_flags);
}
static inline bool ublk_queue_ready(struct ublk_queue *ubq)
@@ -1752,7 +1777,7 @@ static void ublk_cancel_queue(struct ublk_queue *ubq)
int i;
for (i = 0; i < ubq->q_depth; i++)
- ublk_cancel_cmd(ubq, &ubq->ios[i], IO_URING_F_UNLOCKED);
+ ublk_cancel_cmd(ubq, i, IO_URING_F_UNLOCKED);
}
/* Cancel all pending commands, must be called after del_gendisk() returns */
--
2.47.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
2025-04-23 9:24 ` [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA Ming Lei
@ 2025-04-23 14:44 ` Caleb Sander Mateos
2025-04-23 14:52 ` Caleb Sander Mateos
0 siblings, 1 reply; 14+ messages in thread
From: Caleb Sander Mateos @ 2025-04-23 14:44 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Uday Shankar, Guy Eisenberg,
Jared Holzman, Yoav Cohen, Omri Levi, Ofer Oshri
On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> The in-tree code calls io_uring_cmd_complete_in_task() to schedule
> task_work for dispatching this request to handle
> UBLK_U_IO_NEED_GET_DATA.
>
> This ways is really not necessary because the current context is exactly
> the ublk queue context, so call ublk_dispatch_req() directly for handling
> UBLK_U_IO_NEED_GET_DATA.
Indeed, I was planning to make the same change!
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/block/ublk_drv.c | 14 +++-----------
> 1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 2de7b2bd409d..c4d4be4f6fbd 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -1886,15 +1886,6 @@ static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
> }
> }
>
> -static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id,
> - int tag)
> -{
> - struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
> - struct request *req = blk_mq_tag_to_rq(ub->tag_set.tags[q_id], tag);
> -
> - ublk_queue_cmd(ubq, req);
> -}
Looks like this will conflict with Uday's patch:
https://lore.kernel.org/linux-block/20250421-ublk_constify-v1-3-3371f9e9f73c@purestorage.com/
. Since that series already has reviews, I expect it will land first.
> -
> static inline int ublk_check_cmd_op(u32 cmd_op)
> {
> u32 ioc_type = _IOC_TYPE(cmd_op);
> @@ -2103,8 +2094,9 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
> if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> goto out;
> ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
> - ublk_handle_need_get_data(ub, ub_cmd->q_id, ub_cmd->tag);
> - break;
> + req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag);
> + ublk_dispatch_req(ubq, req, issue_flags);
Maybe it would make sense to factor the UBLK_IO_NEED_GET_DATA handling
out of ublk_dispatch_req()? Then ublk_dispatch_req() (called only for
incoming ublk requests) could assume the UBLK_IO_FLAG_NEED_GET_DATA
flag is not yet set, and this path wouldn't need to pay the cost of
re-checking current != ubq->ubq_daemon, ublk_need_get_data(ubq) &&
ublk_need_map_req(req), etc.
> + return -EIOCBQUEUED;
It's probably possible to return the result here synchronously to
avoid the small overhead of io_uring_cmd_done(). That may be easier to
do if the UBLK_IO_NEED_GET_DATA path is separated from
ublk_dispatch_req().
Best,
Caleb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
2025-04-23 14:44 ` Caleb Sander Mateos
@ 2025-04-23 14:52 ` Caleb Sander Mateos
2025-04-24 1:53 ` Ming Lei
0 siblings, 1 reply; 14+ messages in thread
From: Caleb Sander Mateos @ 2025-04-23 14:52 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Uday Shankar, Guy Eisenberg,
Jared Holzman, Yoav Cohen, Omri Levi, Ofer Oshri
On Wed, Apr 23, 2025 at 7:44 AM Caleb Sander Mateos
<csander@purestorage.com> wrote:
>
> On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > The in-tree code calls io_uring_cmd_complete_in_task() to schedule
> > task_work for dispatching this request to handle
> > UBLK_U_IO_NEED_GET_DATA.
> >
> > This ways is really not necessary because the current context is exactly
> > the ublk queue context, so call ublk_dispatch_req() directly for handling
> > UBLK_U_IO_NEED_GET_DATA.
>
> Indeed, I was planning to make the same change!
>
> >
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > drivers/block/ublk_drv.c | 14 +++-----------
> > 1 file changed, 3 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index 2de7b2bd409d..c4d4be4f6fbd 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -1886,15 +1886,6 @@ static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
> > }
> > }
> >
> > -static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id,
> > - int tag)
> > -{
> > - struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
> > - struct request *req = blk_mq_tag_to_rq(ub->tag_set.tags[q_id], tag);
> > -
> > - ublk_queue_cmd(ubq, req);
> > -}
>
> Looks like this will conflict with Uday's patch:
> https://lore.kernel.org/linux-block/20250421-ublk_constify-v1-3-3371f9e9f73c@purestorage.com/
> . Since that series already has reviews, I expect it will land first.
>
> > -
> > static inline int ublk_check_cmd_op(u32 cmd_op)
> > {
> > u32 ioc_type = _IOC_TYPE(cmd_op);
> > @@ -2103,8 +2094,9 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
> > if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> > goto out;
> > ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
> > - ublk_handle_need_get_data(ub, ub_cmd->q_id, ub_cmd->tag);
> > - break;
> > + req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag);
> > + ublk_dispatch_req(ubq, req, issue_flags);
>
> Maybe it would make sense to factor the UBLK_IO_NEED_GET_DATA handling
> out of ublk_dispatch_req()? Then ublk_dispatch_req() (called only for
> incoming ublk requests) could assume the UBLK_IO_FLAG_NEED_GET_DATA
> flag is not yet set, and this path wouldn't need to pay the cost of
> re-checking current != ubq->ubq_daemon, ublk_need_get_data(ubq) &&
> ublk_need_map_req(req), etc.
>
> > + return -EIOCBQUEUED;
>
> It's probably possible to return the result here synchronously to
> avoid the small overhead of io_uring_cmd_done(). That may be easier to
> do if the UBLK_IO_NEED_GET_DATA path is separated from
> ublk_dispatch_req().
And if we can avoid using io_uring_cmd_done(), calling
ublk_fill_io_cmd() for UBLK_IO_NEED_GET_DATA would no longer be
necessary. (This was my original motivation to handle
UBLK_IO_NEED_GET_DATA synchronously; UBLK_IO_NEED_GET_DATA overwriting
io->cmd is an obstacle to introducing a struct request * field that
aliases io->cmd.)
Best,
Caleb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-23 9:24 ` [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
@ 2025-04-23 15:08 ` Caleb Sander Mateos
2025-04-23 15:39 ` Ming Lei
0 siblings, 1 reply; 14+ messages in thread
From: Caleb Sander Mateos @ 2025-04-23 15:08 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Uday Shankar, Guy Eisenberg,
Jared Holzman, Yoav Cohen, Omri Levi, Ofer Oshri
On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
> we may have scheduled task work via io_uring_cmd_complete_in_task() for
> dispatching request, then kernel crash can be triggered.
>
> Fix it by not trying to canceling the command if ublk block request is
> coming to this slot.
>
> Reported-by: Jared Holzman <jholzman@nvidia.com>
> Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nvidia.com/
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------
> 1 file changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index c4d4be4f6fbd..fbfb5b815c8d 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> if (res != BLK_STS_OK)
> return res;
>
> + /*
> + * Order writing to rq->state in blk_mq_start_request() and
> + * reading ubq->canceling, see comment in ublk_cancel_command()
> + * wrt. the pair barrier.
> + */
> + smp_mb();
Adding an mfence to every ublk I/O would be really unfortunate. Memory
barriers are very expensive in a system with a lot of CPUs. Why can't
we rely on blk_mq_quiesce_queue() to prevent new requests from being
queued? Is the bug that ublk_uring_cmd_cancel_fn() alls
ublk_start_cancel() (which calls blk_mq_quiesce_queue()), but
ublk_cancel_dev() does not?
Best,
Caleb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-23 15:08 ` Caleb Sander Mateos
@ 2025-04-23 15:39 ` Ming Lei
2025-04-23 16:48 ` Caleb Sander Mateos
0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-04-23 15:39 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Jens Axboe, linux-block, Uday Shankar, Guy Eisenberg,
Jared Holzman, Yoav Cohen, Omri Levi, Ofer Oshri
On Wed, Apr 23, 2025 at 08:08:17AM -0700, Caleb Sander Mateos wrote:
> On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
> > we may have scheduled task work via io_uring_cmd_complete_in_task() for
> > dispatching request, then kernel crash can be triggered.
> >
> > Fix it by not trying to canceling the command if ublk block request is
> > coming to this slot.
> >
> > Reported-by: Jared Holzman <jholzman@nvidia.com>
> > Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nvidia.com/
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------
> > 1 file changed, 31 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index c4d4be4f6fbd..fbfb5b815c8d 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > if (res != BLK_STS_OK)
> > return res;
> >
> > + /*
> > + * Order writing to rq->state in blk_mq_start_request() and
> > + * reading ubq->canceling, see comment in ublk_cancel_command()
> > + * wrt. the pair barrier.
> > + */
> > + smp_mb();
>
> Adding an mfence to every ublk I/O would be really unfortunate. Memory
> barriers are very expensive in a system with a lot of CPUs. Why can't
I believe perf effect from the little smp_mb() may not be observed, actually
there are several main contributions for ublk perf per my last profiling:
- security_uring_cmd()
With removing security_uring_cmd(), ublk/loop over fast nvme is close to
kernel loop.
- bio allocation & freeing
ublk bio is allocated from one cpu, and usually freed on another CPU
- generic io_uring or block layer handling
which should be same with other io_uring application
And ublk cost is usually pretty small compared with above when running
workload with batched IOs.
> we rely on blk_mq_quiesce_queue() to prevent new requests from being
> queued? Is the bug that ublk_uring_cmd_cancel_fn() alls
> ublk_start_cancel() (which calls blk_mq_quiesce_queue()), but
> ublk_cancel_dev() does not?
I guess it is because we just mark ->canceling for one ubq with queue
quiesced. If all queues' ->canceling is set in ublk_start_cancel(), the
issue may be avoided too without this change.
Thanks
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-23 15:39 ` Ming Lei
@ 2025-04-23 16:48 ` Caleb Sander Mateos
2025-04-24 1:47 ` Ming Lei
0 siblings, 1 reply; 14+ messages in thread
From: Caleb Sander Mateos @ 2025-04-23 16:48 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Uday Shankar, Guy Eisenberg,
Jared Holzman, Yoav Cohen, Omri Levi, Ofer Oshri
On Wed, Apr 23, 2025 at 8:39 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, Apr 23, 2025 at 08:08:17AM -0700, Caleb Sander Mateos wrote:
> > On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
> > > we may have scheduled task work via io_uring_cmd_complete_in_task() for
> > > dispatching request, then kernel crash can be triggered.
> > >
> > > Fix it by not trying to canceling the command if ublk block request is
> > > coming to this slot.
> > >
> > > Reported-by: Jared Holzman <jholzman@nvidia.com>
> > > Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nvidia.com/
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > > drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------
> > > 1 file changed, 31 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index c4d4be4f6fbd..fbfb5b815c8d 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > > if (res != BLK_STS_OK)
> > > return res;
> > >
> > > + /*
> > > + * Order writing to rq->state in blk_mq_start_request() and
> > > + * reading ubq->canceling, see comment in ublk_cancel_command()
> > > + * wrt. the pair barrier.
> > > + */
> > > + smp_mb();
> >
> > Adding an mfence to every ublk I/O would be really unfortunate. Memory
> > barriers are very expensive in a system with a lot of CPUs. Why can't
>
> I believe perf effect from the little smp_mb() may not be observed, actually
> there are several main contributions for ublk perf per my last profiling:
I have seen a single mfence instruction in the I/O path account for
multiple percent of the CPU profile in the past. The cost of a fence
scales superlinearly with the number of CPUs, so it can be a real
parallelism bottleneck. I'm not opposed to the memory barrier if it's
truly necessary for correctness, but I would love to consider any
alternatives.
I have been looking at ublk zero-copy CPU profiles recently, and there
the most expensive instructions there are the atomic
reference-counting in ublk_get_req_ref()/ublk_put_req_ref(). I have
some ideas to reduce that overhead.
Best,
Caleb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-23 16:48 ` Caleb Sander Mateos
@ 2025-04-24 1:47 ` Ming Lei
2025-04-25 0:55 ` Caleb Sander Mateos
0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-04-24 1:47 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Jens Axboe, linux-block, Uday Shankar, Guy Eisenberg,
Jared Holzman, Yoav Cohen, Omri Levi, Ofer Oshri
On Wed, Apr 23, 2025 at 09:48:55AM -0700, Caleb Sander Mateos wrote:
> On Wed, Apr 23, 2025 at 8:39 AM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Wed, Apr 23, 2025 at 08:08:17AM -0700, Caleb Sander Mateos wrote:
> > > On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
> > > > we may have scheduled task work via io_uring_cmd_complete_in_task() for
> > > > dispatching request, then kernel crash can be triggered.
> > > >
> > > > Fix it by not trying to canceling the command if ublk block request is
> > > > coming to this slot.
> > > >
> > > > Reported-by: Jared Holzman <jholzman@nvidia.com>
> > > > Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nvidia.com/
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > ---
> > > > drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------
> > > > 1 file changed, 31 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > index c4d4be4f6fbd..fbfb5b815c8d 100644
> > > > --- a/drivers/block/ublk_drv.c
> > > > +++ b/drivers/block/ublk_drv.c
> > > > @@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > > > if (res != BLK_STS_OK)
> > > > return res;
> > > >
> > > > + /*
> > > > + * Order writing to rq->state in blk_mq_start_request() and
> > > > + * reading ubq->canceling, see comment in ublk_cancel_command()
> > > > + * wrt. the pair barrier.
> > > > + */
> > > > + smp_mb();
> > >
> > > Adding an mfence to every ublk I/O would be really unfortunate. Memory
> > > barriers are very expensive in a system with a lot of CPUs. Why can't
> >
> > I believe perf effect from the little smp_mb() may not be observed, actually
> > there are several main contributions for ublk perf per my last profiling:
>
> I have seen a single mfence instruction in the I/O path account for
> multiple percent of the CPU profile in the past. The cost of a fence
> scales superlinearly with the number of CPUs, so it can be a real
> parallelism bottleneck. I'm not opposed to the memory barrier if it's
> truly necessary for correctness, but I would love to consider any
> alternatives.
Thinking of further, the added barrier is actually useless, because:
- for any new coming request since ublk_start_cancel(), ubq->canceling is
always observed
- this patch is only for addressing requests TW is scheduled before or
during quiesce, but not get chance to run yet
The added single check of `req && blk_mq_request_started(req)` should be
enough because:
- either the started request is aborted via __ublk_abort_rq(), so the
uring_cmd is canceled next time
or
- the uring cmd is done in TW function ublk_dispatch_req() because io_uring
guarantees that it is called
>
> I have been looking at ublk zero-copy CPU profiles recently, and there
> the most expensive instructions there are the atomic
> reference-counting in ublk_get_req_ref()/ublk_put_req_ref(). I have
> some ideas to reduce that overhead.
The two can be observed by perf, but IMO it is still small part compared with
the other ones, not sure you can get obvious IOPS boost in this area,
especially it is hard to avoid the atomic reference.
I am preparing patch to relax the context limit for register_io/un_register_io
command which should have been run from non ublk queue contexts, just need
one offload ublk server implementation.
thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
2025-04-23 14:52 ` Caleb Sander Mateos
@ 2025-04-24 1:53 ` Ming Lei
0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2025-04-24 1:53 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Jens Axboe, linux-block, Uday Shankar, Guy Eisenberg,
Jared Holzman, Yoav Cohen, Omri Levi, Ofer Oshri
On Wed, Apr 23, 2025 at 07:52:19AM -0700, Caleb Sander Mateos wrote:
> On Wed, Apr 23, 2025 at 7:44 AM Caleb Sander Mateos
> <csander@purestorage.com> wrote:
> >
> > On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > The in-tree code calls io_uring_cmd_complete_in_task() to schedule
> > > task_work for dispatching this request to handle
> > > UBLK_U_IO_NEED_GET_DATA.
> > >
> > > This ways is really not necessary because the current context is exactly
> > > the ublk queue context, so call ublk_dispatch_req() directly for handling
> > > UBLK_U_IO_NEED_GET_DATA.
> >
> > Indeed, I was planning to make the same change!
> >
> > >
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > > drivers/block/ublk_drv.c | 14 +++-----------
> > > 1 file changed, 3 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > index 2de7b2bd409d..c4d4be4f6fbd 100644
> > > --- a/drivers/block/ublk_drv.c
> > > +++ b/drivers/block/ublk_drv.c
> > > @@ -1886,15 +1886,6 @@ static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
> > > }
> > > }
> > >
> > > -static void ublk_handle_need_get_data(struct ublk_device *ub, int q_id,
> > > - int tag)
> > > -{
> > > - struct ublk_queue *ubq = ublk_get_queue(ub, q_id);
> > > - struct request *req = blk_mq_tag_to_rq(ub->tag_set.tags[q_id], tag);
> > > -
> > > - ublk_queue_cmd(ubq, req);
> > > -}
> >
> > Looks like this will conflict with Uday's patch:
> > https://lore.kernel.org/linux-block/20250421-ublk_constify-v1-3-3371f9e9f73c@purestorage.com/
> > . Since that series already has reviews, I expect it will land first.
> >
> > > -
> > > static inline int ublk_check_cmd_op(u32 cmd_op)
> > > {
> > > u32 ioc_type = _IOC_TYPE(cmd_op);
> > > @@ -2103,8 +2094,9 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
> > > if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
> > > goto out;
> > > ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
> > > - ublk_handle_need_get_data(ub, ub_cmd->q_id, ub_cmd->tag);
> > > - break;
> > > + req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag);
> > > + ublk_dispatch_req(ubq, req, issue_flags);
> >
> > Maybe it would make sense to factor the UBLK_IO_NEED_GET_DATA handling
> > out of ublk_dispatch_req()? Then ublk_dispatch_req() (called only for
> > incoming ublk requests) could assume the UBLK_IO_FLAG_NEED_GET_DATA
> > flag is not yet set, and this path wouldn't need to pay the cost of
> > re-checking current != ubq->ubq_daemon, ublk_need_get_data(ubq) &&
> > ublk_need_map_req(req), etc.
> >
> > > + return -EIOCBQUEUED;
> >
> > It's probably possible to return the result here synchronously to
> > avoid the small overhead of io_uring_cmd_done(). That may be easier to
> > do if the UBLK_IO_NEED_GET_DATA path is separated from
> > ublk_dispatch_req().
>
> And if we can avoid using io_uring_cmd_done(), calling
> ublk_fill_io_cmd() for UBLK_IO_NEED_GET_DATA would no longer be
> necessary. (This was my original motivation to handle
> UBLK_IO_NEED_GET_DATA synchronously; UBLK_IO_NEED_GET_DATA overwriting
> io->cmd is an obstacle to introducing a struct request * field that
> aliases io->cmd.)
All your comments are reasonable.
Here I just want to keep it simple for backport purpose, and we can clean
up them all by one followup.
Thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-23 9:24 [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-23 9:24 ` [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA Ming Lei
2025-04-23 9:24 ` [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
@ 2025-04-24 21:10 ` Jared Holzman
2025-04-25 1:43 ` Ming Lei
2 siblings, 1 reply; 14+ messages in thread
From: Jared Holzman @ 2025-04-24 21:10 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, linux-block
Cc: Uday Shankar, Caleb Sander Mateos, Guy Eisenberg, Yoav Cohen,
Omri Levi, Ofer Oshri
On 23/04/2025 12:24, Ming Lei wrote:
> Hello Jens,
>
> The 2 patches try to fix race between between io_uring_cmd_complete_in_task
> and ublk_cancel_cmd, please don't apply until Jared verifies them.
>
> Jared, please test and see if the two can fix your crash issue on v6.15-rc3.
>
> If you can't reproduce it on v6.15-rc3, please backport them to v6.14, and I
> can help the backport if you need.
>
> Thanks,
> Ming
>
> Ming Lei (2):
> ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
> ublk: fix race between io_uring_cmd_complete_in_task and
> ublk_cancel_cmd
>
> drivers/block/ublk_drv.c | 51 ++++++++++++++++++++++++++--------------
> 1 file changed, 34 insertions(+), 17 deletions(-)
>
Hi Ming,
It's a solid fix. I ran it through our automation and it passed 300 iterations without an issue. Previously we were getting crash after less than 20.
I also back-ported the patches to 6.14 and it works there too.
Will these fixes make it into 6.15? Or only 6.16?
Also is there a 6.14 maintenance branch that could also be fixed or is it end-of-life already?
Thanks for the help on this.
Regards,
Jared.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-24 1:47 ` Ming Lei
@ 2025-04-25 0:55 ` Caleb Sander Mateos
0 siblings, 0 replies; 14+ messages in thread
From: Caleb Sander Mateos @ 2025-04-25 0:55 UTC (permalink / raw)
To: Ming Lei
Cc: Jens Axboe, linux-block, Uday Shankar, Guy Eisenberg,
Jared Holzman, Yoav Cohen, Omri Levi, Ofer Oshri
On Wed, Apr 23, 2025 at 6:47 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, Apr 23, 2025 at 09:48:55AM -0700, Caleb Sander Mateos wrote:
> > On Wed, Apr 23, 2025 at 8:39 AM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > On Wed, Apr 23, 2025 at 08:08:17AM -0700, Caleb Sander Mateos wrote:
> > > > On Wed, Apr 23, 2025 at 2:24 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > > >
> > > > > ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
> > > > > we may have scheduled task work via io_uring_cmd_complete_in_task() for
> > > > > dispatching request, then kernel crash can be triggered.
> > > > >
> > > > > Fix it by not trying to canceling the command if ublk block request is
> > > > > coming to this slot.
> > > > >
> > > > > Reported-by: Jared Holzman <jholzman@nvidia.com>
> > > > > Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nvidia.com/
> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > ---
> > > > > drivers/block/ublk_drv.c | 37 +++++++++++++++++++++++++++++++------
> > > > > 1 file changed, 31 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > > > > index c4d4be4f6fbd..fbfb5b815c8d 100644
> > > > > --- a/drivers/block/ublk_drv.c
> > > > > +++ b/drivers/block/ublk_drv.c
> > > > > @@ -1334,6 +1334,12 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> > > > > if (res != BLK_STS_OK)
> > > > > return res;
> > > > >
> > > > > + /*
> > > > > + * Order writing to rq->state in blk_mq_start_request() and
> > > > > + * reading ubq->canceling, see comment in ublk_cancel_command()
> > > > > + * wrt. the pair barrier.
> > > > > + */
> > > > > + smp_mb();
> > > >
> > > > Adding an mfence to every ublk I/O would be really unfortunate. Memory
> > > > barriers are very expensive in a system with a lot of CPUs. Why can't
> > >
> > > I believe perf effect from the little smp_mb() may not be observed, actually
> > > there are several main contributions for ublk perf per my last profiling:
> >
> > I have seen a single mfence instruction in the I/O path account for
> > multiple percent of the CPU profile in the past. The cost of a fence
> > scales superlinearly with the number of CPUs, so it can be a real
> > parallelism bottleneck. I'm not opposed to the memory barrier if it's
> > truly necessary for correctness, but I would love to consider any
> > alternatives.
>
> Thinking of further, the added barrier is actually useless, because:
>
> - for any new coming request since ublk_start_cancel(), ubq->canceling is
> always observed
>
> - this patch is only for addressing requests TW is scheduled before or
> during quiesce, but not get chance to run yet
>
> The added single check of `req && blk_mq_request_started(req)` should be
> enough because:
>
> - either the started request is aborted via __ublk_abort_rq(), so the
> uring_cmd is canceled next time
>
> or
>
> - the uring cmd is done in TW function ublk_dispatch_req() because io_uring
> guarantees that it is called
Your logic seems reasonable to me. I'm all for eliminating the barrier!
>
> >
> > I have been looking at ublk zero-copy CPU profiles recently, and there
> > the most expensive instructions there are the atomic
> > reference-counting in ublk_get_req_ref()/ublk_put_req_ref(). I have
> > some ideas to reduce that overhead.
>
> The two can be observed by perf, but IMO it is still small part compared with
> the other ones, not sure you can get obvious IOPS boost in this area,
> especially it is hard to avoid the atomic reference.
>
> I am preparing patch to relax the context limit for register_io/un_register_io
> command which should have been run from non ublk queue contexts, just need
> one offload ublk server implementation.
Good guess! I was hoping to depend on the fact that buffers can only
be registered by the ubq_daemon task, so that reference count
increment doesn't need to be atomic. But if you plan to add support
for registering ublk request buffers from other tasks, then that won't
work.
Thanks,
Caleb
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-24 21:10 ` [PATCH 0/2] " Jared Holzman
@ 2025-04-25 1:43 ` Ming Lei
2025-04-25 1:53 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-04-25 1:43 UTC (permalink / raw)
To: Jared Holzman
Cc: Jens Axboe, linux-block, Uday Shankar, Caleb Sander Mateos,
Guy Eisenberg, Yoav Cohen, Omri Levi, Ofer Oshri
On Fri, Apr 25, 2025 at 12:10:31AM +0300, Jared Holzman wrote:
> On 23/04/2025 12:24, Ming Lei wrote:
> > Hello Jens,
> >
> > The 2 patches try to fix race between between io_uring_cmd_complete_in_task
> > and ublk_cancel_cmd, please don't apply until Jared verifies them.
> >
> > Jared, please test and see if the two can fix your crash issue on v6.15-rc3.
> >
> > If you can't reproduce it on v6.15-rc3, please backport them to v6.14, and I
> > can help the backport if you need.
> >
> > Thanks,
> > Ming
> >
> > Ming Lei (2):
> > ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA
> > ublk: fix race between io_uring_cmd_complete_in_task and
> > ublk_cancel_cmd
> >
> > drivers/block/ublk_drv.c | 51 ++++++++++++++++++++++++++--------------
> > 1 file changed, 34 insertions(+), 17 deletions(-)
> >
>
> Hi Ming,
>
> It's a solid fix. I ran it through our automation and it passed 300 iterations without an issue. Previously we were getting crash after less than 20.
>
Jared, thanks for the test and feedback!
> I also back-ported the patches to 6.14 and it works there too.
>
> Will these fixes make it into 6.15? Or only 6.16?
I think it is fine for v6.15 if Jens agrees.
>
> Also is there a 6.14 maintenance branch that could also be fixed or is it end-of-life already?
Both the two are simple enough, which may be backported to any stable tree,
IMO.
thanks,
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
2025-04-25 1:43 ` Ming Lei
@ 2025-04-25 1:53 ` Jens Axboe
0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2025-04-25 1:53 UTC (permalink / raw)
To: Ming Lei, Jared Holzman
Cc: linux-block, Uday Shankar, Caleb Sander Mateos, Guy Eisenberg,
Yoav Cohen, Omri Levi, Ofer Oshri
On 4/24/25 7:43 PM, Ming Lei wrote:
>
>> Hi Ming,
>>
>> It's a solid fix. I ran it through our automation and it passed 300 iterations without an issue. Previously we were getting crash after less than 20.
>>
>
> Jared, thanks for the test and feedback!
>
>
>> I also back-ported the patches to 6.14 and it works there too.
>>
>> Will these fixes make it into 6.15? Or only 6.16?
>
> I think it is fine for v6.15 if Jens agrees.
We can take them for 6.15 I think, I queued up the v2.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-04-25 1:53 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 9:24 [PATCH 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-23 9:24 ` [PATCH 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA Ming Lei
2025-04-23 14:44 ` Caleb Sander Mateos
2025-04-23 14:52 ` Caleb Sander Mateos
2025-04-24 1:53 ` Ming Lei
2025-04-23 9:24 ` [PATCH 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-23 15:08 ` Caleb Sander Mateos
2025-04-23 15:39 ` Ming Lei
2025-04-23 16:48 ` Caleb Sander Mateos
2025-04-24 1:47 ` Ming Lei
2025-04-25 0:55 ` Caleb Sander Mateos
2025-04-24 21:10 ` [PATCH 0/2] " Jared Holzman
2025-04-25 1:43 ` Ming Lei
2025-04-25 1:53 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox