* [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed
@ 2026-01-12 4:12 Ming Lei
2026-01-12 4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei
2026-01-12 4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei
0 siblings, 2 replies; 7+ messages in thread
From: Ming Lei @ 2026-01-12 4:12 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei
Hi Jens,
The 1st patch cancels inflight uring_cmd in case of failing to validate start_dev pid.
The end patch fix builtin kublk utility for handling starting daemon failure.
Ming Lei (2):
ublk: cancel device on START_DEV failure
selftests/ublk: fix garbage output and cleanup on failure
drivers/block/ublk_drv.c | 9 +++++++--
tools/testing/selftests/ublk/kublk.c | 14 +++++++-------
2 files changed, 14 insertions(+), 9 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] ublk: cancel device on START_DEV failure
2026-01-12 4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei
@ 2026-01-12 4:12 ` Ming Lei
2026-01-12 16:52 ` Caleb Sander Mateos
2026-01-12 4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei
1 sibling, 1 reply; 7+ messages in thread
From: Ming Lei @ 2026-01-12 4:12 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei,
stable
When ublk_ctrl_start_dev() fails after waiting for completion, the
device needs to be properly cancelled to prevent leaving it in an
inconsistent state. Without this, pending I/O commands may remain
uncompleted and the device cannot be cleanly removed.
Add ublk_cancel_dev() call in the error path to ensure proper cleanup
when START_DEV fails.
Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
drivers/block/ublk_drv.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index f6e5a0766721..2d6250d61a7b 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -2953,8 +2953,10 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
if (wait_for_completion_interruptible(&ub->completion) != 0)
return -EINTR;
- if (ub->ublksrv_tgid != ublksrv_pid)
- return -EINVAL;
+ if (ub->ublksrv_tgid != ublksrv_pid) {
+ ret = -EINVAL;
+ goto out;
+ }
mutex_lock(&ub->mutex);
if (ub->dev_info.state == UBLK_S_DEV_LIVE ||
@@ -3017,6 +3019,9 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
put_disk(disk);
out_unlock:
mutex_unlock(&ub->mutex);
+out:
+ if (ret)
+ ublk_cancel_dev(ub);
return ret;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure
2026-01-12 4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei
2026-01-12 4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei
@ 2026-01-12 4:12 ` Ming Lei
2026-01-12 17:36 ` Caleb Sander Mateos
1 sibling, 1 reply; 7+ messages in thread
From: Ming Lei @ 2026-01-12 4:12 UTC (permalink / raw)
To: Jens Axboe, linux-block
Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei
Fix several issues in kublk:
1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
in foreground mode. Without this, _evtfd is zero-initialized to 0
(stdin), and when ublk_send_dev_event() is called on failure, it
writes binary data to stdin which appears as garbage on the terminal.
2. Move fail label in ublk_start_daemon() to ensure pthread_join() is
called before queue deinit on the error path. This ensures proper
thread cleanup when startup fails.
3. Add async parameter to ublk_ctrl_del_dev() and use async deletion
when the daemon fails to start. This prevents potential hangs when
deleting a device that failed during startup.
Also fix a debug message format string that was missing __func__ and
had wrong escape character.
Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
tools/testing/selftests/ublk/kublk.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
index 185ba553686a..0c62a967f2cb 100644
--- a/tools/testing/selftests/ublk/kublk.c
+++ b/tools/testing/selftests/ublk/kublk.c
@@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev)
return __ublk_ctrl_cmd(dev, &data);
}
-static int ublk_ctrl_del_dev(struct ublk_dev *dev)
+static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async)
{
struct ublk_ctrl_cmd_data data = {
- .cmd_op = UBLK_U_CMD_DEL_DEV,
- .flags = 0,
+ .cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV,
};
return __ublk_ctrl_cmd(dev, &data);
@@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
else
ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id);
+ fail:
/* wait until we are terminated */
for (i = 0; i < dev->nthreads; i++)
pthread_join(tinfo[i].thread, &thread_ret);
free(tinfo);
- fail:
for (i = 0; i < dinfo->nr_hw_queues; i++)
ublk_queue_deinit(&dev->q[i]);
ublk_dev_unprep(dev);
@@ -1272,9 +1271,9 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
}
ret = ublk_start_daemon(ctx, dev);
- ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\b", ret);
+ ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\n", __func__, ret);
if (ret < 0)
- ublk_ctrl_del_dev(dev);
+ ublk_ctrl_del_dev(dev, true);
fail:
if (ret < 0)
@@ -1371,7 +1370,7 @@ static int __cmd_dev_del(struct dev_ctx *ctx)
if (ret < 0)
ublk_err("%s: stop daemon id %d dev %d, ret %d\n",
__func__, dev->dev_info.ublksrv_pid, number, ret);
- ublk_ctrl_del_dev(dev);
+ ublk_ctrl_del_dev(dev, false);
fail:
ublk_ctrl_deinit(dev);
@@ -1622,6 +1621,7 @@ int main(int argc, char *argv[])
.nr_hw_queues = 2,
.dev_id = -1,
.tgt_type = "unknown",
+ ._evtfd = -1,
};
int ret = -EINVAL, i;
int tgt_argc = 1;
--
2.47.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] ublk: cancel device on START_DEV failure
2026-01-12 4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei
@ 2026-01-12 16:52 ` Caleb Sander Mateos
2026-01-13 1:38 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Caleb Sander Mateos @ 2026-01-12 16:52 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor, stable
On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> When ublk_ctrl_start_dev() fails after waiting for completion, the
> device needs to be properly cancelled to prevent leaving it in an
> inconsistent state. Without this, pending I/O commands may remain
> uncompleted and the device cannot be cleanly removed.
>
> Add ublk_cancel_dev() call in the error path to ensure proper cleanup
> when START_DEV fails.
It's not clear to me why the UBLK_IO_FETCH_REQ commands must be
cancelled if UBLK_CMD_START_DEV fails. Wouldn't they get cancelled
whenever the ublk device is deleted or the ublk server exits?
And this seems like a UAPI change. Previously, the ublk server could
issue UBLK_CMD_START_DEV again if it failed, but now it must also
resubmit all the UBLK_IO_FETCH_REQ commands. This also means that
issuing UBLK_CMD_START_DEV after the ublk device has already been
started behaves like UBLK_CMD_STOP_DEV, which seems highly
unintuitive.
Best,
Caleb
>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/block/ublk_drv.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index f6e5a0766721..2d6250d61a7b 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -2953,8 +2953,10 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
> if (wait_for_completion_interruptible(&ub->completion) != 0)
> return -EINTR;
>
> - if (ub->ublksrv_tgid != ublksrv_pid)
> - return -EINVAL;
> + if (ub->ublksrv_tgid != ublksrv_pid) {
> + ret = -EINVAL;
> + goto out;
> + }
>
> mutex_lock(&ub->mutex);
> if (ub->dev_info.state == UBLK_S_DEV_LIVE ||
> @@ -3017,6 +3019,9 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
> put_disk(disk);
> out_unlock:
> mutex_unlock(&ub->mutex);
> +out:
> + if (ret)
> + ublk_cancel_dev(ub);
> return ret;
> }
>
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure
2026-01-12 4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei
@ 2026-01-12 17:36 ` Caleb Sander Mateos
2026-01-13 1:46 ` Ming Lei
0 siblings, 1 reply; 7+ messages in thread
From: Caleb Sander Mateos @ 2026-01-12 17:36 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor
On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Fix several issues in kublk:
>
> 1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
> in foreground mode. Without this, _evtfd is zero-initialized to 0
> (stdin), and when ublk_send_dev_event() is called on failure, it
> writes binary data to stdin which appears as garbage on the terminal.
Nice, I always wondered why that happened!
>
> 2. Move fail label in ublk_start_daemon() to ensure pthread_join() is
> called before queue deinit on the error path. This ensures proper
> thread cleanup when startup fails.
>
> 3. Add async parameter to ublk_ctrl_del_dev() and use async deletion
> when the daemon fails to start. This prevents potential hangs when
> deleting a device that failed during startup.
>
> Also fix a debug message format string that was missing __func__ and
> had wrong escape character.
These all look good, but maybe they would make sense as separate commits?
>
> Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> tools/testing/selftests/ublk/kublk.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
> index 185ba553686a..0c62a967f2cb 100644
> --- a/tools/testing/selftests/ublk/kublk.c
> +++ b/tools/testing/selftests/ublk/kublk.c
> @@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev)
> return __ublk_ctrl_cmd(dev, &data);
> }
>
> -static int ublk_ctrl_del_dev(struct ublk_dev *dev)
> +static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async)
> {
> struct ublk_ctrl_cmd_data data = {
> - .cmd_op = UBLK_U_CMD_DEL_DEV,
> - .flags = 0,
> + .cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV,
> };
>
> return __ublk_ctrl_cmd(dev, &data);
> @@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
> else
> ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id);
>
> + fail:
> /* wait until we are terminated */
> for (i = 0; i < dev->nthreads; i++)
> pthread_join(tinfo[i].thread, &thread_ret);
Is it valid to call pthread_join() on a zeroed pthread_t value? If
ublk_queue_init() fails, there is a goto fail before any of the
tinfo[i].thread have been assigned. And there's no checking of the
return value from pthread_create(), so if pthread_create() fails,
tinfo[i].thread may not be assigned either.
Best,
Caleb
> free(tinfo);
> - fail:
> for (i = 0; i < dinfo->nr_hw_queues; i++)
> ublk_queue_deinit(&dev->q[i]);
> ublk_dev_unprep(dev);
> @@ -1272,9 +1271,9 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
> }
>
> ret = ublk_start_daemon(ctx, dev);
> - ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\b", ret);
> + ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\n", __func__, ret);
> if (ret < 0)
> - ublk_ctrl_del_dev(dev);
> + ublk_ctrl_del_dev(dev, true);
>
> fail:
> if (ret < 0)
> @@ -1371,7 +1370,7 @@ static int __cmd_dev_del(struct dev_ctx *ctx)
> if (ret < 0)
> ublk_err("%s: stop daemon id %d dev %d, ret %d\n",
> __func__, dev->dev_info.ublksrv_pid, number, ret);
> - ublk_ctrl_del_dev(dev);
> + ublk_ctrl_del_dev(dev, false);
> fail:
> ublk_ctrl_deinit(dev);
>
> @@ -1622,6 +1621,7 @@ int main(int argc, char *argv[])
> .nr_hw_queues = 2,
> .dev_id = -1,
> .tgt_type = "unknown",
> + ._evtfd = -1,
> };
> int ret = -EINVAL, i;
> int tgt_argc = 1;
> --
> 2.47.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] ublk: cancel device on START_DEV failure
2026-01-12 16:52 ` Caleb Sander Mateos
@ 2026-01-13 1:38 ` Ming Lei
0 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2026-01-13 1:38 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor, stable
On Mon, Jan 12, 2026 at 08:52:31AM -0800, Caleb Sander Mateos wrote:
> On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > When ublk_ctrl_start_dev() fails after waiting for completion, the
> > device needs to be properly cancelled to prevent leaving it in an
> > inconsistent state. Without this, pending I/O commands may remain
> > uncompleted and the device cannot be cleanly removed.
> >
> > Add ublk_cancel_dev() call in the error path to ensure proper cleanup
> > when START_DEV fails.
>
> It's not clear to me why the UBLK_IO_FETCH_REQ commands must be
> cancelled if UBLK_CMD_START_DEV fails. Wouldn't they get cancelled
> whenever the ublk device is deleted or the ublk server exits?
Good catch, DEL_DEV/STOP_DEV supposes to be capable of handling irrecoverable
START_DEV failure.
So this patch isn't needed.
Thanks,
Ming
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure
2026-01-12 17:36 ` Caleb Sander Mateos
@ 2026-01-13 1:46 ` Ming Lei
0 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2026-01-13 1:46 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor
On Mon, Jan 12, 2026 at 09:36:39AM -0800, Caleb Sander Mateos wrote:
> On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Fix several issues in kublk:
> >
> > 1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
> > in foreground mode. Without this, _evtfd is zero-initialized to 0
> > (stdin), and when ublk_send_dev_event() is called on failure, it
> > writes binary data to stdin which appears as garbage on the terminal.
>
> Nice, I always wondered why that happened!
>
> >
> > 2. Move fail label in ublk_start_daemon() to ensure pthread_join() is
> > called before queue deinit on the error path. This ensures proper
> > thread cleanup when startup fails.
> >
> > 3. Add async parameter to ublk_ctrl_del_dev() and use async deletion
> > when the daemon fails to start. This prevents potential hangs when
> > deleting a device that failed during startup.
> >
> > Also fix a debug message format string that was missing __func__ and
> > had wrong escape character.
>
> These all look good, but maybe they would make sense as separate commits?
Fine, I will split it into two patches:
- one is for fixing start_dev failure handling
- another one is for fixing misc debug message problems
>
> >
> > Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > tools/testing/selftests/ublk/kublk.c | 14 +++++++-------
> > 1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
> > index 185ba553686a..0c62a967f2cb 100644
> > --- a/tools/testing/selftests/ublk/kublk.c
> > +++ b/tools/testing/selftests/ublk/kublk.c
> > @@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev)
> > return __ublk_ctrl_cmd(dev, &data);
> > }
> >
> > -static int ublk_ctrl_del_dev(struct ublk_dev *dev)
> > +static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async)
> > {
> > struct ublk_ctrl_cmd_data data = {
> > - .cmd_op = UBLK_U_CMD_DEL_DEV,
> > - .flags = 0,
> > + .cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV,
> > };
> >
> > return __ublk_ctrl_cmd(dev, &data);
> > @@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
> > else
> > ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id);
> >
> > + fail:
> > /* wait until we are terminated */
> > for (i = 0; i < dev->nthreads; i++)
> > pthread_join(tinfo[i].thread, &thread_ret);
>
> Is it valid to call pthread_join() on a zeroed pthread_t value? If
> ublk_queue_init() fails, there is a goto fail before any of the
> tinfo[i].thread have been assigned. And there's no checking of the
> return value from pthread_create(), so if pthread_create() fails,
> tinfo[i].thread may not be assigned either.
Good catch, will add new fail_queue_init label for covering it.
Thanks,
Ming
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-13 1:46 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12 4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei
2026-01-12 4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei
2026-01-12 16:52 ` Caleb Sander Mateos
2026-01-13 1:38 ` Ming Lei
2026-01-12 4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei
2026-01-12 17:36 ` Caleb Sander Mateos
2026-01-13 1:46 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox