public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed
@ 2026-01-12  4:12 Ming Lei
  2026-01-12  4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei
  2026-01-12  4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei
  0 siblings, 2 replies; 7+ messages in thread
From: Ming Lei @ 2026-01-12  4:12 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei


Hi Jens,

The 1st patch cancels inflight uring_cmd in case of failing to validate start_dev pid.

The end patch fix builtin kublk utility for handling starting daemon failure. 


Ming Lei (2):
  ublk: cancel device on START_DEV failure
  selftests/ublk: fix garbage output and cleanup on failure

 drivers/block/ublk_drv.c             |  9 +++++++--
 tools/testing/selftests/ublk/kublk.c | 14 +++++++-------
 2 files changed, 14 insertions(+), 9 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] ublk: cancel device on START_DEV failure
  2026-01-12  4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei
@ 2026-01-12  4:12 ` Ming Lei
  2026-01-12 16:52   ` Caleb Sander Mateos
  2026-01-12  4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei
  1 sibling, 1 reply; 7+ messages in thread
From: Ming Lei @ 2026-01-12  4:12 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei,
	stable

When ublk_ctrl_start_dev() fails after waiting for completion, the
device needs to be properly cancelled to prevent leaving it in an
inconsistent state. Without this, pending I/O commands may remain
uncompleted and the device cannot be cleanly removed.

Add ublk_cancel_dev() call in the error path to ensure proper cleanup
when START_DEV fails.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index f6e5a0766721..2d6250d61a7b 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -2953,8 +2953,10 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
 	if (wait_for_completion_interruptible(&ub->completion) != 0)
 		return -EINTR;
 
-	if (ub->ublksrv_tgid != ublksrv_pid)
-		return -EINVAL;
+	if (ub->ublksrv_tgid != ublksrv_pid) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	mutex_lock(&ub->mutex);
 	if (ub->dev_info.state == UBLK_S_DEV_LIVE ||
@@ -3017,6 +3019,9 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
 		put_disk(disk);
 out_unlock:
 	mutex_unlock(&ub->mutex);
+out:
+	if (ret)
+		ublk_cancel_dev(ub);
 	return ret;
 }
 
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure
  2026-01-12  4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei
  2026-01-12  4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei
@ 2026-01-12  4:12 ` Ming Lei
  2026-01-12 17:36   ` Caleb Sander Mateos
  1 sibling, 1 reply; 7+ messages in thread
From: Ming Lei @ 2026-01-12  4:12 UTC (permalink / raw)
  To: Jens Axboe, linux-block
  Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei

Fix several issues in kublk:

1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
   in foreground mode. Without this, _evtfd is zero-initialized to 0
   (stdin), and when ublk_send_dev_event() is called on failure, it
   writes binary data to stdin which appears as garbage on the terminal.

2. Move fail label in ublk_start_daemon() to ensure pthread_join() is
   called before queue deinit on the error path. This ensures proper
   thread cleanup when startup fails.

3. Add async parameter to ublk_ctrl_del_dev() and use async deletion
   when the daemon fails to start. This prevents potential hangs when
   deleting a device that failed during startup.

Also fix a debug message format string that was missing __func__ and
had wrong escape character.

Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 tools/testing/selftests/ublk/kublk.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
index 185ba553686a..0c62a967f2cb 100644
--- a/tools/testing/selftests/ublk/kublk.c
+++ b/tools/testing/selftests/ublk/kublk.c
@@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev)
 	return __ublk_ctrl_cmd(dev, &data);
 }
 
-static int ublk_ctrl_del_dev(struct ublk_dev *dev)
+static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async)
 {
 	struct ublk_ctrl_cmd_data data = {
-		.cmd_op = UBLK_U_CMD_DEL_DEV,
-		.flags = 0,
+		.cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV,
 	};
 
 	return __ublk_ctrl_cmd(dev, &data);
@@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
 	else
 		ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id);
 
+ fail:
 	/* wait until we are terminated */
 	for (i = 0; i < dev->nthreads; i++)
 		pthread_join(tinfo[i].thread, &thread_ret);
 	free(tinfo);
- fail:
 	for (i = 0; i < dinfo->nr_hw_queues; i++)
 		ublk_queue_deinit(&dev->q[i]);
 	ublk_dev_unprep(dev);
@@ -1272,9 +1271,9 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
 	}
 
 	ret = ublk_start_daemon(ctx, dev);
-	ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\b", ret);
+	ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\n", __func__, ret);
 	if (ret < 0)
-		ublk_ctrl_del_dev(dev);
+		ublk_ctrl_del_dev(dev, true);
 
 fail:
 	if (ret < 0)
@@ -1371,7 +1370,7 @@ static int __cmd_dev_del(struct dev_ctx *ctx)
 	if (ret < 0)
 		ublk_err("%s: stop daemon id %d dev %d, ret %d\n",
 				__func__, dev->dev_info.ublksrv_pid, number, ret);
-	ublk_ctrl_del_dev(dev);
+	ublk_ctrl_del_dev(dev, false);
 fail:
 	ublk_ctrl_deinit(dev);
 
@@ -1622,6 +1621,7 @@ int main(int argc, char *argv[])
 		.nr_hw_queues	=	2,
 		.dev_id		=	-1,
 		.tgt_type	=	"unknown",
+		._evtfd		=	-1,
 	};
 	int ret = -EINVAL, i;
 	int tgt_argc = 1;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ublk: cancel device on START_DEV failure
  2026-01-12  4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei
@ 2026-01-12 16:52   ` Caleb Sander Mateos
  2026-01-13  1:38     ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Caleb Sander Mateos @ 2026-01-12 16:52 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor, stable

On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> When ublk_ctrl_start_dev() fails after waiting for completion, the
> device needs to be properly cancelled to prevent leaving it in an
> inconsistent state. Without this, pending I/O commands may remain
> uncompleted and the device cannot be cleanly removed.
>
> Add ublk_cancel_dev() call in the error path to ensure proper cleanup
> when START_DEV fails.

It's not clear to me why the UBLK_IO_FETCH_REQ commands must be
cancelled if UBLK_CMD_START_DEV fails. Wouldn't they get cancelled
whenever the ublk device is deleted or the ublk server exits?
And this seems like a UAPI change. Previously, the ublk server could
issue UBLK_CMD_START_DEV again if it failed, but now it must also
resubmit all the UBLK_IO_FETCH_REQ commands. This also means that
issuing UBLK_CMD_START_DEV after the ublk device has already been
started behaves like UBLK_CMD_STOP_DEV, which seems highly
unintuitive.

Best,
Caleb

>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/block/ublk_drv.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index f6e5a0766721..2d6250d61a7b 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -2953,8 +2953,10 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
>         if (wait_for_completion_interruptible(&ub->completion) != 0)
>                 return -EINTR;
>
> -       if (ub->ublksrv_tgid != ublksrv_pid)
> -               return -EINVAL;
> +       if (ub->ublksrv_tgid != ublksrv_pid) {
> +               ret = -EINVAL;
> +               goto out;
> +       }
>
>         mutex_lock(&ub->mutex);
>         if (ub->dev_info.state == UBLK_S_DEV_LIVE ||
> @@ -3017,6 +3019,9 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub,
>                 put_disk(disk);
>  out_unlock:
>         mutex_unlock(&ub->mutex);
> +out:
> +       if (ret)
> +               ublk_cancel_dev(ub);
>         return ret;
>  }
>
> --
> 2.47.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure
  2026-01-12  4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei
@ 2026-01-12 17:36   ` Caleb Sander Mateos
  2026-01-13  1:46     ` Ming Lei
  0 siblings, 1 reply; 7+ messages in thread
From: Caleb Sander Mateos @ 2026-01-12 17:36 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor

On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Fix several issues in kublk:
>
> 1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
>    in foreground mode. Without this, _evtfd is zero-initialized to 0
>    (stdin), and when ublk_send_dev_event() is called on failure, it
>    writes binary data to stdin which appears as garbage on the terminal.

Nice, I always wondered why that happened!

>
> 2. Move fail label in ublk_start_daemon() to ensure pthread_join() is
>    called before queue deinit on the error path. This ensures proper
>    thread cleanup when startup fails.
>
> 3. Add async parameter to ublk_ctrl_del_dev() and use async deletion
>    when the daemon fails to start. This prevents potential hangs when
>    deleting a device that failed during startup.
>
> Also fix a debug message format string that was missing __func__ and
> had wrong escape character.

These all look good, but maybe they would make sense as separate commits?

>
> Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  tools/testing/selftests/ublk/kublk.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
> index 185ba553686a..0c62a967f2cb 100644
> --- a/tools/testing/selftests/ublk/kublk.c
> +++ b/tools/testing/selftests/ublk/kublk.c
> @@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev)
>         return __ublk_ctrl_cmd(dev, &data);
>  }
>
> -static int ublk_ctrl_del_dev(struct ublk_dev *dev)
> +static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async)
>  {
>         struct ublk_ctrl_cmd_data data = {
> -               .cmd_op = UBLK_U_CMD_DEL_DEV,
> -               .flags = 0,
> +               .cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV,
>         };
>
>         return __ublk_ctrl_cmd(dev, &data);
> @@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
>         else
>                 ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id);
>
> + fail:
>         /* wait until we are terminated */
>         for (i = 0; i < dev->nthreads; i++)
>                 pthread_join(tinfo[i].thread, &thread_ret);

Is it valid to call pthread_join() on a zeroed pthread_t value? If
ublk_queue_init() fails, there is a goto fail before any of the
tinfo[i].thread have been assigned. And there's no checking of the
return value from pthread_create(), so if pthread_create() fails,
tinfo[i].thread may not be assigned either.

Best,
Caleb

>         free(tinfo);
> - fail:
>         for (i = 0; i < dinfo->nr_hw_queues; i++)
>                 ublk_queue_deinit(&dev->q[i]);
>         ublk_dev_unprep(dev);
> @@ -1272,9 +1271,9 @@ static int __cmd_dev_add(const struct dev_ctx *ctx)
>         }
>
>         ret = ublk_start_daemon(ctx, dev);
> -       ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\b", ret);
> +       ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\n", __func__, ret);
>         if (ret < 0)
> -               ublk_ctrl_del_dev(dev);
> +               ublk_ctrl_del_dev(dev, true);
>
>  fail:
>         if (ret < 0)
> @@ -1371,7 +1370,7 @@ static int __cmd_dev_del(struct dev_ctx *ctx)
>         if (ret < 0)
>                 ublk_err("%s: stop daemon id %d dev %d, ret %d\n",
>                                 __func__, dev->dev_info.ublksrv_pid, number, ret);
> -       ublk_ctrl_del_dev(dev);
> +       ublk_ctrl_del_dev(dev, false);
>  fail:
>         ublk_ctrl_deinit(dev);
>
> @@ -1622,6 +1621,7 @@ int main(int argc, char *argv[])
>                 .nr_hw_queues   =       2,
>                 .dev_id         =       -1,
>                 .tgt_type       =       "unknown",
> +               ._evtfd         =       -1,
>         };
>         int ret = -EINVAL, i;
>         int tgt_argc = 1;
> --
> 2.47.0
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] ublk: cancel device on START_DEV failure
  2026-01-12 16:52   ` Caleb Sander Mateos
@ 2026-01-13  1:38     ` Ming Lei
  0 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2026-01-13  1:38 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor, stable

On Mon, Jan 12, 2026 at 08:52:31AM -0800, Caleb Sander Mateos wrote:
> On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > When ublk_ctrl_start_dev() fails after waiting for completion, the
> > device needs to be properly cancelled to prevent leaving it in an
> > inconsistent state. Without this, pending I/O commands may remain
> > uncompleted and the device cannot be cleanly removed.
> >
> > Add ublk_cancel_dev() call in the error path to ensure proper cleanup
> > when START_DEV fails.
> 
> It's not clear to me why the UBLK_IO_FETCH_REQ commands must be
> cancelled if UBLK_CMD_START_DEV fails. Wouldn't they get cancelled
> whenever the ublk device is deleted or the ublk server exits?

Good catch, DEL_DEV/STOP_DEV supposes to be capable of handling irrecoverable
START_DEV failure.

So this patch isn't needed.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure
  2026-01-12 17:36   ` Caleb Sander Mateos
@ 2026-01-13  1:46     ` Ming Lei
  0 siblings, 0 replies; 7+ messages in thread
From: Ming Lei @ 2026-01-13  1:46 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor

On Mon, Jan 12, 2026 at 09:36:39AM -0800, Caleb Sander Mateos wrote:
> On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > Fix several issues in kublk:
> >
> > 1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output
> >    in foreground mode. Without this, _evtfd is zero-initialized to 0
> >    (stdin), and when ublk_send_dev_event() is called on failure, it
> >    writes binary data to stdin which appears as garbage on the terminal.
> 
> Nice, I always wondered why that happened!
> 
> >
> > 2. Move fail label in ublk_start_daemon() to ensure pthread_join() is
> >    called before queue deinit on the error path. This ensures proper
> >    thread cleanup when startup fails.
> >
> > 3. Add async parameter to ublk_ctrl_del_dev() and use async deletion
> >    when the daemon fails to start. This prevents potential hangs when
> >    deleting a device that failed during startup.
> >
> > Also fix a debug message format string that was missing __func__ and
> > had wrong escape character.
> 
> These all look good, but maybe they would make sense as separate commits?

Fine, I will split it into two patches:

- one is for fixing start_dev failure handling

- another one is for fixing misc debug message problems

> 
> >
> > Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  tools/testing/selftests/ublk/kublk.c | 14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
> > index 185ba553686a..0c62a967f2cb 100644
> > --- a/tools/testing/selftests/ublk/kublk.c
> > +++ b/tools/testing/selftests/ublk/kublk.c
> > @@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev)
> >         return __ublk_ctrl_cmd(dev, &data);
> >  }
> >
> > -static int ublk_ctrl_del_dev(struct ublk_dev *dev)
> > +static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async)
> >  {
> >         struct ublk_ctrl_cmd_data data = {
> > -               .cmd_op = UBLK_U_CMD_DEL_DEV,
> > -               .flags = 0,
> > +               .cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV,
> >         };
> >
> >         return __ublk_ctrl_cmd(dev, &data);
> > @@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev)
> >         else
> >                 ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id);
> >
> > + fail:
> >         /* wait until we are terminated */
> >         for (i = 0; i < dev->nthreads; i++)
> >                 pthread_join(tinfo[i].thread, &thread_ret);
> 
> Is it valid to call pthread_join() on a zeroed pthread_t value? If
> ublk_queue_init() fails, there is a goto fail before any of the
> tinfo[i].thread have been assigned. And there's no checking of the
> return value from pthread_create(), so if pthread_create() fails,
> tinfo[i].thread may not be assigned either.

Good catch, will add new fail_queue_init label for covering it.
 

Thanks,
Ming


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-01-13  1:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-12  4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei
2026-01-12  4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei
2026-01-12 16:52   ` Caleb Sander Mateos
2026-01-13  1:38     ` Ming Lei
2026-01-12  4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei
2026-01-12 17:36   ` Caleb Sander Mateos
2026-01-13  1:46     ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox