* [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed @ 2026-01-12 4:12 Ming Lei 2026-01-12 4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei 2026-01-12 4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei 0 siblings, 2 replies; 7+ messages in thread From: Ming Lei @ 2026-01-12 4:12 UTC (permalink / raw) To: Jens Axboe, linux-block Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei Hi Jens, The 1st patch cancels inflight uring_cmd in case of failing to validate start_dev pid. The end patch fix builtin kublk utility for handling starting daemon failure. Ming Lei (2): ublk: cancel device on START_DEV failure selftests/ublk: fix garbage output and cleanup on failure drivers/block/ublk_drv.c | 9 +++++++-- tools/testing/selftests/ublk/kublk.c | 14 +++++++------- 2 files changed, 14 insertions(+), 9 deletions(-) -- 2.47.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] ublk: cancel device on START_DEV failure 2026-01-12 4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei @ 2026-01-12 4:12 ` Ming Lei 2026-01-12 16:52 ` Caleb Sander Mateos 2026-01-12 4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei 1 sibling, 1 reply; 7+ messages in thread From: Ming Lei @ 2026-01-12 4:12 UTC (permalink / raw) To: Jens Axboe, linux-block Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei, stable When ublk_ctrl_start_dev() fails after waiting for completion, the device needs to be properly cancelled to prevent leaving it in an inconsistent state. Without this, pending I/O commands may remain uncompleted and the device cannot be cleanly removed. Add ublk_cancel_dev() call in the error path to ensure proper cleanup when START_DEV fails. Cc: <stable@vger.kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- drivers/block/ublk_drv.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index f6e5a0766721..2d6250d61a7b 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -2953,8 +2953,10 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, if (wait_for_completion_interruptible(&ub->completion) != 0) return -EINTR; - if (ub->ublksrv_tgid != ublksrv_pid) - return -EINVAL; + if (ub->ublksrv_tgid != ublksrv_pid) { + ret = -EINVAL; + goto out; + } mutex_lock(&ub->mutex); if (ub->dev_info.state == UBLK_S_DEV_LIVE || @@ -3017,6 +3019,9 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, put_disk(disk); out_unlock: mutex_unlock(&ub->mutex); +out: + if (ret) + ublk_cancel_dev(ub); return ret; } -- 2.47.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] ublk: cancel device on START_DEV failure 2026-01-12 4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei @ 2026-01-12 16:52 ` Caleb Sander Mateos 2026-01-13 1:38 ` Ming Lei 0 siblings, 1 reply; 7+ messages in thread From: Caleb Sander Mateos @ 2026-01-12 16:52 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor, stable On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote: > > When ublk_ctrl_start_dev() fails after waiting for completion, the > device needs to be properly cancelled to prevent leaving it in an > inconsistent state. Without this, pending I/O commands may remain > uncompleted and the device cannot be cleanly removed. > > Add ublk_cancel_dev() call in the error path to ensure proper cleanup > when START_DEV fails. It's not clear to me why the UBLK_IO_FETCH_REQ commands must be cancelled if UBLK_CMD_START_DEV fails. Wouldn't they get cancelled whenever the ublk device is deleted or the ublk server exits? And this seems like a UAPI change. Previously, the ublk server could issue UBLK_CMD_START_DEV again if it failed, but now it must also resubmit all the UBLK_IO_FETCH_REQ commands. This also means that issuing UBLK_CMD_START_DEV after the ublk device has already been started behaves like UBLK_CMD_STOP_DEV, which seems highly unintuitive. Best, Caleb > > Cc: <stable@vger.kernel.org> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > drivers/block/ublk_drv.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > index f6e5a0766721..2d6250d61a7b 100644 > --- a/drivers/block/ublk_drv.c > +++ b/drivers/block/ublk_drv.c > @@ -2953,8 +2953,10 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, > if (wait_for_completion_interruptible(&ub->completion) != 0) > return -EINTR; > > - if (ub->ublksrv_tgid != ublksrv_pid) > - return -EINVAL; > + if (ub->ublksrv_tgid != ublksrv_pid) { > + ret = -EINVAL; > + goto out; > + } > > mutex_lock(&ub->mutex); > if (ub->dev_info.state == UBLK_S_DEV_LIVE || > @@ -3017,6 +3019,9 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, > put_disk(disk); > out_unlock: > mutex_unlock(&ub->mutex); > +out: > + if (ret) > + ublk_cancel_dev(ub); > return ret; > } > > -- > 2.47.0 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] ublk: cancel device on START_DEV failure 2026-01-12 16:52 ` Caleb Sander Mateos @ 2026-01-13 1:38 ` Ming Lei 0 siblings, 0 replies; 7+ messages in thread From: Ming Lei @ 2026-01-13 1:38 UTC (permalink / raw) To: Caleb Sander Mateos Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor, stable On Mon, Jan 12, 2026 at 08:52:31AM -0800, Caleb Sander Mateos wrote: > On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote: > > > > When ublk_ctrl_start_dev() fails after waiting for completion, the > > device needs to be properly cancelled to prevent leaving it in an > > inconsistent state. Without this, pending I/O commands may remain > > uncompleted and the device cannot be cleanly removed. > > > > Add ublk_cancel_dev() call in the error path to ensure proper cleanup > > when START_DEV fails. > > It's not clear to me why the UBLK_IO_FETCH_REQ commands must be > cancelled if UBLK_CMD_START_DEV fails. Wouldn't they get cancelled > whenever the ublk device is deleted or the ublk server exits? Good catch, DEL_DEV/STOP_DEV supposes to be capable of handling irrecoverable START_DEV failure. So this patch isn't needed. Thanks, Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure 2026-01-12 4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei 2026-01-12 4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei @ 2026-01-12 4:12 ` Ming Lei 2026-01-12 17:36 ` Caleb Sander Mateos 1 sibling, 1 reply; 7+ messages in thread From: Ming Lei @ 2026-01-12 4:12 UTC (permalink / raw) To: Jens Axboe, linux-block Cc: Caleb Sander Mateos, Uday Shankar, Seamus Connor, Ming Lei Fix several issues in kublk: 1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output in foreground mode. Without this, _evtfd is zero-initialized to 0 (stdin), and when ublk_send_dev_event() is called on failure, it writes binary data to stdin which appears as garbage on the terminal. 2. Move fail label in ublk_start_daemon() to ensure pthread_join() is called before queue deinit on the error path. This ensures proper thread cleanup when startup fails. 3. Add async parameter to ublk_ctrl_del_dev() and use async deletion when the daemon fails to start. This prevents potential hangs when deleting a device that failed during startup. Also fix a debug message format string that was missing __func__ and had wrong escape character. Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk") Signed-off-by: Ming Lei <ming.lei@redhat.com> --- tools/testing/selftests/ublk/kublk.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c index 185ba553686a..0c62a967f2cb 100644 --- a/tools/testing/selftests/ublk/kublk.c +++ b/tools/testing/selftests/ublk/kublk.c @@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev) return __ublk_ctrl_cmd(dev, &data); } -static int ublk_ctrl_del_dev(struct ublk_dev *dev) +static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async) { struct ublk_ctrl_cmd_data data = { - .cmd_op = UBLK_U_CMD_DEL_DEV, - .flags = 0, + .cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV, }; return __ublk_ctrl_cmd(dev, &data); @@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev) else ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id); + fail: /* wait until we are terminated */ for (i = 0; i < dev->nthreads; i++) pthread_join(tinfo[i].thread, &thread_ret); free(tinfo); - fail: for (i = 0; i < dinfo->nr_hw_queues; i++) ublk_queue_deinit(&dev->q[i]); ublk_dev_unprep(dev); @@ -1272,9 +1271,9 @@ static int __cmd_dev_add(const struct dev_ctx *ctx) } ret = ublk_start_daemon(ctx, dev); - ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\b", ret); + ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\n", __func__, ret); if (ret < 0) - ublk_ctrl_del_dev(dev); + ublk_ctrl_del_dev(dev, true); fail: if (ret < 0) @@ -1371,7 +1370,7 @@ static int __cmd_dev_del(struct dev_ctx *ctx) if (ret < 0) ublk_err("%s: stop daemon id %d dev %d, ret %d\n", __func__, dev->dev_info.ublksrv_pid, number, ret); - ublk_ctrl_del_dev(dev); + ublk_ctrl_del_dev(dev, false); fail: ublk_ctrl_deinit(dev); @@ -1622,6 +1621,7 @@ int main(int argc, char *argv[]) .nr_hw_queues = 2, .dev_id = -1, .tgt_type = "unknown", + ._evtfd = -1, }; int ret = -EINVAL, i; int tgt_argc = 1; -- 2.47.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure 2026-01-12 4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei @ 2026-01-12 17:36 ` Caleb Sander Mateos 2026-01-13 1:46 ` Ming Lei 0 siblings, 1 reply; 7+ messages in thread From: Caleb Sander Mateos @ 2026-01-12 17:36 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote: > > Fix several issues in kublk: > > 1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output > in foreground mode. Without this, _evtfd is zero-initialized to 0 > (stdin), and when ublk_send_dev_event() is called on failure, it > writes binary data to stdin which appears as garbage on the terminal. Nice, I always wondered why that happened! > > 2. Move fail label in ublk_start_daemon() to ensure pthread_join() is > called before queue deinit on the error path. This ensures proper > thread cleanup when startup fails. > > 3. Add async parameter to ublk_ctrl_del_dev() and use async deletion > when the daemon fails to start. This prevents potential hangs when > deleting a device that failed during startup. > > Also fix a debug message format string that was missing __func__ and > had wrong escape character. These all look good, but maybe they would make sense as separate commits? > > Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk") > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > tools/testing/selftests/ublk/kublk.c | 14 +++++++------- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c > index 185ba553686a..0c62a967f2cb 100644 > --- a/tools/testing/selftests/ublk/kublk.c > +++ b/tools/testing/selftests/ublk/kublk.c > @@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev) > return __ublk_ctrl_cmd(dev, &data); > } > > -static int ublk_ctrl_del_dev(struct ublk_dev *dev) > +static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async) > { > struct ublk_ctrl_cmd_data data = { > - .cmd_op = UBLK_U_CMD_DEL_DEV, > - .flags = 0, > + .cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV, > }; > > return __ublk_ctrl_cmd(dev, &data); > @@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev) > else > ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id); > > + fail: > /* wait until we are terminated */ > for (i = 0; i < dev->nthreads; i++) > pthread_join(tinfo[i].thread, &thread_ret); Is it valid to call pthread_join() on a zeroed pthread_t value? If ublk_queue_init() fails, there is a goto fail before any of the tinfo[i].thread have been assigned. And there's no checking of the return value from pthread_create(), so if pthread_create() fails, tinfo[i].thread may not be assigned either. Best, Caleb > free(tinfo); > - fail: > for (i = 0; i < dinfo->nr_hw_queues; i++) > ublk_queue_deinit(&dev->q[i]); > ublk_dev_unprep(dev); > @@ -1272,9 +1271,9 @@ static int __cmd_dev_add(const struct dev_ctx *ctx) > } > > ret = ublk_start_daemon(ctx, dev); > - ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\b", ret); > + ublk_dbg(UBLK_DBG_DEV, "%s: daemon exit %d\n", __func__, ret); > if (ret < 0) > - ublk_ctrl_del_dev(dev); > + ublk_ctrl_del_dev(dev, true); > > fail: > if (ret < 0) > @@ -1371,7 +1370,7 @@ static int __cmd_dev_del(struct dev_ctx *ctx) > if (ret < 0) > ublk_err("%s: stop daemon id %d dev %d, ret %d\n", > __func__, dev->dev_info.ublksrv_pid, number, ret); > - ublk_ctrl_del_dev(dev); > + ublk_ctrl_del_dev(dev, false); > fail: > ublk_ctrl_deinit(dev); > > @@ -1622,6 +1621,7 @@ int main(int argc, char *argv[]) > .nr_hw_queues = 2, > .dev_id = -1, > .tgt_type = "unknown", > + ._evtfd = -1, > }; > int ret = -EINVAL, i; > int tgt_argc = 1; > -- > 2.47.0 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure 2026-01-12 17:36 ` Caleb Sander Mateos @ 2026-01-13 1:46 ` Ming Lei 0 siblings, 0 replies; 7+ messages in thread From: Ming Lei @ 2026-01-13 1:46 UTC (permalink / raw) To: Caleb Sander Mateos; +Cc: Jens Axboe, linux-block, Uday Shankar, Seamus Connor On Mon, Jan 12, 2026 at 09:36:39AM -0800, Caleb Sander Mateos wrote: > On Sun, Jan 11, 2026 at 8:12 PM Ming Lei <ming.lei@redhat.com> wrote: > > > > Fix several issues in kublk: > > > > 1. Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output > > in foreground mode. Without this, _evtfd is zero-initialized to 0 > > (stdin), and when ublk_send_dev_event() is called on failure, it > > writes binary data to stdin which appears as garbage on the terminal. > > Nice, I always wondered why that happened! > > > > > 2. Move fail label in ublk_start_daemon() to ensure pthread_join() is > > called before queue deinit on the error path. This ensures proper > > thread cleanup when startup fails. > > > > 3. Add async parameter to ublk_ctrl_del_dev() and use async deletion > > when the daemon fails to start. This prevents potential hangs when > > deleting a device that failed during startup. > > > > Also fix a debug message format string that was missing __func__ and > > had wrong escape character. > > These all look good, but maybe they would make sense as separate commits? Fine, I will split it into two patches: - one is for fixing start_dev failure handling - another one is for fixing misc debug message problems > > > > > Fixes: 6aecda00b7d1 ("selftests: ublk: add kernel selftests for ublk") > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > tools/testing/selftests/ublk/kublk.c | 14 +++++++------- > > 1 file changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c > > index 185ba553686a..0c62a967f2cb 100644 > > --- a/tools/testing/selftests/ublk/kublk.c > > +++ b/tools/testing/selftests/ublk/kublk.c > > @@ -153,11 +153,10 @@ static int ublk_ctrl_add_dev(struct ublk_dev *dev) > > return __ublk_ctrl_cmd(dev, &data); > > } > > > > -static int ublk_ctrl_del_dev(struct ublk_dev *dev) > > +static int ublk_ctrl_del_dev(struct ublk_dev *dev, bool async) > > { > > struct ublk_ctrl_cmd_data data = { > > - .cmd_op = UBLK_U_CMD_DEL_DEV, > > - .flags = 0, > > + .cmd_op = async ? UBLK_U_CMD_DEL_DEV_ASYNC: UBLK_U_CMD_DEL_DEV, > > }; > > > > return __ublk_ctrl_cmd(dev, &data); > > @@ -1063,11 +1062,11 @@ static int ublk_start_daemon(const struct dev_ctx *ctx, struct ublk_dev *dev) > > else > > ublk_send_dev_event(ctx, dev, dev->dev_info.dev_id); > > > > + fail: > > /* wait until we are terminated */ > > for (i = 0; i < dev->nthreads; i++) > > pthread_join(tinfo[i].thread, &thread_ret); > > Is it valid to call pthread_join() on a zeroed pthread_t value? If > ublk_queue_init() fails, there is a goto fail before any of the > tinfo[i].thread have been assigned. And there's no checking of the > return value from pthread_create(), so if pthread_create() fails, > tinfo[i].thread may not be assigned either. Good catch, will add new fail_queue_init label for covering it. Thanks, Ming ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-01-13 1:46 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-12 4:12 [PATCH 0/2] ublk: fix error handling if start_dev pid validation failed Ming Lei 2026-01-12 4:12 ` [PATCH 1/2] ublk: cancel device on START_DEV failure Ming Lei 2026-01-12 16:52 ` Caleb Sander Mateos 2026-01-13 1:38 ` Ming Lei 2026-01-12 4:12 ` [PATCH 2/2] selftests/ublk: fix garbage output and cleanup on failure Ming Lei 2026-01-12 17:36 ` Caleb Sander Mateos 2026-01-13 1:46 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox