public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] ublk: ublk server hangs in D state during STOP_DEV
@ 2026-01-16 14:15 huang-jl
  2026-01-16 14:58 ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: huang-jl @ 2026-01-16 14:15 UTC (permalink / raw)
  To: ming.lei; +Cc: linux-block

Hello,

I am reporting a bug in the ublk driver observed during production usage.
Under specific conditions during device removal, the ublk server process
enters an uninterruptible sleep (D state) and becomes unkillable,
subsequently blocking further device deletions on the system.

[1. Description]
We run a customized ublk server with the following configuration:

- ublksrv_ctrl_dev_info.flags = 0 (UBLK_F_USER_RECOVERY is not enabled).
- Environment: Frequent creation/deletion of ublk devices (400–500 active
  devices, one device at most 5-hour lifespan).
- Upon receiving SIGINT, our ublk server will sends UBLK_U_CMD_STOP_DEV to the
  driver.
- A monitor process will send SIGINT to the ublk server when deleting. If it
  finds the ublk server does not stopped within 10 seconds, the monitor will
  send SIGKILL.


On one production node, a ublk server process (PID 348910) failed to exit and
entered D state. Simultaneously, related kworkers also entered D state:

$ ps -eo pid,stat,lstart,comm | grep -E "^ *[0-9]+ D"
 77625 D    Wed Jan 14 15:23:57 2026 kworker/303:0+events
348910 Dl   Wed Jan 14 23:00:20 2026 uvm_ublk
355239 D    Wed Jan 14 23:04:18 2026 kworker/u775:1+flush-259:11

The device number of the ublk device is exact 259:11.

After this hang occurs, we can still create new ublk devices, but we cannot
delete them. While UBLK_U_CMD_STOP_DEV can be sent, UBLK_U_CMD_DEL_DEV never
receives a response from io_uring, and the issuing process hangs in S state.

I give the process's stack in the following:

# The kworker/303:0+events
$ cat /proc/77625/stack 
[<0>] folio_wait_bit_common+0x136/0x330
[<0>] __folio_lock+0x17/0x30
[<0>] write_cache_pages+0x1cd/0x430
[<0>] blkdev_writepages+0x6f/0xb0
[<0>] do_writepages+0xcd/0x1f0
[<0>] filemap_fdatawrite_wbc+0x75/0xb0
[<0>] __filemap_fdatawrite_range+0x58/0x80
[<0>] filemap_write_and_wait_range+0x59/0xc0
[<0>] bdev_release+0x18e/0x240
[<0>] blkdev_release+0x15/0x30
[<0>] __fput+0xa0/0x2e0
[<0>] delayed_fput+0x23/0x40
[<0>] process_one_work+0x181/0x3a0
[<0>] worker_thread+0x306/0x440
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x44/0x70
[<0>] ret_from_fork_asm+0x1b/0x30

# The ublk server
$ cat /proc/348910/stack 
[<0>] io_wq_put_and_exit+0xa6/0x210
[<0>] io_uring_clean_tctx+0x8c/0xd0
[<0>] io_uring_cancel_generic+0x19b/0x370
[<0>] __io_uring_cancel+0x1b/0x30
[<0>] do_exit+0x17a/0x530
[<0>] do_group_exit+0x35/0x90
[<0>] get_signal+0x96e/0x9b0
[<0>] arch_do_signal_or_restart+0x39/0x120
[<0>] syscall_exit_to_user_mode+0x15f/0x1e0
[<0>] do_syscall_64+0x8c/0x180
[<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80

# kworker/u775:1+flush-259:11
$ cat /proc/355239/stack
cat /proc/355239/stack 
[<0>] rq_qos_wait+0xcf/0x180
[<0>] wbt_wait+0xb3/0x100
[<0>] __rq_qos_throttle+0x25/0x40
[<0>] blk_mq_submit_bio+0x168/0x6b0
[<0>] __submit_bio+0xb3/0x1c0
[<0>] submit_bio_noacct_nocheck+0x13c/0x1f0
[<0>] submit_bio_noacct+0x162/0x5b0
[<0>] submit_bio+0xb2/0x110
[<0>] submit_bh_wbc+0x156/0x190
[<0>] __block_write_full_folio+0x1da/0x3d0
[<0>] block_write_full_folio+0x150/0x180
[<0>] write_cache_pages+0x15b/0x430
[<0>] blkdev_writepages+0x6f/0xb0
[<0>] do_writepages+0xcd/0x1f0
[<0>] __writeback_single_inode+0x44/0x290
[<0>] writeback_sb_inodes+0x21b/0x520
[<0>] __writeback_inodes_wb+0x54/0x100
[<0>] wb_writeback+0x2df/0x350
[<0>] wb_do_writeback+0x225/0x2a0
[<0>] wb_workfn+0x5f/0x240
[<0>] process_one_work+0x181/0x3a0
[<0>] worker_thread+0x306/0x440
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x44/0x70
[<0>] ret_from_fork_asm+0x1b/0x30

There is also and iou-wrk of that ublk server process:

$ cat /proc/348910/task/348911/stack 
[<0>] folio_wait_bit_common+0x136/0x330
[<0>] __folio_lock+0x17/0x30
[<0>] write_cache_pages+0x1cd/0x430
[<0>] blkdev_writepages+0x6f/0xb0
[<0>] do_writepages+0xcd/0x1f0
[<0>] filemap_fdatawrite_wbc+0x75/0xb0
[<0>] __filemap_fdatawrite_range+0x58/0x80
[<0>] filemap_write_and_wait_range+0x59/0xc0
[<0>] bdev_mark_dead+0x85/0xd0
[<0>] blk_report_disk_dead+0x87/0xf0
[<0>] del_gendisk+0x37f/0x3b0
[<0>] ublk_stop_dev+0x89/0x100 [ublk_drv]
[<0>] ublk_ctrl_uring_cmd+0x51a/0x750 [ublk_drv]
[<0>] io_uring_cmd+0x9f/0x140
[<0>] io_issue_sqe+0x193/0x410
[<0>] io_wq_submit_work+0xe2/0x380
[<0>] io_worker_handle_work+0xdf/0x340
[<0>] io_wq_worker+0xf9/0x350
[<0>] ret_from_fork+0x44/0x70
[<0>] ret_from_fork_asm+0x1b/0x30

[2. Kernel version]

Linux  6.8.0-87-generic #88-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 09:28:41 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

I am running Ubuntu 24:

Distributor ID: Ubuntu
Description:    Ubuntu-Server 24.04.2 2025.05.26 (Cubic 2025-05-27 05:35)
Release:        24.04
Codename:       noble

[3. Steps to reproduce]
Sorry, as this happens only in one process among one of our production servers,
I do not find an easy way to reproduce the error.

[4. Dmesg/Logs]
I can only find the logs like following. Apart from the kworker, there are
similar logs for the ublk server iou-wrk.

Jan 15 00:53:02 kernel: INFO: task kworker/303:0:77625 blocked for more than 122 seconds.
Jan 15 00:53:02 kernel:       Tainted: G           OE      6.8.0-87-generic #88-Ubuntu
Jan 15 00:53:02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 15 00:53:02 kernel: task:kworker/303:0   state:D stack:0     pid:77625 tgid:77625 ppid:2      flags:0x00004000
Jan 15 00:53:02 kernel: Workqueue: events delayed_fput
Jan 15 00:53:02 kernel: Call Trace:
Jan 15 00:53:02 kernel:  <TASK>
Jan 15 00:53:02 kernel:  __schedule+0x27c/0x6b0
Jan 15 00:53:02 kernel:  schedule+0x33/0x110
Jan 15 00:53:02 kernel:  io_schedule+0x46/0x80
Jan 15 00:53:02 kernel:  folio_wait_bit_common+0x136/0x330
Jan 15 00:53:02 kernel:  ? __pfx_wake_page_function+0x10/0x10
Jan 15 00:53:02 kernel:  __folio_lock+0x17/0x30
Jan 15 00:53:02 kernel:  write_cache_pages+0x1cd/0x430
Jan 15 00:53:02 kernel:  ? __pfx_blkdev_get_block+0x10/0x10
Jan 15 00:53:02 kernel:  ? __pfx_block_write_full_folio+0x10/0x10
Jan 15 00:53:02 kernel:  blkdev_writepages+0x6f/0xb0
Jan 15 00:53:02 kernel:  do_writepages+0xcd/0x1f0
Jan 15 00:53:02 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 15 00:53:02 kernel:  filemap_fdatawrite_wbc+0x75/0xb0
Jan 15 00:53:02 kernel:  __filemap_fdatawrite_range+0x58/0x80
Jan 15 00:53:02 kernel:  filemap_write_and_wait_range+0x59/0xc0
Jan 15 00:53:02 kernel:  bdev_release+0x18e/0x240
Jan 15 00:53:02 kernel:  blkdev_release+0x15/0x30
Jan 15 00:53:02 kernel:  __fput+0xa0/0x2e0
Jan 15 00:53:02 kernel:  delayed_fput+0x23/0x40
Jan 15 00:53:02 kernel:  process_one_work+0x181/0x3a0
Jan 15 00:53:02 kernel:  worker_thread+0x306/0x440
Jan 15 00:53:02 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 15 00:53:02 kernel:  ? _raw_spin_lock_irqsave+0xe/0x20
Jan 15 00:53:02 kernel:  ? __pfx_worker_thread+0x10/0x10
Jan 15 00:53:02 kernel:  kthread+0xef/0x120
Jan 15 00:53:02 kernel:  ? __pfx_kthread+0x10/0x10
Jan 15 00:53:02 kernel:  ret_from_fork+0x44/0x70
Jan 15 00:53:02 kernel:  ? __pfx_kthread+0x10/0x10
Jan 15 00:53:02 kernel:  ret_from_fork_asm+0x1b/0x30
Jan 15 00:53:02 kernel:  </TASK>

[5. Technical Hypothesis]
I suspect a deadlock occurs during the following sequence (assuming ublk id
is 123):

1. User program writes to /dev/ublkb123 via cached I/O, leaving dirty pages
   in the page cache.
2. The ublk server receives SIGINT and issues UBLK_U_CMD_STOP_DEV.
3. The kernel path STOP_DEV -> del_gendisk() -> bdev_mark_dead() attempts to
   flush dirty pages.
4. This flush generates new I/O requests directed back to the ublk server.
5. The ublk server receives SIGKILL at this moment, its threads stop and can
   no longer handle the I/O requests generated by the flush in step 3.
6. The server remains stuck in del_gendisk(), waiting for I/O completion that
   will never happen.

[6. My Question]
1. Would enable UBLK_F_USER_RECOVERY solve this bug? I find UBLK_F_USER_RECOVERY
   allows  ublk_unquiesce_dev() to be called during ublk_stop_dev.
2. Should userspace strictly forbid to send SIGKILL to ublk server?
3. I try to search the related bug fix or patches, but does not find.
   Are there known fixes in later kernels (6.10+) that address this specific
   interaction between del_gendisk and server termination?

Thanks,
huang-jl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] ublk: ublk server hangs in D state during STOP_DEV
  2026-01-16 14:15 [BUG] ublk: ublk server hangs in D state during STOP_DEV huang-jl
@ 2026-01-16 14:58 ` Ming Lei
  2026-01-17  5:18   ` huang-jl
       [not found]   ` <20260116171613.46312-1-huang-jl@deepseek.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Ming Lei @ 2026-01-16 14:58 UTC (permalink / raw)
  To: huang-jl; +Cc: linux-block

On Fri, Jan 16, 2026 at 10:15:32PM +0800, huang-jl wrote:
> Hello,
> 
> I am reporting a bug in the ublk driver observed during production usage.
> Under specific conditions during device removal, the ublk server process
> enters an uninterruptible sleep (D state) and becomes unkillable,
> subsequently blocking further device deletions on the system.
> 
> [1. Description]
> We run a customized ublk server with the following configuration:
> 
> - ublksrv_ctrl_dev_info.flags = 0 (UBLK_F_USER_RECOVERY is not enabled).
> - Environment: Frequent creation/deletion of ublk devices (400–500 active
>   devices, one device at most 5-hour lifespan).
> - Upon receiving SIGINT, our ublk server will sends UBLK_U_CMD_STOP_DEV to the
>   driver.
> - A monitor process will send SIGINT to the ublk server when deleting. If it
>   finds the ublk server does not stopped within 10 seconds, the monitor will
>   send SIGKILL.
> 
> 
> On one production node, a ublk server process (PID 348910) failed to exit and
> entered D state. Simultaneously, related kworkers also entered D state:
> 
> $ ps -eo pid,stat,lstart,comm | grep -E "^ *[0-9]+ D"
>  77625 D    Wed Jan 14 15:23:57 2026 kworker/303:0+events
> 348910 Dl   Wed Jan 14 23:00:20 2026 uvm_ublk
> 355239 D    Wed Jan 14 23:04:18 2026 kworker/u775:1+flush-259:11
> 
> The device number of the ublk device is exact 259:11.
> 
> After this hang occurs, we can still create new ublk devices, but we cannot
> delete them. While UBLK_U_CMD_STOP_DEV can be sent, UBLK_U_CMD_DEL_DEV never
> receives a response from io_uring, and the issuing process hangs in S state.
> 
> I give the process's stack in the following:
> 
> 
> # The ublk server
> $ cat /proc/348910/stack 
> [<0>] io_wq_put_and_exit+0xa6/0x210
> [<0>] io_uring_clean_tctx+0x8c/0xd0
> [<0>] io_uring_cancel_generic+0x19b/0x370
> [<0>] __io_uring_cancel+0x1b/0x30
> [<0>] do_exit+0x17a/0x530
> [<0>] do_group_exit+0x35/0x90
> [<0>] get_signal+0x96e/0x9b0
> [<0>] arch_do_signal_or_restart+0x39/0x120
> [<0>] syscall_exit_to_user_mode+0x15f/0x1e0
> [<0>] do_syscall_64+0x8c/0x180
> [<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80

The above trace means that io-wq workers are stuck on blocked I/O.

However, all builtin uring_cmd won't be run from io-wq, so it is very
likely that your target IO handling is stuck somewhere, then some ublk io
commands can't be completed.

If your system supports drgn and it is still ready to collect log, it
should be pretty easy to figure out the reason by writing one drgn script
to dump ublk queue/ublk io of driver.
 
> [2. Kernel version]
> 
> Linux  6.8.0-87-generic #88-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 09:28:41 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
> 
> I am running Ubuntu 24:
> 
> Distributor ID: Ubuntu
> Description:    Ubuntu-Server 24.04.2 2025.05.26 (Cubic 2025-05-27 05:35)
> Release:        24.04
> Codename:       noble
> 
> [3. Steps to reproduce]
> Sorry, as this happens only in one process among one of our production servers,
> I do not find an easy way to reproduce the error.
> 
> [4. Dmesg/Logs]
> I can only find the logs like following. Apart from the kworker, there are
> similar logs for the ublk server iou-wrk.
> 
> Jan 15 00:53:02 kernel: INFO: task kworker/303:0:77625 blocked for more than 122 seconds.
> Jan 15 00:53:02 kernel:       Tainted: G           OE      6.8.0-87-generic #88-Ubuntu
> Jan 15 00:53:02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jan 15 00:53:02 kernel: task:kworker/303:0   state:D stack:0     pid:77625 tgid:77625 ppid:2      flags:0x00004000
> Jan 15 00:53:02 kernel: Workqueue: events delayed_fput
> Jan 15 00:53:02 kernel: Call Trace:
> Jan 15 00:53:02 kernel:  <TASK>
> Jan 15 00:53:02 kernel:  __schedule+0x27c/0x6b0
> Jan 15 00:53:02 kernel:  schedule+0x33/0x110
> Jan 15 00:53:02 kernel:  io_schedule+0x46/0x80
> Jan 15 00:53:02 kernel:  folio_wait_bit_common+0x136/0x330
> Jan 15 00:53:02 kernel:  ? __pfx_wake_page_function+0x10/0x10
> Jan 15 00:53:02 kernel:  __folio_lock+0x17/0x30
> Jan 15 00:53:02 kernel:  write_cache_pages+0x1cd/0x430
> Jan 15 00:53:02 kernel:  ? __pfx_blkdev_get_block+0x10/0x10
> Jan 15 00:53:02 kernel:  ? __pfx_block_write_full_folio+0x10/0x10
> Jan 15 00:53:02 kernel:  blkdev_writepages+0x6f/0xb0
> Jan 15 00:53:02 kernel:  do_writepages+0xcd/0x1f0
> Jan 15 00:53:02 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> Jan 15 00:53:02 kernel:  filemap_fdatawrite_wbc+0x75/0xb0
> Jan 15 00:53:02 kernel:  __filemap_fdatawrite_range+0x58/0x80
> Jan 15 00:53:02 kernel:  filemap_write_and_wait_range+0x59/0xc0
> Jan 15 00:53:02 kernel:  bdev_release+0x18e/0x240
> Jan 15 00:53:02 kernel:  blkdev_release+0x15/0x30
> Jan 15 00:53:02 kernel:  __fput+0xa0/0x2e0
> Jan 15 00:53:02 kernel:  delayed_fput+0x23/0x40
> Jan 15 00:53:02 kernel:  process_one_work+0x181/0x3a0
> Jan 15 00:53:02 kernel:  worker_thread+0x306/0x440
> Jan 15 00:53:02 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> Jan 15 00:53:02 kernel:  ? _raw_spin_lock_irqsave+0xe/0x20
> Jan 15 00:53:02 kernel:  ? __pfx_worker_thread+0x10/0x10
> Jan 15 00:53:02 kernel:  kthread+0xef/0x120
> Jan 15 00:53:02 kernel:  ? __pfx_kthread+0x10/0x10
> Jan 15 00:53:02 kernel:  ret_from_fork+0x44/0x70
> Jan 15 00:53:02 kernel:  ? __pfx_kthread+0x10/0x10
> Jan 15 00:53:02 kernel:  ret_from_fork_asm+0x1b/0x30
> Jan 15 00:53:02 kernel:  </TASK>
> 
> [5. Technical Hypothesis]
> I suspect a deadlock occurs during the following sequence (assuming ublk id
> is 123):
> 
> 1. User program writes to /dev/ublkb123 via cached I/O, leaving dirty pages
>    in the page cache.
> 2. The ublk server receives SIGINT and issues UBLK_U_CMD_STOP_DEV.
> 3. The kernel path STOP_DEV -> del_gendisk() -> bdev_mark_dead() attempts to
>    flush dirty pages.
> 4. This flush generates new I/O requests directed back to the ublk server.
> 5. The ublk server receives SIGKILL at this moment, its threads stop and can
>    no longer handle the I/O requests generated by the flush in step 3.
> 6. The server remains stuck in del_gendisk(), waiting for I/O completion that
>    will never happen.
> 
> [6. My Question]
> 1. Would enable UBLK_F_USER_RECOVERY solve this bug? I find UBLK_F_USER_RECOVERY
>    allows  ublk_unquiesce_dev() to be called during ublk_stop_dev.
> 2. Should userspace strictly forbid to send SIGKILL to ublk server?

It is fine to send KILL to ublk server, and actually it is used widely in ublk
kernel selftest.

> 3. I try to search the related bug fix or patches, but does not find.
>    Are there known fixes in later kernels (6.10+) that address this specific
>    interaction between del_gendisk and server termination?

I'd understand why ublk server is stuck in io_wq_put_and_exit() first, so
far it is very likely caused by your ublk target logic...

If the cancel code path can move on, the ublk uring_cmd cancel function will fail
the inflight uring_cmd first, and finally the ublk char device is released
after ublk server is exit really, then all pending ublk block requests are aborted,
and everything can move on.
 

Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] ublk: ublk server hangs in D state during STOP_DEV
  2026-01-16 14:58 ` Ming Lei
@ 2026-01-17  5:18   ` huang-jl
       [not found]   ` <20260116171613.46312-1-huang-jl@deepseek.com>
  1 sibling, 0 replies; 8+ messages in thread
From: huang-jl @ 2026-01-17  5:18 UTC (permalink / raw)
  To: ming.lei; +Cc: huang-jl, linux-block, axboe

Hi Ming,

I've completed further investigation and share my thoughts.

## Summary

The deadlock is caused by WBT (Writeback Throttling) holding a folio lock
while waiting for I/O completion that will never happen because the ublk
server is dead.

## Stuck Tasks Analysis

There are three tasks in D state:

- 348911 (iou-wrk): wchan=folio_wait_bit_common
  ublk server's io-wq worker, waiting for folio lock

- 77625 (kworker/303:0+events): wchan=folio_wait_bit_common
  Waiting for folio lock

- 355239 (kworker/u775:1+flush-259:11): wchan=rq_qos_wait
  **Holds folio lock**, stuck in WBT throttle

## Root Cause

The flush worker (355239) is the key. Its stack:

[<0>] rq_qos_wait
[<0>] wbt_wait
[<0>] __rq_qos_throttle
[<0>] blk_mq_submit_bio
[<0>] __submit_bio
[<0>] submit_bio_noacct_nocheck
[<0>] submit_bio_noacct
[<0>] submit_bio
[<0>] submit_bh_wbc
[<0>] __block_write_full_folio      <- folio is LOCKED here
[<0>] block_write_full_folio
[<0>] write_cache_pages
...

Looking at __block_write_full_folio() in fs/buffer.c:

    int __block_write_full_folio(...)
    {
        // ... prepare buffers ...

        folio_start_writeback(folio);

        do {
            struct buffer_head *next = bh->b_this_page;
            if (buffer_async_write(bh)) {
                submit_bh_wbc(...);    // <- 355239 stuck HERE in wbt_wait()
                nr_underway++;
            }
            bh = next;
        } while (bh != head);

        folio_unlock(folio);           // <- Never reached!
    }

The folio lock is acquired before submit_bh_wbc() and released after.
When WBT throttles inside submit_bh_wbc(), the folio remains locked.

## WBT State

From debugfs on the stuck device:

    $ cat /sys/kernel/debug/block/ublkb197/rqos/wbt/wb_normal
    48
    $ cat /sys/kernel/debug/block/ublkb197/rqos/wbt/wb_background
    24
    $ cat /sys/kernel/debug/block/ublkb197/rqos/wbt/inflight
    0: inflight 58
    1: inflight 0
    2: inflight 0

There are 58 inflight requests, exceeding the wb_normal limit of 48.
WBT is throttling new submissions, waiting for inflight I/O to complete.
But since the ublk server is dead, these I/Os will never complete, and
wbt_done() will never be called to wake up the waiter.

## Deadlock Chain

1. ublk server receives SIGINT
   -> Sends STOP_DEV via io_uring

2. io-wq worker (348911) handles STOP_DEV
   -> ublk_stop_dev() -> del_gendisk() -> bdev_mark_dead()
   -> Triggers writeback of dirty pages

3. Flush worker (355239) is already doing writeback
   -> Locks folio, calls submit_bh_wbc()
   -> WBT throttles (58 inflight > 48 limit)
   -> Stuck in wbt_wait(), HOLDING folio lock

4. ublk server receives SIGKILL
   -> Cannot handle any I/O requests
   -> 58 inflight requests stuck forever
   -> wbt_done() never called
   -> 355239 never wakes up, holds folio lock forever

5. io-wq worker (348911) tries to flush pages
   -> Tries to lock the same folio
   -> Stuck in folio_wait_bit_common()

6. Main ublk server thread (348910)
   -> do_exit() -> io_wq_put_and_exit()
   -> Waiting for worker 348911 to finish
   -> DEADLOCK

This seems like a general issue might affect any userspace block device
(ublk, tcmu, nbd, etc.) when WBT is enabled. Has this been discussed
before?

Thanks,
huang-jl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] ublk: ublk server hangs in D state during STOP_DEV
       [not found]   ` <20260116171613.46312-1-huang-jl@deepseek.com>
@ 2026-01-17  7:44     ` Ming Lei
  2026-01-17 11:16       ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2026-01-17  7:44 UTC (permalink / raw)
  To: huang-jl; +Cc: linux-block

On Sat, Jan 17, 2026 at 01:16:13AM +0800, huang-jl wrote:
> > I'd understand why ublk server is stuck in io_wq_put_and_exit() first, so
> > far it is very likely caused by your ublk target logic...
> 
> I think the io-wq worker is stuck executing STOP_DEV uring cmd,
> and not our target I/O logic causes the issue. Let me explain:
> 
> Looking at the iou-wrk thread (348911) stack trace, this iou-wrk is a thread
> in my D-state ublk server, its stack is as follows:
> 
> $ cat /proc/348910/task/348911/stack 
> [<0>] folio_wait_bit_common+0x136/0x330
> [<0>] __folio_lock+0x17/0x30
> [<0>] write_cache_pages+0x1cd/0x430
> [<0>] blkdev_writepages+0x6f/0xb0
> [<0>] do_writepages+0xcd/0x1f0
> [<0>] filemap_fdatawrite_wbc+0x75/0xb0
> [<0>] __filemap_fdatawrite_range+0x58/0x80
> [<0>] filemap_write_and_wait_range+0x59/0xc0
> [<0>] bdev_mark_dead+0x85/0xd0
> [<0>] blk_report_disk_dead+0x87/0xf0
> [<0>] del_gendisk+0x37f/0x3b0
> [<0>] ublk_stop_dev+0x89/0x100 [ublk_drv]
> [<0>] ublk_ctrl_uring_cmd+0x51a/0x750 [ublk_drv]
> [<0>] io_uring_cmd+0x9f/0x140
> [<0>] io_issue_sqe+0x193/0x410
> [<0>] io_wq_submit_work+0xe2/0x380
> [<0>] io_worker_handle_work+0xdf/0x340
> [<0>] io_wq_worker+0xf9/0x350
> [<0>] ret_from_fork+0x44/0x70
> [<0>] ret_from_fork_asm+0x1b/0x30
> 
> This shows:
> 
> - The STOP_DEV command is being executed by an io-wq worker thread
> - ublk_stop_dev() called del_gendisk()
> - del_gendisk() is trying to flush dirty pages via bdev_mark_dead()
> - The writeback is stuck waiting for a folio lock

> - Upon receiving SIGINT, our ublk server will sends UBLK_U_CMD_STOP_DEV to the
>  driver.

Can you share how your server sends STOP_DEV when receiving SIGINT?

If it prevents normal IO command handling, ublk_stop_dev() will cause deadlock.

For example, follows the preferred IO handling in ublk server:

prepare UBLK_IO_FETCH_REQ uring_cmds;
while (1) {
	io_uring_enter(submission & wait event);
}

If you send STOP_DEV command inside the above loop, you will get the
deadlock, because inflight and new IOs can't be handled any more.

So you should send the STOP_DEV command from the signal handler or other
pthread for avoiding the issue.

> But I do not understand why it get stuck at waiting for folio lock.

It just shows normal ublk block IOs can't be completed.

> 
> I traced the code path and understand why STOP_DEV runs in io-wq:
> 
> 1. The ublk server call io_uring_enter() to submit the STOP_DEV uring cmd.
> 2. The kernel will call io_submit_sqes() -> io_submit_sqe() -> io_queue_sqe().
> 3. io_queue_sqe() first tries io_issue_sqe() with IO_URING_F_NONBLOCK
> 4. ublk_ctrl_uring_cmd() returns -EAGAIN when it sees IO_URING_F_NONBLOCK
> 5. io_uring then queues the work to io-wq via io_queue_iowq()
> 
> > If your system supports drgn and it is still ready to collect log, it
> > should be pretty easy to figure out the reason by writing one drgn script
> > to dump ublk queue/ublk io of driver.
> 
> The D-state process is still present on the system. I can install drgn and
> collect information.
> Could you tell me what specific data would be most helpful? For example:
> 
> - ublk_device state and flags?
> - ublk_queue state for each queue (force_abort, nr_io_ready, etc.)?
> - Individual ublk_io flags for inflight I/Os?

Yes, all above info is helpful.

 
Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] ublk: ublk server hangs in D state during STOP_DEV
  2026-01-17  7:44     ` Ming Lei
@ 2026-01-17 11:16       ` Ming Lei
  2026-01-17 17:03         ` huang-jl
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2026-01-17 11:16 UTC (permalink / raw)
  To: huang-jl; +Cc: linux-block

On Sat, Jan 17, 2026 at 03:44:22PM +0800, Ming Lei wrote:
> On Sat, Jan 17, 2026 at 01:16:13AM +0800, huang-jl wrote:
> > > I'd understand why ublk server is stuck in io_wq_put_and_exit() first, so
> > > far it is very likely caused by your ublk target logic...
> > 
> > I think the io-wq worker is stuck executing STOP_DEV uring cmd,
> > and not our target I/O logic causes the issue. Let me explain:
> > 
> > Looking at the iou-wrk thread (348911) stack trace, this iou-wrk is a thread
> > in my D-state ublk server, its stack is as follows:
> > 
> > $ cat /proc/348910/task/348911/stack 
> > [<0>] folio_wait_bit_common+0x136/0x330
> > [<0>] __folio_lock+0x17/0x30
> > [<0>] write_cache_pages+0x1cd/0x430
> > [<0>] blkdev_writepages+0x6f/0xb0
> > [<0>] do_writepages+0xcd/0x1f0
> > [<0>] filemap_fdatawrite_wbc+0x75/0xb0
> > [<0>] __filemap_fdatawrite_range+0x58/0x80
> > [<0>] filemap_write_and_wait_range+0x59/0xc0
> > [<0>] bdev_mark_dead+0x85/0xd0
> > [<0>] blk_report_disk_dead+0x87/0xf0
> > [<0>] del_gendisk+0x37f/0x3b0
> > [<0>] ublk_stop_dev+0x89/0x100 [ublk_drv]
> > [<0>] ublk_ctrl_uring_cmd+0x51a/0x750 [ublk_drv]
> > [<0>] io_uring_cmd+0x9f/0x140
> > [<0>] io_issue_sqe+0x193/0x410
> > [<0>] io_wq_submit_work+0xe2/0x380
> > [<0>] io_worker_handle_work+0xdf/0x340
> > [<0>] io_wq_worker+0xf9/0x350
> > [<0>] ret_from_fork+0x44/0x70
> > [<0>] ret_from_fork_asm+0x1b/0x30
> > 
> > This shows:
> > 
> > - The STOP_DEV command is being executed by an io-wq worker thread
> > - ublk_stop_dev() called del_gendisk()
> > - del_gendisk() is trying to flush dirty pages via bdev_mark_dead()
> > - The writeback is stuck waiting for a folio lock
> 
> > - Upon receiving SIGINT, our ublk server will sends UBLK_U_CMD_STOP_DEV to the
> >  driver.
> 
> Can you share how your server sends STOP_DEV when receiving SIGINT?
> 
> If it prevents normal IO command handling, ublk_stop_dev() will cause deadlock.
> 
> For example, follows the preferred IO handling in ublk server:
> 
> prepare UBLK_IO_FETCH_REQ uring_cmds;
> while (1) {
> 	io_uring_enter(submission & wait event);
> }
> 
> If you send STOP_DEV command inside the above loop, you will get the
> deadlock, because inflight and new IOs can't be handled any more.

If that is the case, you may remove the sending of STOP_DEV simply in your
implementation, and let ublk server exit. Then delete the ublk device before
ublk server is exiting to kernel, or let your monitor process remove the
ublk device, which becomes DEAD state after ublk server exits.

Otherwise, you may have to investigate why ublk server can't handle IO
command after sending STOP_DEV command.

STOP_DEV and DEL_DEV command may need to be documented for ublk server to
avoid deadlock.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] ublk: ublk server hangs in D state during STOP_DEV
  2026-01-17 11:16       ` Ming Lei
@ 2026-01-17 17:03         ` huang-jl
  2026-01-18 11:50           ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: huang-jl @ 2026-01-17 17:03 UTC (permalink / raw)
  To: ming.lei; +Cc: huang-jl, linux-block

> > - Upon receiving SIGINT, our ublk server will sends UBLK_U_CMD_STOP_DEV to the
> > driver.
> 
> Can you share how your server sends STOP_DEV when receiving SIGINT?
> 
> If it prevents normal IO command handling, ublk_stop_dev() will cause deadlock.
> 
> For example, follows the preferred IO handling in ublk server:
> 
> prepare UBLK_IO_FETCH_REQ uring_cmds;
> while (1) {
> io_uring_enter(submission & wait event);
> }
> 
> If you send STOP_DEV command inside the above loop, you will get the
> deadlock, because inflight and new IOs can't be handled any more.

My ublk server has two threads:

- The main thread: opens /dev/ublk-control and create an io uring. It handles
  all control uring cmd (e.g., ADD_DEV, START_DEV, STOP_DEV).

- A worker thread: also creates an io uring. It will send UBLK_U_IO_FETCH_REQ
  and UBLK_U_IO_COMMIT_AND_FETCH_REQ uring cmd, and handle IO request.

Main thread will first ADD and START device, then listens for SIGINT signal.
Upon receiving SIGINT, main thread issues a STOP_DEV cmd.
The worker thread continues running independently and is not directly affected
by this signal.

(Implementation detail: I am using Rust and the Tokio async runtime. The SIGINT
is handled within the userspace rather than directly inside the signal handler.)

> If that is the case, you may remove the sending of STOP_DEV simply in your
> implementation, and let ublk server exit. Then delete the ublk device before
> ublk server is exiting to kernel, or let your monitor process remove the
> ublk device, which becomes DEAD state after ublk server exits.
> 
> Otherwise, you may have to investigate why ublk server can't handle IO
> command after sending STOP_DEV command.

Based on my previous post [1], **the ublk server gets SIGKILL after sending
STOP_DEV command**, so it cannot handle IO request.

In detail:

1. Flush Kernel Thread: A kworker/flush thread attempts to write dirty pages
to the ublk device. It acquires a folio lock but then enters a sleep state
in wbt_wait() due to Writeback Throttling (WBT).

2. STOP_DEV Command: The ublk server issues a STOP_DEV command. This uring cmd
is processed by an uring kernel thread.

3. Resource Conflict: When handle STOP_DEV, the uring kernel thread also tries
to flush dirty pages. It needs to acquire the same folio lock held by the flush
kworker to submit the I/O.

4. Termination Signal: At this point, the ublk server receives a SIGKILL.

Conclusion:
This creates a circular dependency: the flush kworker cannot be woken from WBT
because the ublk server is no longer processing I/O requests.
Simultaneously, the ublk server cannot exit because it is waiting for the
STOP_DEV command to complete. However, the uring kernel thread remains stuck
waiting for the folio lock held by the flush kworker.

Should we avoid having the ublk server issue STOP_DEV command by it self? As it
is highly prone to deadlocks if a SIGKILL occurs during the shutdown sequence.

Thanks,
huang-jl

[1]: https://lore.kernel.org/linux-block/aWtvoHe3Yt8xbmdb@fedora/T/#mdd14011f6830a7e93a7df9b8df1ea93db91ebc77


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] ublk: ublk server hangs in D state during STOP_DEV
  2026-01-17 17:03         ` huang-jl
@ 2026-01-18 11:50           ` Ming Lei
  2026-01-18 13:14             ` huang-jl
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2026-01-18 11:50 UTC (permalink / raw)
  To: huang-jl; +Cc: linux-block

On Sun, Jan 18, 2026 at 01:03:19AM +0800, huang-jl wrote:
> > > - Upon receiving SIGINT, our ublk server will sends UBLK_U_CMD_STOP_DEV to the
> > > driver.
> > 
> > Can you share how your server sends STOP_DEV when receiving SIGINT?
> > 
> > If it prevents normal IO command handling, ublk_stop_dev() will cause deadlock.
> > 
> > For example, follows the preferred IO handling in ublk server:
> > 
> > prepare UBLK_IO_FETCH_REQ uring_cmds;
> > while (1) {
> > io_uring_enter(submission & wait event);
> > }
> > 
> > If you send STOP_DEV command inside the above loop, you will get the
> > deadlock, because inflight and new IOs can't be handled any more.
> 
> My ublk server has two threads:
> 
> - The main thread: opens /dev/ublk-control and create an io uring. It handles
>   all control uring cmd (e.g., ADD_DEV, START_DEV, STOP_DEV).
> 
> - A worker thread: also creates an io uring. It will send UBLK_U_IO_FETCH_REQ
>   and UBLK_U_IO_COMMIT_AND_FETCH_REQ uring cmd, and handle IO request.
> 
> Main thread will first ADD and START device, then listens for SIGINT signal.
> Upon receiving SIGINT, main thread issues a STOP_DEV cmd.
> The worker thread continues running independently and is not directly affected
> by this signal.
> 
> (Implementation detail: I am using Rust and the Tokio async runtime. The SIGINT
> is handled within the userspace rather than directly inside the signal handler.)
> 
> > If that is the case, you may remove the sending of STOP_DEV simply in your
> > implementation, and let ublk server exit. Then delete the ublk device before
> > ublk server is exiting to kernel, or let your monitor process remove the
> > ublk device, which becomes DEAD state after ublk server exits.
> > 
> > Otherwise, you may have to investigate why ublk server can't handle IO
> > command after sending STOP_DEV command.
> 
> Based on my previous post [1], **the ublk server gets SIGKILL after sending
> STOP_DEV command**, so it cannot handle IO request.
> 
> In detail:
> 
> 1. Flush Kernel Thread: A kworker/flush thread attempts to write dirty pages
> to the ublk device. It acquires a folio lock but then enters a sleep state
> in wbt_wait() due to Writeback Throttling (WBT).
> 
> 2. STOP_DEV Command: The ublk server issues a STOP_DEV command. This uring cmd
> is processed by an uring kernel thread.
> 
> 3. Resource Conflict: When handle STOP_DEV, the uring kernel thread also tries
> to flush dirty pages. It needs to acquire the same folio lock held by the flush
> kworker to submit the I/O.
> 
> 4. Termination Signal: At this point, the ublk server receives a SIGKILL.

It is _not_ related with WBT or write cache.

When your monitor sends SIGKILL, do_exit() is actually called, but
io_uring_cancel_generic() blocks on io_wq_put_and_exit() because STOP_DEV
can't be completed, then do_exit() can't move on, and finally the ublk cancel
function can't be called, then ublk char device can't be closed.

Looks something which might be improved in future, but it highly depends on
(complicated) io_uring's cancel code path, and I believe there isn't
any way for ublk to get notified when do_exit() happens from `kill -9`.

> 
> Conclusion:
> This creates a circular dependency: the flush kworker cannot be woken from WBT
> because the ublk server is no longer processing I/O requests.
> Simultaneously, the ublk server cannot exit because it is waiting for the
> STOP_DEV command to complete. However, the uring kernel thread remains stuck
> waiting for the folio lock held by the flush kworker.
> 
> Should we avoid having the ublk server issue STOP_DEV command by it self? As it
> is highly prone to deadlocks if a SIGKILL occurs during the shutdown sequence.

So far there are at least two ways:

1) don't provide signal handler for SIGINT, or simply call exit() in the
signal handler, and ublk block device is removed automatically when ublk
server is done in case of !USER_RECOVERY.

OR

2) provide signal SIGINT handler to stop device, meantime not send SIGKILL
in your monitor process. This way works just fine if your ublk server is
well implemented, which often requires you to handle timeout in the server
implementation.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] ublk: ublk server hangs in D state during STOP_DEV
  2026-01-18 11:50           ` Ming Lei
@ 2026-01-18 13:14             ` huang-jl
  0 siblings, 0 replies; 8+ messages in thread
From: huang-jl @ 2026-01-18 13:14 UTC (permalink / raw)
  To: ming.lei; +Cc: huang-jl, linux-block

> > > > - Upon receiving SIGINT, our ublk server will sends UBLK_U_CMD_STOP_DEV to the
> > > > driver.
> > > 
> > > Can you share how your server sends STOP_DEV when receiving SIGINT?
> > > 
> > > If it prevents normal IO command handling, ublk_stop_dev() will cause deadlock.
> > > 
> > > For example, follows the preferred IO handling in ublk server:
> > > 
> > > prepare UBLK_IO_FETCH_REQ uring_cmds;
> > > while (1) {
> > > io_uring_enter(submission & wait event);
> > > }
> > > 
> > > If you send STOP_DEV command inside the above loop, you will get the
> > > deadlock, because inflight and new IOs can't be handled any more.
> > 
> > My ublk server has two threads:
> > 
> > - The main thread: opens /dev/ublk-control and create an io uring. It handles
> >   all control uring cmd (e.g., ADD_DEV, START_DEV, STOP_DEV).
> > 
> > - A worker thread: also creates an io uring. It will send UBLK_U_IO_FETCH_REQ
> >   and UBLK_U_IO_COMMIT_AND_FETCH_REQ uring cmd, and handle IO request.
> > 
> > Main thread will first ADD and START device, then listens for SIGINT signal.
> > Upon receiving SIGINT, main thread issues a STOP_DEV cmd.
> > The worker thread continues running independently and is not directly affected
> > by this signal.
> > 
> > (Implementation detail: I am using Rust and the Tokio async runtime. The SIGINT
> > is handled within the userspace rather than directly inside the signal handler.)
> > 
> > > If that is the case, you may remove the sending of STOP_DEV simply in your
> > > implementation, and let ublk server exit. Then delete the ublk device before
> > > ublk server is exiting to kernel, or let your monitor process remove the
> > > ublk device, which becomes DEAD state after ublk server exits.
> > > 
> > > Otherwise, you may have to investigate why ublk server can't handle IO
> > > command after sending STOP_DEV command.
> > 
> > Based on my previous post [1], **the ublk server gets SIGKILL after sending
> > STOP_DEV command**, so it cannot handle IO request.
> > 
> > In detail:
> > 
> > 1. Flush Kernel Thread: A kworker/flush thread attempts to write dirty pages
> > to the ublk device. It acquires a folio lock but then enters a sleep state
> > in wbt_wait() due to Writeback Throttling (WBT).
> > 
> > 2. STOP_DEV Command: The ublk server issues a STOP_DEV command. This uring cmd
> > is processed by an uring kernel thread.
> > 
> > 3. Resource Conflict: When handle STOP_DEV, the uring kernel thread also tries
> > to flush dirty pages. It needs to acquire the same folio lock held by the flush
> > kworker to submit the I/O.
> > 
> > 4. Termination Signal: At this point, the ublk server receives a SIGKILL.
> 
> It is _not_ related with WBT or write cache.
> 
> When your monitor sends SIGKILL, do_exit() is actually called, but
> io_uring_cancel_generic() blocks on io_wq_put_and_exit() because STOP_DEV
> can't be completed, then do_exit() can't move on, and finally the ublk cancel
> function can't be called, then ublk char device can't be closed.
> 
> Looks something which might be improved in future, but it highly depends on
> (complicated) io_uring's cancel code path, and I believe there isn't
> any way for ublk to get notified when do_exit() happens from `kill -9`.

Ok, the reason I mention about WTB is we find a flush kernel thread, whose name
is kworker/u775:1+flush-259:11. Note that 259:11 is exactly the major and minor
number of our broken ublk device. The stack of this flush kworker is:

[<0>] rq_qos_wait+0xcf/0x180
[<0>] wbt_wait+0xb3/0x100
[<0>] __rq_qos_throttle+0x25/0x40
[<0>] blk_mq_submit_bio+0x168/0x6b0
[<0>] __submit_bio+0xb3/0x1c0
[<0>] submit_bio_noacct_nocheck+0x13c/0x1f0
[<0>] submit_bio_noacct+0x162/0x5b0
[<0>] submit_bio+0xb2/0x110
[<0>] submit_bh_wbc+0x156/0x190
[<0>] __block_write_full_folio+0x1da/0x3d0
[<0>] block_write_full_folio+0x150/0x180
[<0>] write_cache_pages+0x15b/0x430
[<0>] blkdev_writepages+0x6f/0xb0
[<0>] do_writepages+0xcd/0x1f0
[<0>] __writeback_single_inode+0x44/0x290
[<0>] writeback_sb_inodes+0x21b/0x520
[<0>] __writeback_inodes_wb+0x54/0x100
[<0>] wb_writeback+0x2df/0x350
[<0>] wb_do_writeback+0x225/0x2a0
[<0>] wb_workfn+0x5f/0x240
[<0>] process_one_work+0x181/0x3a0
[<0>] worker_thread+0x306/0x440
[<0>] kthread+0xef/0x120
[<0>] ret_from_fork+0x44/0x70
[<0>] ret_from_fork_asm+0x1b/0x30

It is actually get stuck in wbt_wait (writeback throttling), holding a folio lock
of a dirty page.

But after I read more kernel code, I find out: yes, as you said, it does not
related to WBT. Even if there is no such a flush kworker, the ublk server will
still get stuck in io_wq_put_and_exit(), after been SIGKILL.

> > Conclusion:
> > This creates a circular dependency: the flush kworker cannot be woken from WBT
> > because the ublk server is no longer processing I/O requests.
> > Simultaneously, the ublk server cannot exit because it is waiting for the
> > STOP_DEV command to complete. However, the uring kernel thread remains stuck
> > waiting for the folio lock held by the flush kworker.
> > 
> > Should we avoid having the ublk server issue STOP_DEV command by it self? As it
> > is highly prone to deadlocks if a SIGKILL occurs during the shutdown sequence.
> 
> So far there are at least two ways:
> 
> 1) don't provide signal handler for SIGINT, or simply call exit() in the
> signal handler, and ublk block device is removed automatically when ublk
> server is done in case of !USER_RECOVERY.
> 
> OR
> 
> 2) provide signal SIGINT handler to stop device, meantime not send SIGKILL
> in your monitor process. This way works just fine if your ublk server is
> well implemented, which often requires you to handle timeout in the server
> implementation.

Appreciate your suggestions, I will remove my signal handler for SIGINT.

Thanks,
huang-jl

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-01-18 13:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-16 14:15 [BUG] ublk: ublk server hangs in D state during STOP_DEV huang-jl
2026-01-16 14:58 ` Ming Lei
2026-01-17  5:18   ` huang-jl
     [not found]   ` <20260116171613.46312-1-huang-jl@deepseek.com>
2026-01-17  7:44     ` Ming Lei
2026-01-17 11:16       ` Ming Lei
2026-01-17 17:03         ` huang-jl
2026-01-18 11:50           ` Ming Lei
2026-01-18 13:14             ` huang-jl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox