From: Jared Holzman <jholzman@nvidia.com>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Uday Shankar <ushankar@purestorage.com>,
Caleb Sander Mateos <csander@purestorage.com>,
Guy Eisenberg <geisenberg@nvidia.com>,
Yoav Cohen <yoav@nvidia.com>, Omri Levi <omril@nvidia.com>,
Ofer Oshri <ofer@nvidia.com>
Subject: Re: [PATCH V2 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
Date: Sun, 18 May 2025 22:12:29 +0300 [thread overview]
Message-ID: <4395080b-7628-4290-9edd-ba442f0e096b@nvidia.com> (raw)
In-Reply-To: <mruqwpf4tqenkbtgezv5oxwq7ngyq24jzeyqy4ixzvivatbbxv@4oh2wzz4e6qn>
On 14/05/2025 4:50, Shinichiro Kawasaki wrote:
> On Apr 25, 2025 / 09:37, Ming Lei wrote:
>> ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
>> we may have scheduled task work via io_uring_cmd_complete_in_task() for
>> dispatching request, then kernel crash can be triggered.
>>
>> Fix it by not trying to canceling the command if ublk block request is
>> started.
>
> I found that the blktests test case ublk/002 often hangs with the recent
> v6.15-rcX kernel tags with the INFO at iou-wrk-X [1]. The hang is recreated
> in stable manner when I repeat the test case a few times.
>
> I bisected and this patch as the commit f40139fde527 triggers the hang.
> When I reverted the commit from the kernel v6.15-rc6, the hang disappeared.
> (I repeated ublk/002 20 times, and observed no hang.)
>
> The hang was observed with test systems with Fedora 42. I do not observe
> the hang with Fedora 41, but not sure why. liburing version difference
> could be the reason (v2.6 for Fedora 41, v2.9 for Fedora 42).
>
> Actions for fix will be appreciated.
>
> [1]
>
> [ 4497.777695] [ T130863] run blktests ublk/002 at 2025-05-07 14:48:32
> [ 4499.983130] [ T67084] blk_print_req_error: 58 callbacks suppressed
> [ 4499.983136] [ T67084] I/O error, dev ublkb0, sector 106830432 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> [ 4499.985230] [ T67084] I/O error, dev ublkb0, sector 165364248 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> [ 4499.998194] [ T57173] I/O error, dev ublkb0, sector 251279584 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> [ 4671.208372] [ T46] INFO: task iou-wrk-131061:131062 blocked for more than 122 seconds.
> [ 4671.209417] [ T46] Not tainted 6.15.0-rc5 #317
> [ 4671.210182] [ T46] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4671.211194] [ T46] task:iou-wrk-131061 state:D stack:0 pid:131062 tgid:131061 ppid:131060 task_flags:0x404150 flags:0x00004000
> [ 4671.212573] [ T46] Call Trace:
> [ 4671.213364] [ T46] <TASK>
> [ 4671.214047] [ T46] __schedule+0xf8c/0x5c80
> [ 4671.214920] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.215831] [ T46] ? __mutex_lock+0x2d9/0x1b50
> [ 4671.216760] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.217718] [ T46] ? lock_release+0x244/0x300
> [ 4671.218621] [ T46] ? lock_release+0x244/0x300
> [ 4671.219521] [ T46] ? __pfx___schedule+0x10/0x10
> [ 4671.220433] [ T46] ? do_raw_spin_lock+0x124/0x260
> [ 4671.221425] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.222292] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.223121] [ T46] ? lock_release+0x244/0x300
> [ 4671.223984] [ T46] ? lock_release+0x244/0x300
> [ 4671.224844] [ T46] schedule+0xda/0x390
> [ 4671.225667] [ T46] blk_mq_freeze_queue_wait+0x121/0x170
> [ 4671.226653] [ T46] ? __pfx_blk_mq_freeze_queue_wait+0x10/0x10
> [ 4671.227656] [ T46] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4671.228656] [ T46] ? trace_hardirqs_on+0x14/0x150
> [ 4671.229564] [ T46] ? _raw_spin_unlock_irq+0x34/0x50
> [ 4671.230484] [ T46] del_gendisk+0x4f9/0xa20
> [ 4671.231320] [ T46] ? ublk_stop_dev+0x24/0x200 [ublk_drv]
> [ 4671.232379] [ T46] ? __pfx_del_gendisk+0x10/0x10
> [ 4671.233197] [ T46] ? __pfx___mutex_lock+0x10/0x10
> [ 4671.234094] [ T46] ublk_stop_dev_unlocked.part.0+0xb6/0x580 [ublk_drv]
> [ 4671.235153] [ T46] ? __pfx_ublk_stop_dev_unlocked.part.0+0x10/0x10 [ublk_drv]
> [ 4671.236303] [ T46] ? kvm_sched_clock_read+0xd/0x20
> [ 4671.237156] [ T46] ublk_stop_dev+0x62/0x200 [ublk_drv]
> [ 4671.238081] [ T46] ? security_capable+0x79/0x110
> [ 4671.238964] [ T46] ublk_ctrl_uring_cmd+0xe87/0x2a70 [ublk_drv]
> [ 4671.239967] [ T46] ? __pfx_avc_has_perm+0x10/0x10
> [ 4671.240858] [ T46] ? lock_release+0x244/0x300
> [ 4671.241759] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.242677] [ T46] ? __pfx_ublk_ctrl_uring_cmd+0x10/0x10 [ublk_drv]
> [ 4671.243751] [ T46] ? finish_task_switch.isra.0+0x202/0x850
> [ 4671.244728] [ T46] ? selinux_uring_cmd+0x1c3/0x280
> [ 4671.245623] [ T46] ? __schedule+0x2f21/0x5c80
> [ 4671.246504] [ T46] ? __pfx_selinux_uring_cmd+0x10/0x10
> [ 4671.247438] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.248228] [ T46] ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 4671.249109] [ T46] io_uring_cmd+0x221/0x3c0
> [ 4671.249952] [ T46] io_issue_sqe+0x6ab/0x1800
> [ 4671.250767] [ T46] ? trace_hardirqs_on+0x14/0x150
> [ 4671.251634] [ T46] ? __pfx_io_issue_sqe+0x10/0x10
> [ 4671.252502] [ T46] ? __timer_delete_sync+0x16c/0x270
> [ 4671.253392] [ T46] ? __pfx___timer_delete_sync+0x10/0x10
> [ 4671.254402] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.255162] [ T46] io_wq_submit_work+0x3ea/0xf70
> [ 4671.255990] [ T46] io_worker_handle_work+0x615/0x1280
> [ 4671.256854] [ T46] io_wq_worker+0x2ee/0xbd0
> [ 4671.257670] [ T46] ? __pfx_io_wq_worker+0x10/0x10
> [ 4671.258491] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.259285] [ T46] ? lock_acquire+0x2af/0x310
> [ 4671.260035] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.260800] [ T46] ? do_raw_spin_lock+0x124/0x260
> [ 4671.261604] [ T46] ? lock_acquire+0x2af/0x310
> [ 4671.262376] [ T46] ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 4671.263186] [ T46] ? lock_release+0x244/0x300
> [ 4671.263947] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4671.264718] [ T46] ? __pfx_io_wq_worker+0x10/0x10
> [ 4671.265529] [ T46] ret_from_fork+0x30/0x70
> [ 4671.266277] [ T46] ? __pfx_io_wq_worker+0x10/0x10
> [ 4671.267070] [ T46] ret_from_fork_asm+0x1a/0x30
> [ 4671.267855] [ T46] </TASK>
> [ 4671.268495] [ T46] INFO: lockdep is turned off.
> [ 4794.088765] [ T46] INFO: task iou-wrk-131061:131062 blocked for more than 245 seconds.
> [ 4794.090310] [ T46] Not tainted 6.15.0-rc5 #317
> [ 4794.091453] [ T46] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4794.093379] [ T46] task:iou-wrk-131061 state:D stack:0 pid:131062 tgid:131061 ppid:131060 task_flags:0x404150 flags:0x00004000
> [ 4794.095353] [ T46] Call Trace:
> [ 4794.096142] [ T46] <TASK>
> [ 4794.096929] [ T46] __schedule+0xf8c/0x5c80
> [ 4794.097878] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.098869] [ T46] ? __mutex_lock+0x2d9/0x1b50
> [ 4794.099866] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.100870] [ T46] ? lock_release+0x244/0x300
> [ 4794.101857] [ T46] ? lock_release+0x244/0x300
> [ 4794.102797] [ T46] ? __pfx___schedule+0x10/0x10
> [ 4794.103726] [ T46] ? do_raw_spin_lock+0x124/0x260
> [ 4794.104752] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.105744] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.106678] [ T46] ? lock_release+0x244/0x300
> [ 4794.107575] [ T46] ? lock_release+0x244/0x300
> [ 4794.108528] [ T46] schedule+0xda/0x390
> [ 4794.109391] [ T46] blk_mq_freeze_queue_wait+0x121/0x170
> [ 4794.110437] [ T46] ? __pfx_blk_mq_freeze_queue_wait+0x10/0x10
> [ 4794.111527] [ T46] ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4794.112662] [ T46] ? trace_hardirqs_on+0x14/0x150
> [ 4794.113626] [ T46] ? _raw_spin_unlock_irq+0x34/0x50
> [ 4794.114644] [ T46] del_gendisk+0x4f9/0xa20
> [ 4794.115518] [ T46] ? ublk_stop_dev+0x24/0x200 [ublk_drv]
> [ 4794.116548] [ T46] ? __pfx_del_gendisk+0x10/0x10
> [ 4794.117536] [ T46] ? __pfx___mutex_lock+0x10/0x10
> [ 4794.118521] [ T46] ublk_stop_dev_unlocked.part.0+0xb6/0x580 [ublk_drv]
> [ 4794.119749] [ T46] ? __pfx_ublk_stop_dev_unlocked.part.0+0x10/0x10 [ublk_drv]
> [ 4794.121060] [ T46] ? kvm_sched_clock_read+0xd/0x20
> [ 4794.122081] [ T46] ublk_stop_dev+0x62/0x200 [ublk_drv]
> [ 4794.123102] [ T46] ? security_capable+0x79/0x110
> [ 4794.124139] [ T46] ublk_ctrl_uring_cmd+0xe87/0x2a70 [ublk_drv]
> [ 4794.125225] [ T46] ? __pfx_avc_has_perm+0x10/0x10
> [ 4794.126054] [ T46] ? lock_release+0x244/0x300
> [ 4794.126850] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.127579] [ T46] ? __pfx_ublk_ctrl_uring_cmd+0x10/0x10 [ublk_drv]
> [ 4794.128512] [ T46] ? finish_task_switch.isra.0+0x202/0x850
> [ 4794.129379] [ T46] ? selinux_uring_cmd+0x1c3/0x280
> [ 4794.130178] [ T46] ? __schedule+0x2f21/0x5c80
> [ 4794.130926] [ T46] ? __pfx_selinux_uring_cmd+0x10/0x10
> [ 4794.131701] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.132357] [ T46] ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 4794.133159] [ T46] io_uring_cmd+0x221/0x3c0
> [ 4794.133931] [ T46] io_issue_sqe+0x6ab/0x1800
> [ 4794.134653] [ T46] ? trace_hardirqs_on+0x14/0x150
> [ 4794.135417] [ T46] ? __pfx_io_issue_sqe+0x10/0x10
> [ 4794.136191] [ T46] ? __timer_delete_sync+0x16c/0x270
> [ 4794.136984] [ T46] ? __pfx___timer_delete_sync+0x10/0x10
> [ 4794.137833] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.138527] [ T46] io_wq_submit_work+0x3ea/0xf70
> [ 4794.139268] [ T46] io_worker_handle_work+0x615/0x1280
> [ 4794.140062] [ T46] io_wq_worker+0x2ee/0xbd0
> [ 4794.140806] [ T46] ? __pfx_io_wq_worker+0x10/0x10
> [ 4794.141478] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.142189] [ T46] ? lock_acquire+0x2af/0x310
> [ 4794.142930] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.143663] [ T46] ? do_raw_spin_lock+0x124/0x260
> [ 4794.144363] [ T46] ? lock_acquire+0x2af/0x310
> [ 4794.145087] [ T46] ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 4794.145897] [ T46] ? lock_release+0x244/0x300
> [ 4794.146617] [ T46] ? rcu_is_watching+0x11/0xb0
> [ 4794.147340] [ T46] ? __pfx_io_wq_worker+0x10/0x10
> [ 4794.148070] [ T46] ret_from_fork+0x30/0x70
> [ 4794.148817] [ T46] ? __pfx_io_wq_worker+0x10/0x10
> [ 4794.149576] [ T46] ret_from_fork_asm+0x1a/0x30
> [ 4794.150307] [ T46] </TASK>
> [ 4794.150959] [ T46] INFO: lockdep is turned off.
> ...
Hi Shinichiro,
Are you referring to test_generic_02.sh in tools/testing/selftests/ublk?
I tried running it 20 times and it didn't get stuck
Kernel: 6.15.0-rc6
liburing: v2.9
Jared
next prev parent reply other threads:[~2025-05-18 19:12 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-25 1:37 [PATCH V2 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-25 1:37 ` [PATCH V2 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA Ming Lei
2025-04-25 1:37 ` [PATCH V2 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-05-14 1:50 ` Shinichiro Kawasaki
2025-05-18 19:12 ` Jared Holzman [this message]
2025-05-18 23:54 ` Shinichiro Kawasaki
2025-05-19 2:45 ` Ming Lei
2025-04-25 1:53 ` [PATCH V2 0/2] " Jens Axboe
2025-04-28 13:31 ` Jared Holzman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4395080b-7628-4290-9edd-ba442f0e096b@nvidia.com \
--to=jholzman@nvidia.com \
--cc=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=geisenberg@nvidia.com \
--cc=linux-block@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=ofer@nvidia.com \
--cc=omril@nvidia.com \
--cc=shinichiro.kawasaki@wdc.com \
--cc=ushankar@purestorage.com \
--cc=yoav@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox