public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Jared Holzman <jholzman@nvidia.com>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>,
	Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Uday Shankar <ushankar@purestorage.com>,
	Caleb Sander Mateos <csander@purestorage.com>,
	Guy Eisenberg <geisenberg@nvidia.com>,
	Yoav Cohen <yoav@nvidia.com>, Omri Levi <omril@nvidia.com>,
	Ofer Oshri <ofer@nvidia.com>
Subject: Re: [PATCH V2 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd
Date: Sun, 18 May 2025 22:12:29 +0300	[thread overview]
Message-ID: <4395080b-7628-4290-9edd-ba442f0e096b@nvidia.com> (raw)
In-Reply-To: <mruqwpf4tqenkbtgezv5oxwq7ngyq24jzeyqy4ixzvivatbbxv@4oh2wzz4e6qn>

On 14/05/2025 4:50, Shinichiro Kawasaki wrote:
> On Apr 25, 2025 / 09:37, Ming Lei wrote:
>> ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
>> we may have scheduled task work via io_uring_cmd_complete_in_task() for
>> dispatching request, then kernel crash can be triggered.
>>
>> Fix it by not trying to canceling the command if ublk block request is
>> started.
> 
> I found that the blktests test case ublk/002 often hangs with the recent
> v6.15-rcX kernel tags with the INFO at iou-wrk-X [1]. The hang is recreated
> in stable manner when I repeat the test case a few times.
> 
> I bisected and this patch as the commit f40139fde527 triggers the hang.
> When I reverted the commit from the kernel v6.15-rc6, the hang disappeared.
> (I repeated ublk/002 20 times, and observed no hang.)
> 
> The hang was observed with test systems with Fedora 42. I do not observe
> the hang with Fedora 41, but not sure why. liburing version difference
> could be the reason (v2.6 for Fedora 41, v2.9 for Fedora 42).
> 
> Actions for fix will be appreciated.
> 
> [1]
> 
> [ 4497.777695] [ T130863] run blktests ublk/002 at 2025-05-07 14:48:32
> [ 4499.983130] [  T67084] blk_print_req_error: 58 callbacks suppressed
> [ 4499.983136] [  T67084] I/O error, dev ublkb0, sector 106830432 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> [ 4499.985230] [  T67084] I/O error, dev ublkb0, sector 165364248 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> [ 4499.998194] [  T57173] I/O error, dev ublkb0, sector 251279584 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
> [ 4671.208372] [     T46] INFO: task iou-wrk-131061:131062 blocked for more than 122 seconds.
> [ 4671.209417] [     T46]       Not tainted 6.15.0-rc5 #317
> [ 4671.210182] [     T46] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4671.211194] [     T46] task:iou-wrk-131061  state:D stack:0     pid:131062 tgid:131061 ppid:131060 task_flags:0x404150 flags:0x00004000
> [ 4671.212573] [     T46] Call Trace:
> [ 4671.213364] [     T46]  <TASK>
> [ 4671.214047] [     T46]  __schedule+0xf8c/0x5c80
> [ 4671.214920] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.215831] [     T46]  ? __mutex_lock+0x2d9/0x1b50
> [ 4671.216760] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.217718] [     T46]  ? lock_release+0x244/0x300
> [ 4671.218621] [     T46]  ? lock_release+0x244/0x300
> [ 4671.219521] [     T46]  ? __pfx___schedule+0x10/0x10
> [ 4671.220433] [     T46]  ? do_raw_spin_lock+0x124/0x260
> [ 4671.221425] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.222292] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.223121] [     T46]  ? lock_release+0x244/0x300
> [ 4671.223984] [     T46]  ? lock_release+0x244/0x300
> [ 4671.224844] [     T46]  schedule+0xda/0x390
> [ 4671.225667] [     T46]  blk_mq_freeze_queue_wait+0x121/0x170
> [ 4671.226653] [     T46]  ? __pfx_blk_mq_freeze_queue_wait+0x10/0x10
> [ 4671.227656] [     T46]  ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4671.228656] [     T46]  ? trace_hardirqs_on+0x14/0x150
> [ 4671.229564] [     T46]  ? _raw_spin_unlock_irq+0x34/0x50
> [ 4671.230484] [     T46]  del_gendisk+0x4f9/0xa20
> [ 4671.231320] [     T46]  ? ublk_stop_dev+0x24/0x200 [ublk_drv]
> [ 4671.232379] [     T46]  ? __pfx_del_gendisk+0x10/0x10
> [ 4671.233197] [     T46]  ? __pfx___mutex_lock+0x10/0x10
> [ 4671.234094] [     T46]  ublk_stop_dev_unlocked.part.0+0xb6/0x580 [ublk_drv]
> [ 4671.235153] [     T46]  ? __pfx_ublk_stop_dev_unlocked.part.0+0x10/0x10 [ublk_drv]
> [ 4671.236303] [     T46]  ? kvm_sched_clock_read+0xd/0x20
> [ 4671.237156] [     T46]  ublk_stop_dev+0x62/0x200 [ublk_drv]
> [ 4671.238081] [     T46]  ? security_capable+0x79/0x110
> [ 4671.238964] [     T46]  ublk_ctrl_uring_cmd+0xe87/0x2a70 [ublk_drv]
> [ 4671.239967] [     T46]  ? __pfx_avc_has_perm+0x10/0x10
> [ 4671.240858] [     T46]  ? lock_release+0x244/0x300
> [ 4671.241759] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.242677] [     T46]  ? __pfx_ublk_ctrl_uring_cmd+0x10/0x10 [ublk_drv]
> [ 4671.243751] [     T46]  ? finish_task_switch.isra.0+0x202/0x850
> [ 4671.244728] [     T46]  ? selinux_uring_cmd+0x1c3/0x280
> [ 4671.245623] [     T46]  ? __schedule+0x2f21/0x5c80
> [ 4671.246504] [     T46]  ? __pfx_selinux_uring_cmd+0x10/0x10
> [ 4671.247438] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.248228] [     T46]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 4671.249109] [     T46]  io_uring_cmd+0x221/0x3c0
> [ 4671.249952] [     T46]  io_issue_sqe+0x6ab/0x1800
> [ 4671.250767] [     T46]  ? trace_hardirqs_on+0x14/0x150
> [ 4671.251634] [     T46]  ? __pfx_io_issue_sqe+0x10/0x10
> [ 4671.252502] [     T46]  ? __timer_delete_sync+0x16c/0x270
> [ 4671.253392] [     T46]  ? __pfx___timer_delete_sync+0x10/0x10
> [ 4671.254402] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.255162] [     T46]  io_wq_submit_work+0x3ea/0xf70
> [ 4671.255990] [     T46]  io_worker_handle_work+0x615/0x1280
> [ 4671.256854] [     T46]  io_wq_worker+0x2ee/0xbd0
> [ 4671.257670] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
> [ 4671.258491] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.259285] [     T46]  ? lock_acquire+0x2af/0x310
> [ 4671.260035] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.260800] [     T46]  ? do_raw_spin_lock+0x124/0x260
> [ 4671.261604] [     T46]  ? lock_acquire+0x2af/0x310
> [ 4671.262376] [     T46]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 4671.263186] [     T46]  ? lock_release+0x244/0x300
> [ 4671.263947] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4671.264718] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
> [ 4671.265529] [     T46]  ret_from_fork+0x30/0x70
> [ 4671.266277] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
> [ 4671.267070] [     T46]  ret_from_fork_asm+0x1a/0x30
> [ 4671.267855] [     T46]  </TASK>
> [ 4671.268495] [     T46] INFO: lockdep is turned off.
> [ 4794.088765] [     T46] INFO: task iou-wrk-131061:131062 blocked for more than 245 seconds.
> [ 4794.090310] [     T46]       Not tainted 6.15.0-rc5 #317
> [ 4794.091453] [     T46] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 4794.093379] [     T46] task:iou-wrk-131061  state:D stack:0     pid:131062 tgid:131061 ppid:131060 task_flags:0x404150 flags:0x00004000
> [ 4794.095353] [     T46] Call Trace:
> [ 4794.096142] [     T46]  <TASK>
> [ 4794.096929] [     T46]  __schedule+0xf8c/0x5c80
> [ 4794.097878] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.098869] [     T46]  ? __mutex_lock+0x2d9/0x1b50
> [ 4794.099866] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.100870] [     T46]  ? lock_release+0x244/0x300
> [ 4794.101857] [     T46]  ? lock_release+0x244/0x300
> [ 4794.102797] [     T46]  ? __pfx___schedule+0x10/0x10
> [ 4794.103726] [     T46]  ? do_raw_spin_lock+0x124/0x260
> [ 4794.104752] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.105744] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.106678] [     T46]  ? lock_release+0x244/0x300
> [ 4794.107575] [     T46]  ? lock_release+0x244/0x300
> [ 4794.108528] [     T46]  schedule+0xda/0x390
> [ 4794.109391] [     T46]  blk_mq_freeze_queue_wait+0x121/0x170
> [ 4794.110437] [     T46]  ? __pfx_blk_mq_freeze_queue_wait+0x10/0x10
> [ 4794.111527] [     T46]  ? __pfx_autoremove_wake_function+0x10/0x10
> [ 4794.112662] [     T46]  ? trace_hardirqs_on+0x14/0x150
> [ 4794.113626] [     T46]  ? _raw_spin_unlock_irq+0x34/0x50
> [ 4794.114644] [     T46]  del_gendisk+0x4f9/0xa20
> [ 4794.115518] [     T46]  ? ublk_stop_dev+0x24/0x200 [ublk_drv]
> [ 4794.116548] [     T46]  ? __pfx_del_gendisk+0x10/0x10
> [ 4794.117536] [     T46]  ? __pfx___mutex_lock+0x10/0x10
> [ 4794.118521] [     T46]  ublk_stop_dev_unlocked.part.0+0xb6/0x580 [ublk_drv]
> [ 4794.119749] [     T46]  ? __pfx_ublk_stop_dev_unlocked.part.0+0x10/0x10 [ublk_drv]
> [ 4794.121060] [     T46]  ? kvm_sched_clock_read+0xd/0x20
> [ 4794.122081] [     T46]  ublk_stop_dev+0x62/0x200 [ublk_drv]
> [ 4794.123102] [     T46]  ? security_capable+0x79/0x110
> [ 4794.124139] [     T46]  ublk_ctrl_uring_cmd+0xe87/0x2a70 [ublk_drv]
> [ 4794.125225] [     T46]  ? __pfx_avc_has_perm+0x10/0x10
> [ 4794.126054] [     T46]  ? lock_release+0x244/0x300
> [ 4794.126850] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.127579] [     T46]  ? __pfx_ublk_ctrl_uring_cmd+0x10/0x10 [ublk_drv]
> [ 4794.128512] [     T46]  ? finish_task_switch.isra.0+0x202/0x850
> [ 4794.129379] [     T46]  ? selinux_uring_cmd+0x1c3/0x280
> [ 4794.130178] [     T46]  ? __schedule+0x2f21/0x5c80
> [ 4794.130926] [     T46]  ? __pfx_selinux_uring_cmd+0x10/0x10
> [ 4794.131701] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.132357] [     T46]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 4794.133159] [     T46]  io_uring_cmd+0x221/0x3c0
> [ 4794.133931] [     T46]  io_issue_sqe+0x6ab/0x1800
> [ 4794.134653] [     T46]  ? trace_hardirqs_on+0x14/0x150
> [ 4794.135417] [     T46]  ? __pfx_io_issue_sqe+0x10/0x10
> [ 4794.136191] [     T46]  ? __timer_delete_sync+0x16c/0x270
> [ 4794.136984] [     T46]  ? __pfx___timer_delete_sync+0x10/0x10
> [ 4794.137833] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.138527] [     T46]  io_wq_submit_work+0x3ea/0xf70
> [ 4794.139268] [     T46]  io_worker_handle_work+0x615/0x1280
> [ 4794.140062] [     T46]  io_wq_worker+0x2ee/0xbd0
> [ 4794.140806] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
> [ 4794.141478] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.142189] [     T46]  ? lock_acquire+0x2af/0x310
> [ 4794.142930] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.143663] [     T46]  ? do_raw_spin_lock+0x124/0x260
> [ 4794.144363] [     T46]  ? lock_acquire+0x2af/0x310
> [ 4794.145087] [     T46]  ? __pfx_do_raw_spin_lock+0x10/0x10
> [ 4794.145897] [     T46]  ? lock_release+0x244/0x300
> [ 4794.146617] [     T46]  ? rcu_is_watching+0x11/0xb0
> [ 4794.147340] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
> [ 4794.148070] [     T46]  ret_from_fork+0x30/0x70
> [ 4794.148817] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
> [ 4794.149576] [     T46]  ret_from_fork_asm+0x1a/0x30
> [ 4794.150307] [     T46]  </TASK>
> [ 4794.150959] [     T46] INFO: lockdep is turned off.
> ...

Hi Shinichiro,

Are you referring to test_generic_02.sh in tools/testing/selftests/ublk?

I tried running it 20 times and it didn't get stuck

Kernel: 6.15.0-rc6
liburing: v2.9

Jared

  reply	other threads:[~2025-05-18 19:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-25  1:37 [PATCH V2 0/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-04-25  1:37 ` [PATCH V2 1/2] ublk: call ublk_dispatch_req() for handling UBLK_U_IO_NEED_GET_DATA Ming Lei
2025-04-25  1:37 ` [PATCH V2 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd Ming Lei
2025-05-14  1:50   ` Shinichiro Kawasaki
2025-05-18 19:12     ` Jared Holzman [this message]
2025-05-18 23:54       ` Shinichiro Kawasaki
2025-05-19  2:45       ` Ming Lei
2025-04-25  1:53 ` [PATCH V2 0/2] " Jens Axboe
2025-04-28 13:31   ` Jared Holzman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4395080b-7628-4290-9edd-ba442f0e096b@nvidia.com \
    --to=jholzman@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=geisenberg@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=ofer@nvidia.com \
    --cc=omril@nvidia.com \
    --cc=shinichiro.kawasaki@wdc.com \
    --cc=ushankar@purestorage.com \
    --cc=yoav@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox