From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>,
"axboe@kernel.dk" <axboe@kernel.dk>
Subject: [bug report] block/005 hangs with NVMe device and linux-block/for-next
Date: Mon, 1 Nov 2021 08:34:18 +0000 [thread overview]
Message-ID: <20211101083417.fcttizyxpahrcgov@shindev> (raw)
I tried the latest linux-block/for-next branch tip (git hash b43fadb6631f and
observed a process hang during blktests block/005 run on a NVMe device.
Kernel message reported "INFO: task check:1224 blocked for more than 122
seconds." with call trace [1]. So far, the hang is 100% reproducible with my
system. This hang is not observed with HDDs or null_blk devices.
I bisected and found the commit 4f5022453acd ("nvme: wire up completion batching
for the IRQ path") triggers the hang. When I revert this commit from the
for-next branch tip, the hang disappears. The block/005 test case does IO
scheduler switch during IO, and the completion path change by the commit looks
affecting the scheduler switch. Comments for solution will be appreciated.
[1]
[ 121.677521] run blktests block/005 at 2021-11-01 16:24:21
[ 371.819391] INFO: task check:1224 blocked for more than 122 seconds.
[ 371.826660] Not tainted 5.15.0-rc6+ #41
[ 371.831786] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 371.840349] task:check state:D stack: 0 pid: 1224 ppid: 1223 flags:0x00000000
[ 371.849471] Call Trace:
[ 371.852680] __schedule+0x9e2/0x2240
[ 371.857052] ? io_schedule_timeout+0x190/0x190
[ 371.862330] ? _raw_spin_unlock_irqrestore+0x37/0x40
[ 371.868070] schedule+0xdd/0x280
[ 371.872062] blk_mq_freeze_queue_wait+0xc0/0xf0
[ 371.877351] ? blk_mq_queue_inflight+0x70/0x70
[ 371.882554] ? blk_mq_run_hw_queues+0x13b/0x410
[ 371.887856] ? finish_wait+0x270/0x270
[ 371.892363] elevator_switch+0x4a/0xa0
[ 371.896857] elv_iosched_store+0x16e/0x3c0
[ 371.901698] ? elevator_init_mq+0x3f0/0x3f0
[ 371.906643] ? lock_is_held_type+0xe0/0x110
[ 371.911556] ? lock_is_held_type+0xe0/0x110
[ 371.916473] queue_attr_store+0x8b/0xd0
[ 371.921050] ? sysfs_file_ops+0x170/0x170
[ 371.925789] kernfs_fop_write_iter+0x2c7/0x460
[ 371.930965] new_sync_write+0x359/0x5e0
[ 371.935533] ? new_sync_read+0x5d0/0x5d0
[ 371.940177] ? ksys_write+0xe9/0x1b0
[ 371.944483] ? lock_release+0x690/0x690
[ 371.949046] ? __cond_resched+0x15/0x30
[ 371.953604] ? inode_security+0x56/0xf0
[ 371.958171] ? lock_is_held_type+0xe0/0x110
[ 371.963075] vfs_write+0x5e4/0x8e0
[ 371.967196] ksys_write+0xe9/0x1b0
[ 371.971321] ? __ia32_sys_read+0xa0/0xa0
[ 371.975962] ? syscall_enter_from_user_mode+0x21/0x70
[ 371.981733] do_syscall_64+0x3b/0x90
[ 371.986028] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 371.991799] RIP: 0033:0x7f24adda7387
[ 371.996085] RSP: 002b:00007ffd9c80f2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 372.004366] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f24adda7387
[ 372.012213] RDX: 0000000000000005 RSI: 0000562156826120 RDI: 0000000000000001
[ 372.020060] RBP: 0000562156826120 R08: 000000000000000a R09: 00007f24ade3d4e0
[ 372.027901] R10: 00007f24ade3d3e0 R11: 0000000000000246 R12: 0000000000000005
[ 372.035746] R13: 00007f24ade7a520 R14: 0000000000000005 R15: 00007f24ade7a700
[ 372.043629]
Showing all locks held in the system:
[ 372.051235] 1 lock held by khungtaskd/62:
[ 372.055960] #0: ffffffffa64483c0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x269
[ 372.065649] 1 lock held by systemd-journal/584:
[ 372.070902] 4 locks held by check/1224:
[ 372.075452] #0: ffff888118fc4460 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xe9/0x1b0
[ 372.084177] #1: ffff888120717088 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460
[ 372.093771] #2: ffff8881480a0a00 (kn->active#131){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x239/0x460
[ 372.103707] #3: ffff888148d66858 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x5d/0xd0
[ 372.115258] =============================================
--
Best Regards,
Shin'ichiro Kawasaki
next reply other threads:[~2021-11-01 8:34 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-01 8:34 Shinichiro Kawasaki [this message]
2021-11-01 12:41 ` [bug report] block/005 hangs with NVMe device and linux-block/for-next Jens Axboe
2021-11-01 23:01 ` Jens Axboe
2021-11-02 2:22 ` Shinichiro Kawasaki
2021-11-02 3:07 ` Chaitanya Kulkarni
2021-11-02 8:19 ` Shinichiro Kawasaki
2021-11-02 8:28 ` Damien Le Moal
2021-11-02 3:44 ` Ming Lei
2021-11-02 8:28 ` Shinichiro Kawasaki
2021-11-02 9:02 ` Shinichiro Kawasaki
2021-11-02 10:48 ` Ming Lei
2021-11-02 11:24 ` Shinichiro Kawasaki
2021-11-02 12:26 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211101083417.fcttizyxpahrcgov@shindev \
--to=shinichiro.kawasaki@wdc.com \
--cc=Damien.LeMoal@wdc.com \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox