public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>
Subject: [bug report] block/005 hangs with NVMe device and linux-block/for-next
Date: Mon, 1 Nov 2021 08:34:18 +0000	[thread overview]
Message-ID: <20211101083417.fcttizyxpahrcgov@shindev> (raw)

I tried the latest linux-block/for-next branch tip (git hash b43fadb6631f and
observed a process hang during blktests block/005 run on a NVMe device.
Kernel message reported "INFO: task check:1224 blocked for more than 122
seconds." with call trace [1]. So far, the hang is 100% reproducible with my
system. This hang is not observed with HDDs or null_blk devices.

I bisected and found the commit 4f5022453acd ("nvme: wire up completion batching
for the IRQ path") triggers the hang. When I revert this commit from the
for-next branch tip, the hang disappears. The block/005 test case does IO
scheduler switch during IO, and the completion path change by the commit looks
affecting the scheduler switch. Comments for solution will be appreciated.

[1]

[  121.677521] run blktests block/005 at 2021-11-01 16:24:21
[  371.819391] INFO: task check:1224 blocked for more than 122 seconds.
[  371.826660]       Not tainted 5.15.0-rc6+ #41
[  371.831786] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  371.840349] task:check           state:D stack:    0 pid: 1224 ppid:  1223 flags:0x00000000
[  371.849471] Call Trace:
[  371.852680]  __schedule+0x9e2/0x2240
[  371.857052]  ? io_schedule_timeout+0x190/0x190
[  371.862330]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[  371.868070]  schedule+0xdd/0x280
[  371.872062]  blk_mq_freeze_queue_wait+0xc0/0xf0
[  371.877351]  ? blk_mq_queue_inflight+0x70/0x70
[  371.882554]  ? blk_mq_run_hw_queues+0x13b/0x410
[  371.887856]  ? finish_wait+0x270/0x270
[  371.892363]  elevator_switch+0x4a/0xa0
[  371.896857]  elv_iosched_store+0x16e/0x3c0
[  371.901698]  ? elevator_init_mq+0x3f0/0x3f0
[  371.906643]  ? lock_is_held_type+0xe0/0x110
[  371.911556]  ? lock_is_held_type+0xe0/0x110
[  371.916473]  queue_attr_store+0x8b/0xd0
[  371.921050]  ? sysfs_file_ops+0x170/0x170
[  371.925789]  kernfs_fop_write_iter+0x2c7/0x460
[  371.930965]  new_sync_write+0x359/0x5e0
[  371.935533]  ? new_sync_read+0x5d0/0x5d0
[  371.940177]  ? ksys_write+0xe9/0x1b0
[  371.944483]  ? lock_release+0x690/0x690
[  371.949046]  ? __cond_resched+0x15/0x30
[  371.953604]  ? inode_security+0x56/0xf0
[  371.958171]  ? lock_is_held_type+0xe0/0x110
[  371.963075]  vfs_write+0x5e4/0x8e0
[  371.967196]  ksys_write+0xe9/0x1b0
[  371.971321]  ? __ia32_sys_read+0xa0/0xa0
[  371.975962]  ? syscall_enter_from_user_mode+0x21/0x70
[  371.981733]  do_syscall_64+0x3b/0x90
[  371.986028]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  371.991799] RIP: 0033:0x7f24adda7387
[  371.996085] RSP: 002b:00007ffd9c80f2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  372.004366] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f24adda7387
[  372.012213] RDX: 0000000000000005 RSI: 0000562156826120 RDI: 0000000000000001
[  372.020060] RBP: 0000562156826120 R08: 000000000000000a R09: 00007f24ade3d4e0
[  372.027901] R10: 00007f24ade3d3e0 R11: 0000000000000246 R12: 0000000000000005
[  372.035746] R13: 00007f24ade7a520 R14: 0000000000000005 R15: 00007f24ade7a700
[  372.043629] 
               Showing all locks held in the system:
[  372.051235] 1 lock held by khungtaskd/62:
[  372.055960]  #0: ffffffffa64483c0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x269
[  372.065649] 1 lock held by systemd-journal/584:
[  372.070902] 4 locks held by check/1224:
[  372.075452]  #0: ffff888118fc4460 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0xe9/0x1b0
[  372.084177]  #1: ffff888120717088 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x216/0x460
[  372.093771]  #2: ffff8881480a0a00 (kn->active#131){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x239/0x460
[  372.103707]  #3: ffff888148d66858 (&q->sysfs_lock){+.+.}-{3:3}, at: queue_attr_store+0x5d/0xd0

[  372.115258] =============================================

-- 
Best Regards,
Shin'ichiro Kawasaki

             reply	other threads:[~2021-11-01  8:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01  8:34 Shinichiro Kawasaki [this message]
2021-11-01 12:41 ` [bug report] block/005 hangs with NVMe device and linux-block/for-next Jens Axboe
2021-11-01 23:01   ` Jens Axboe
2021-11-02  2:22     ` Shinichiro Kawasaki
2021-11-02  3:07       ` Chaitanya Kulkarni
2021-11-02  8:19         ` Shinichiro Kawasaki
2021-11-02  8:28           ` Damien Le Moal
2021-11-02  3:44       ` Ming Lei
2021-11-02  8:28         ` Shinichiro Kawasaki
2021-11-02  9:02           ` Shinichiro Kawasaki
2021-11-02 10:48             ` Ming Lei
2021-11-02 11:24               ` Shinichiro Kawasaki
2021-11-02 12:26                 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211101083417.fcttizyxpahrcgov@shindev \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox