public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Damien Le Moal <Damien.LeMoal@wdc.com>
Subject: Re: [bug report] block/005 hangs with NVMe device and linux-block/for-next
Date: Tue, 2 Nov 2021 08:28:47 +0000	[thread overview]
Message-ID: <20211102082846.m632phnsaqnwtaec@shindev> (raw)
In-Reply-To: <YYC0ESdW1+B/dDTs@T590>

On Nov 02, 2021 / 11:44, Ming Lei wrote:
> On Tue, Nov 02, 2021 at 02:22:15AM +0000, Shinichiro Kawasaki wrote:
> > On Nov 01, 2021 / 17:01, Jens Axboe wrote:
> > > On 11/1/21 6:41 AM, Jens Axboe wrote:
> > > > On 11/1/21 2:34 AM, Shinichiro Kawasaki wrote:
> > > >> I tried the latest linux-block/for-next branch tip (git hash b43fadb6631f and
> > > >> observed a process hang during blktests block/005 run on a NVMe device.
> > > >> Kernel message reported "INFO: task check:1224 blocked for more than 122
> > > >> seconds." with call trace [1]. So far, the hang is 100% reproducible with my
> > > >> system. This hang is not observed with HDDs or null_blk devices.
> > > >>
> > > >> I bisected and found the commit 4f5022453acd ("nvme: wire up completion batching
> > > >> for the IRQ path") triggers the hang. When I revert this commit from the
> > > >> for-next branch tip, the hang disappears. The block/005 test case does IO
> > > >> scheduler switch during IO, and the completion path change by the commit looks
> > > >> affecting the scheduler switch. Comments for solution will be appreciated.
> > > > 
> > > > I'll take a look at this.
> > > 
> > > I've tried running various things most of the day, and I cannot
> > > reproduce this issue nor do I see what it could be. Even if requests are
> > > split between batched completion and one-by-one completion, it works
> > > just fine for me. No special care needs to be taken for put_many() on
> > > the queue reference, as the wake_up() happens for the ref going to zero.
> > > 
> > > Tell me more about your setup. What does the runtimes of the test look
> > > like? Do you have all schedulers enabled? What kind of NVMe device is
> > > this?
> > 
> > Thank you for spending your precious time. With the kernel without the hang,
> > the test case completes around 20 seconds. When the hang happens, the check
> > script process stops at blk_mq_freeze_queue_wait() at scheduler change, and fio
> > workload processes stop at __blkdev_direct_IO_simple(). The test case does not
> > end, so I need to reboot the system for the next trial. While waiting the test
> > case completion, the kernel repeats the same INFO message every 2 minutes.
> > 
> > Regarding the scheduler, I compiled the kernel with mq-deadline and kyber.
> > 
> > The NVMe device I use is a U.2 NVMe ZNS SSD. It has a zoned name space and
> > a regular name space, and the hang is observed with both name spaces. I have
> > not yet tried other NVME devices, so I will try them.
> > 
> > > 
> > > FWIW, this is upstream now, so testing with Linus -git would be
> > > preferable.
> > 
> > I see. I have switched from linux-block for-next branch to the upstream branch
> > of Linus. At git hash 879dbe9ffebc, and still the hang is observed.
> 
> Can you post the blk-mq debugfs log after the hang is triggered?
> 
> (cd /sys/kernel/debug/block/nvme0n1 && find . -type f -exec grep -aH . {} \;)

Thanks Ming. When I ran the command above, the grep command stopped when it
opened tag related files in the debugfs tree. That grep command looked hanking
also. So I used the find command below instead to exclude the tag related files.

# find . -type f -not -name *tag* -exec grep -aH . {} \;

Here I share the captured log.

./sched/queued:0 8 0
./sched/owned_by_driver:0 8 0
./sched/async_depth:192
./sched/starved:0
./sched/batching:1
./rqos/wbt/wb_background:4
./rqos/wbt/wb_normal:8
./rqos/wbt/unknown_cnt:0
./rqos/wbt/min_lat_nsec:2000000
./rqos/wbt/inflight:0: inflight 0
./rqos/wbt/inflight:1: inflight 0
./rqos/wbt/inflight:2: inflight 0
./rqos/wbt/id:0
./rqos/wbt/enabled:1
./rqos/wbt/curr_win_nsec:0
./hctx7/type:default
./hctx7/dispatch_busy:7
./hctx7/active:1024
./hctx7/run:5
./hctx7/ctx_map:00000000: 00
./hctx7/dispatch:000000003dfed3fd {.op=READ, .cmd_flags=, .rq_flags=STARTED|ELVPRIV|IO_STAT|22, .state=idle, .tag=-1, .internal_tag=192}
./hctx7/dispatch:0000000077876d9e {.op=READ, .cmd_flags=, .rq_flags=STARTED|ELVPRIV|IO_STAT|22, .state=idle, .tag=-1, .internal_tag=193}
./hctx7/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED
./hctx7/state:TAG_ACTIVE|SCHED_RESTART
./hctx6/type:default
./hctx6/dispatch_busy:5
./hctx6/active:1025
./hctx6/run:4
./hctx6/ctx_map:00000000: 00
./hctx6/dispatch:00000000c0b8e1c9 {.op=READ, .cmd_flags=, .rq_flags=STARTED|ELVPRIV|IO_STAT|22, .state=idle, .tag=-1, .internal_tag=192}
./hctx6/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED
./hctx6/state:TAG_ACTIVE|SCHED_RESTART
./hctx5/type:default
./hctx5/dispatch_busy:5
./hctx5/active:1024
./hctx5/run:4
./hctx5/ctx_map:00000000: 00
./hctx5/dispatch:00000000aaf1e364 {.op=READ, .cmd_flags=, .rq_flags=STARTED|ELVPRIV|IO_STAT|22, .state=idle, .tag=-1, .internal_tag=128}
./hctx5/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED
./hctx5/state:TAG_ACTIVE|SCHED_RESTART
./hctx4/type:default
./hctx4/dispatch_busy:0
./hctx4/active:1023
./hctx4/run:1
./hctx4/ctx_map:00000000: 00
./hctx4/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED
./hctx3/type:default
./hctx3/dispatch_busy:5
./hctx3/active:1024
./hctx3/run:4
./hctx3/ctx_map:00000000: 00
./hctx3/dispatch:000000008b07d5e1 {.op=READ, .cmd_flags=, .rq_flags=STARTED|ELVPRIV|IO_STAT|22, .state=idle, .tag=-1, .internal_tag=128}
./hctx3/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED
./hctx3/state:TAG_ACTIVE|SCHED_RESTART
./hctx2/type:default
./hctx2/dispatch_busy:5
./hctx2/active:1024
./hctx2/run:4
./hctx2/ctx_map:00000000: 00
./hctx2/dispatch:00000000c4887013 {.op=READ, .cmd_flags=, .rq_flags=STARTED|ELVPRIV|IO_STAT|22, .state=idle, .tag=-1, .internal_tag=128}
./hctx2/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED
./hctx2/state:TAG_ACTIVE|SCHED_RESTART
./hctx1/type:default
./hctx1/dispatch_busy:6
./hctx1/active:1024
./hctx1/run:5
./hctx1/ctx_map:00000000: 00
./hctx1/dispatch:00000000efe38e4e {.op=READ, .cmd_flags=, .rq_flags=STARTED|ELVPRIV|IO_STAT|22, .state=idle, .tag=-1, .internal_tag=0}
./hctx1/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED
./hctx1/state:TAG_ACTIVE|SCHED_RESTART
./hctx0/type:default
./hctx0/dispatch_busy:5
./hctx0/active:1024
./hctx0/run:4
./hctx0/ctx_map:00000000: 00
./hctx0/dispatch:0000000015147095 {.op=READ, .cmd_flags=, .rq_flags=STARTED|ELVPRIV|IO_STAT|22, .state=idle, .tag=-1, .internal_tag=192}
./hctx0/flags:alloc_policy=FIFO SHOULD_MERGE|TAG_QUEUE_SHARED
./hctx0/state:TAG_ACTIVE|SCHED_RESTART
./write_hints:hint0: 0
./write_hints:hint1: 0
./write_hints:hint2: 0
./write_hints:hint3: 0
./write_hints:hint4: 0
./state:SAME_COMP|NONROT|IO_STAT|DISCARD|INIT_DONE|STATS|REGISTERED|PCI_P2PDMA|NOWAIT
./pm_only:0
./poll_stat:read  (512 Bytes): samples=0
./poll_stat:write (512 Bytes): samples=0
./poll_stat:read  (1024 Bytes): samples=0
./poll_stat:write (1024 Bytes): samples=0
./poll_stat:read  (2048 Bytes): samples=0
./poll_stat:write (2048 Bytes): samples=0
./poll_stat:read  (4096 Bytes): samples=0
./poll_stat:write (4096 Bytes): samples=0
./poll_stat:read  (8192 Bytes): samples=0
./poll_stat:write (8192 Bytes): samples=0
./poll_stat:read  (16384 Bytes): samples=0
./poll_stat:write (16384 Bytes): samples=0
./poll_stat:read  (32768 Bytes): samples=0
./poll_stat:write (32768 Bytes): samples=0
./poll_stat:read  (65536 Bytes): samples=0
./poll_stat:write (65536 Bytes): samples=0


-- 
Best Regards,
Shin'ichiro Kawasaki

  reply	other threads:[~2021-11-02  8:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01  8:34 [bug report] block/005 hangs with NVMe device and linux-block/for-next Shinichiro Kawasaki
2021-11-01 12:41 ` Jens Axboe
2021-11-01 23:01   ` Jens Axboe
2021-11-02  2:22     ` Shinichiro Kawasaki
2021-11-02  3:07       ` Chaitanya Kulkarni
2021-11-02  8:19         ` Shinichiro Kawasaki
2021-11-02  8:28           ` Damien Le Moal
2021-11-02  3:44       ` Ming Lei
2021-11-02  8:28         ` Shinichiro Kawasaki [this message]
2021-11-02  9:02           ` Shinichiro Kawasaki
2021-11-02 10:48             ` Ming Lei
2021-11-02 11:24               ` Shinichiro Kawasaki
2021-11-02 12:26                 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211102082846.m632phnsaqnwtaec@shindev \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox