From: Ming Lei <ming.lei@redhat.com>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Damien Le Moal <Damien.LeMoal@wdc.com>,
ming.lei@redhat.com
Subject: Re: [bug report] block/005 hangs with NVMe device and linux-block/for-next
Date: Tue, 2 Nov 2021 11:44:17 +0800 [thread overview]
Message-ID: <YYC0ESdW1+B/dDTs@T590> (raw)
In-Reply-To: <20211102022214.7hetxsg4z2yqafyd@shindev>
On Tue, Nov 02, 2021 at 02:22:15AM +0000, Shinichiro Kawasaki wrote:
> On Nov 01, 2021 / 17:01, Jens Axboe wrote:
> > On 11/1/21 6:41 AM, Jens Axboe wrote:
> > > On 11/1/21 2:34 AM, Shinichiro Kawasaki wrote:
> > >> I tried the latest linux-block/for-next branch tip (git hash b43fadb6631f and
> > >> observed a process hang during blktests block/005 run on a NVMe device.
> > >> Kernel message reported "INFO: task check:1224 blocked for more than 122
> > >> seconds." with call trace [1]. So far, the hang is 100% reproducible with my
> > >> system. This hang is not observed with HDDs or null_blk devices.
> > >>
> > >> I bisected and found the commit 4f5022453acd ("nvme: wire up completion batching
> > >> for the IRQ path") triggers the hang. When I revert this commit from the
> > >> for-next branch tip, the hang disappears. The block/005 test case does IO
> > >> scheduler switch during IO, and the completion path change by the commit looks
> > >> affecting the scheduler switch. Comments for solution will be appreciated.
> > >
> > > I'll take a look at this.
> >
> > I've tried running various things most of the day, and I cannot
> > reproduce this issue nor do I see what it could be. Even if requests are
> > split between batched completion and one-by-one completion, it works
> > just fine for me. No special care needs to be taken for put_many() on
> > the queue reference, as the wake_up() happens for the ref going to zero.
> >
> > Tell me more about your setup. What does the runtimes of the test look
> > like? Do you have all schedulers enabled? What kind of NVMe device is
> > this?
>
> Thank you for spending your precious time. With the kernel without the hang,
> the test case completes around 20 seconds. When the hang happens, the check
> script process stops at blk_mq_freeze_queue_wait() at scheduler change, and fio
> workload processes stop at __blkdev_direct_IO_simple(). The test case does not
> end, so I need to reboot the system for the next trial. While waiting the test
> case completion, the kernel repeats the same INFO message every 2 minutes.
>
> Regarding the scheduler, I compiled the kernel with mq-deadline and kyber.
>
> The NVMe device I use is a U.2 NVMe ZNS SSD. It has a zoned name space and
> a regular name space, and the hang is observed with both name spaces. I have
> not yet tried other NVME devices, so I will try them.
>
> >
> > FWIW, this is upstream now, so testing with Linus -git would be
> > preferable.
>
> I see. I have switched from linux-block for-next branch to the upstream branch
> of Linus. At git hash 879dbe9ffebc, and still the hang is observed.
Can you post the blk-mq debugfs log after the hang is triggered?
(cd /sys/kernel/debug/block/nvme0n1 && find . -type f -exec grep -aH . {} \;)
Thanks
Ming
next prev parent reply other threads:[~2021-11-02 3:44 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-01 8:34 [bug report] block/005 hangs with NVMe device and linux-block/for-next Shinichiro Kawasaki
2021-11-01 12:41 ` Jens Axboe
2021-11-01 23:01 ` Jens Axboe
2021-11-02 2:22 ` Shinichiro Kawasaki
2021-11-02 3:07 ` Chaitanya Kulkarni
2021-11-02 8:19 ` Shinichiro Kawasaki
2021-11-02 8:28 ` Damien Le Moal
2021-11-02 3:44 ` Ming Lei [this message]
2021-11-02 8:28 ` Shinichiro Kawasaki
2021-11-02 9:02 ` Shinichiro Kawasaki
2021-11-02 10:48 ` Ming Lei
2021-11-02 11:24 ` Shinichiro Kawasaki
2021-11-02 12:26 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YYC0ESdW1+B/dDTs@T590 \
--to=ming.lei@redhat.com \
--cc=Damien.LeMoal@wdc.com \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox