From: Keith Busch <kbusch@kernel.org>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Keith Busch <kbusch@meta.com>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [PATCH blktests] nvme: test log page offsets
Date: Mon, 5 Feb 2024 22:41:03 -0800 [thread overview]
Message-ID: <ZcHUf_MF_yDAkrMG@kbusch-mbp.mynextlight.net> (raw)
In-Reply-To: <mabar2czffh5f7ftappnbtir34vzz65ecxatz7p4xbe6u6ts2s@hwe7yb3pcajx>
On Tue, Feb 06, 2024 at 06:02:24AM +0000, Shinichiro Kawasaki wrote:
> Hi Keith, thanks for the patch.
>
> On Feb 05, 2024 / 10:52, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> >
> > I've encountered a device that fails catastrophically if the host
> > requests to an error log with a non-zero LPO. The fallout has been bad
> > enough to warrant a sanity check against this scenario.
>
> Question, which part of the kernel code does this test case cover? I'm wondering
> if this test case might be testing NVMe devices rather than the kernel code.
This is definitely a device-side focused test. This isn't really
exercising particularly interesting parts of the kernel/driver that are
not already thoroughly hit with other tests.
> Also, was there any related kernel code change or discussion? If so, I would
> like to leave links to them in the commit message.
Not a kernel change, but a tooling change. smartmontools, particularly
the minor update package from Redhat, version 7.1-3, introduced a change
to split large logs into multiple commands. The device doesn't just
break itself in this scenario: every non-posted PCIe transaction times
out after it sees an error log command with a non-zero offset that AER
handing fails to recover, taking servers offline with it. A truly
spectacular cascade of failure from a seemingly benign test case.
FWIW, while the device handling is completely wrong, this log has read
side effects, so an application reading non-zero offsets is pretty much
nonsense (and unfortunately nvme-cli does similiar things too). I've
filed a bug report on smartmontools' github. I believe someone else will
be taking this up with Redhat because the backports from 7.2 to the
minor 7.1-3 update are wrong anyway.
> I ran this test case on my test system using QEMU NVME device, and saw it failed
> with the message below.
>
> nvme/051 => nvme0n1 (Tests device support for log page offsets) [failed]
> runtime 0.104s ... 0.126s
> --- tests/nvme/051.out 2024-02-06 09:46:03.522522896 +0900
> +++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/051.out.bad 2024-02-06 14:50:57.394105192 +0900
> @@ -1,2 +1,3 @@
> Running nvme/051
> +NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x4002)
> Test complete
>
> I took a look in the latest QEMU code, and found it returns "Invalid Field in
> Command" when the specified offset is larger than error log size in QEMU. Do I
> miss anything to make this test case pass?
Ah, good catch. I don't want to fail the test if the device correctly
reports an error, so I'd need to redirect 2>&1. Success for this test is
just if the command completes at all with any status.
next prev parent reply other threads:[~2024-02-06 6:41 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-05 18:52 [PATCH blktests] nvme: test log page offsets Keith Busch
2024-02-05 22:30 ` Chaitanya Kulkarni
2024-02-06 6:02 ` Shinichiro Kawasaki
2024-02-06 6:41 ` Keith Busch [this message]
2024-02-06 16:50 ` Keith Busch
2024-02-07 4:32 ` Shinichiro Kawasaki
2024-02-07 5:19 ` Chaitanya Kulkarni
2024-02-07 5:52 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZcHUf_MF_yDAkrMG@kbusch-mbp.mynextlight.net \
--to=kbusch@kernel.org \
--cc=kbusch@meta.com \
--cc=linux-nvme@lists.infradead.org \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox