public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: Keith Busch <kbusch@meta.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [PATCH blktests] nvme: test log page offsets
Date: Mon, 5 Feb 2024 22:41:03 -0800	[thread overview]
Message-ID: <ZcHUf_MF_yDAkrMG@kbusch-mbp.mynextlight.net> (raw)
In-Reply-To: <mabar2czffh5f7ftappnbtir34vzz65ecxatz7p4xbe6u6ts2s@hwe7yb3pcajx>

On Tue, Feb 06, 2024 at 06:02:24AM +0000, Shinichiro Kawasaki wrote:
> Hi Keith, thanks for the patch.
> 
> On Feb 05, 2024 / 10:52, Keith Busch wrote:
> > From: Keith Busch <kbusch@kernel.org>
> > 
> > I've encountered a device that fails catastrophically if the host
> > requests to an error log with a non-zero LPO. The fallout has been bad
> > enough to warrant a sanity check against this scenario.
> 
> Question, which part of the kernel code does this test case cover? I'm wondering
> if this test case might be testing NVMe devices rather than the kernel code.

This is definitely a device-side focused test. This isn't really
exercising particularly interesting parts of the kernel/driver that are
not already thoroughly hit with other tests.
 
> Also, was there any related kernel code change or discussion? If so, I would
> like to leave links to them in the commit message.

Not a kernel change, but a tooling change. smartmontools, particularly
the minor update package from Redhat, version 7.1-3, introduced a change
to split large logs into multiple commands. The device doesn't just
break itself in this scenario: every non-posted PCIe transaction times
out after it sees an error log command with a non-zero offset that AER
handing fails to recover, taking servers offline with it. A truly
spectacular cascade of failure from a seemingly benign test case.

FWIW, while the device handling is completely wrong, this log has read
side effects, so an application reading non-zero offsets is pretty much
nonsense (and unfortunately nvme-cli does similiar things too). I've
filed a bug report on smartmontools' github. I believe someone else will
be taking this up with Redhat because the backports from 7.2 to the
minor 7.1-3 update are wrong anyway.

> I ran this test case on my test system using QEMU NVME device, and saw it failed
> with the message below.
> 
> nvme/051 => nvme0n1 (Tests device support for log page offsets) [failed]
>     runtime  0.104s  ...  0.126s
>     --- tests/nvme/051.out      2024-02-06 09:46:03.522522896 +0900
>     +++ /home/shin/Blktests/blktests/results/nvme0n1/nvme/051.out.bad   2024-02-06 14:50:57.394105192 +0900
>     @@ -1,2 +1,3 @@
>      Running nvme/051
>     +NVMe status: Invalid Field in Command: A reserved coded value or an unsupported value in a defined field(0x4002)
>      Test complete
>
> I took a look in the latest QEMU code, and found it returns "Invalid Field in
> Command" when the specified offset is larger than error log size in QEMU. Do I
> miss anything to make this test case pass?

Ah, good catch. I don't want to fail the test if the device correctly
reports an error, so I'd need to redirect 2>&1. Success for this test is
just if the command completes at all with any status.


  reply	other threads:[~2024-02-06  6:41 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-05 18:52 [PATCH blktests] nvme: test log page offsets Keith Busch
2024-02-05 22:30 ` Chaitanya Kulkarni
2024-02-06  6:02 ` Shinichiro Kawasaki
2024-02-06  6:41   ` Keith Busch [this message]
2024-02-06 16:50     ` Keith Busch
2024-02-07  4:32     ` Shinichiro Kawasaki
2024-02-07  5:19       ` Chaitanya Kulkarni
2024-02-07  5:52       ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcHUf_MF_yDAkrMG@kbusch-mbp.mynextlight.net \
    --to=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=shinichiro.kawasaki@wdc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox