Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Laurence Oberman <loberman@redhat.com>
Cc: "busch, keith" <keith.busch@intel.com>, linux-nvme@lists.infradead.org
Subject: Re: nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
Date: Thu, 3 Oct 2024 15:04:50 -0600	[thread overview]
Message-ID: <Zv8G8l5cyzDRwLJA@kbusch-mbp> (raw)
In-Reply-To: <b73005ac327784e740bb6b362870c15d0c7788fa.camel@redhat.com>

On Thu, Sep 26, 2024 at 05:11:05PM -0400, Laurence Oberman wrote:
> It was reported to Red Hat, seeing issues with using a
> "nvme subsystem-reset /dev/nvme0" command to test resets.

I really dislike that command. The side effects are overkill for the pci
transport...
 
> On multiple servers I tested on two types of nvme attached devices
> These are not the rootfs devices
>
> 1. The front slot (hotplug) devices in a 2.5in format 
> reset and after some time recover (what is expected)
> 
> Example of one working
> 
> Does not trap and land up as a machine-check

<snip>

> 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes 
> a machine check and panics the box when its against a nvme in a 
> PCIE slot
> 
> [  263.862919] mce: [Hardware Error]: CPU 12: Machine Check Exception: 5 Bank 6: ba00000000000e0b
> [  263.862924] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}

So this wasn't failing before 6.11? As Nilay mentioned, there are some
changes on how nvme subsystem reset is handled. The main thing being
this ioctl doesn't automatically trigger an nvme reset. I expected
delayed recovery might happen, but machine checks are not expected. If
this was working before, I can only guess right now that the previous
behavior was accessing MMIO and config quicker and triggered a different
error path. If you're successful with the PPC patch reverted, I would be
interested to hear about it.


  parent reply	other threads:[~2024-10-03 21:04 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-26 21:11 nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot Laurence Oberman
2024-09-27  6:10 ` Nilay Shroff
2024-09-27 12:18   ` Laurence Oberman
2024-09-27 13:06     ` Nilay Shroff
2024-10-03 21:04 ` Keith Busch [this message]
2024-10-07 15:56   ` Laurence Oberman
2024-10-29 16:07     ` Laurence Oberman
2024-10-29 16:42       ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv8G8l5cyzDRwLJA@kbusch-mbp \
    --to=kbusch@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=loberman@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox