linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Shuai Xue' <xueshuai@linux.alibaba.com>,
	Bjorn Helgaas <helgaas@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	"mahesh@linux.ibm.com" <mahesh@linux.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"bp@alien8.de" <bp@alien8.de>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Linux PCI <linux-pci@vger.kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"lenb@kernel.org" <lenb@kernel.org>
Subject: RE: Questions: Should kernel panic when PCIe fatal error occurs?
Date: Thu, 21 Sep 2023 13:20:58 +0000	[thread overview]
Message-ID: <2e5870e416f84e8fad8340061ec303e2@AcuMS.aculab.com> (raw)
In-Reply-To: <d84b6d17-7fe9-222a-c874-798af4d9faea@linux.alibaba.com>

...
I've got a target to generate AER errors by generating read cycles
that are inside the address range that the bridge forwards but
outside of any BAR because there are 2 different sized BARs.
(Pretty easy to setup.)
On the system I was using they didn't get propagated all the way
to the root bridge - but were visible in the lower bridge.
It would be nice for a driver to be able to detect/clear such
a flag if it gets an unexpected ~0u read value.
(I'm not sure an error callback helps.)

OTOH a 'nebs compliant' server routed any kind of PCIe link error
through to some 'system management' logic that then raised an NMI.
I'm not sure who thought an NMI was a good idea - they are pretty
impossible to handle in the kernel and too late to be of use to
the code performing the access.

In any case we were getting one after 'echo 1 >xxx/remove' and
then taking the PCIe link down by reprogramming the fpga.
So the link going down was entirely expected, but there seemed
to be nothing we could do to stop the kernel crashing.

I'm sure 'nebs compliant' ought to contain some requirements for
resilience to hardware failures!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  reply	other threads:[~2023-09-21 13:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-18  9:39 Questions: Should kernel panic when PCIe fatal error occurs? Shuai Xue
2023-09-20 23:02 ` Bjorn Helgaas
2023-09-21 12:10   ` Shuai Xue
2023-09-21 13:20     ` David Laight [this message]
2023-09-25  1:43       ` Shuai Xue
2023-09-25  8:07         ` David Laight
2023-09-21 21:52     ` Bjorn Helgaas
2023-09-22  2:46       ` Shuai Xue
2023-09-26 23:02         ` Bjorn Helgaas
2023-09-27  3:01           ` Shuai Xue
2023-09-27  4:03           ` Oliver O'Halloran
2023-09-21 22:22   ` David Laight
2023-09-25  3:54     ` Oliver O'Halloran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e5870e416f84e8fad8340061ec303e2@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=rafael@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).