From: Bjorn Helgaas <helgaas@kernel.org>
To: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>,
"wangkefeng.wang@huawei.com" <wangkefeng.wang@huawei.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
gregkh@linuxfoundation.org, Linux PCI <linux-pci@vger.kernel.org>,
mahesh@linux.ibm.com,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"tanxiaofei@huawei.com" <tanxiaofei@huawei.com>,
"bp@alien8.de" <bp@alien8.de>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
bhelgaas@google.com, "james.morse@arm.com" <james.morse@arm.com>,
"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"lenb@kernel.org" <lenb@kernel.org>
Subject: Re: Questions: Should kernel panic when PCIe fatal error occurs?
Date: Tue, 26 Sep 2023 18:02:47 -0500 [thread overview]
Message-ID: <20230926230247.GA429368@bhelgaas> (raw)
In-Reply-To: <fdc7a4ee-250f-7ec8-ca15-32cbd480bd3e@linux.alibaba.com>
On Fri, Sep 22, 2023 at 10:46:36AM +0800, Shuai Xue wrote:
> ...
> Actually, this is a question from my colleague from firmware team.
> The original question is that:
>
> "Should I set CPER_SEV_FATAL for Generic Error Status Block when a
> PCIe fatal error is detected? If set, kernel will always panic.
> Otherwise, kernel will always not panic."
>
> So I pull a question about desired behavior of Linux kernel first :)
> From the perspective of the kernel, CPER_SEV_FATAL for Generic Error
> Status Block is not reasonable. The kernel will attempt to recover
> Fatal errors, although recovery may fail.
I don't know the semantics of CPER_SEV_FATAL or why it's there.
With CPER, we have *two* error severities: a "native" one defined by
the PCIe spec and another defined by the platform via CPER.
I speculate that the reason for the CPER severity could be to provide
a severity for error sources that don't have a "native" severity like
AER does, or for the vendor to force the OS to restart (for
CPER_SEV_FATAL, anyway) in cases where it might not otherwise.
In the native case, we only have the PCIe severity and don't have the
CPER severity at all, and I suspect that unless there's uncontained
data corruption, we would rather handle even the most severe PCIe
fatal error by disabling the specific device(s) instead of panicking
and restarting the whole machine.
So for PCIe errors, I'm not sure setting CPER_SEV_FATAL is beneficial
unless the platform wants to force the OS to panic, e.g., maybe the
platform knows about data corruption and/or the vendor wants the OS to
panic as part of a reliability story.
Presumably the platform has already logged the error, and I assume the
platform *could* restart without even returning to the OS, but maybe
it wants the OS to do a crashdump or shutdown in a more orderly way.
Bjorn
next prev parent reply other threads:[~2023-09-26 23:03 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-18 9:39 Questions: Should kernel panic when PCIe fatal error occurs? Shuai Xue
2023-09-20 23:02 ` Bjorn Helgaas
2023-09-21 12:10 ` Shuai Xue
2023-09-21 13:20 ` David Laight
2023-09-25 1:43 ` Shuai Xue
2023-09-25 8:07 ` David Laight
2023-09-21 21:52 ` Bjorn Helgaas
2023-09-22 2:46 ` Shuai Xue
2023-09-26 23:02 ` Bjorn Helgaas [this message]
2023-09-27 3:01 ` Shuai Xue
2023-09-27 4:03 ` Oliver O'Halloran
2023-09-21 22:22 ` David Laight
2023-09-25 3:54 ` Oliver O'Halloran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230926230247.GA429368@bhelgaas \
--to=helgaas@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhelgaas@google.com \
--cc=bp@alien8.de \
--cc=gregkh@linuxfoundation.org \
--cc=james.morse@arm.com \
--cc=lenb@kernel.org \
--cc=linmiaohe@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=rafael@kernel.org \
--cc=tanxiaofei@huawei.com \
--cc=wangkefeng.wang@huawei.com \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).