From: Alexandru Gagniuc <mr.nuke.me@gmail.com>
To: linux-acpi@vger.kernel.org
Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com,
bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com,
james.morse@arm.com, shiju.jose@huawei.com,
zjzhang@codeaurora.org, gengdongjiu@huawei.com,
linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com,
austin_bolen@dell.com, shyam_iyer@dell.com,
Alexandru Gagniuc <mr.nuke.me@gmail.com>
Subject: [RFC PATCH 3/4] acpi: apei: Do not panic() in NMI because of GHES messages
Date: Tue, 3 Apr 2018 12:08:29 -0500 [thread overview]
Message-ID: <20180403170830.29282-4-mr.nuke.me@gmail.com> (raw)
In-Reply-To: <20180403170830.29282-1-mr.nuke.me@gmail.com>
BIOSes like to send NMIs for a number of silly reasons often deemed
to be "fatal". For example pin bounce during a PCIE hotplug/unplug
might cause the link to go down and retrain, with fatal PCI errors
being generated while the link is retraining.
Instead of panic()ing in NMI context, pass fatal errors down to IRQ
context to see if they can be resolved.
With these change, PCIe error are handled by AER. Other far less
common errors, such as machine check exceptions, still cause a panic()
in their respective handlers.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/acpi/apei/ghes.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 2c998125b1d5..7243a99ea57e 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -428,8 +428,7 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
* GHES_SEV_RECOVERABLE -> AER_NONFATAL
* GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL
* These both need to be reported and recovered from by the AER driver.
- * GHES_SEV_PANIC does not make it to this handling since the kernel must
- * panic.
+ * GHES_SEV_PANIC -> AER_FATAL
*/
static bool ghes_handle_aer(struct acpi_hest_generic_data *gdata)
{
@@ -899,6 +898,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
struct ghes_estatus_node *estatus_node;
struct acpi_hest_generic *generic;
struct acpi_hest_generic_status *estatus;
+ int corrected_sev;
u32 len, node_len;
llnode = llist_del_all(&ghes_estatus_llist);
@@ -914,7 +914,14 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
len = cper_estatus_len(estatus);
node_len = GHES_ESTATUS_NODE_LEN(len);
- ghes_do_proc(estatus_node->ghes, estatus);
+ corrected_sev = ghes_do_proc(estatus_node->ghes, estatus);
+
+ if (corrected_sev >= GHES_SEV_PANIC) {
+ oops_begin();
+ ghes_print_queued_estatus();
+ __ghes_panic(estatus_node->ghes);
+ }
+
if (!ghes_estatus_cached(estatus)) {
generic = estatus_node->generic;
if (ghes_print_estatus(NULL, generic, estatus))
@@ -955,7 +962,7 @@ static void __process_error(struct ghes *ghes)
static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
{
struct ghes *ghes;
- int sev, ret = NMI_DONE;
+ int ret = NMI_DONE;
if (!atomic_add_unless(&ghes_in_nmi, 1, 1))
return ret;
@@ -968,13 +975,6 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs)
ret = NMI_HANDLED;
}
- sev = ghes_severity(ghes->estatus->error_severity);
- if (sev >= GHES_SEV_PANIC) {
- oops_begin();
- ghes_print_queued_estatus();
- __ghes_panic(ghes);
- }
-
if (!(ghes->flags & GHES_TO_CLEAR))
continue;
--
2.14.3
next prev parent reply other threads:[~2018-04-03 17:08 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-03 17:08 [RFC PATCH 0/4] acpi: apei: Improve error handling with firmware-first Alexandru Gagniuc
2018-04-03 17:08 ` [RFC PATCH 1/4] acpi: apei: Return severity of GHES messages after handling Alexandru Gagniuc
2018-04-03 17:08 ` [RFC PATCH 2/4] acpi: apei: Swap ghes_print_queued_estatus and ghes_proc_in_irq Alexandru Gagniuc
2018-04-03 17:08 ` Alexandru Gagniuc [this message]
2018-04-04 7:18 ` [RFC PATCH 3/4] acpi: apei: Do not panic() in NMI because of GHES messages James Morse
2018-04-04 15:33 ` Alex G.
2018-04-04 16:53 ` James Morse
2018-04-04 19:49 ` Alex G.
2018-04-06 18:24 ` James Morse
2018-04-09 18:11 ` Alex G.
2018-04-13 16:38 ` James Morse
2018-04-16 21:59 ` Alex G.
2018-04-20 7:27 ` James Morse
2018-04-20 22:04 ` Alex G.
2018-04-03 17:08 ` [RFC PATCH 4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal" Alexandru Gagniuc
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180403170830.29282-4-mr.nuke.me@gmail.com \
--to=mr.nuke.me@gmail.com \
--cc=alex_gagniuc@dellteam.com \
--cc=austin_bolen@dell.com \
--cc=bp@alien8.de \
--cc=gengdongjiu@huawei.com \
--cc=james.morse@arm.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=rjw@rjwysocki.net \
--cc=shiju.jose@huawei.com \
--cc=shyam_iyer@dell.com \
--cc=tbaicar@codeaurora.org \
--cc=tony.luck@intel.com \
--cc=will.deacon@arm.com \
--cc=zjzhang@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox