From: Anton Blanchard <anton@samba.org>
To: benh@kernel.crashing.org, paulus@samba.org
Cc: linuxppc-dev@ozlabs.org
Subject: [PATCH 6/9] powerpc: Rework pseries machine check handler
Date: Wed, 12 Jan 2011 16:49:19 +1100 [thread overview]
Message-ID: <20110112164919.77ac3b71@kryten> (raw)
In-Reply-To: <20110112164318.753a435b@kryten>
Rework pseries machine check handler:
- If MSR_RI isn't set, we cannot recover even if the machine check was fully
recovered
- Rename nonfatal to recovered
- Handle RTAS_DISP_LIMITED_RECOVERY
- Use BUS_MCEERR_AR instead of BUS_ADRERR
- Don't check all the RTAS error log fields when receiving a synchronous
machine check. Recent versions of the pseries firmware do not fill them
in during a machine check and instead send a follow up error log with
the detailed information. If we see a synchronous machine check, and we
came from userspace then kill the task.
Signed-off-by: Anton Blanchard <anton@samba.org>
---
Index: powerpc.git/arch/powerpc/platforms/pseries/ras.c
===================================================================
--- powerpc.git.orig/arch/powerpc/platforms/pseries/ras.c 2010-10-15 13:23:34.161268941 +1100
+++ powerpc.git/arch/powerpc/platforms/pseries/ras.c 2010-10-15 13:23:38.701320228 +1100
@@ -259,31 +259,43 @@ int pSeries_system_reset_exception(struc
* Return 1 if corrected (or delivered a signal).
* Return 0 if there is nothing we can do.
*/
-static int recover_mce(struct pt_regs *regs, struct rtas_error_log * err)
+static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
{
- int nonfatal = 0;
+ int recovered = 0;
- if (err->disposition == RTAS_DISP_FULLY_RECOVERED) {
+ if (!(regs->msr & MSR_RI)) {
+ /* If MSR_RI isn't set, we cannot recover */
+ recovered = 0;
+
+ } else if (err->disposition == RTAS_DISP_FULLY_RECOVERED) {
/* Platform corrected itself */
- nonfatal = 1;
- } else if ((regs->msr & MSR_RI) &&
- user_mode(regs) &&
- err->severity == RTAS_SEVERITY_ERROR_SYNC &&
- err->disposition == RTAS_DISP_NOT_RECOVERED &&
- err->target == RTAS_TARGET_MEMORY &&
- err->type == RTAS_TYPE_ECC_UNCORR &&
- !(current->pid == 0 || is_global_init(current))) {
- /* Kill off a user process with an ECC error */
- printk(KERN_ERR "MCE: uncorrectable ecc error for pid %d\n",
- current->pid);
- /* XXX something better for ECC error? */
- _exception(SIGBUS, regs, BUS_ADRERR, regs->nip);
- nonfatal = 1;
+ recovered = 1;
+
+ } else if (err->disposition == RTAS_DISP_LIMITED_RECOVERY) {
+ /* Platform corrected itself but could be degraded */
+ printk(KERN_ERR "MCE: limited recovery, system may "
+ "be degraded\n");
+ recovered = 1;
+
+ } else if (user_mode(regs) && !is_global_init(current) &&
+ err->severity == RTAS_SEVERITY_ERROR_SYNC) {
+
+ /*
+ * If we received a synchronous error when in userspace
+ * kill the task. Firmware may report details of the fail
+ * asynchronously, so we can't rely on the target and type
+ * fields being valid here.
+ */
+ printk(KERN_ERR "MCE: uncorrectable error, killing task "
+ "%s:%d\n", current->comm, current->pid);
+
+ _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+ recovered = 1;
}
log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
- return nonfatal;
+ return recovered;
}
/*
next prev parent reply other threads:[~2011-01-12 5:49 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-12 5:43 [PATCH 0/9] Machine check handling fixes Anton Blanchard
2011-01-12 5:44 ` [PATCH 1/9] powerpc: Print 32 bits of DSISR in show_regs Anton Blanchard
2011-01-12 5:45 ` [PATCH 2/9] powerpc: Don't force MSR_RI in machine_check_exception Anton Blanchard
2011-01-12 5:46 ` [PATCH 3/9] powerpc: Never halt RTAS error logging after receiving an unrecoverable machine check Anton Blanchard
2011-01-12 5:47 ` [PATCH 4/9] powerpc: Remove duplicate debugger hook in machine_check_exception Anton Blanchard
2011-01-12 5:48 ` [PATCH 5/9] powerpc: Don't silently handle machine checks from userspace Anton Blanchard
2011-01-12 5:49 ` Anton Blanchard [this message]
2011-01-12 5:50 ` [PATCH 7/9] powerpc: Fix corruption when grabbing FWNMI data Anton Blanchard
2011-01-12 5:51 ` [PATCH 8/9] powerpc: Check RTAS extended log flag before checking length Anton Blanchard
2011-01-12 5:52 ` [PATCH 9/9] powerpc: machine_check_generic is wrong on 64bit Anton Blanchard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110112164919.77ac3b71@kryten \
--to=anton@samba.org \
--cc=benh@kernel.crashing.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.