All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Blanchard <anton@samba.org>
To: benh@kernel.crashing.org, paulus@samba.org
Cc: linuxppc-dev@ozlabs.org
Subject: [PATCH 6/9] powerpc: Rework pseries machine check handler
Date: Wed, 12 Jan 2011 16:49:19 +1100	[thread overview]
Message-ID: <20110112164919.77ac3b71@kryten> (raw)
In-Reply-To: <20110112164318.753a435b@kryten>


Rework pseries machine check handler:

- If MSR_RI isn't set, we cannot recover even if the machine check was fully
  recovered

- Rename nonfatal to recovered

- Handle RTAS_DISP_LIMITED_RECOVERY

- Use BUS_MCEERR_AR instead of BUS_ADRERR

- Don't check all the RTAS error log fields when receiving a synchronous
  machine check. Recent versions of the pseries firmware do not fill them
  in during a machine check and instead send a follow up error log with
  the detailed information. If we see a synchronous machine check, and we
  came from userspace then kill the task.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/arch/powerpc/platforms/pseries/ras.c
===================================================================
--- powerpc.git.orig/arch/powerpc/platforms/pseries/ras.c	2010-10-15 13:23:34.161268941 +1100
+++ powerpc.git/arch/powerpc/platforms/pseries/ras.c	2010-10-15 13:23:38.701320228 +1100
@@ -259,31 +259,43 @@ int pSeries_system_reset_exception(struc
  * Return 1 if corrected (or delivered a signal).
  * Return 0 if there is nothing we can do.
  */
-static int recover_mce(struct pt_regs *regs, struct rtas_error_log * err)
+static int recover_mce(struct pt_regs *regs, struct rtas_error_log *err)
 {
-	int nonfatal = 0;
+	int recovered = 0;
 
-	if (err->disposition == RTAS_DISP_FULLY_RECOVERED) {
+	if (!(regs->msr & MSR_RI)) {
+		/* If MSR_RI isn't set, we cannot recover */
+		recovered = 0;
+
+	} else if (err->disposition == RTAS_DISP_FULLY_RECOVERED) {
 		/* Platform corrected itself */
-		nonfatal = 1;
-	} else if ((regs->msr & MSR_RI) &&
-		   user_mode(regs) &&
-		   err->severity == RTAS_SEVERITY_ERROR_SYNC &&
-		   err->disposition == RTAS_DISP_NOT_RECOVERED &&
-		   err->target == RTAS_TARGET_MEMORY &&
-		   err->type == RTAS_TYPE_ECC_UNCORR &&
-		   !(current->pid == 0 || is_global_init(current))) {
-		/* Kill off a user process with an ECC error */
-		printk(KERN_ERR "MCE: uncorrectable ecc error for pid %d\n",
-		       current->pid);
-		/* XXX something better for ECC error? */
-		_exception(SIGBUS, regs, BUS_ADRERR, regs->nip);
-		nonfatal = 1;
+		recovered = 1;
+
+	} else if (err->disposition == RTAS_DISP_LIMITED_RECOVERY) {
+		/* Platform corrected itself but could be degraded */
+		printk(KERN_ERR "MCE: limited recovery, system may "
+		       "be degraded\n");
+		recovered = 1;
+
+	} else if (user_mode(regs) && !is_global_init(current) &&
+		   err->severity == RTAS_SEVERITY_ERROR_SYNC) {
+
+		/*
+		 * If we received a synchronous error when in userspace
+		 * kill the task. Firmware may report details of the fail
+		 * asynchronously, so we can't rely on the target and type
+		 * fields being valid here.
+		 */
+		printk(KERN_ERR "MCE: uncorrectable error, killing task "
+		       "%s:%d\n", current->comm, current->pid);
+
+		_exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
+		recovered = 1;
 	}
 
 	log_error((char *)err, ERR_TYPE_RTAS_LOG, 0);
 
-	return nonfatal;
+	return recovered;
 }
 
 /*

  parent reply	other threads:[~2011-01-12  5:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-12  5:43 [PATCH 0/9] Machine check handling fixes Anton Blanchard
2011-01-12  5:44 ` [PATCH 1/9] powerpc: Print 32 bits of DSISR in show_regs Anton Blanchard
2011-01-12  5:45 ` [PATCH 2/9] powerpc: Don't force MSR_RI in machine_check_exception Anton Blanchard
2011-01-12  5:46 ` [PATCH 3/9] powerpc: Never halt RTAS error logging after receiving an unrecoverable machine check Anton Blanchard
2011-01-12  5:47 ` [PATCH 4/9] powerpc: Remove duplicate debugger hook in machine_check_exception Anton Blanchard
2011-01-12  5:48 ` [PATCH 5/9] powerpc: Don't silently handle machine checks from userspace Anton Blanchard
2011-01-12  5:49 ` Anton Blanchard [this message]
2011-01-12  5:50 ` [PATCH 7/9] powerpc: Fix corruption when grabbing FWNMI data Anton Blanchard
2011-01-12  5:51 ` [PATCH 8/9] powerpc: Check RTAS extended log flag before checking length Anton Blanchard
2011-01-12  5:52 ` [PATCH 9/9] powerpc: machine_check_generic is wrong on 64bit Anton Blanchard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110112164919.77ac3b71@kryten \
    --to=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.