public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, x86@kernel.org,
	Andi Kleen <ak@linux.intel.com>
Subject: Re: [PATCH] x86: MCE: Fix for mce_panic_timeout
Date: Wed, 27 May 2009 12:07:50 +0200	[thread overview]
Message-ID: <874ov6oo3t.fsf@basil.nowhere.org> (raw)
In-Reply-To: <4A1CC21A.10301@jp.fujitsu.com> (Hidetoshi Seto's message of "Wed, 27 May 2009 13:31:22 +0900")

Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> writes:

> This fixes:

Thanks I had already fixed it on my own.

Updated patch appended.
>
>  - In case of panic_timeout > 0 and mce_bootlog == 0.
>    System should reboot after panic, but it doesn't on mce panic because
>    current mce code overwrite panic_timeout to 0.

Nope, with bootlog==0 it should _not_ automatically reboot on panic.
Automatic rebooting makes mainly sense with boot logging, otherwise
you will likely lose the information. Or at least the kernel 
cannot know if you lose information or not so it has to err on 
the safe side.

I changed it now to only override with panic_timeout == 0,
as in the user didn't set anything,
that's probably the most sensible semantics anyways.

-Andi

---

x86: MCE: Default to panic timeout for machine checks v3

Fatal machine checks can be logged to disk after boot, but only if
the system did a warm reboot. That's unfortunately difficult with the
default panic behaviour, which waits forever and the admin has to
press the power button because modern systems usually miss a reset button.
This clears the machine checks in the registers and make
it impossible to log them.

This patch changes the default for machine check panic to always
reboot after 30s. Then the mce can be successfully logged after
reboot.

I believe this will improve machine check experience for any 
system running the X server.

This is dependent on successfull boot logging of MCEs. This currently
only works on Intel systems, on AMD there are quite a lot of systems
around which leave junk in the machine check registers after boot,
so it's disabled here. These systems will continue to default
to endless waiting panic.

v2: Only force panic timeout when it's shorter (H.Seto)
v3: Only panic when there is no earlier timeout or it's not zero
(based on comment H.Seto)

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---
 arch/x86/kernel/cpu/mcheck/mce.c |    7 +++++++
 1 file changed, 7 insertions(+)

Index: linux/arch/x86/kernel/cpu/mcheck/mce.c
===================================================================
--- linux.orig/arch/x86/kernel/cpu/mcheck/mce.c	2009-05-27 11:59:03.000000000 +0200
+++ linux/arch/x86/kernel/cpu/mcheck/mce.c	2009-05-27 12:01:07.000000000 +0200
@@ -82,6 +82,7 @@
 static int			rip_msr;
 static int			mce_bootlog = -1;
 static int			monarch_timeout = -1;
+static int			mce_panic_timeout;
 
 static char			trigger[128];
 static char			*trigger_argv[2] = { trigger, NULL };
@@ -203,6 +204,8 @@
 	local_irq_enable();
 	while (timeout-- > 0)
 		udelay(1);
+	if (panic_timeout == 0)
+		panic_timeout = mce_panic_timeout;
 	panic("Panicing machine check CPU died");
 }
 
@@ -240,6 +243,8 @@
 		printk(KERN_EMERG "Some CPUs didn't answer in synchronization\n");
 	if (exp)
 		printk(KERN_EMERG "Machine check: %s\n", exp);
+	if (panic_timeout == 0)
+		panic_timeout = mce_panic_timeout;
 	panic(msg);
 }
 
@@ -1100,6 +1105,8 @@
 	}
 	if (monarch_timeout < 0)
 		monarch_timeout = 0;
+	if (mce_bootlog != 0)
+		mce_panic_timeout = 30;
 }
 
 static void __cpuinit mce_ancient_init(struct cpuinfo_x86 *c)


-- 
ak@linux.intel.com -- Speaking for myself only.

  reply	other threads:[~2009-05-27 10:08 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-26 23:54 x86 MCE improvements series for 2.6.31 v2 Andi Kleen
2009-05-26 23:54 ` [PATCH 01/31] x86: MCE: Synchronize core after machine check handling Andi Kleen
2009-05-26 23:54   ` [PATCH 02/31] x86: MCE: Improve mce_get_rip v3 Andi Kleen
2009-05-27  4:29     ` Hidetoshi Seto
2009-05-27  7:23       ` Andi Kleen
2009-05-27  4:29     ` [PATCH] x86: MCE: Fix for getting IP/CS at MCE Hidetoshi Seto
2009-05-26 23:54   ` [PATCH 03/31] x86: MCE: Fix EIPV behaviour with !PCC Andi Kleen
2009-05-27  4:30     ` Hidetoshi Seto
2009-05-27  7:38       ` Andi Kleen
2009-05-27  7:38         ` Huang Ying
2009-05-27  8:53           ` Andi Kleen
2009-05-26 23:54   ` [PATCH 04/31] x86: MCE: Use extended sysattrs for the check_interval attribute Andi Kleen
2009-05-26 23:54   ` [PATCH 05/31] x86: MCE: Add machine check exception count in /proc/interrupts Andi Kleen
2009-05-26 23:54   ` [PATCH 06/31] x86: Fix panic with interrupts off (needed for MCE) Andi Kleen
2009-05-27  4:30     ` Hidetoshi Seto
2009-05-27  7:05       ` Andi Kleen
2009-05-26 23:54   ` [PATCH 07/31] x86: MCE: Log corrected errors when panicing Andi Kleen
2009-05-26 23:54   ` [PATCH 08/31] x86: MCE: Remove unused mce_events variable Andi Kleen
2009-05-26 23:54   ` [PATCH 09/31] x86: MCE: Remove mce_init unused argument Andi Kleen
2009-05-26 23:54   ` [PATCH 10/31] x86: MCE: Rename and align out2 label Andi Kleen
2009-05-26 23:54   ` [PATCH 11/31] x86: MCE: Implement bootstrapping for machine check wakeups Andi Kleen
2009-05-26 23:54   ` [PATCH 12/31] x86: MCE: Remove TSC print heuristic Andi Kleen
2009-05-26 23:54   ` [PATCH 13/31] x86: MCE: Drop BKL in mce_open Andi Kleen
2009-05-26 23:54   ` [PATCH 14/31] x86: MCE: Add table driven machine check grading Andi Kleen
2009-05-26 23:54   ` [PATCH 15/31] x86: MCE: Check early in exception handler if panic is needed Andi Kleen
2009-05-26 23:54   ` [PATCH 16/31] x86: MCE: Implement panic synchronization Andi Kleen
2009-05-26 23:54   ` [PATCH 17/31] x86: MCE: Switch x86 machine check handler to Monarch election. v2 Andi Kleen
2009-05-26 23:54   ` [PATCH 18/31] x86: MCE: Store record length into memory struct mce anchor Andi Kleen
2009-05-26 23:54   ` [PATCH 19/31] x86: MCE: Default to panic timeout for machine checks v2 Andi Kleen
2009-05-27  4:31     ` Hidetoshi Seto
2009-05-27  7:24       ` Andi Kleen
2009-05-27  4:31     ` [PATCH] x86: MCE: Fix for mce_panic_timeout Hidetoshi Seto
2009-05-27 10:07       ` Andi Kleen [this message]
2009-05-28  0:52         ` Hidetoshi Seto
2009-05-28  8:15           ` Andi Kleen
2009-05-26 23:54   ` [PATCH 20/31] x86: MCE: Improve documentation Andi Kleen
2009-05-26 23:54   ` [PATCH 21/31] x86: MCE: Support more than 256 CPUs in struct mce Andi Kleen
2009-05-26 23:54   ` [PATCH 22/31] x86: MCE: Extend struct mce user interface with more information Andi Kleen
2009-05-26 23:54   ` [PATCH 23/31] x86: MCE: Add MCE poll count to /proc/interrupts Andi Kleen
2009-05-26 23:54   ` [PATCH 24/31] x86: MCE: Don't print backtrace on machine checks with DEBUG_BUGVERBOSE Andi Kleen
2009-05-26 23:54   ` [PATCH 25/31] x86: MCE: Implement new status bits v2 Andi Kleen
2009-05-26 23:54   ` [PATCH 26/31] x86: MCE: Export MCE severities coverage via debugfs Andi Kleen
2009-05-26 23:54   ` [PATCH 27/31] x86: MCE: Print header/footer only once for multiple MCEs Andi Kleen
2009-05-27  4:31     ` Hidetoshi Seto
2009-05-27  7:10       ` Andi Kleen
2009-05-26 23:54   ` [PATCH 28/31] x86: MCE: Make non Monarch panic message "Fatal machine check" too v2 Andi Kleen
2009-05-26 23:54   ` [PATCH 29/31] x86: MCE: Rename mce_notify_user to mce_notify_irq Andi Kleen
2009-05-26 23:54   ` [PATCH 30/31] x86: MCE: Define MCE_VECTOR Andi Kleen
2009-05-26 23:54   ` [PATCH 31/31] x86: MCE: Support action-optional machine checks v2 Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874ov6oo3t.fsf@basil.nowhere.org \
    --to=andi@firstfloor.org \
    --cc=ak@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox