All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -mm] x86 port lockless MCE quirky bank0
@ 2005-05-17 20:18 Hugh Dickins
  2005-05-17 20:24 ` Andi Kleen
  0 siblings, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2005-05-17 20:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yu, Luming, Guo, Racing, Andi Kleen, Dave Jones, linux-kernel

I've been avoiding -mm on the Dell Latitude with PIII Mobile for several
weeks now, since APM resume from RAM locks up.  Just stole time to bisect
and it's the x86-port-lockless-mce patches, which lack some of the wisdom
of ages handed down in the earlier MCE source.

The "Don't enable bank 0 on Intel P6 cores, it goes bang quickly" fix
from the old mcheck/p6.c solves my PIII problem; but looking further,
the old mcheck/k7.c says some Athlons also have a problem with bank 0.

So try to handle them both together: but this code is an amalgam of
what was done slightly differently in the two cases, diverging from each
- I don't want to add voodoo if it's unnecessary; and untested on AMD.

Worry: has more such wisdom got lost in the translation?

Signed-off-by: Hugh Dickins <hugh@veritas.com>

--- 2.6.12-rc4-mm2/arch/i386/kernel/cpu/mcheck/mce.c	2005-05-16 12:18:35.000000000 +0100
+++ linux/arch/i386/kernel/cpu/mcheck/mce.c	2005-05-17 20:21:17.000000000 +0100
@@ -30,7 +30,7 @@ int __devinitdata mce_dont_init = 0;
 /* 0: always panic, 1: panic if deadlock possible, 2: try to avoid panic,
    3: never panic or exit (for testing only) */
 static unsigned long tolerant = 1;
-static int banks;
+static int banks, quirky_bank0;
 static unsigned long bank[NR_BANKS] = { [0 ... NR_BANKS-1] = ~0UL };
 static unsigned long console_logged;
 static int notify_user;
@@ -320,7 +320,8 @@ static void mce_init(void *dummy)
 		wrmsr(MSR_IA32_MCG_CTL, 0xffffffff, 0xffffffff);
 
 	for (i = 0; i < banks; i++) {
-		wrmsrl(MSR_IA32_MC0_CTL+4*i, bank[i]);
+		if (i >= quirky_bank0)
+			wrmsrl(MSR_IA32_MC0_CTL+4*i, bank[i]);
 		wrmsrl(MSR_IA32_MC0_STATUS+4*i, 0);
 	}
 }
@@ -334,6 +335,14 @@ static void __devinit mce_cpu_quirks(str
 		   incorrectly with the IOMMU & 3ware & Cerberus. */
 		clear_bit(10, &bank[4]);
 	}
+	if ((c->x86_vendor == X86_VENDOR_AMD ||
+	     c->x86_vendor == X86_VENDOR_INTEL) && c->x86 == 6) {
+		/*
+		 * Intel P6 cores go bang quickly when bank0 is enabled.
+	 	 * Some Athlons cause spurious MCEs when bank0 is enabled.
+		 */
+		quirky_bank0 = 1;
+	}
 }
 
 static void __devinit mce_cpu_features(struct cpuinfo_x86 *c)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -mm] x86 port lockless MCE quirky bank0
  2005-05-17 20:18 [PATCH -mm] x86 port lockless MCE quirky bank0 Hugh Dickins
@ 2005-05-17 20:24 ` Andi Kleen
  2005-05-17 22:39   ` Andrew Morton
  0 siblings, 1 reply; 4+ messages in thread
From: Andi Kleen @ 2005-05-17 20:24 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Yu, Luming, Guo, Racing, Andi Kleen, Dave Jones,
	linux-kernel

> +	if ((c->x86_vendor == X86_VENDOR_AMD ||
> +	     c->x86_vendor == X86_VENDOR_INTEL) && c->x86 == 6) {
> +		/*
> +		 * Intel P6 cores go bang quickly when bank0 is enabled.
> +	 	 * Some Athlons cause spurious MCEs when bank0 is enabled.
> +		 */
> +		quirky_bank0 = 1;
> +	}

That's wrong on K8 AMD machines at least. You need to check c->x86
there too.

And better would be to just do bank[0] = 0 instead of
adding the new variable.

-Andi

P.S.: Also Yu Luming can you please submit an updated patch that keeps mce.c
in arch/x86_64 like we discussed earlier?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -mm] x86 port lockless MCE quirky bank0
  2005-05-17 20:24 ` Andi Kleen
@ 2005-05-17 22:39   ` Andrew Morton
  2005-05-18 12:34     ` Hugh Dickins
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Morton @ 2005-05-17 22:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: hugh, luming.yu, racing.guo, ak, davej, linux-kernel

Andi Kleen <ak@suse.de> wrote:
>
> > +	if ((c->x86_vendor == X86_VENDOR_AMD ||
> > +	     c->x86_vendor == X86_VENDOR_INTEL) && c->x86 == 6) {
> > +		/*
> > +		 * Intel P6 cores go bang quickly when bank0 is enabled.
> > +	 	 * Some Athlons cause spurious MCEs when bank0 is enabled.
> > +		 */
> > +		quirky_bank0 = 1;
> > +	}
> 
> That's wrong on K8 AMD machines at least. You need to check c->x86
> there too.
> 
> And better would be to just do bank[0] = 0 instead of
> adding the new variable.
> 
> -Andi
> 
> P.S.: Also Yu Luming can you please submit an updated patch that keeps mce.c
> in arch/x86_64 like we discussed earlier?

Yes.  I'll drop the following from -mm:

x86-port-lockless-mce-preparation.patch
x86-port-lockless-mce-implementation.patch
x86-port-lockless-mce-implementation-fix.patch
x86-port-lockless-mce-implementation-fix-2.patch


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -mm] x86 port lockless MCE quirky bank0
  2005-05-17 22:39   ` Andrew Morton
@ 2005-05-18 12:34     ` Hugh Dickins
  0 siblings, 0 replies; 4+ messages in thread
From: Hugh Dickins @ 2005-05-18 12:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Andi Kleen, luming.yu, racing.guo, davej, linux-kernel

On Tue, 17 May 2005, Andrew Morton wrote:
> 
> Yes.  I'll drop the following from -mm:
> 
> x86-port-lockless-mce-preparation.patch
> x86-port-lockless-mce-implementation.patch
> x86-port-lockless-mce-implementation-fix.patch
> x86-port-lockless-mce-implementation-fix-2.patch

The right decision for now, I think - thanks.

I presume they may return later on, so I'd better confess:
I lied when I said my patch fixed the P6 bank0 issue, I was confused
between rebuildings and rebootings.  It should have fixed the issue,
but the patch which actually fixed it was one earlier, which had an
off-by-one ("> quirky_bank0") which I'd corrected before posting.

The ">= quirky_bank0" patch was not enough to fix the issue because...
mce_cpu_quirks (and mce_init and mce_cpu_features) are never called at
startup (on Intel or Centaur), because the logic in machine_check_init
(see below) to call mcheck_init is broken.

Which explains why I got the freeze at resume not at startup,
and casts doubt on how much any of it has got tested so far.

Hugh

void __devinit machine_check_init(struct cpuinfo_x86 *c)
{
	if (mce_dont_init)
		return;

	switch (c->x86_vendor) {

		case X86_VENDOR_INTEL:
			if (c->x86==5)
				intel_p5_mcheck_init(c);
			break;

		case X86_VENDOR_CENTAUR:
			if (c->x86==5)
				winchip_mcheck_init(c);
			break;

		default:
			machine_check_vector = do_machine_check;
			wmb();
			mcheck_init(c);
			break;
	}
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-05-18 12:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-17 20:18 [PATCH -mm] x86 port lockless MCE quirky bank0 Hugh Dickins
2005-05-17 20:24 ` Andi Kleen
2005-05-17 22:39   ` Andrew Morton
2005-05-18 12:34     ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.