* Re: [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN
2009-03-20 5:02 [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN Ke, Liping
@ 2009-03-20 23:46 ` Frank van der Linden
2009-03-20 23:48 ` Frank van der Linden
1 sibling, 0 replies; 4+ messages in thread
From: Frank van der Linden @ 2009-03-20 23:46 UTC (permalink / raw)
To: Ke, Liping; +Cc: xen-devel@lists.xensource.com, Keir Fraser
Ke, Liping wrote:
> The patches are for MCA enabling in XEN. Those patches based on AMD and SUN's MCA related jobs.
> We have some discussions with AMD/SUN and did refinements from the last sending. Also we rebase it after
> SUN's latest improvements. We will have following patches for recovery actions. This is a basic framework
> for Intel MCA.
I looked the patches over a little more closely, and merged them with my
-unstable tree. I found a few minor issues:
* some compile issues with printk format strings in the case of DEBUG
and 32bit
* in severity_scan, use mca_rdmsrl and mca_wrmsrl to work correctly for
simulated errors using injection
* in severity_scan, if the MSR values were injected for debugging
purposes, don't panic but keep going, since the injected values will be
lost at reboot, and this is just a simulated #MC anyway, there is no
danger of losing state
I'll attach a little patch to fix these issues. I haven't tested this
patch yet, although the compile fixes have been "tested".
Finally, one final question:
> 2) When MCE# happens, all CPUs enter MCA context. The first CPU who read&clear the error MSR bank will be this
> MCE# owner. Necessary locks/synchronization will help to judge the owner and select most severe error.
Is it always true (at least, for Intel CPUs of family 6 and 15) that
when a #MC happens, *all* CPUs will receive a #MC trap? I couldn't find
this anywhere in the documentation.
If this is true, I'll change the MCE injection code to simulate #MC on
all CPUs in the case of an Intel system.
- Frank
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN
2009-03-20 5:02 [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN Ke, Liping
2009-03-20 23:46 ` Frank van der Linden
@ 2009-03-20 23:48 ` Frank van der Linden
2009-03-21 5:13 ` Keir Fraser
1 sibling, 1 reply; 4+ messages in thread
From: Frank van der Linden @ 2009-03-20 23:48 UTC (permalink / raw)
To: Ke, Liping; +Cc: xen-devel@lists.xensource.com, Keir Fraser
[-- Attachment #1: Type: text/plain, Size: 71 bytes --]
Forgot to attach the patch with the minor fixes.. here it is.
- Frank
[-- Attachment #2: intel-fixes --]
[-- Type: text/plain, Size: 3924 bytes --]
diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
@@ -256,9 +256,10 @@ static int fill_vmsr_data(int cpu, struc
d->arch.vmca_msrs.nr_injection++;
printk(KERN_DEBUG "MCE: Found error @[CPU%d BANK%d "
- "status %lx addr %lx domid %d]\n ",
+ "status %p addr %p domid %d]\n ",
entry->cpu, mc_bank->mc_bank,
- mc_bank->mc_status, mc_bank->mc_addr, mc_bank->mc_domid);
+ _p(mc_bank->mc_status), _p(mc_bank->mc_addr),
+ mc_bank->mc_domid);
}
return 0;
}
@@ -426,7 +427,7 @@ static void severity_scan(void)
* recovered, we need to RESET for avoiding DOM0 LOG missing
*/
for ( i = 0; i < nr_mce_banks; i++) {
- rdmsrl(MSR_IA32_MC0_STATUS + 4 * i , status);
+ mca_rdmsrl(MSR_IA32_MC0_STATUS + 4 * i , status);
if ( !(status & MCi_STATUS_VAL) )
continue;
/* MCE handler only handles UC error */
@@ -434,7 +435,12 @@ static void severity_scan(void)
continue;
if ( !(status & MCi_STATUS_EN) )
continue;
- if (status & MCi_STATUS_PCC)
+ /*
+ * If this was an injected error, keep going, since the
+ * interposed value will be lost at reboot.
+ */
+ if (status & MCi_STATUS_PCC && intpose_lookup(smp_processor_id(),
+ MSR_IA32_MC0_STATUS + 4 * i, NULL) == NULL)
mc_panic("pcc = 1, cpu unable to continue\n");
}
@@ -519,8 +525,8 @@ static void intel_machine_check(struct c
/* Pick one CPU to clear MCIP */
if (!test_and_set_bool(mce_process_lock)) {
- rdmsrl(MSR_IA32_MCG_STATUS, gstatus);
- wrmsrl(MSR_IA32_MCG_STATUS, gstatus & ~MCG_STATUS_MCIP);
+ mca_rdmsrl(MSR_IA32_MCG_STATUS, gstatus);
+ mca_wrmsrl(MSR_IA32_MCG_STATUS, gstatus & ~MCG_STATUS_MCIP);
if (worst >= 3) {
printk(KERN_WARNING "worst=3 should have caused RESET\n");
@@ -843,7 +849,7 @@ int intel_mce_wrmsr(u32 msr, u32 lo, u32
break;
}
d->arch.vmca_msrs.mcg_status = value;
- printk(KERN_DEBUG "MCE: wrmsr MCG_CTL %lx\n", value);
+ printk(KERN_DEBUG "MCE: wrmsr MCG_CTL %p\n", _p(value));
break;
case MSR_IA32_MC0_CTL2:
case MSR_IA32_MC1_CTL2:
@@ -905,7 +911,7 @@ int intel_mce_wrmsr(u32 msr, u32 lo, u32
}
printk(KERN_DEBUG "MCE: wmrsr mci_status in vMCE# context\n");
}
- printk(KERN_DEBUG "MCE: wrmsr mci_status val:%lx\n", value);
+ printk(KERN_DEBUG "MCE: wrmsr mci_status val:%p\n", _p(value));
break;
}
spin_unlock(&mce_locks);
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2215,8 +2215,8 @@ static int emulate_privileged_op(struct
break;
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
if ( intel_mce_wrmsr(regs->ecx, eax, edx) != 0) {
- gdprintk(XENLOG_ERR, "MCE: vMCE MSRS(%lx) Write"
- " (%x:%x) Fails! ", regs->ecx, edx, eax);
+ gdprintk(XENLOG_ERR, "MCE: vMCE MSRS(%p) Write"
+ " (%x:%x) Fails! ", _p(regs->ecx), edx, eax);
goto fail;
}
break;
@@ -2313,7 +2313,7 @@ static int emulate_privileged_op(struct
if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) {
if ( intel_mce_rdmsr(regs->ecx, &eax, &edx) != 0)
- printk(KERN_ERR "MCE: Not MCE MSRs %lx\n", regs->ecx);
+ printk(KERN_ERR "MCE: Not MCE MSRs %p\n", _p(regs->ecx));
}
break;
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread