All of lore.kernel.org
 help / color / mirror / Atom feed
* machine check report on HVM startup
@ 2008-08-13 11:48 Christoph Egger
  2008-08-13 12:27 ` Keir Fraser
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Egger @ 2008-08-13 11:48 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel


Hi,

When I launch  memtest as HVM guest, then Xen sends tons of VIRQ_MCA events
to the Dom0, although there occured NO correctable machine check errors.
When the Dom0 tries to fetch the error telemetry, then the

BUG_ON(mc_data.fetch_idx > mc_data.error_idx); in x86_mcinfo_getfetchptr()
in xen/arch/x86/cpu/mcheck/mce.c is hit. (x86_mcinfo_getfetchptr() only works
if actually real error occured which is not the case.)

This looks to me, there's a non-public event channel using the same number
as VIRQ_MCA which fires when  launching memtest as HVM guest.

Christoph


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: machine check report on HVM startup
  2008-08-13 11:48 machine check report on HVM startup Christoph Egger
@ 2008-08-13 12:27 ` Keir Fraser
  2008-08-13 12:36   ` Keir Fraser
  0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2008-08-13 12:27 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

On 13/8/08 12:48, "Christoph Egger" <Christoph.Egger@amd.com> wrote:

> When I launch  memtest as HVM guest, then Xen sends tons of VIRQ_MCA events
> to the Dom0, although there occured NO correctable machine check errors.
> When the Dom0 tries to fetch the error telemetry, then the
> 
> BUG_ON(mc_data.fetch_idx > mc_data.error_idx); in x86_mcinfo_getfetchptr()
> in xen/arch/x86/cpu/mcheck/mce.c is hit. (x86_mcinfo_getfetchptr() only works
> if actually real error occured which is not the case.)

Perhaps you should be more wary of hypercall inputs? Failing the hypercall,
perhaps with a warning printk, would be better than BUG_ON() I think.

> This looks to me, there's a non-public event channel using the same number
> as VIRQ_MCA which fires when  launching memtest as HVM guest.

I don't think this is the case. Sounds easy to repro this issue though. I'll
give it a go.

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: machine check report on HVM startup
  2008-08-13 12:27 ` Keir Fraser
@ 2008-08-13 12:36   ` Keir Fraser
  2008-08-13 12:40     ` Christoph Egger
  0 siblings, 1 reply; 8+ messages in thread
From: Keir Fraser @ 2008-08-13 12:36 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

On 13/8/08 13:27, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

>> This looks to me, there's a non-public event channel using the same number
>> as VIRQ_MCA which fires when  launching memtest as HVM guest.
> 
> I don't think this is the case. Sounds easy to repro this issue though. I'll
> give it a go.

I can boot a memtest-3.4 ISO in an HVM guest on PAE hypervisor just fine.

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: machine check report on HVM startup
  2008-08-13 12:36   ` Keir Fraser
@ 2008-08-13 12:40     ` Christoph Egger
  2008-08-13 12:48       ` Keir Fraser
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Egger @ 2008-08-13 12:40 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser

On Wednesday 13 August 2008 14:36:21 Keir Fraser wrote:
> On 13/8/08 13:27, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
> >> This looks to me, there's a non-public event channel using the same
> >> number as VIRQ_MCA which fires when  launching memtest as HVM guest.
> >
> > I don't think this is the case. Sounds easy to repro this issue though.
> > I'll give it a go.
>
> I can boot a memtest-3.4 ISO in an HVM guest on PAE hypervisor just fine.

Does your Dom0 kernel registrate the machine check event handler ?
If not, then it things go fine. If yes, then you should see the flood of 
VIRQ_MCA events in the Dom0.

Christoph


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: machine check report on HVM startup
  2008-08-13 12:40     ` Christoph Egger
@ 2008-08-13 12:48       ` Keir Fraser
  2008-08-13 13:17         ` Christoph Egger
  2008-08-13 13:17         ` Keir Fraser
  0 siblings, 2 replies; 8+ messages in thread
From: Keir Fraser @ 2008-08-13 12:48 UTC (permalink / raw)
  To: Christoph Egger, xen-devel

On 13/8/08 13:40, "Christoph Egger" <Christoph.Egger@amd.com> wrote:

>>>> This looks to me, there's a non-public event channel using the same
>>>> number as VIRQ_MCA which fires when  launching memtest as HVM guest.
>>> 
>>> I don't think this is the case. Sounds easy to repro this issue though.
>>> I'll give it a go.
>> 
>> I can boot a memtest-3.4 ISO in an HVM guest on PAE hypervisor just fine.
> 
> Does your Dom0 kernel registrate the machine check event handler ?
> If not, then it things go fine. If yes, then you should see the flood of
> VIRQ_MCA events in the Dom0.

How do I make it do that?

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: machine check report on HVM startup
  2008-08-13 12:48       ` Keir Fraser
@ 2008-08-13 13:17         ` Christoph Egger
  2008-08-13 13:22           ` Keir Fraser
  2008-08-13 13:17         ` Keir Fraser
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Egger @ 2008-08-13 13:17 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 1426 bytes --]

On Wednesday 13 August 2008 14:48:04 Keir Fraser wrote:
> On 13/8/08 13:40, "Christoph Egger" <Christoph.Egger@amd.com> wrote:
> >>>> This looks to me, there's a non-public event channel using the same
> >>>> number as VIRQ_MCA which fires when  launching memtest as HVM guest.
> >>>
> >>> I don't think this is the case. Sounds easy to repro this issue though.
> >>> I'll give it a go.
> >>
> >> I can boot a memtest-3.4 ISO in an HVM guest on PAE hypervisor just
> >> fine.
> >
> > Does your Dom0 kernel registrate the machine check event handler ?
> > If not, then it things go fine. If yes, then you should see the flood of
> > VIRQ_MCA events in the Dom0.
>
> How do I make it do that?

Assuming you use Linux as Dom0, apply the attached patch to your local tree.
With it, you should see a flood of "xen_mca: HW reported correctable error(s)"
Dom0 kernel messages.

Note, the patch is not intended to go upstream. There will be something better
in the future.

Christoph

-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy

[-- Attachment #2: linux_xenmca.diff --]
[-- Type: text/x-diff, Size: 1127 bytes --]

diff -r c110692c140f arch/i386/kernel/cpu/mcheck/non-fatal.c
--- a/arch/i386/kernel/cpu/mcheck/non-fatal.c	Wed Aug 13 10:00:09 2008 +0100
+++ b/arch/i386/kernel/cpu/mcheck/non-fatal.c	Wed Aug 13 15:10:47 2008 +0200
@@ -60,9 +60,31 @@ static void mce_work_fn(void *data)
 	schedule_delayed_work(&mce_work, MCE_RATE);
 } 
 
+/* Privileged receive callback and transmit kicker. */
+static irqreturn_t xenmca_event(int irq, void *dev_id,
+                                struct pt_regs *regs)
+{
+	printk("xen_mca: HW reported correctable error(s)\n");
+
+        return IRQ_HANDLED;
+}
+
+static int mca_event_irq;
+
 static int __init init_nonfatal_mce_checker(void)
 {
 	struct cpuinfo_x86 *c = &boot_cpu_data;
+
+        if (is_initial_xendomain()) {
+                mca_event_irq = bind_virq_to_irqhandler(
+                        VIRQ_MCA,
+                        0,
+                        xenmca_event,
+                        0,
+                        "mca0",
+                        NULL);
+                BUG_ON(mca_event_irq < 0);
+        }
 
 	/* Check for MCE support */
 	if (!cpu_has(c, X86_FEATURE_MCE))

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: machine check report on HVM startup
  2008-08-13 12:48       ` Keir Fraser
  2008-08-13 13:17         ` Christoph Egger
@ 2008-08-13 13:17         ` Keir Fraser
  1 sibling, 0 replies; 8+ messages in thread
From: Keir Fraser @ 2008-08-13 13:17 UTC (permalink / raw)
  To: Christoph Egger, xen-devel

On 13/8/08 13:48, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

>> Does your Dom0 kernel registrate the machine check event handler ?
>> If not, then it things go fine. If yes, then you should see the flood of
>> VIRQ_MCA events in the Dom0.
> 
> How do I make it do that?

I modified the netback VIRQ_DEBUG handler to register on VIRQ_MCA instead. I
didn't get any output from it when running a memtest HVM guest.

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: machine check report on HVM startup
  2008-08-13 13:17         ` Christoph Egger
@ 2008-08-13 13:22           ` Keir Fraser
  0 siblings, 0 replies; 8+ messages in thread
From: Keir Fraser @ 2008-08-13 13:22 UTC (permalink / raw)
  To: Christoph Egger, xen-devel

On 13/8/08 14:17, "Christoph Egger" <Christoph.Egger@amd.com> wrote:

> Assuming you use Linux as Dom0, apply the attached patch to your local tree.
> With it, you should see a flood of "xen_mca: HW reported correctable error(s)"
> Dom0 kernel messages.
> 
> Note, the patch is not intended to go upstream. There will be something better
> in the future.

The patch won't do much since CONFIG_X86_MCE depends on !XEN.

Anyhow, I tried registering some other handler as VIRQ_MCA and it never
fired for me.

 -- Keir

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-08-13 13:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-13 11:48 machine check report on HVM startup Christoph Egger
2008-08-13 12:27 ` Keir Fraser
2008-08-13 12:36   ` Keir Fraser
2008-08-13 12:40     ` Christoph Egger
2008-08-13 12:48       ` Keir Fraser
2008-08-13 13:17         ` Christoph Egger
2008-08-13 13:22           ` Keir Fraser
2008-08-13 13:17         ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.