xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Kernel 2.6.39+ hangs when running as HVM guest under Xen
@ 2011-08-04 12:59 Stefan Bader
  2011-08-09  2:38 ` Konrad Rzeszutek Wilk
  2011-08-17 13:15 ` Stefan Bader
  0 siblings, 2 replies; 7+ messages in thread
From: Stefan Bader @ 2011-08-04 12:59 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com
  Cc: Konrad Rzeszutek Wilk, Bug 791850, Stefano Stabellini

Since kernel 2.6.39 we were experiencing strange hangs when booting those as HVM
guests in Xen (similar hangs but different places when looking at CentOS 5.4 +
Xen 3.4.3 as well as Xen 4.1 and a 3.0 based dom0). The problem only happens
when running with more than one vcpu.

I was able to examine some dumps[1] and it always seemed to be a weird
situations. In one example (booting 3.0 HVM under Xen 3.4.3/2.6.18 dom0) the
lockup always seemed to occur when the delayed mtrr init took place. Cpu#0
seemed to have been starting the rendevouz (stop_cpu) but then been interrupted
and the other (I was using vcpu=2 for simplicity) was idling somewhere else but
had the mtrr
rendevouz handler queued up (just seemed to never get started).

Things seemed to indicate some IPI problem but to be sure I went to bisect when
the problem started. I ended up with the following patch which, when reverted,
allows me to bring up a 3.0 HVM guest with more than one CPU without any problems.

commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date:   Thu Dec 2 17:55:10 2010 +0000

    xen: PV on HVM: support PV spinlocks and IPIs

    Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus
    (that switch to APIC mode and initialize APIC routing); on secondary
    CPUs on CPU_UP_PREPARE.

    Enable the usage of event channels to send and receive IPIs when
    running as a PV on HVM guest.

Though I have not yet really understood why exactly this happens, I thought I
post the results so far. It feels like either signalling an IPI through the
eventchannel does not come through or goes to the wrong CPU. It did not seem to
cause the exactly same place to fail. Like said, the 3.0 guest running in the
CentOS dom0 was locking up early right after all CPUs were brought up. While
during the bisect (using a kernel between 2.6.38 and .39-rc1) the lockup was later.

Maybe someone has a clue immediately. I will dig a bit deeper in the dumps in
the meantime. Looking at the description, which sounds like using event channels
only was intended for PV on HVM guests, it is wrong in the first place to set
the xen ipi functions on the HVM side...

-Stefan

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/791850

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel 2.6.39+ hangs when running as HVM guest under Xen
  2011-08-04 12:59 Kernel 2.6.39+ hangs when running as HVM guest under Xen Stefan Bader
@ 2011-08-09  2:38 ` Konrad Rzeszutek Wilk
  2011-08-09 14:54   ` Stefan Bader
  2011-08-17 13:15 ` Stefan Bader
  1 sibling, 1 reply; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-09  2:38 UTC (permalink / raw)
  To: Stefan Bader
  Cc: xen-devel@lists.xensource.com, Bug 791850, Stefano Stabellini

On Thu, Aug 04, 2011 at 02:59:05PM +0200, Stefan Bader wrote:
> Since kernel 2.6.39 we were experiencing strange hangs when booting those as HVM
> guests in Xen (similar hangs but different places when looking at CentOS 5.4 +
> Xen 3.4.3 as well as Xen 4.1 and a 3.0 based dom0). The problem only happens
> when running with more than one vcpu.
> 

Hey Stefan,

We were all at the XenSummit and I think did not get to think about this at all.
Also the merge window openned so that ate a good chunk of time. Anyhow..

Is this related to this: http://marc.info/?i=4E4070B4.1020008@it-infrastrukturen.org ?

> I was able to examine some dumps[1] and it always seemed to be a weird
> situations. In one example (booting 3.0 HVM under Xen 3.4.3/2.6.18 dom0) the
> lockup always seemed to occur when the delayed mtrr init took place. Cpu#0
> seemed to have been starting the rendevouz (stop_cpu) but then been interrupted
> and the other (I was using vcpu=2 for simplicity) was idling somewhere else but
> had the mtrr
> rendevouz handler queued up (just seemed to never get started).
> 
> Things seemed to indicate some IPI problem but to be sure I went to bisect when
> the problem started. I ended up with the following patch which, when reverted,
> allows me to bring up a 3.0 HVM guest with more than one CPU without any problems.
> 
> commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
> Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Date:   Thu Dec 2 17:55:10 2010 +0000
> 
>     xen: PV on HVM: support PV spinlocks and IPIs
> 
>     Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus
>     (that switch to APIC mode and initialize APIC routing); on secondary
>     CPUs on CPU_UP_PREPARE.
> 
>     Enable the usage of event channels to send and receive IPIs when
>     running as a PV on HVM guest.
> 
> Though I have not yet really understood why exactly this happens, I thought I
> post the results so far. It feels like either signalling an IPI through the
> eventchannel does not come through or goes to the wrong CPU. It did not seem to
> cause the exactly same place to fail. Like said, the 3.0 guest running in the
> CentOS dom0 was locking up early right after all CPUs were brought up. While
> during the bisect (using a kernel between 2.6.38 and .39-rc1) the lockup was later.
> 
> Maybe someone has a clue immediately. I will dig a bit deeper in the dumps in
> the meantime. Looking at the description, which sounds like using event channels

Anything turned up?
> only was intended for PV on HVM guests, it is wrong in the first place to set
> the xen ipi functions on the HVM side...

On true HVM - sure, but on PVonHVM it sounds right.
> 
> -Stefan
> 
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/791850
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel 2.6.39+ hangs when running as HVM guest under Xen
  2011-08-09  2:38 ` Konrad Rzeszutek Wilk
@ 2011-08-09 14:54   ` Stefan Bader
  2011-08-10 14:40     ` Stefan Bader
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Bader @ 2011-08-09 14:54 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel@lists.xensource.com, Bug 791850, Stefano Stabellini

On 08.08.2011 21:38, Konrad Rzeszutek Wilk wrote:
> On Thu, Aug 04, 2011 at 02:59:05PM +0200, Stefan Bader wrote:
>> Since kernel 2.6.39 we were experiencing strange hangs when booting those as HVM
>> guests in Xen (similar hangs but different places when looking at CentOS 5.4 +
>> Xen 3.4.3 as well as Xen 4.1 and a 3.0 based dom0). The problem only happens
>> when running with more than one vcpu.
>>
> 
> Hey Stefan,
> 
> We were all at the XenSummit and I think did not get to think about this at all.
> Also the merge window openned so that ate a good chunk of time. Anyhow..
>

Ah, right. Know the feeling. :) I am travelling this week, too.

> Is this related to this: http://marc.info/?i=4E4070B4.1020008@it-infrastrukturen.org ?
>

On a quick glance it seems to be different. What I was looking at was dom0
setups which worked for HVM guests up to kernel 2.6.38. And locked up at some
point when a guest kernel after that was started in SMP mode.

>> I was able to examine some dumps[1] and it always seemed to be a weird
>> situations. In one example (booting 3.0 HVM under Xen 3.4.3/2.6.18 dom0) the
>> lockup always seemed to occur when the delayed mtrr init took place. Cpu#0
>> seemed to have been starting the rendevouz (stop_cpu) but then been interrupted
>> and the other (I was using vcpu=2 for simplicity) was idling somewhere else but
>> had the mtrr
>> rendevouz handler queued up (just seemed to never get started).
>>
>> Things seemed to indicate some IPI problem but to be sure I went to bisect when
>> the problem started. I ended up with the following patch which, when reverted,
>> allows me to bring up a 3.0 HVM guest with more than one CPU without any problems.
>>
>> commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
>> Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Date:   Thu Dec 2 17:55:10 2010 +0000
>>
>>     xen: PV on HVM: support PV spinlocks and IPIs
>>
>>     Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus
>>     (that switch to APIC mode and initialize APIC routing); on secondary
>>     CPUs on CPU_UP_PREPARE.
>>
>>     Enable the usage of event channels to send and receive IPIs when
>>     running as a PV on HVM guest.
>>
>> Though I have not yet really understood why exactly this happens, I thought I
>> post the results so far. It feels like either signalling an IPI through the
>> eventchannel does not come through or goes to the wrong CPU. It did not seem to
>> cause the exactly same place to fail. Like said, the 3.0 guest running in the
>> CentOS dom0 was locking up early right after all CPUs were brought up. While
>> during the bisect (using a kernel between 2.6.38 and .39-rc1) the lockup was later.
>>
>> Maybe someone has a clue immediately. I will dig a bit deeper in the dumps in
>> the meantime. Looking at the description, which sounds like using event channels
> 
> Anything turned up?

>From the data structures everything seems to be set up correctly.

>> only was intended for PV on HVM guests, it is wrong in the first place to set
>> the xen ipi functions on the HVM side...
> 
> On true HVM - sure, but on PVonHVM it sounds right.

Though exactly that seems to be what is happening. So I am looking at the guest
which is started as a HVM guest and the patch is modifying ipi delivery to be
tried as hypervisor calls instead of using the native apic method.

>>
>> -Stefan
>>
>> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/791850
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel 2.6.39+ hangs when running as HVM guest under Xen
  2011-08-09 14:54   ` Stefan Bader
@ 2011-08-10 14:40     ` Stefan Bader
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Bader @ 2011-08-10 14:40 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel@lists.xensource.com, Bug 791850, Stefano Stabellini

On 09.08.2011 09:54, Stefan Bader wrote:
> On 08.08.2011 21:38, Konrad Rzeszutek Wilk wrote:
>> On Thu, Aug 04, 2011 at 02:59:05PM +0200, Stefan Bader wrote:
>>> Since kernel 2.6.39 we were experiencing strange hangs when booting those as HVM
>>> guests in Xen (similar hangs but different places when looking at CentOS 5.4 +
>>> Xen 3.4.3 as well as Xen 4.1 and a 3.0 based dom0). The problem only happens
>>> when running with more than one vcpu.
>>>
>>
>> Hey Stefan,
>>
>> We were all at the XenSummit and I think did not get to think about this at all.
>> Also the merge window openned so that ate a good chunk of time. Anyhow..
>>
> 
> Ah, right. Know the feeling. :) I am travelling this week, too.
> 
>> Is this related to this: http://marc.info/?i=4E4070B4.1020008@it-infrastrukturen.org ?
>>
> 
> On a quick glance it seems to be different. What I was looking at was dom0
> setups which worked for HVM guests up to kernel 2.6.38. And locked up at some
> point when a guest kernel after that was started in SMP mode.
> 
>>> I was able to examine some dumps[1] and it always seemed to be a weird
>>> situations. In one example (booting 3.0 HVM under Xen 3.4.3/2.6.18 dom0) the
>>> lockup always seemed to occur when the delayed mtrr init took place. Cpu#0
>>> seemed to have been starting the rendevouz (stop_cpu) but then been interrupted
>>> and the other (I was using vcpu=2 for simplicity) was idling somewhere else but
>>> had the mtrr
>>> rendevouz handler queued up (just seemed to never get started).
>>>
>>> Things seemed to indicate some IPI problem but to be sure I went to bisect when
>>> the problem started. I ended up with the following patch which, when reverted,
>>> allows me to bring up a 3.0 HVM guest with more than one CPU without any problems.
>>>
>>> commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
>>> Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>> Date:   Thu Dec 2 17:55:10 2010 +0000
>>>
>>>     xen: PV on HVM: support PV spinlocks and IPIs
>>>
>>>     Initialize PV spinlocks on boot CPU right after native_smp_prepare_cpus
>>>     (that switch to APIC mode and initialize APIC routing); on secondary
>>>     CPUs on CPU_UP_PREPARE.
>>>
>>>     Enable the usage of event channels to send and receive IPIs when
>>>     running as a PV on HVM guest.
>>>
>>> Though I have not yet really understood why exactly this happens, I thought I
>>> post the results so far. It feels like either signalling an IPI through the
>>> eventchannel does not come through or goes to the wrong CPU. It did not seem to
>>> cause the exactly same place to fail. Like said, the 3.0 guest running in the
>>> CentOS dom0 was locking up early right after all CPUs were brought up. While
>>> during the bisect (using a kernel between 2.6.38 and .39-rc1) the lockup was later.
>>>
>>> Maybe someone has a clue immediately. I will dig a bit deeper in the dumps in
>>> the meantime. Looking at the description, which sounds like using event channels
>>
>> Anything turned up?
> 
>>From the data structures everything seems to be set up correctly.
> 
>>> only was intended for PV on HVM guests, it is wrong in the first place to set
>>> the xen ipi functions on the HVM side...
>>
>> On true HVM - sure, but on PVonHVM it sounds right.
> 
> Though exactly that seems to be what is happening. So I am looking at the guest
> which is started as a HVM guest and the patch is modifying ipi delivery to be
> tried as hypervisor calls instead of using the native apic method.
>

As a bit of more information, it seems after upgrading the hypervisor to xen
4.1.1 (with the same 3.0 dom0 kernel) the HVM guest (3.0 kernel) boots past the
initial hang to end up having ata timeouts on the emulated IDE controller. *sigh*

So there also seems to be a dependency on the hypervisor code. 3.4.3 and 4.1.0
at least seemed to have problems, 4.1.1 has a different one.

Sounds a bit like some later versions of the hypervisor would handle the IPI
calls while older ones do not...

>>>
>>> -Stefan
>>>
>>> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/791850
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel 2.6.39+ hangs when running as HVM guest under Xen
  2011-08-04 12:59 Kernel 2.6.39+ hangs when running as HVM guest under Xen Stefan Bader
  2011-08-09  2:38 ` Konrad Rzeszutek Wilk
@ 2011-08-17 13:15 ` Stefan Bader
  2011-08-17 13:25   ` Stefan Bader
  1 sibling, 1 reply; 7+ messages in thread
From: Stefan Bader @ 2011-08-17 13:15 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com
  Cc: Stefano Stabellini, Bug 791850, Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 1831 bytes --]

So after a bit more help from Stefano, a fix for that could be this one:

>From 8e6c2f27782859b657faef508c6b56c2068af533 Mon Sep 17 00:00:00 2001
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date: Wed, 17 Aug 2011 10:10:59 +0200
Subject: [PATCH] UBUNTU: (upstream) xen: Do not enable PV IPIs when vector
callback not present

Fix regression for HVM case on older (<4.1.1) hypervisors caused by

  commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
  Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
  Date:   Thu Dec 2 17:55:10 2010 +0000

    xen: PV on HVM: support PV spinlocks and IPIs

This change replaced the SMP operations with event based handlers without
taking into account that this only works when the hypervisor supports
callback vectors. This causes unexplainable hangs early on boot for
HVM guests with more than one CPU.

BugLink: http://bugs.launchpad.net/bugs/791850

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

---
 arch/x86/xen/smp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index b4533a8..e79dbb9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -521,8 +521,6 @@ static void __init xen_hvm_smp_prepare_cpus(unsigned int
max_cpus)
        native_smp_prepare_cpus(max_cpus);
        WARN_ON(xen_smp_intr_init(0));

-       if (!xen_have_vector_callback)
-               return;
        xen_init_lock_cpu(0);
        xen_init_spinlocks();
 }
@@ -546,6 +544,8 @@ static void xen_hvm_cpu_die(unsigned int cpu)
 void __init xen_hvm_smp_init(void)
 {
+       if (!xen_have_vector_callback)
+               return;
        smp_ops.smp_prepare_cpus = xen_hvm_smp_prepare_cpus;
        smp_ops.smp_send_reschedule = xen_smp_send_reschedule;
        smp_ops.cpu_up = xen_hvm_cpu_up;
-- 
1.7.4.1

[-- Attachment #2: 0002-xen-Do-not-enable-PV-IPIs-when-vector-callback-not-p.patch --]
[-- Type: text/x-diff, Size: 1680 bytes --]

>From 8e6c2f27782859b657faef508c6b56c2068af533 Mon Sep 17 00:00:00 2001
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Date: Wed, 17 Aug 2011 10:10:59 +0200
Subject: [PATCH] UBUNTU: (upstream) xen: Do not enable PV IPIs when vector callback not present

Fix regression for HVM case on older (<4.1.1) hypervisors caused by

  commit 99bbb3a84a99cd04ab16b998b20f01a72cfa9f4f
  Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
  Date:   Thu Dec 2 17:55:10 2010 +0000

    xen: PV on HVM: support PV spinlocks and IPIs

This change replaced the SMP operations with event based handlers without
taking into account that this only works when the hypervisor supports
callback vectors. This causes unexplainable hangs early on boot for
HVM guests with more than one CPU.

BugLink: http://bugs.launchpad.net/bugs/791850

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 arch/x86/xen/smp.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index b4533a8..e79dbb9 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -521,8 +521,6 @@ static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus)
 	native_smp_prepare_cpus(max_cpus);
 	WARN_ON(xen_smp_intr_init(0));
 
-	if (!xen_have_vector_callback)
-		return;
 	xen_init_lock_cpu(0);
 	xen_init_spinlocks();
 }
@@ -546,6 +544,8 @@ static void xen_hvm_cpu_die(unsigned int cpu)
 
 void __init xen_hvm_smp_init(void)
 {
+	if (!xen_have_vector_callback)
+		return;
 	smp_ops.smp_prepare_cpus = xen_hvm_smp_prepare_cpus;
 	smp_ops.smp_send_reschedule = xen_smp_send_reschedule;
 	smp_ops.cpu_up = xen_hvm_cpu_up;
-- 
1.7.4.1


[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Kernel 2.6.39+ hangs when running as HVM guest under Xen
  2011-08-17 13:15 ` Stefan Bader
@ 2011-08-17 13:25   ` Stefan Bader
  2011-08-17 13:37     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Bader @ 2011-08-17 13:25 UTC (permalink / raw)
  To: xen-devel

On 17.08.2011 15:15, Stefan Bader wrote:
> So after a bit more help from Stefano, a fix for that could be this one:
> 
>>From 8e6c2f27782859b657faef508c6b56c2068af533 Mon Sep 17 00:00:00 2001
> From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Date: Wed, 17 Aug 2011 10:10:59 +0200
> Subject: [PATCH] UBUNTU: (upstream) xen: Do not enable PV IPIs when vector
> callback not present

Doh, no need for that upstream... s/UBUNTU: (upstream) //

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel 2.6.39+ hangs when running as HVM guest under Xen
  2011-08-17 13:25   ` Stefan Bader
@ 2011-08-17 13:37     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 7+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-08-17 13:37 UTC (permalink / raw)
  To: Stefan Bader; +Cc: xen-devel

On Wed, Aug 17, 2011 at 03:25:54PM +0200, Stefan Bader wrote:
> On 17.08.2011 15:15, Stefan Bader wrote:
> > So after a bit more help from Stefano, a fix for that could be this one:
> > 
> >>From 8e6c2f27782859b657faef508c6b56c2068af533 Mon Sep 17 00:00:00 2001
> > From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Date: Wed, 17 Aug 2011 10:10:59 +0200
> > Subject: [PATCH] UBUNTU: (upstream) xen: Do not enable PV IPIs when vector
> > callback not present
> 
> Doh, no need for that upstream... s/UBUNTU: (upstream) //

/me nods. Queued up for 3.1 and for stable

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-08-17 13:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-04 12:59 Kernel 2.6.39+ hangs when running as HVM guest under Xen Stefan Bader
2011-08-09  2:38 ` Konrad Rzeszutek Wilk
2011-08-09 14:54   ` Stefan Bader
2011-08-10 14:40     ` Stefan Bader
2011-08-17 13:15 ` Stefan Bader
2011-08-17 13:25   ` Stefan Bader
2011-08-17 13:37     ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).