xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* 4.2.3 - xen: vector 0x2 is not implemented
@ 2013-10-07 23:50 Steven Haigh
  2013-10-08 12:42 ` Ian Campbell
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Haigh @ 2013-10-07 23:50 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 791 bytes --]

Hi all,

I've had a report of a host starting to give this output:
Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on CPU { 2}
(t=273561 jiffies g=17919 c=17918 q=9688)
Oct 7 19:36:37 kernel: sending NMI to all CPUs:
Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented

Kernel is 3.11.2.
Xen is 4.2.3.

I've seen various random reports of this, but never a follow up on what
seemed to be the issue or a fix. The latest was on the ARM platform, but
may be a different issue.

Has anyone come across these before and found a solution?

Xen 4.2.2 seems unaffected.

Bug Report:
http://xen.crc.id.au/bugs/view.php?id=21

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.2.3 - xen: vector 0x2 is not implemented
  2013-10-07 23:50 4.2.3 - xen: vector 0x2 is not implemented Steven Haigh
@ 2013-10-08 12:42 ` Ian Campbell
  2013-10-08 14:31   ` Boris Ostrovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Ian Campbell @ 2013-10-08 12:42 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel

On Tue, 2013-10-08 at 10:50 +1100, Steven Haigh wrote:
> Hi all,
> 
> I've had a report of a host starting to give this output:
> Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on CPU { 2}
> (t=273561 jiffies g=17919 c=17918 q=9688)
> Oct 7 19:36:37 kernel: sending NMI to all CPUs:
> Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented

This bit is just a symptom triggered by the initial rcu stall which is
your real problem.

That said I thought the NMI thing was fixed recently, which might have
gotten you better debugging on the rcu problem.

It's a kernel issue rather than a Xen one BTW.

Ian.

> 
> Kernel is 3.11.2.
> Xen is 4.2.3.
> 
> I've seen various random reports of this, but never a follow up on what
> seemed to be the issue or a fix. The latest was on the ARM platform, but
> may be a different issue.
> 
> Has anyone come across these before and found a solution?
> 
> Xen 4.2.2 seems unaffected.
> 
> Bug Report:
> http://xen.crc.id.au/bugs/view.php?id=21
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.2.3 - xen: vector 0x2 is not implemented
  2013-10-08 12:42 ` Ian Campbell
@ 2013-10-08 14:31   ` Boris Ostrovsky
  2013-10-08 14:41     ` Steven Haigh
  0 siblings, 1 reply; 9+ messages in thread
From: Boris Ostrovsky @ 2013-10-08 14:31 UTC (permalink / raw)
  To: Ian Campbell, Steven Haigh; +Cc: xen-devel

On 10/08/2013 08:42 AM, Ian Campbell wrote:
> On Tue, 2013-10-08 at 10:50 +1100, Steven Haigh wrote:
>> Hi all,
>>
>> I've had a report of a host starting to give this output:
>> Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on CPU { 2}
>> (t=273561 jiffies g=17919 c=17918 q=9688)
>> Oct 7 19:36:37 kernel: sending NMI to all CPUs:
>> Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented
> This bit is just a symptom triggered by the initial rcu stall which is
> your real problem.
>
> That said I thought the NMI thing was fixed recently, which might have
> gotten you better debugging on the rcu problem.

This went into v3.12-rc1 (commit 6efa20e).

Steven, can you try your test with newer kernels that have this fix? With
it we should be able to see where the stall is happening.

-boris

>
> It's a kernel issue rather than a Xen one BTW.
>
> Ian.
>
>> Kernel is 3.11.2.
>> Xen is 4.2.3.
>>
>> I've seen various random reports of this, but never a follow up on what
>> seemed to be the issue or a fix. The latest was on the ARM platform, but
>> may be a different issue.
>>
>> Has anyone come across these before and found a solution?
>>
>> Xen 4.2.2 seems unaffected.
>>
>> Bug Report:
>> http://xen.crc.id.au/bugs/view.php?id=21
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.2.3 - xen: vector 0x2 is not implemented
  2013-10-08 14:31   ` Boris Ostrovsky
@ 2013-10-08 14:41     ` Steven Haigh
  2013-10-08 15:56       ` Boris Ostrovsky
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Haigh @ 2013-10-08 14:41 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1021 bytes --]

On 10/09/2013 01:31 AM, Boris Ostrovsky wrote:
> On 10/08/2013 08:42 AM, Ian Campbell wrote:
>> On Tue, 2013-10-08 at 10:50 +1100, Steven Haigh wrote:
>>> Hi all,
>>>
>>> I've had a report of a host starting to give this output:
>>> Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on CPU { 2}
>>> (t=273561 jiffies g=17919 c=17918 q=9688)
>>> Oct 7 19:36:37 kernel: sending NMI to all CPUs:
>>> Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented
>> This bit is just a symptom triggered by the initial rcu stall which is
>> your real problem.
>>
>> That said I thought the NMI thing was fixed recently, which might have
>> gotten you better debugging on the rcu problem.
> 
> This went into v3.12-rc1 (commit 6efa20e).
> 
> Steven, can you try your test with newer kernels that have this fix? With
> it we should be able to see where the stall is happening.

Sadly I can't easily rebuild this to 3.12. Is it possible to get a
version / patch / commit that will work with 3.11.x?


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.2.3 - xen: vector 0x2 is not implemented
  2013-10-08 14:41     ` Steven Haigh
@ 2013-10-08 15:56       ` Boris Ostrovsky
  2013-10-09  0:25         ` Steven Haigh
  0 siblings, 1 reply; 9+ messages in thread
From: Boris Ostrovsky @ 2013-10-08 15:56 UTC (permalink / raw)
  To: Steven Haigh; +Cc: xen-devel

On 10/08/2013 10:41 AM, Steven Haigh wrote:
> On 10/09/2013 01:31 AM, Boris Ostrovsky wrote:
>> On 10/08/2013 08:42 AM, Ian Campbell wrote:
>>> On Tue, 2013-10-08 at 10:50 +1100, Steven Haigh wrote:
>>>> Hi all,
>>>>
>>>> I've had a report of a host starting to give this output:
>>>> Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on CPU { 2}
>>>> (t=273561 jiffies g=17919 c=17918 q=9688)
>>>> Oct 7 19:36:37 kernel: sending NMI to all CPUs:
>>>> Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented
>>> This bit is just a symptom triggered by the initial rcu stall which is
>>> your real problem.
>>>
>>> That said I thought the NMI thing was fixed recently, which might have
>>> gotten you better debugging on the rcu problem.
>> This went into v3.12-rc1 (commit 6efa20e).
>>
>> Steven, can you try your test with newer kernels that have this fix? With
>> it we should be able to see where the stall is happening.
> Sadly I can't easily rebuild this to 3.12. Is it possible to get a
> version / patch / commit that will work with 3.11.x?

I am not sure I understand what you are asking for. A backport of the
patch to 3.11.x so that you apply it on top of your sources and build it
yourself?

-boris

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.2.3 - xen: vector 0x2 is not implemented
  2013-10-08 15:56       ` Boris Ostrovsky
@ 2013-10-09  0:25         ` Steven Haigh
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Haigh @ 2013-10-09  0:25 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1762 bytes --]

On 9/10/2013 2:56 AM, Boris Ostrovsky wrote:
> On 10/08/2013 10:41 AM, Steven Haigh wrote:
>> On 10/09/2013 01:31 AM, Boris Ostrovsky wrote:
>>> On 10/08/2013 08:42 AM, Ian Campbell wrote:
>>>> On Tue, 2013-10-08 at 10:50 +1100, Steven Haigh wrote:
>>>>> Hi all,
>>>>>
>>>>> I've had a report of a host starting to give this output:
>>>>> Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on CPU { 2}
>>>>> (t=273561 jiffies g=17919 c=17918 q=9688)
>>>>> Oct 7 19:36:37 kernel: sending NMI to all CPUs:
>>>>> Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented
>>>> This bit is just a symptom triggered by the initial rcu stall which is
>>>> your real problem.
>>>>
>>>> That said I thought the NMI thing was fixed recently, which might have
>>>> gotten you better debugging on the rcu problem.
>>> This went into v3.12-rc1 (commit 6efa20e).
>>>
>>> Steven, can you try your test with newer kernels that have this fix?
>>> With
>>> it we should be able to see where the stall is happening.
>> Sadly I can't easily rebuild this to 3.12. Is it possible to get a
>> version / patch / commit that will work with 3.11.x?
> 
> I am not sure I understand what you are asking for. A backport of the
> patch to 3.11.x so that you apply it on top of your sources and build it
> yourself?

Correct. I have a buildroot set up for building 3.11.x into RPMs that
would take a lot of work to change to 3.12. As this is a problem with
3.11 as well (as I don't run or provide 3.12 anywhere), I'd like to test
the fix on 3.11.

Eventually, it will need to be fixed in the 3.11 series as well.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 834 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.2.3 - xen: vector 0x2 is not implemented
@ 2013-10-09  1:53 Boris Ostrovsky
  2013-10-09  1:56 ` Steven Haigh
  0 siblings, 1 reply; 9+ messages in thread
From: Boris Ostrovsky @ 2013-10-09  1:53 UTC (permalink / raw)
  To: netwiz; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2008 bytes --]


----- netwiz@crc.id.au wrote:

> On 9/10/2013 2:56 AM, Boris Ostrovsky wrote:
> > On 10/08/2013 10:41 AM, Steven Haigh wrote:
> >> On 10/09/2013 01:31 AM, Boris Ostrovsky wrote:
> >>> On 10/08/2013 08:42 AM, Ian Campbell wrote:
> >>>> On Tue, 2013-10-08 at 10:50 +1100, Steven Haigh wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I've had a report of a host starting to give this output:
> >>>>> Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on
> CPU { 2}
> >>>>> (t=273561 jiffies g=17919 c=17918 q=9688)
> >>>>> Oct 7 19:36:37 kernel: sending NMI to all CPUs:
> >>>>> Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented
> >>>> This bit is just a symptom triggered by the initial rcu stall
> which is
> >>>> your real problem.
> >>>>
> >>>> That said I thought the NMI thing was fixed recently, which might
> have
> >>>> gotten you better debugging on the rcu problem.
> >>> This went into v3.12-rc1 (commit 6efa20e).
> >>>
> >>> Steven, can you try your test with newer kernels that have this
> fix?
> >>> With
> >>> it we should be able to see where the stall is happening.
> >> Sadly I can't easily rebuild this to 3.12. Is it possible to get a
> >> version / patch / commit that will work with 3.11.x?
> > 
> > I am not sure I understand what you are asking for. A backport of
> the
> > patch to 3.11.x so that you apply it on top of your sources and
> build it
> > yourself?
> 
> Correct. I have a buildroot set up for building 3.11.x into RPMs that
> would take a lot of work to change to 3.12. As this is a problem with
> 3.11 as well (as I don't run or provide 3.12 anywhere), I'd like to
> test
> the fix on 3.11.

I am attaching the patch to 3.11.4 (which is exactly the same as the one
that went into 3.12 btw). I only compile-tested it to make sure that it builds.

> 
> Eventually, it will need to be fixed in the 3.11 series as well.
> 

It's unlikely to go into 3.11 since it's really a new feature and not a bug fix.


-boris

[-- Attachment #2: nmi_3.11.4.patch --]
[-- Type: text/plain, Size: 4507 bytes --]

diff --git a/arch/x86/include/asm/xen/events.h b/arch/x86/include/asm/xen/events.h
index ca842f2..608a79d 100644
--- a/arch/x86/include/asm/xen/events.h
+++ b/arch/x86/include/asm/xen/events.h
@@ -7,6 +7,7 @@ enum ipi_vector {
 	XEN_CALL_FUNCTION_SINGLE_VECTOR,
 	XEN_SPIN_UNLOCK_VECTOR,
 	XEN_IRQ_WORK_VECTOR,
+	XEN_NMI_VECTOR,
 
 	XEN_NR_IPIS,
 };
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 193097e..b5a22fa 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -427,8 +427,7 @@ static void __init xen_init_cpuid_mask(void)
 
 	if (!xen_initial_domain())
 		cpuid_leaf1_edx_mask &=
-			~((1 << X86_FEATURE_APIC) |  /* disable local APIC */
-			  (1 << X86_FEATURE_ACPI));  /* disable ACPI */
+			~((1 << X86_FEATURE_ACPI));  /* disable ACPI */
 
 	cpuid_leaf1_ecx_mask &= ~(1 << (X86_FEATURE_X2APIC % 32));
 
@@ -735,8 +734,7 @@ static int cvt_gate_to_trap(int vector, const gate_desc *val,
 		addr = (unsigned long)xen_int3;
 	else if (addr == (unsigned long)stack_segment)
 		addr = (unsigned long)xen_stack_segment;
-	else if (addr == (unsigned long)double_fault ||
-		 addr == (unsigned long)nmi) {
+	else if (addr == (unsigned long)double_fault) {
 		/* Don't need to handle these */
 		return 0;
 #ifdef CONFIG_X86_MCE
@@ -747,7 +745,12 @@ static int cvt_gate_to_trap(int vector, const gate_desc *val,
 		 */
 		;
 #endif
-	} else {
+	} else if (addr == (unsigned long)nmi)
+		/*
+		 * Use the native version as well.
+		 */
+		;
+	else {
 		/* Some other trap using IST? */
 		if (WARN_ON(val->ist != 0))
 			return 0;
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 8f3eea6..0caa7af 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -33,6 +33,9 @@
 /* These are code, but not functions.  Defined in entry.S */
 extern const char xen_hypervisor_callback[];
 extern const char xen_failsafe_callback[];
+#ifdef CONFIG_X86_64
+extern const char nmi[];
+#endif
 extern void xen_sysenter_target(void);
 extern void xen_syscall_target(void);
 extern void xen_syscall32_target(void);
@@ -547,7 +550,13 @@ void xen_enable_syscall(void)
 	}
 #endif /* CONFIG_X86_64 */
 }
-
+void __cpuinit xen_enable_nmi(void)
+{
+#ifdef CONFIG_X86_64
+	if (register_callback(CALLBACKTYPE_nmi, nmi))
+		BUG();
+#endif
+}
 void __init xen_arch_setup(void)
 {
 	xen_panic_handler_init();
@@ -565,7 +574,7 @@ void __init xen_arch_setup(void)
 
 	xen_enable_sysenter();
 	xen_enable_syscall();
-
+	xen_enable_nmi();
 #ifdef CONFIG_ACPI
 	if (!(xen_start_info->flags & SIF_INITDOMAIN)) {
 		printk(KERN_INFO "ACPI in unprivileged domain disabled\n");
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index b81c88e..1e2580a 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -572,6 +572,12 @@ static inline int xen_map_vector(int vector)
 	case IRQ_WORK_VECTOR:
 		xen_vector = XEN_IRQ_WORK_VECTOR;
 		break;
+#ifdef CONFIG_X86_64
+	case NMI_VECTOR:
+	case APIC_DM_NMI: /* Some use that instead of NMI_VECTOR */
+		xen_vector = XEN_NMI_VECTOR;
+		break;
+#endif
 	default:
 		xen_vector = -1;
 		printk(KERN_ERR "xen: vector 0x%x is not implemented\n",
diff --git a/drivers/xen/events.c b/drivers/xen/events.c
index 5e8be46..387f3f4 100644
--- a/drivers/xen/events.c
+++ b/drivers/xen/events.c
@@ -56,6 +56,7 @@
 #include <xen/interface/hvm/params.h>
 #include <xen/interface/physdev.h>
 #include <xen/interface/sched.h>
+#include <xen/interface/vcpu.h>
 #include <asm/hw_irq.h>
 
 /*
@@ -1212,7 +1213,15 @@ EXPORT_SYMBOL_GPL(evtchn_put);
 
 void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector)
 {
-	int irq = per_cpu(ipi_to_irq, cpu)[vector];
+	int irq;
+
+	if (unlikely(vector == XEN_NMI_VECTOR)) {
+		int rc =  HYPERVISOR_vcpu_op(VCPUOP_send_nmi, cpu, NULL);
+		if (rc < 0)
+			printk(KERN_WARNING "Sending nmi to CPU%d failed (rc:%d)\n", cpu, rc);
+		return;
+	}
+	irq = per_cpu(ipi_to_irq, cpu)[vector];
 	BUG_ON(irq < 0);
 	notify_remote_via_irq(irq);
 }
diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index 87e6f8a..b05288c 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -170,4 +170,6 @@ struct vcpu_register_vcpu_info {
 };
 DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
 
+/* Send an NMI to the specified VCPU. @extra_arg == NULL. */
+#define VCPUOP_send_nmi             11
 #endif /* __XEN_PUBLIC_VCPU_H__ */

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: 4.2.3 - xen: vector 0x2 is not implemented
  2013-10-09  1:53 Boris Ostrovsky
@ 2013-10-09  1:56 ` Steven Haigh
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Haigh @ 2013-10-09  1:56 UTC (permalink / raw)
  To: Boris Ostrovsky; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2577 bytes --]

On 9/10/2013 12:53 PM, Boris Ostrovsky wrote:
> 
> ----- netwiz@crc.id.au wrote:
> 
>> On 9/10/2013 2:56 AM, Boris Ostrovsky wrote:
>>> On 10/08/2013 10:41 AM, Steven Haigh wrote:
>>>> On 10/09/2013 01:31 AM, Boris Ostrovsky wrote:
>>>>> On 10/08/2013 08:42 AM, Ian Campbell wrote:
>>>>>> On Tue, 2013-10-08 at 10:50 +1100, Steven Haigh wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've had a report of a host starting to give this output:
>>>>>>> Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on
>> CPU { 2}
>>>>>>> (t=273561 jiffies g=17919 c=17918 q=9688)
>>>>>>> Oct 7 19:36:37 kernel: sending NMI to all CPUs:
>>>>>>> Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented
>>>>>> This bit is just a symptom triggered by the initial rcu stall
>> which is
>>>>>> your real problem.
>>>>>>
>>>>>> That said I thought the NMI thing was fixed recently, which might
>> have
>>>>>> gotten you better debugging on the rcu problem.
>>>>> This went into v3.12-rc1 (commit 6efa20e).
>>>>>
>>>>> Steven, can you try your test with newer kernels that have this
>> fix?
>>>>> With
>>>>> it we should be able to see where the stall is happening.
>>>> Sadly I can't easily rebuild this to 3.12. Is it possible to get a
>>>> version / patch / commit that will work with 3.11.x?
>>>
>>> I am not sure I understand what you are asking for. A backport of
>> the
>>> patch to 3.11.x so that you apply it on top of your sources and
>> build it
>>> yourself?
>>
>> Correct. I have a buildroot set up for building 3.11.x into RPMs that
>> would take a lot of work to change to 3.12. As this is a problem with
>> 3.11 as well (as I don't run or provide 3.12 anywhere), I'd like to
>> test
>> the fix on 3.11.
> 
> I am attaching the patch to 3.11.4 (which is exactly the same as the one
> that went into 3.12 btw). I only compile-tested it to make sure that it builds.
> 
>>
>> Eventually, it will need to be fixed in the 3.11 series as well.
>>
> 
> It's unlikely to go into 3.11 since it's really a new feature and not a bug fix.

Thanks, I'll test this and see what I can turn up. The problems are
happening in 3.11.2 at this stage - which causes the system to become
unresponsive and a load of 45+. It also causes network traffic to stop.

As such, it should probably be looked at as a bugfix for 3.11. Will try
and get further results and revert back when I can provide more feedback.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 834 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 4.2.3 - xen: vector 0x2 is not implemented
@ 2013-10-09  2:15 Boris Ostrovsky
  0 siblings, 0 replies; 9+ messages in thread
From: Boris Ostrovsky @ 2013-10-09  2:15 UTC (permalink / raw)
  To: netwiz; +Cc: xen-devel


----- netwiz@crc.id.au wrote:

> On 9/10/2013 12:53 PM, Boris Ostrovsky wrote:
> > 
> > ----- netwiz@crc.id.au wrote:
> > 
> >> On 9/10/2013 2:56 AM, Boris Ostrovsky wrote:
> >>> On 10/08/2013 10:41 AM, Steven Haigh wrote:
> >>>> On 10/09/2013 01:31 AM, Boris Ostrovsky wrote:
> >>>>> On 10/08/2013 08:42 AM, Ian Campbell wrote:
> >>>>>> On Tue, 2013-10-08 at 10:50 +1100, Steven Haigh wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I've had a report of a host starting to give this output:
> >>>>>>> Oct 7 19:36:37 kernel: INFO: rcu_sched self-detected stall on
> >> CPU { 2}
> >>>>>>> (t=273561 jiffies g=17919 c=17918 q=9688)
> >>>>>>> Oct 7 19:36:37 kernel: sending NMI to all CPUs:
> >>>>>>> Oct 7 19:36:37 kernel: xen: vector 0x2 is not implemented
> >>>>>> This bit is just a symptom triggered by the initial rcu stall
> >> which is
> >>>>>> your real problem.
> >>>>>>
> >>>>>> That said I thought the NMI thing was fixed recently, which
> might
> >> have
> >>>>>> gotten you better debugging on the rcu problem.
> >>>>> This went into v3.12-rc1 (commit 6efa20e).
> >>>>>
> >>>>> Steven, can you try your test with newer kernels that have this
> >> fix?
> >>>>> With
> >>>>> it we should be able to see where the stall is happening.
> >>>> Sadly I can't easily rebuild this to 3.12. Is it possible to get
> a
> >>>> version / patch / commit that will work with 3.11.x?
> >>>
> >>> I am not sure I understand what you are asking for. A backport of
> >> the
> >>> patch to 3.11.x so that you apply it on top of your sources and
> >> build it
> >>> yourself?
> >>
> >> Correct. I have a buildroot set up for building 3.11.x into RPMs
> that
> >> would take a lot of work to change to 3.12. As this is a problem
> with
> >> 3.11 as well (as I don't run or provide 3.12 anywhere), I'd like
> to
> >> test
> >> the fix on 3.11.
> > 
> > I am attaching the patch to 3.11.4 (which is exactly the same as the
> one
> > that went into 3.12 btw). I only compile-tested it to make sure that
> it builds.
> > 
> >>
> >> Eventually, it will need to be fixed in the 3.11 series as well.
> >>
> > 
> > It's unlikely to go into 3.11 since it's really a new feature and
> not a bug fix.
> 
> Thanks, I'll test this and see what I can turn up. The problems are
> happening in 3.11.2 at this stage - which causes the system to become
> unresponsive and a load of 45+. It also causes network traffic to
> stop.
> 
> As such, it should probably be looked at as a bugfix for 3.11. 

Just to be clear: this patch is not to make your hang go away: it's a way
for kernel to produce stack trace that will (hopefully) show where the
hang is. So don't get your hopes high that it will fix your problem ;-)


-boris


> Will try and get further results and revert back when I can provide more
> feedback.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-10-09  2:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-07 23:50 4.2.3 - xen: vector 0x2 is not implemented Steven Haigh
2013-10-08 12:42 ` Ian Campbell
2013-10-08 14:31   ` Boris Ostrovsky
2013-10-08 14:41     ` Steven Haigh
2013-10-08 15:56       ` Boris Ostrovsky
2013-10-09  0:25         ` Steven Haigh
  -- strict thread matches above, loose matches on Subject: below --
2013-10-09  1:53 Boris Ostrovsky
2013-10-09  1:56 ` Steven Haigh
2013-10-09  2:15 Boris Ostrovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).