xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: correct socket_cpumask allocation for AP
@ 2015-07-08  9:36 Chao Peng
  2015-07-08 12:38 ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Peng @ 2015-07-08  9:36 UTC (permalink / raw)
  To: xen-devel; +Cc: andrew.cooper3, keir, JBeulich

For AP, phys_proc_id is still not valid in CPU_PREPARE notifier
(cpu_smpboot_alloc), so cpu_to_socket(cpu) is not valid as well.

Introduce a pre-allocated secondary_cpu_mask so that later in
smp_store_cpu_info() socket_cpumask[socket] can consume it.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
This is targeted for staging branch.
I tested on a 2-sockets machine and looks fine.
---
 xen/arch/x86/smpboot.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index c73aa1b..49b8497 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -62,6 +62,7 @@ EXPORT_SYMBOL(cpu_online_map);
 
 unsigned int __read_mostly nr_sockets;
 cpumask_var_t *__read_mostly socket_cpumask;
+static cpumask_var_t secondary_socket_cpumask;
 
 struct cpuinfo_x86 cpu_data[NR_CPUS];
 
@@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
 static void smp_store_cpu_info(int id)
 {
     struct cpuinfo_x86 *c = cpu_data + id;
+    unsigned int socket;
 
     *c = boot_cpu_data;
     if ( id != 0 )
+    {
         identify_cpu(c);
 
+        socket = cpu_to_socket(id);
+        if ( !socket_cpumask[socket] )
+        {
+            socket_cpumask[socket] = secondary_socket_cpumask;
+            secondary_socket_cpumask = NULL;
+        }
+    }
+
     /*
      * Certain Athlons might work (for various values of 'work') in SMP
      * but they are not certified as MP capable.
@@ -705,7 +716,6 @@ static int cpu_smpboot_alloc(unsigned int cpu)
     nodeid_t node = cpu_to_node(cpu);
     struct desc_struct *gdt;
     unsigned long stub_page;
-    unsigned int socket = cpu_to_socket(cpu);
 
     if ( node != NUMA_NO_NODE )
         memflags = MEMF_node(node);
@@ -748,8 +758,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
         goto oom;
     per_cpu(stubs.addr, cpu) = stub_page + STUB_BUF_CPU_OFFS(cpu);
 
-    if ( !socket_cpumask[socket] &&
-         !zalloc_cpumask_var(socket_cpumask + socket) )
+    if ( !secondary_socket_cpumask &&
+         !zalloc_cpumask_var(&secondary_socket_cpumask) )
         goto oom;
 
     if ( zalloc_cpumask_var(&per_cpu(cpu_sibling_mask, cpu)) &&
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08  9:36 [PATCH] x86: correct socket_cpumask allocation for AP Chao Peng
@ 2015-07-08 12:38 ` Jan Beulich
  2015-07-08 15:11   ` Dario Faggioli
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2015-07-08 12:38 UTC (permalink / raw)
  To: Chao Peng; +Cc: andrew.cooper3, keir, xen-devel

>>> On 08.07.15 at 11:36, <chao.p.peng@linux.intel.com> wrote:
> @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
>  static void smp_store_cpu_info(int id)
>  {
>      struct cpuinfo_x86 *c = cpu_data + id;
> +    unsigned int socket;
>  
>      *c = boot_cpu_data;
>      if ( id != 0 )
> +    {
>          identify_cpu(c);
>  
> +        socket = cpu_to_socket(id);
> +        if ( !socket_cpumask[socket] )
> +        {
> +            socket_cpumask[socket] = secondary_socket_cpumask;
> +            secondary_socket_cpumask = NULL;

I don't think this will build with small enough NR_CPUS. Which
raises the question whether the use of cpumask_var_t is suitable
here in the first place.

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08 12:38 ` Jan Beulich
@ 2015-07-08 15:11   ` Dario Faggioli
  2015-07-08 15:38     ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Dario Faggioli @ 2015-07-08 15:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Chao Peng, xen-devel, keir, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 5234 bytes --]

On Wed, 2015-07-08 at 13:38 +0100, Jan Beulich wrote:
> >>> On 08.07.15 at 11:36, <chao.p.peng@linux.intel.com> wrote:
> > @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
> >  static void smp_store_cpu_info(int id)
> >  {
> >      struct cpuinfo_x86 *c = cpu_data + id;
> > +    unsigned int socket;
> >  
> >      *c = boot_cpu_data;
> >      if ( id != 0 )
> > +    {
> >          identify_cpu(c);
> >  
> > +        socket = cpu_to_socket(id);
> > +        if ( !socket_cpumask[socket] )
> > +        {
> > +            socket_cpumask[socket] = secondary_socket_cpumask;
> > +            secondary_socket_cpumask = NULL;
> 
> I don't think this will build with small enough NR_CPUS.
>
And it *does* *not* fix the issue on my box.

In fact, even with this applied, I see the following splat, on CPU #0,
as you can see:

...
(XEN) I/O virtualisation enabled
(XEN) Interrupt remapping enabled
(XEN) CPU0: Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz stepping 02
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08018a52a>] set_cpu_sibling_map+0x65/0x396
(XEN) RFLAGS: 0000000000010287   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff82d080332240   rcx: 0000000000000000
(XEN) rdx: 0000000000000001   rsi: 000000000000000a   rdi: ffff82d08029b768
(XEN) rbp: ffff82d0802efe28   rsp: ffff82d0802efdd8   r8:  ffff82d0802d5c40
(XEN) r9:  0000000000000001   r10: 0000000000000002   r11: 0000000000000001
(XEN) r12: 0000000000000000   r13: ffff82d0803321e0   r14: ffff82d0803321e0
(XEN) r15: 000000000000000f   cr0: 000000008005003b   cr4: 00000000000006e0
(XEN) cr3: 00000000dba9a000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82d0802efdd8:
(XEN)    ffff82d0802efe28 0000000000000020 ffff8303206c3610 ffff82d000000000
(XEN)    ffff82d0803321e0 ffff82d080332240 ffff8303206c3610 ffff82d0803321e0
(XEN)    ffff82d0803321e0 000000000000000f ffff82d0802efe48 ffff82d0802c14b3
(XEN)    ffff82d0802e8000 ffff82d0802e8000 ffff82d0802eff08 ffff82d0802c0d5a
(XEN)    0000000000000000 ffff82d0802dd900 0000000001478000 0000000000324000
(XEN)    0080016300000000 0000000001478001 0000000000000000 0000000000000015
(XEN)    ffff830000079c80 ffff830000079ee0 0000000100000002 ffff830000079fb0
(XEN)    0000000800000000 000000010000006e 0000000000000003 00000000000002f8
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffff82d080100073
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8300dbdf4000 0000000000000000
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08018a52a>] set_cpu_sibling_map+0x65/0x396
(XEN)    [<ffff82d0802c14b3>] smp_prepare_cpus+0x12e/0x24f
(XEN)    [<ffff82d0802c0d5a>] __start_xen+0x1ff3/0x2536
(XEN)    [<ffff82d080100073>] __high_start+0x53/0x55
(XEN) 
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 0000000320734063 ffffffffffffffff
(XEN)  L3[0x000] = 0000000320733063 ffffffffffffffff
(XEN)  L2[0x000] = 0000000320732063 ffffffffffffffff 
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN) 
Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: 0000000000000000
(XEN) ****************************************
...

The configuration of the testbox in question is as follows:

root@Zhaman:~# xl info -n
...
machine                : x86_64
nr_cpus                : 16
max_cpu_id             : 63
nr_nodes               : 2
cores_per_socket       : 4
threads_per_core       : 2
...
cpu_topology           :
cpu:    core    socket     node
  0:       0        1        0
  1:       0        1        0
  2:       1        1        0
  3:       1        1        0
  4:       9        1        0
  5:       9        1        0
  6:      10        1        0
  7:      10        1        0
  8:       0        0        1
  9:       0        0        1
 10:       1        0        1
 11:       1        0        1
 12:       9        0        1
 13:       9        0        1
 14:      10        0        1
 15:      10        0        1

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08 15:11   ` Dario Faggioli
@ 2015-07-08 15:38     ` Jan Beulich
  2015-07-08 15:45       ` Boris Ostrovsky
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jan Beulich @ 2015-07-08 15:38 UTC (permalink / raw)
  To: Dario Faggioli, Chao Peng; +Cc: andrew.cooper3, keir, xen-devel

>>> On 08.07.15 at 17:11, <dario.faggioli@citrix.com> wrote:
> On Wed, 2015-07-08 at 13:38 +0100, Jan Beulich wrote:
>> >>> On 08.07.15 at 11:36, <chao.p.peng@linux.intel.com> wrote:
>> > @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
>> >  static void smp_store_cpu_info(int id)
>> >  {
>> >      struct cpuinfo_x86 *c = cpu_data + id;
>> > +    unsigned int socket;
>> >  
>> >      *c = boot_cpu_data;
>> >      if ( id != 0 )
>> > +    {
>> >          identify_cpu(c);
>> >  
>> > +        socket = cpu_to_socket(id);
>> > +        if ( !socket_cpumask[socket] )
>> > +        {
>> > +            socket_cpumask[socket] = secondary_socket_cpumask;
>> > +            secondary_socket_cpumask = NULL;
>> 
>> I don't think this will build with small enough NR_CPUS.
>>
> And it *does* *not* fix the issue on my box.

I.e. bad analysis (albeit it seemed correct to me) _and_ new code
not tested. Chao, one more try, or we'll have to revert the whole
series (closing the window for it for 4.6).

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08 15:38     ` Jan Beulich
@ 2015-07-08 15:45       ` Boris Ostrovsky
  2015-07-08 16:17       ` Dario Faggioli
  2015-07-09  1:47       ` Chao Peng
  2 siblings, 0 replies; 11+ messages in thread
From: Boris Ostrovsky @ 2015-07-08 15:45 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli, Chao Peng; +Cc: andrew.cooper3, keir, xen-devel

On 07/08/2015 11:38 AM, Jan Beulich wrote:
>>>> On 08.07.15 at 17:11, <dario.faggioli@citrix.com> wrote:
>> On Wed, 2015-07-08 at 13:38 +0100, Jan Beulich wrote:
>>>>>> On 08.07.15 at 11:36, <chao.p.peng@linux.intel.com> wrote:
>>>> @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
>>>>   static void smp_store_cpu_info(int id)
>>>>   {
>>>>       struct cpuinfo_x86 *c = cpu_data + id;
>>>> +    unsigned int socket;
>>>>   
>>>>       *c = boot_cpu_data;
>>>>       if ( id != 0 )
>>>> +    {
>>>>           identify_cpu(c);
>>>>   
>>>> +        socket = cpu_to_socket(id);
>>>> +        if ( !socket_cpumask[socket] )
>>>> +        {
>>>> +            socket_cpumask[socket] = secondary_socket_cpumask;
>>>> +            secondary_socket_cpumask = NULL;
>>> I don't think this will build with small enough NR_CPUS.
>>>
>> And it *does* *not* fix the issue on my box.
> I.e. bad analysis (albeit it seemed correct to me) _and_ new code
> not tested. Chao, one more try, or we'll have to revert the whole
> series (closing the window for it for 4.6).

FWIW, this fixes it for me:

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index c73aa1b..a93c723 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -707,6 +707,8 @@ static int cpu_smpboot_alloc(unsigned int cpu)
      unsigned long stub_page;
      unsigned int socket = cpu_to_socket(cpu);

+    socket = cpu >> 4;
+
      if ( node != NUMA_NO_NODE )
          memflags = MEMF_node(node);



But I suspect this is not what you want ;-)

-boris

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08 15:38     ` Jan Beulich
  2015-07-08 15:45       ` Boris Ostrovsky
@ 2015-07-08 16:17       ` Dario Faggioli
  2015-07-08 16:32         ` Jan Beulich
  2015-07-08 18:24         ` Boris Ostrovsky
  2015-07-09  1:47       ` Chao Peng
  2 siblings, 2 replies; 11+ messages in thread
From: Dario Faggioli @ 2015-07-08 16:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Chao Peng, Boris Ostrovsky, xen-devel, keir, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 2107 bytes --]

On Wed, 2015-07-08 at 16:38 +0100, Jan Beulich wrote:
> >>> On 08.07.15 at 17:11, <dario.faggioli@citrix.com> wrote:
> > On Wed, 2015-07-08 at 13:38 +0100, Jan Beulich wrote:
> >> >>> On 08.07.15 at 11:36, <chao.p.peng@linux.intel.com> wrote:
> >> > @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
> >> >  static void smp_store_cpu_info(int id)
> >> >  {
> >> >      struct cpuinfo_x86 *c = cpu_data + id;
> >> > +    unsigned int socket;
> >> >  
> >> >      *c = boot_cpu_data;
> >> >      if ( id != 0 )
> >> > +    {
> >> >          identify_cpu(c);
> >> >  
> >> > +        socket = cpu_to_socket(id);
> >> > +        if ( !socket_cpumask[socket] )
> >> > +        {
> >> > +            socket_cpumask[socket] = secondary_socket_cpumask;
> >> > +            secondary_socket_cpumask = NULL;
> >> 
> >> I don't think this will build with small enough NR_CPUS.
> >>
> > And it *does* *not* fix the issue on my box.
> 
> I.e. bad analysis (albeit it seemed correct to me)
>
Same here, and in fact I triple checked that I had the patch really
applied... and, yes, it is, and it's still crashing, with the same
(reported) dump as the one we find in Osstest's failure, as reported by
Ian.

>  _and_ new code not tested. 
>
Looking another time, both me and Osstest are probably seeing a
different issue, than the one Boris is facing. I don't see Boris' Oops,
so I can't be sure, but in my case, this is happening in
set_cpu_sibling_map(), called from smp_prepare_cpus() on the boot CPU,
not during secondary CPUs bringup.

I think it has to do with the fact that I've got CPU #0 on socket #1,
while Boris' (and perhaps Chao's too) test box have it on socket #0.

I'd be happy to test patches on my box, if that helps (although, I'm
about to leave right now, so that will be tomorrow).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08 16:17       ` Dario Faggioli
@ 2015-07-08 16:32         ` Jan Beulich
  2015-07-09  1:58           ` Chao Peng
  2015-07-08 18:24         ` Boris Ostrovsky
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2015-07-08 16:32 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: andrew.cooper3, Boris Ostrovsky, xen-devel, keir, Chao Peng

>>> On 08.07.15 at 18:17, <dario.faggioli@citrix.com> wrote:
> I think it has to do with the fact that I've got CPU #0 on socket #1,
> while Boris' (and perhaps Chao's too) test box have it on socket #0.

Ah, yes, this is indeed a case I didn't consider when validating
Chao's analysis.

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08 16:17       ` Dario Faggioli
  2015-07-08 16:32         ` Jan Beulich
@ 2015-07-08 18:24         ` Boris Ostrovsky
  1 sibling, 0 replies; 11+ messages in thread
From: Boris Ostrovsky @ 2015-07-08 18:24 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich; +Cc: Chao Peng, xen-devel, keir, andrew.cooper3

On 07/08/2015 12:17 PM, Dario Faggioli wrote:
> On Wed, 2015-07-08 at 16:38 +0100, Jan Beulich wrote:
>>>>> On 08.07.15 at 17:11, <dario.faggioli@citrix.com> wrote:
>>> On Wed, 2015-07-08 at 13:38 +0100, Jan Beulich wrote:
>>>>>>> On 08.07.15 at 11:36, <chao.p.peng@linux.intel.com> wrote:
>>>>> @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
>>>>>   static void smp_store_cpu_info(int id)
>>>>>   {
>>>>>       struct cpuinfo_x86 *c = cpu_data + id;
>>>>> +    unsigned int socket;
>>>>>   
>>>>>       *c = boot_cpu_data;
>>>>>       if ( id != 0 )
>>>>> +    {
>>>>>           identify_cpu(c);
>>>>>   
>>>>> +        socket = cpu_to_socket(id);
>>>>> +        if ( !socket_cpumask[socket] )
>>>>> +        {
>>>>> +            socket_cpumask[socket] = secondary_socket_cpumask;
>>>>> +            secondary_socket_cpumask = NULL;
>>>> I don't think this will build with small enough NR_CPUS.
>>>>
>>> And it *does* *not* fix the issue on my box.
>> I.e. bad analysis (albeit it seemed correct to me)
>>
> Same here, and in fact I triple checked that I had the patch really
> applied... and, yes, it is, and it's still crashing, with the same
> (reported) dump as the one we find in Osstest's failure, as reported by
> Ian.
>
>>   _and_ new code not tested.
>>
> Looking another time, both me and Osstest are probably seeing a
> different issue, than the one Boris is facing. I don't see Boris' Oops,
> so I can't be sure, but in my case, this is happening in
> set_cpu_sibling_map(), called from smp_prepare_cpus() on the boot CPU,
> not during secondary CPUs bringup.

I see it from start_secondary():

...
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    16
(XEN) RIP:    e008:[<ffff82d080189051>] set_cpu_sibling_map+0x65/0x37e
(XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000010
(XEN) rdx: 0000000000000010   rsi: 00000033bc7c8400   rdi: 0000000000000010
(XEN) rbp: ffff83083be0fec0   rsp: ffff83083be0fe60   r8: ffff83083be0fe88
(XEN) r9:  0000000000014000   r10: ffff82cfffdfb0f0   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000009   r14: 0000000000000010
(XEN) r15: 0000000000000010   cr0: 000000008005003b   cr4: 00000000000426e0
(XEN) cr3: 00000000bd897000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff83083be0fe60:
(XEN)    000000103be0fec0 0000000000000046 0000001000000000 0000008400000000
(XEN)    00000000000426e0 000000103caed000 0000000000000009 0000000000000000
(XEN)    0000000000000000 0000000000000009 0000000000000010 0000000000000010
(XEN)    ffff83083be0ff10 ffff82d08018958a 0000000000000000 0000001000000000
(XEN)    0000000000000000 0000000000000001 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000010 ffff8300bdce0000 00000033bc7c8400 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d080189051>] set_cpu_sibling_map+0x65/0x37e
(XEN)    [<ffff82d08018958a>] start_secondary+0x220/0x277
(XEN)
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 000000043ffef063 ffffffffffffffff
(XEN)  L3[0x000] = 000000043ffee063 ffffffffffffffff
(XEN)  L2[0x000] = 000000043ffed063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 16:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0002]
(XEN) Faulting linear address: 0000000000000000
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.





>
> I think it has to do with the fact that I've got CPU #0 on socket #1,
> while Boris' (and perhaps Chao's too) test box have it on socket #0.
>
> I'd be happy to test patches on my box, if that helps (although, I'm
> about to leave right now, so that will be tomorrow).
>
> Regards,
> Dario

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08 15:38     ` Jan Beulich
  2015-07-08 15:45       ` Boris Ostrovsky
  2015-07-08 16:17       ` Dario Faggioli
@ 2015-07-09  1:47       ` Chao Peng
  2 siblings, 0 replies; 11+ messages in thread
From: Chao Peng @ 2015-07-09  1:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: andrew.cooper3, Dario Faggioli, keir, xen-devel

On Wed, Jul 08, 2015 at 04:38:52PM +0100, Jan Beulich wrote:
> >>> On 08.07.15 at 17:11, <dario.faggioli@citrix.com> wrote:
> > On Wed, 2015-07-08 at 13:38 +0100, Jan Beulich wrote:
> >> >>> On 08.07.15 at 11:36, <chao.p.peng@linux.intel.com> wrote:
> >> > @@ -84,11 +85,21 @@ void *stack_base[NR_CPUS];
> >> >  static void smp_store_cpu_info(int id)
> >> >  {
> >> >      struct cpuinfo_x86 *c = cpu_data + id;
> >> > +    unsigned int socket;
> >> >  
> >> >      *c = boot_cpu_data;
> >> >      if ( id != 0 )
> >> > +    {
> >> >          identify_cpu(c);
> >> >  
> >> > +        socket = cpu_to_socket(id);
> >> > +        if ( !socket_cpumask[socket] )
> >> > +        {
> >> > +            socket_cpumask[socket] = secondary_socket_cpumask;
> >> > +            secondary_socket_cpumask = NULL;
> >> 
> >> I don't think this will build with small enough NR_CPUS.
> >>
> > And it *does* *not* fix the issue on my box.
> 
> I.e. bad analysis (albeit it seemed correct to me) _and_ new code
> not tested.

I did test it on my 2-sockets box, perhaps not covered all the cases.

> Chao, one more try, or we'll have to revert the whole
> series (closing the window for it for 4.6).

I honor this chance and certainly will try my best.

Thanks Jan.

Chao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-08 16:32         ` Jan Beulich
@ 2015-07-09  1:58           ` Chao Peng
  2015-07-09  8:39             ` Dario Faggioli
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Peng @ 2015-07-09  1:58 UTC (permalink / raw)
  To: Jan Beulich
  Cc: andrew.cooper3, Dario Faggioli, keir, Boris Ostrovsky, xen-devel

On Wed, Jul 08, 2015 at 05:32:11PM +0100, Jan Beulich wrote:
> >>> On 08.07.15 at 18:17, <dario.faggioli@citrix.com> wrote:
> > I think it has to do with the fact that I've got CPU #0 on socket #1,
> > while Boris' (and perhaps Chao's too) test box have it on socket #0.
> 
> Ah, yes, this is indeed a case I didn't consider when validating
> Chao's analysis.

I apologize, this is what I have made the wrong assumption on
from the beginning. While looks the fix is simple, just change
'zalloc_cpumask_var(socket_cpumask)' to
'zalloc_cpumask_var(socket_cpumask + cpu_to_socket(0))'
for booting cpu in smp_prepare_cpus().

Chao

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] x86: correct socket_cpumask allocation for AP
  2015-07-09  1:58           ` Chao Peng
@ 2015-07-09  8:39             ` Dario Faggioli
  0 siblings, 0 replies; 11+ messages in thread
From: Dario Faggioli @ 2015-07-09  8:39 UTC (permalink / raw)
  To: Chao Peng; +Cc: andrew.cooper3, Boris Ostrovsky, keir, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1426 bytes --]

On Thu, 2015-07-09 at 09:58 +0800, Chao Peng wrote:
> On Wed, Jul 08, 2015 at 05:32:11PM +0100, Jan Beulich wrote:
> > >>> On 08.07.15 at 18:17, <dario.faggioli@citrix.com> wrote:
> > > I think it has to do with the fact that I've got CPU #0 on socket #1,
> > > while Boris' (and perhaps Chao's too) test box have it on socket #0.
> > 
> > Ah, yes, this is indeed a case I didn't consider when validating
> > Chao's analysis.
> 
> I apologize, this is what I have made the wrong assumption on
> from the beginning.
>
Yes, it's quite easy to get caught by this, I've been in similar
situation too (i.e., with code working in completely different ways
depending on whether CPU#0 was on socket 0 or not). :-)

> While looks the fix is simple, just change
> 'zalloc_cpumask_var(socket_cpumask)' to
> 'zalloc_cpumask_var(socket_cpumask + cpu_to_socket(0))'
> for booting cpu in smp_prepare_cpus().
> 
FWIW, I can confirm that doing this (on top of the already submitted
patch) makes the issue go away on my hardware.

I can't verify whether things are working properly, though, as I don't
have any CAT enabled system handy.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-07-09  8:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-08  9:36 [PATCH] x86: correct socket_cpumask allocation for AP Chao Peng
2015-07-08 12:38 ` Jan Beulich
2015-07-08 15:11   ` Dario Faggioli
2015-07-08 15:38     ` Jan Beulich
2015-07-08 15:45       ` Boris Ostrovsky
2015-07-08 16:17       ` Dario Faggioli
2015-07-08 16:32         ` Jan Beulich
2015-07-09  1:58           ` Chao Peng
2015-07-09  8:39             ` Dario Faggioli
2015-07-08 18:24         ` Boris Ostrovsky
2015-07-09  1:47       ` Chao Peng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).