xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix cpu offline bug
@ 2011-03-04 10:22 Liu, Jinsong
  2011-03-04 10:49 ` Keir Fraser
  2011-03-05 11:40 ` Keir Fraser
  0 siblings, 2 replies; 9+ messages in thread
From: Liu, Jinsong @ 2011-03-04 10:22 UTC (permalink / raw)
  To: Keir Fraser, xen-devel@lists.xensource.com
  Cc: Jiang, Yunhong, keir@xen.org, Li, Xin

[-- Attachment #1: Type: text/plain, Size: 1249 bytes --]

Fix cpu offline bug

At current xen, when cpu offline, cpu0 will wait the 1st cpu offline;
However, if offline 2nd, 3rd, ... cpu, cpu0 will not wait it.
This patch is used to fix the bug.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>

diff -r d1631540bcc4 xen/arch/x86/smpboot.c
--- a/xen/arch/x86/smpboot.c	Tue Jan 18 17:23:24 2011 +0000
+++ b/xen/arch/x86/smpboot.c	Sat Feb 12 03:48:09 2011 +0800
@@ -78,7 +78,8 @@ static enum cpu_state {
     CPU_STATE_INIT,     /* master -> slave: Early bringup phase 1 */
     CPU_STATE_CALLOUT,  /* master -> slave: Early bringup phase 2 */
     CPU_STATE_CALLIN,   /* slave -> master: Completed phase 2 */
-    CPU_STATE_ONLINE    /* master -> slave: Go fully online now. */
+    CPU_STATE_ONLINE,   /* master -> slave: Go fully online now. */
+    CPU_STATE_STAY      /* after slave dead, global cpu_state stay here */
 } cpu_state;
 #define set_cpu_state(state) do { mb(); cpu_state = (state); } while (0)
 
@@ -867,6 +868,8 @@ void __cpu_die(unsigned int cpu)
         if ( (++i % 10) == 0 )
             printk(KERN_ERR "CPU %u still not dead...\n", cpu);
     }
+
+    set_cpu_state(CPU_STATE_STAY);
 }
 
 int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm)

[-- Attachment #2: cpuoffline_fix_1.bug --]
[-- Type: application/octet-stream, Size: 1251 bytes --]

Fix cpu offline bug

At current xen, when cpu offline, cpu0 will wait the 1st cpu offline;
However, if offline 2nd, 3rd, ... cpu, cpu0 will not wait it.
This patch is used to fix the bug.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>

diff -r d1631540bcc4 xen/arch/x86/smpboot.c
--- a/xen/arch/x86/smpboot.c	Tue Jan 18 17:23:24 2011 +0000
+++ b/xen/arch/x86/smpboot.c	Sat Feb 12 03:48:09 2011 +0800
@@ -78,7 +78,8 @@ static enum cpu_state {
     CPU_STATE_INIT,     /* master -> slave: Early bringup phase 1 */
     CPU_STATE_CALLOUT,  /* master -> slave: Early bringup phase 2 */
     CPU_STATE_CALLIN,   /* slave -> master: Completed phase 2 */
-    CPU_STATE_ONLINE    /* master -> slave: Go fully online now. */
+    CPU_STATE_ONLINE,   /* master -> slave: Go fully online now. */
+    CPU_STATE_STAY      /* after slave dead, global cpu_state stay here */
 } cpu_state;
 #define set_cpu_state(state) do { mb(); cpu_state = (state); } while (0)
 
@@ -867,6 +868,8 @@ void __cpu_die(unsigned int cpu)
         if ( (++i % 10) == 0 )
             printk(KERN_ERR "CPU %u still not dead...\n", cpu);
     }
+
+    set_cpu_state(CPU_STATE_STAY);
 }
 
 int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm)

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix cpu offline bug
  2011-03-04 10:22 [PATCH] Fix cpu offline bug Liu, Jinsong
@ 2011-03-04 10:49 ` Keir Fraser
  2011-03-08  8:52   ` Jan Beulich
  2011-03-05 11:40 ` Keir Fraser
  1 sibling, 1 reply; 9+ messages in thread
From: Keir Fraser @ 2011-03-04 10:49 UTC (permalink / raw)
  To: Liu, Jinsong, xen-devel@lists.xensource.com; +Cc: Jiang, Yunhong, Li, Xin

Good fix of a nasty bug! I think I will refactor this a bit and do it
slightly differently, but the (simple) principle will of course be the same.

 Thanks,
 Keir

On 04/03/2011 10:22, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:

> Fix cpu offline bug
> 
> At current xen, when cpu offline, cpu0 will wait the 1st cpu offline;
> However, if offline 2nd, 3rd, ... cpu, cpu0 will not wait it.
> This patch is used to fix the bug.
> 
> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
> 
> diff -r d1631540bcc4 xen/arch/x86/smpboot.c
> --- a/xen/arch/x86/smpboot.c Tue Jan 18 17:23:24 2011 +0000
> +++ b/xen/arch/x86/smpboot.c Sat Feb 12 03:48:09 2011 +0800
> @@ -78,7 +78,8 @@ static enum cpu_state {
>      CPU_STATE_INIT,     /* master -> slave: Early bringup phase 1 */
>      CPU_STATE_CALLOUT,  /* master -> slave: Early bringup phase 2 */
>      CPU_STATE_CALLIN,   /* slave -> master: Completed phase 2 */
> -    CPU_STATE_ONLINE    /* master -> slave: Go fully online now. */
> +    CPU_STATE_ONLINE,   /* master -> slave: Go fully online now. */
> +    CPU_STATE_STAY      /* after slave dead, global cpu_state stay here */
>  } cpu_state;
>  #define set_cpu_state(state) do { mb(); cpu_state = (state); } while (0)
>  
> @@ -867,6 +868,8 @@ void __cpu_die(unsigned int cpu)
>          if ( (++i % 10) == 0 )
>              printk(KERN_ERR "CPU %u still not dead...\n", cpu);
>      }
> +
> +    set_cpu_state(CPU_STATE_STAY);
>  }
>  
>  int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix cpu offline bug
  2011-03-04 10:22 [PATCH] Fix cpu offline bug Liu, Jinsong
  2011-03-04 10:49 ` Keir Fraser
@ 2011-03-05 11:40 ` Keir Fraser
  2011-03-05 15:10   ` Liu, Jinsong
  1 sibling, 1 reply; 9+ messages in thread
From: Keir Fraser @ 2011-03-05 11:40 UTC (permalink / raw)
  To: Liu, Jinsong, xen-devel@lists.xensource.com; +Cc: Jiang, Yunhong, Li, Xin

On 04/03/2011 10:22, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:

> Fix cpu offline bug
> 
> At current xen, when cpu offline, cpu0 will wait the 1st cpu offline;
> However, if offline 2nd, 3rd, ... cpu, cpu0 will not wait it.
> This patch is used to fix the bug.

Alternative fix applied tio unstable and 4.1-testing. The fix below doesn't
account for the fact that cpu_state can also == CPU_STATE_DEAD after an
(unsuccessful) CPU online operation.

 -- Keir

> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
> 
> diff -r d1631540bcc4 xen/arch/x86/smpboot.c
> --- a/xen/arch/x86/smpboot.c Tue Jan 18 17:23:24 2011 +0000
> +++ b/xen/arch/x86/smpboot.c Sat Feb 12 03:48:09 2011 +0800
> @@ -78,7 +78,8 @@ static enum cpu_state {
>      CPU_STATE_INIT,     /* master -> slave: Early bringup phase 1 */
>      CPU_STATE_CALLOUT,  /* master -> slave: Early bringup phase 2 */
>      CPU_STATE_CALLIN,   /* slave -> master: Completed phase 2 */
> -    CPU_STATE_ONLINE    /* master -> slave: Go fully online now. */
> +    CPU_STATE_ONLINE,   /* master -> slave: Go fully online now. */
> +    CPU_STATE_STAY      /* after slave dead, global cpu_state stay here */
>  } cpu_state;
>  #define set_cpu_state(state) do { mb(); cpu_state = (state); } while (0)
>  
> @@ -867,6 +868,8 @@ void __cpu_die(unsigned int cpu)
>          if ( (++i % 10) == 0 )
>              printk(KERN_ERR "CPU %u still not dead...\n", cpu);
>      }
> +
> +    set_cpu_state(CPU_STATE_STAY);
>  }
>  
>  int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] Fix cpu offline bug
  2011-03-05 11:40 ` Keir Fraser
@ 2011-03-05 15:10   ` Liu, Jinsong
  2011-03-05 15:57     ` Keir Fraser
  0 siblings, 1 reply; 9+ messages in thread
From: Liu, Jinsong @ 2011-03-05 15:10 UTC (permalink / raw)
  To: Keir Fraser, xen-devel@lists.xensource.com; +Cc: Jiang, Yunhong, Li, Xin

Keir,

Thanks for comments and update!
However, I think we still need update it a little :)

=============================
Fix cpu offline bug

Remove BUG_ON since it's of some risk, considering the small time window:
cpu0 read (cpu_state) --> offlining cpu write (cpu_state) --> cpu0 read (cpu_state)

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>

diff -r 1ecb840bea17 xen/arch/x86/smpboot.c
--- a/xen/arch/x86/smpboot.c    Sat Mar 05 23:30:29 2022 +0800
+++ b/xen/arch/x86/smpboot.c    Sun Mar 06 01:21:34 2022 +0800
@@ -864,7 +864,6 @@ void __cpu_die(unsigned int cpu)

     while ( cpu_state != CPU_STATE_DEAD )
     {
-        BUG_ON(cpu_state != CPU_STATE_DYING);
         mdelay(100);
         cpu_relax();
         process_pending_softirqs();
=============================

Thanks,
Jinsong



Keir Fraser wrote:
> On 04/03/2011 10:22, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> 
>> Fix cpu offline bug
>> 
>> At current xen, when cpu offline, cpu0 will wait the 1st cpu offline;
>> However, if offline 2nd, 3rd, ... cpu, cpu0 will not wait it.
>> This patch is used to fix the bug.
> 
> Alternative fix applied tio unstable and 4.1-testing. The fix below
> doesn't account for the fact that cpu_state can also ==
> CPU_STATE_DEAD after an (unsuccessful) CPU online operation.
> 
>  -- Keir
> 
>> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
>> 
>> diff -r d1631540bcc4 xen/arch/x86/smpboot.c
>> --- a/xen/arch/x86/smpboot.c Tue Jan 18 17:23:24 2011 +0000
>> +++ b/xen/arch/x86/smpboot.c Sat Feb 12 03:48:09 2011 +0800
>> @@ -78,7 +78,8 @@ static enum cpu_state {
>>      CPU_STATE_INIT,     /* master -> slave: Early bringup phase 1 */
>>      CPU_STATE_CALLOUT,  /* master -> slave: Early bringup phase 2 */
>>      CPU_STATE_CALLIN,   /* slave -> master: Completed phase 2 */
>> -    CPU_STATE_ONLINE    /* master -> slave: Go fully online now. */
>> +    CPU_STATE_ONLINE,   /* master -> slave: Go fully online now. */
>> +    CPU_STATE_STAY      /* after slave dead, global cpu_state stay
>>  here */  } cpu_state; #define set_cpu_state(state) do { mb();
>> cpu_state = (state); } while (0) 
>> 
>> @@ -867,6 +868,8 @@ void __cpu_die(unsigned int cpu)
>>          if ( (++i % 10) == 0 )
>>              printk(KERN_ERR "CPU %u still not dead...\n", cpu);    
>> } +
>> +    set_cpu_state(CPU_STATE_STAY);
>>  }
>> 
>>  int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix cpu offline bug
  2011-03-05 15:10   ` Liu, Jinsong
@ 2011-03-05 15:57     ` Keir Fraser
  0 siblings, 0 replies; 9+ messages in thread
From: Keir Fraser @ 2011-03-05 15:57 UTC (permalink / raw)
  To: Liu, Jinsong, xen-devel@lists.xensource.com; +Cc: Jiang, Yunhong, Li, Xin

Urk, good point!

 Thanks,
 Keir


On 05/03/2011 15:10, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:

> Keir,
> 
> Thanks for comments and update!
> However, I think we still need update it a little :)
> 
> =============================
> Fix cpu offline bug
> 
> Remove BUG_ON since it's of some risk, considering the small time window:
> cpu0 read (cpu_state) --> offlining cpu write (cpu_state) --> cpu0 read
> (cpu_state)
> 
> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
> 
> diff -r 1ecb840bea17 xen/arch/x86/smpboot.c
> --- a/xen/arch/x86/smpboot.c    Sat Mar 05 23:30:29 2022 +0800
> +++ b/xen/arch/x86/smpboot.c    Sun Mar 06 01:21:34 2022 +0800
> @@ -864,7 +864,6 @@ void __cpu_die(unsigned int cpu)
> 
>      while ( cpu_state != CPU_STATE_DEAD )
>      {
> -        BUG_ON(cpu_state != CPU_STATE_DYING);
>          mdelay(100);
>          cpu_relax();
>          process_pending_softirqs();
> =============================
> 
> Thanks,
> Jinsong
> 
> 
> 
> Keir Fraser wrote:
>> On 04/03/2011 10:22, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> 
>>> Fix cpu offline bug
>>> 
>>> At current xen, when cpu offline, cpu0 will wait the 1st cpu offline;
>>> However, if offline 2nd, 3rd, ... cpu, cpu0 will not wait it.
>>> This patch is used to fix the bug.
>> 
>> Alternative fix applied tio unstable and 4.1-testing. The fix below
>> doesn't account for the fact that cpu_state can also ==
>> CPU_STATE_DEAD after an (unsuccessful) CPU online operation.
>> 
>>  -- Keir
>> 
>>> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
>>> 
>>> diff -r d1631540bcc4 xen/arch/x86/smpboot.c
>>> --- a/xen/arch/x86/smpboot.c Tue Jan 18 17:23:24 2011 +0000
>>> +++ b/xen/arch/x86/smpboot.c Sat Feb 12 03:48:09 2011 +0800
>>> @@ -78,7 +78,8 @@ static enum cpu_state {
>>>      CPU_STATE_INIT,     /* master -> slave: Early bringup phase 1 */
>>>      CPU_STATE_CALLOUT,  /* master -> slave: Early bringup phase 2 */
>>>      CPU_STATE_CALLIN,   /* slave -> master: Completed phase 2 */
>>> -    CPU_STATE_ONLINE    /* master -> slave: Go fully online now. */
>>> +    CPU_STATE_ONLINE,   /* master -> slave: Go fully online now. */
>>> +    CPU_STATE_STAY      /* after slave dead, global cpu_state stay
>>>  here */  } cpu_state; #define set_cpu_state(state) do { mb();
>>> cpu_state = (state); } while (0)
>>> 
>>> @@ -867,6 +868,8 @@ void __cpu_die(unsigned int cpu)
>>>          if ( (++i % 10) == 0 )
>>>              printk(KERN_ERR "CPU %u still not dead...\n", cpu);
>>> } +
>>> +    set_cpu_state(CPU_STATE_STAY);
>>>  }
>>> 
>>>  int cpu_add(uint32_t apic_id, uint32_t acpi_id, uint32_t pxm)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix cpu offline bug
  2011-03-04 10:49 ` Keir Fraser
@ 2011-03-08  8:52   ` Jan Beulich
  2011-03-08  9:47     ` Liu, Jinsong
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2011-03-08  8:52 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Jinsong Liu, xen-devel@lists.xensource.com, Yunhong Jiang, Xin Li

>>> On 05.03.11, Keir Fraser <keir.xen@gmail.com> wrote:
>Urk, good point!

But your fix still allows what the comment says it is supposed to avoid
- simply introducing a variable doesn't help afaict, you also need to
insert a barrier() to avoid a second read (or "cpu_state" would need
to be volatile).

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Re: [PATCH] Fix cpu offline bug
  2011-03-08  8:52   ` Jan Beulich
@ 2011-03-08  9:47     ` Liu, Jinsong
  2011-03-08 10:14       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Liu, Jinsong @ 2011-03-08  9:47 UTC (permalink / raw)
  To: Jan Beulich, Keir Fraser
  Cc: xen-devel@lists.xensource.com, Li, Xin, Jiang, Yunhong

Jan, I'm not quite clear your meaning.
Why and where need to insert barrier, or volatile cpu_state?

Thanks,
Jinsong

Jan Beulich wrote:
>>>> On 05.03.11, Keir Fraser <keir.xen@gmail.com> wrote: Urk, good
>>>> point! 
> 
> But your fix still allows what the comment says it is supposed to
> avoid - simply introducing a variable doesn't help afaict, you also
> need to insert a barrier() to avoid a second read (or "cpu_state"
> would need to be volatile).
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Re: [PATCH] Fix cpu offline bug
  2011-03-08  9:47     ` Liu, Jinsong
@ 2011-03-08 10:14       ` Jan Beulich
  2011-03-08 15:36         ` Keir Fraser
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2011-03-08 10:14 UTC (permalink / raw)
  To: Jinsong Liu
  Cc: xen-devel@lists.xensource.com, Keir Fraser, Xin Li, Yunhong Jiang

>>> On 08.03.11 at 10:47, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
> Jan, I'm not quite clear your meaning.
> Why and where need to insert barrier, or volatile cpu_state?
 
     while ( (seen_state = cpu_state) != CPU_STATE_DEAD )
     {
+        barrier();
         BUG_ON(seen_state != CPU_STATE_DYING);
         mdelay(100);
         cpu_relax();

Without this, the compiler is free to eliminate "seen_state" in favor
of reading "cpu_state" twice (irrespective of the optimizer very
likely trying to do exactly the opposite).

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH] Fix cpu offline bug
  2011-03-08 10:14       ` Jan Beulich
@ 2011-03-08 15:36         ` Keir Fraser
  0 siblings, 0 replies; 9+ messages in thread
From: Keir Fraser @ 2011-03-08 15:36 UTC (permalink / raw)
  To: Jan Beulich, Jinsong Liu
  Cc: xen-devel@lists.xensource.com, Yunhong Jiang, Xin Li

On 08/03/2011 10:14, "Jan Beulich" <JBeulich@novell.com> wrote:

>>>> On 08.03.11 at 10:47, "Liu, Jinsong" <jinsong.liu@intel.com> wrote:
>> Jan, I'm not quite clear your meaning.
>> Why and where need to insert barrier, or volatile cpu_state?
>  
>      while ( (seen_state = cpu_state) != CPU_STATE_DEAD )
>      {
> +        barrier();
>          BUG_ON(seen_state != CPU_STATE_DYING);
>          mdelay(100);
>          cpu_relax();
> 
> Without this, the compiler is free to eliminate "seen_state" in favor
> of reading "cpu_state" twice (irrespective of the optimizer very
> likely trying to do exactly the opposite).

Haha, you're getting paranoid Jan! What you describe is impossible -- there
is no way that C semantics allow a local variable's value to change after it
has affected program behaviour (if it hasn't escaped local scope by having
its address taken for example). If that could happen then we'd have plenty
of other code that doesn't work properly either, I'm sure.

In this case, if a second read from cpu_state did occur then we could end up
observing seen_state==CPU_STATE_DEAD, in a code block that is predicated on
this not being true!

 -- Keir

> Jan
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-03-08 15:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-04 10:22 [PATCH] Fix cpu offline bug Liu, Jinsong
2011-03-04 10:49 ` Keir Fraser
2011-03-08  8:52   ` Jan Beulich
2011-03-08  9:47     ` Liu, Jinsong
2011-03-08 10:14       ` Jan Beulich
2011-03-08 15:36         ` Keir Fraser
2011-03-05 11:40 ` Keir Fraser
2011-03-05 15:10   ` Liu, Jinsong
2011-03-05 15:57     ` Keir Fraser

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).