[PATCH 0/4] Fixes to common and x86 barriers

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] Fixes to common and x86 barriers
@ 2016-12-05 10:05 Andrew Cooper
  2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

Andrew Cooper (4):
  xen/common: Replace incorrect mandatory barriers with SMP barriers
  xen/x86: Drop erronious barriers
  xen/x86: Replace incorrect mandatory barriers with SMP barriers
  xen/x86: Correct mandatory and SMP barrier definitions

 xen/arch/x86/acpi/cpu_idle.c             | 10 ++++------
 xen/arch/x86/cpu/mcheck/amd_nonfatal.c   |  4 ++--
 xen/arch/x86/cpu/mcheck/barrier.c        | 10 +++++-----
 xen/arch/x86/cpu/mcheck/mce.c            |  3 +--
 xen/arch/x86/cpu/mcheck/mctelem.c        | 10 +++++-----
 xen/arch/x86/crash.c                     |  3 ---
 xen/arch/x86/genapic/x2apic.c            |  6 +++---
 xen/arch/x86/hpet.c                      |  6 +++---
 xen/arch/x86/hvm/ioreq.c                 |  4 ++--
 xen/arch/x86/io_apic.c                   |  4 ++--
 xen/arch/x86/irq.c                       |  4 ++--
 xen/arch/x86/mm.c                        | 10 +++++-----
 xen/arch/x86/mm/shadow/multi.c           |  2 +-
 xen/arch/x86/smpboot.c                   | 14 ++++++--------
 xen/arch/x86/time.c                      |  8 ++++----
 xen/common/grant_table.c                 |  2 +-
 xen/common/time.c                        |  4 ++--
 xen/common/vm_event.c                    |  6 +++---
 xen/drivers/passthrough/amd/iommu_init.c |  4 ++--
 xen/include/asm-x86/desc.h               |  8 ++++----
 xen/include/asm-x86/system.h             | 33 +++++++++++++++++++++++---------
 xen/include/asm-x86/x86_64/system.h      |  3 ---
 22 files changed, 81 insertions(+), 77 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper
@ 2016-12-05 10:05 ` Andrew Cooper
  2016-12-05 10:55   ` Jan Beulich
  2016-12-05 19:07   ` Stefano Stabellini
  2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Julien Grall, Stefano Stabellini, Jan Beulich

Mandatory barriers are only for use with reduced-cacheability MMIO mappings.

All of these uses are just to deal with shared memory between multiple
processors, so use the smp_*() which are the correct barriers for the purpose.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>

Restricting to just the $ARCH maintainers, as this is a project-wide sweep
---
 xen/common/grant_table.c | 2 +-
 xen/common/time.c        | 4 ++--
 xen/common/vm_event.c    | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index e2c4097..a425a9e 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -993,7 +993,7 @@ __gnttab_map_grant_ref(
     mt = &maptrack_entry(lgt, handle);
     mt->domid = op->dom;
     mt->ref   = op->ref;
-    wmb();
+    smp_wmb();
     write_atomic(&mt->flags, op->flags);
 
     if ( need_iommu )
diff --git a/xen/common/time.c b/xen/common/time.c
index 721ada8..a7caea9 100644
--- a/xen/common/time.c
+++ b/xen/common/time.c
@@ -103,7 +103,7 @@ void update_domain_wallclock_time(struct domain *d)
 
     wc_version = &shared_info(d, wc_version);
     *wc_version = version_update_begin(*wc_version);
-    wmb();
+    smp_wmb();
 
     sec = wc_sec + d->time_offset_seconds;
     shared_info(d, wc_sec)    = sec;
@@ -117,7 +117,7 @@ void update_domain_wallclock_time(struct domain *d)
     shared_info(d, wc_sec_hi) = sec >> 32;
 #endif
 
-    wmb();
+    smp_wmb();
     *wc_version = version_update_end(*wc_version);
 
     spin_unlock(&wc_lock);
diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
index 907ab40..c6f7d32 100644
--- a/xen/common/vm_event.c
+++ b/xen/common/vm_event.c
@@ -31,9 +31,9 @@
 #include <xsm/xsm.h>
 
 /* for public/io/ring.h macros */
-#define xen_mb()   mb()
-#define xen_rmb()  rmb()
-#define xen_wmb()  wmb()
+#define xen_mb()   smp_mb()
+#define xen_rmb()  smp_rmb()
+#define xen_wmb()  smp_wmb()
 
 #define vm_event_ring_lock_init(_ved)  spin_lock_init(&(_ved)->ring_lock)
 #define vm_event_ring_lock(_ved)       spin_lock(&(_ved)->ring_lock)
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
@ 2016-12-05 10:55   ` Jan Beulich
  2016-12-05 19:07   ` Stefano Stabellini
  1 sibling, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 10:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel

>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
> Mandatory barriers are only for use with reduced-cacheability MMIO mappings.
> 
> All of these uses are just to deal with shared memory between multiple
> processors, so use the smp_*() which are the correct barriers for the 
> purpose.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
  2016-12-05 10:55   ` Jan Beulich
@ 2016-12-05 19:07   ` Stefano Stabellini
  1 sibling, 0 replies; 36+ messages in thread
From: Stefano Stabellini @ 2016-12-05 19:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Julien Grall, Stefano Stabellini, Jan Beulich, Xen-devel

On Mon, 5 Dec 2016, Andrew Cooper wrote:
> Mandatory barriers are only for use with reduced-cacheability MMIO mappings.
> 
> All of these uses are just to deal with shared memory between multiple
> processors, so use the smp_*() which are the correct barriers for the purpose.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> CC: Jan Beulich <JBeulich@suse.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> 
> Restricting to just the $ARCH maintainers, as this is a project-wide sweep
> ---
>  xen/common/grant_table.c | 2 +-
>  xen/common/time.c        | 4 ++--
>  xen/common/vm_event.c    | 6 +++---
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index e2c4097..a425a9e 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -993,7 +993,7 @@ __gnttab_map_grant_ref(
>      mt = &maptrack_entry(lgt, handle);
>      mt->domid = op->dom;
>      mt->ref   = op->ref;
> -    wmb();
> +    smp_wmb();
>      write_atomic(&mt->flags, op->flags);
>  
>      if ( need_iommu )
> diff --git a/xen/common/time.c b/xen/common/time.c
> index 721ada8..a7caea9 100644
> --- a/xen/common/time.c
> +++ b/xen/common/time.c
> @@ -103,7 +103,7 @@ void update_domain_wallclock_time(struct domain *d)
>  
>      wc_version = &shared_info(d, wc_version);
>      *wc_version = version_update_begin(*wc_version);
> -    wmb();
> +    smp_wmb();
>  
>      sec = wc_sec + d->time_offset_seconds;
>      shared_info(d, wc_sec)    = sec;
> @@ -117,7 +117,7 @@ void update_domain_wallclock_time(struct domain *d)
>      shared_info(d, wc_sec_hi) = sec >> 32;
>  #endif
>  
> -    wmb();
> +    smp_wmb();
>      *wc_version = version_update_end(*wc_version);
>  
>      spin_unlock(&wc_lock);
> diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
> index 907ab40..c6f7d32 100644
> --- a/xen/common/vm_event.c
> +++ b/xen/common/vm_event.c
> @@ -31,9 +31,9 @@
>  #include <xsm/xsm.h>
>  
>  /* for public/io/ring.h macros */
> -#define xen_mb()   mb()
> -#define xen_rmb()  rmb()
> -#define xen_wmb()  wmb()
> +#define xen_mb()   smp_mb()
> +#define xen_rmb()  smp_rmb()
> +#define xen_wmb()  smp_wmb()
>  
>  #define vm_event_ring_lock_init(_ved)  spin_lock_init(&(_ved)->ring_lock)
>  #define vm_event_ring_lock(_ved)       spin_lock(&(_ved)->ring_lock)
> -- 
> 2.1.4
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper
  2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
@ 2016-12-05 10:05 ` Andrew Cooper
  2016-12-05 11:18   ` Jan Beulich
  2016-12-05 19:17   ` Stefano Stabellini
  2016-12-05 10:05 ` [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
  2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
  3 siblings, 2 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Julien Grall, Stefano Stabellini, Jan Beulich

None of these barriers serve any purpose, as they are not synchronising with
any anything on remote CPUs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>

Restricting to just the $ARCH maintainers, as this is a project-wide sweep.

Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
from x86, but I don't know whether further development has gained a dependence
on them.
---
 xen/arch/x86/acpi/cpu_idle.c  | 2 --
 xen/arch/x86/cpu/mcheck/mce.c | 1 -
 xen/arch/x86/crash.c          | 3 ---
 xen/arch/x86/smpboot.c        | 2 --
 4 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index f36b184..09c8848 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -1331,8 +1331,6 @@ void cpuidle_disable_deep_cstate(void)
             max_cstate = 1;
     }
 
-    mb();
-
     hpet_disable_legacy_broadcast();
 }
 
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 2695b0c..460e336 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -86,7 +86,6 @@ static x86_mce_vector_t _machine_check_vector = unexpected_machine_check;
 void x86_mce_vector_register(x86_mce_vector_t hdlr)
 {
     _machine_check_vector = hdlr;
-    wmb();
 }
 
 /* Call the installed machine check handler for this CPU setup. */
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index f28f527..4cadb5e 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void)
     write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])),
                  (unsigned long)&do_nmi_crash);
 
-    /* Ensure the new callback function is set before sending out the NMI. */
-    wmb();
-
     smp_send_nmi_allbutself();
 
     msecs = 1000; /* Wait at most a second for the other cpus to stop */
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 3a9dd3e..0aa377a 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -346,7 +346,6 @@ void start_secondary(void *unused)
     spin_debug_enable();
     set_cpu_sibling_map(cpu);
     notify_cpu_starting(cpu);
-    wmb();
 
     /*
      * We need to hold vector_lock so there the set of online cpus
@@ -364,7 +363,6 @@ void start_secondary(void *unused)
 
     microcode_resume_cpu(cpu);
 
-    wmb();
     startup_cpu_idle_loop();
 }
 
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper
@ 2016-12-05 11:18   ` Jan Beulich
  2016-12-05 11:25     ` Andrew Cooper
  2016-12-05 19:17   ` Stefano Stabellini
  1 sibling, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 11:18 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel

>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:

As to the title - don't you rather mean "pointless" or some such? It's
not really an error to have them where they are.

> --- a/xen/arch/x86/crash.c
> +++ b/xen/arch/x86/crash.c
> @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void)
>      write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])),
>                   (unsigned long)&do_nmi_crash);
>  
> -    /* Ensure the new callback function is set before sending out the NMI. */
> -    wmb();
> -
>      smp_send_nmi_allbutself();

I don't think I agree with this change - we certainly want to make
sure the APIC write happens only after after the exception vector
adjustment became visible, namely when in x2APIC mode (where
the respective WRMSRs are not serializing).

> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
>      spin_debug_enable();
>      set_cpu_sibling_map(cpu);
>      notify_cpu_starting(cpu);
> -    wmb();
>  
>      /*
>       * We need to hold vector_lock so there the set of online cpus

Hmm, this one is indeed redundant with the lock_vector_lock()
following right below, but if that lock wasn't there, I think it
would be needed to order set_cpu_sibling_map() and the
setting of the bit in the online map. So I think it would better
stay (but be relaxed to smb_wmb()).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 11:18   ` Jan Beulich
@ 2016-12-05 11:25     ` Andrew Cooper
  2016-12-05 12:28       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 11:25 UTC (permalink / raw)
  To: Jan Beulich; +Cc: JulienGrall, Stefano Stabellini, Xen-devel

On 05/12/16 11:18, Jan Beulich wrote:
>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
> As to the title - don't you rather mean "pointless" or some such? It's
> not really an error to have them where they are.

True.  I will update.

>
>> --- a/xen/arch/x86/crash.c
>> +++ b/xen/arch/x86/crash.c
>> @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void)
>>      write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])),
>>                   (unsigned long)&do_nmi_crash);
>>  
>> -    /* Ensure the new callback function is set before sending out the NMI. */
>> -    wmb();
>> -
>>      smp_send_nmi_allbutself();
> I don't think I agree with this change - we certainly want to make
> sure the APIC write happens only after after the exception vector
> adjustment became visible, namely when in x2APIC mode (where
> the respective WRMSRs are not serializing).

This wmb() is already only a barrier() (Fixed in the final patch)

Even if it weren't, wrmsr doesn't interact with sfence, so the barrier
would still be pointless.

>
>> --- a/xen/arch/x86/smpboot.c
>> +++ b/xen/arch/x86/smpboot.c
>> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
>>      spin_debug_enable();
>>      set_cpu_sibling_map(cpu);
>>      notify_cpu_starting(cpu);
>> -    wmb();
>>  
>>      /*
>>       * We need to hold vector_lock so there the set of online cpus
> Hmm, this one is indeed redundant with the lock_vector_lock()
> following right below, but if that lock wasn't there, I think it
> would be needed to order set_cpu_sibling_map() and the
> setting of the bit in the online map. So I think it would better
> stay (but be relaxed to smb_wmb()).

Why?  It doesn't relate to an smp_rmb() on any other CPU, and is
therefore wrong to use.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 11:25     ` Andrew Cooper
@ 2016-12-05 12:28       ` Jan Beulich
  2016-12-05 13:43         ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 12:28 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel

>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote:
> On 05/12/16 11:18, Jan Beulich wrote:
>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/crash.c
>>> +++ b/xen/arch/x86/crash.c
>>> @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void)
>>>      write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])),
>>>                   (unsigned long)&do_nmi_crash);
>>>  
>>> -    /* Ensure the new callback function is set before sending out the NMI. */
>>> -    wmb();
>>> -
>>>      smp_send_nmi_allbutself();
>> I don't think I agree with this change - we certainly want to make
>> sure the APIC write happens only after after the exception vector
>> adjustment became visible, namely when in x2APIC mode (where
>> the respective WRMSRs are not serializing).
> 
> This wmb() is already only a barrier() (Fixed in the final patch)

Good point.

> Even if it weren't, wrmsr doesn't interact with sfence, so the barrier
> would still be pointless.

Are you sure there's absolutely nothing in replacement for the lack
of serialization?

That said, we're still having enough of a barrier left anyway, due to
the ones in send_IPI_mask_x2apic_{phys,cluster}(), and due to the
LAPIC mapping being UC. So yes, I'm fine with dropping it here then.

>>> --- a/xen/arch/x86/smpboot.c
>>> +++ b/xen/arch/x86/smpboot.c
>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
>>>      spin_debug_enable();
>>>      set_cpu_sibling_map(cpu);
>>>      notify_cpu_starting(cpu);
>>> -    wmb();
>>>  
>>>      /*
>>>       * We need to hold vector_lock so there the set of online cpus
>> Hmm, this one is indeed redundant with the lock_vector_lock()
>> following right below, but if that lock wasn't there, I think it
>> would be needed to order set_cpu_sibling_map() and the
>> setting of the bit in the online map. So I think it would better
>> stay (but be relaxed to smb_wmb()).
> 
> Why?  It doesn't relate to an smp_rmb() on any other CPU, and is
> therefore wrong to use.

I think it does, just not with one that's spelled out as smp_rmb().
Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as
a de-facto equivalent of smp_rmb().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 12:28       ` Jan Beulich
@ 2016-12-05 13:43         ` Andrew Cooper
  2016-12-05 13:50           ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 13:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: JulienGrall, Stefano Stabellini, Xen-devel

On 05/12/16 12:28, Jan Beulich wrote:
>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote:
>> On 05/12/16 11:18, Jan Beulich wrote:
>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/arch/x86/crash.c
>>>> +++ b/xen/arch/x86/crash.c
>>>> @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void)
>>>>      write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])),
>>>>                   (unsigned long)&do_nmi_crash);
>>>>  
>>>> -    /* Ensure the new callback function is set before sending out the NMI. */
>>>> -    wmb();
>>>> -
>>>>      smp_send_nmi_allbutself();
>>> I don't think I agree with this change - we certainly want to make
>>> sure the APIC write happens only after after the exception vector
>>> adjustment became visible, namely when in x2APIC mode (where
>>> the respective WRMSRs are not serializing).
>> This wmb() is already only a barrier() (Fixed in the final patch)
> Good point.
>
>> Even if it weren't, wrmsr doesn't interact with sfence, so the barrier
>> would still be pointless.
> Are you sure there's absolutely nothing in replacement for the lack
> of serialization?
>
> That said, we're still having enough of a barrier left anyway, due to
> the ones in send_IPI_mask_x2apic_{phys,cluster}(), and due to the
> LAPIC mapping being UC. So yes, I'm fine with dropping it here then.
>
>>>> --- a/xen/arch/x86/smpboot.c
>>>> +++ b/xen/arch/x86/smpboot.c
>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
>>>>      spin_debug_enable();
>>>>      set_cpu_sibling_map(cpu);
>>>>      notify_cpu_starting(cpu);
>>>> -    wmb();
>>>>  
>>>>      /*
>>>>       * We need to hold vector_lock so there the set of online cpus
>>> Hmm, this one is indeed redundant with the lock_vector_lock()
>>> following right below, but if that lock wasn't there, I think it
>>> would be needed to order set_cpu_sibling_map() and the
>>> setting of the bit in the online map. So I think it would better
>>> stay (but be relaxed to smb_wmb()).
>> Why?  It doesn't relate to an smp_rmb() on any other CPU, and is
>> therefore wrong to use.
> I think it does, just not with one that's spelled out as smp_rmb().
> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as
> a de-facto equivalent of smp_rmb().

__cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map)
between the two context hunks.

I still don't see any purpose or need for there to be any barriers here
at all, not even compiler barriers.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 13:43         ` Andrew Cooper
@ 2016-12-05 13:50           ` Jan Beulich
  2016-12-05 13:59             ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 13:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel

>>> On 05.12.16 at 14:43, <andrew.cooper3@citrix.com> wrote:
> On 05/12/16 12:28, Jan Beulich wrote:
>>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote:
>>> On 05/12/16 11:18, Jan Beulich wrote:
>>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>>>>> --- a/xen/arch/x86/smpboot.c
>>>>> +++ b/xen/arch/x86/smpboot.c
>>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
>>>>>      spin_debug_enable();
>>>>>      set_cpu_sibling_map(cpu);
>>>>>      notify_cpu_starting(cpu);
>>>>> -    wmb();
>>>>>  
>>>>>      /*
>>>>>       * We need to hold vector_lock so there the set of online cpus
>>>> Hmm, this one is indeed redundant with the lock_vector_lock()
>>>> following right below, but if that lock wasn't there, I think it
>>>> would be needed to order set_cpu_sibling_map() and the
>>>> setting of the bit in the online map. So I think it would better
>>>> stay (but be relaxed to smb_wmb()).
>>> Why?  It doesn't relate to an smp_rmb() on any other CPU, and is
>>> therefore wrong to use.
>> I think it does, just not with one that's spelled out as smp_rmb().
>> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as
>> a de-facto equivalent of smp_rmb().
> 
> __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map)
> between the two context hunks.

Exactly - so here we need the write side to that, unless (like
suggested elsewhere) we mean to make use of the barrier
implied by the LOCKed instruction which cpumask_set_cpu()
resolves to. Such "amortization", however, would better be
recorded in a comment, so that it becomes less likely for a
future change to eliminate a required barrier altogether.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 13:50           ` Jan Beulich
@ 2016-12-05 13:59             ` Andrew Cooper
  2016-12-05 14:07               ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 13:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: JulienGrall, Stefano Stabellini, Xen-devel

On 05/12/16 13:50, Jan Beulich wrote:
>>>> On 05.12.16 at 14:43, <andrew.cooper3@citrix.com> wrote:
>> On 05/12/16 12:28, Jan Beulich wrote:
>>>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote:
>>>> On 05/12/16 11:18, Jan Beulich wrote:
>>>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>>>>>> --- a/xen/arch/x86/smpboot.c
>>>>>> +++ b/xen/arch/x86/smpboot.c
>>>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
>>>>>>      spin_debug_enable();
>>>>>>      set_cpu_sibling_map(cpu);
>>>>>>      notify_cpu_starting(cpu);
>>>>>> -    wmb();
>>>>>>  
>>>>>>      /*
>>>>>>       * We need to hold vector_lock so there the set of online cpus
>>>>> Hmm, this one is indeed redundant with the lock_vector_lock()
>>>>> following right below, but if that lock wasn't there, I think it
>>>>> would be needed to order set_cpu_sibling_map() and the
>>>>> setting of the bit in the online map. So I think it would better
>>>>> stay (but be relaxed to smb_wmb()).
>>>> Why?  It doesn't relate to an smp_rmb() on any other CPU, and is
>>>> therefore wrong to use.
>>> I think it does, just not with one that's spelled out as smp_rmb().
>>> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as
>>> a de-facto equivalent of smp_rmb().
>> __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map)
>> between the two context hunks.
> Exactly - so here we need the write side to that

No, we don't.

cpumask_set_cpu(cpu, &cpu_online_map) is a write operation, so orders
properly on x86.  C's ordering properties ensure that the adjacent
function calls happen in program order.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 13:59             ` Andrew Cooper
@ 2016-12-05 14:07               ` Jan Beulich
  2016-12-05 19:14                 ` Stefano Stabellini
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 14:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel

>>> On 05.12.16 at 14:59, <andrew.cooper3@citrix.com> wrote:
> On 05/12/16 13:50, Jan Beulich wrote:
>>>>> On 05.12.16 at 14:43, <andrew.cooper3@citrix.com> wrote:
>>> On 05/12/16 12:28, Jan Beulich wrote:
>>>>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote:
>>>>> On 05/12/16 11:18, Jan Beulich wrote:
>>>>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>>>>>>> --- a/xen/arch/x86/smpboot.c
>>>>>>> +++ b/xen/arch/x86/smpboot.c
>>>>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
>>>>>>>      spin_debug_enable();
>>>>>>>      set_cpu_sibling_map(cpu);
>>>>>>>      notify_cpu_starting(cpu);
>>>>>>> -    wmb();
>>>>>>>  
>>>>>>>      /*
>>>>>>>       * We need to hold vector_lock so there the set of online cpus
>>>>>> Hmm, this one is indeed redundant with the lock_vector_lock()
>>>>>> following right below, but if that lock wasn't there, I think it
>>>>>> would be needed to order set_cpu_sibling_map() and the
>>>>>> setting of the bit in the online map. So I think it would better
>>>>>> stay (but be relaxed to smb_wmb()).
>>>>> Why?  It doesn't relate to an smp_rmb() on any other CPU, and is
>>>>> therefore wrong to use.
>>>> I think it does, just not with one that's spelled out as smp_rmb().
>>>> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as
>>>> a de-facto equivalent of smp_rmb().
>>> __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map)
>>> between the two context hunks.
>> Exactly - so here we need the write side to that
> 
> No, we don't.
> 
> cpumask_set_cpu(cpu, &cpu_online_map) is a write operation, so orders
> properly on x86.  C's ordering properties ensure that the adjacent
> function calls happen in program order.

Well, that then again falls into the category of barriers which
would be needed in arch-independent code, but can be omitted
in x86-specific sources. I think we should separate both classes
when relaxing/eliminating them.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 14:07               ` Jan Beulich
@ 2016-12-05 19:14                 ` Stefano Stabellini
  0 siblings, 0 replies; 36+ messages in thread
From: Stefano Stabellini @ 2016-12-05 19:14 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, JulienGrall, Stefano Stabellini, Xen-devel

On Mon, 5 Dec 2016, Jan Beulich wrote:
> >>> On 05.12.16 at 14:59, <andrew.cooper3@citrix.com> wrote:
> > On 05/12/16 13:50, Jan Beulich wrote:
> >>>>> On 05.12.16 at 14:43, <andrew.cooper3@citrix.com> wrote:
> >>> On 05/12/16 12:28, Jan Beulich wrote:
> >>>>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote:
> >>>>> On 05/12/16 11:18, Jan Beulich wrote:
> >>>>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
> >>>>>>> --- a/xen/arch/x86/smpboot.c
> >>>>>>> +++ b/xen/arch/x86/smpboot.c
> >>>>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
> >>>>>>>      spin_debug_enable();
> >>>>>>>      set_cpu_sibling_map(cpu);
> >>>>>>>      notify_cpu_starting(cpu);
> >>>>>>> -    wmb();
> >>>>>>>  
> >>>>>>>      /*
> >>>>>>>       * We need to hold vector_lock so there the set of online cpus
> >>>>>> Hmm, this one is indeed redundant with the lock_vector_lock()
> >>>>>> following right below, but if that lock wasn't there, I think it
> >>>>>> would be needed to order set_cpu_sibling_map() and the
> >>>>>> setting of the bit in the online map. So I think it would better
> >>>>>> stay (but be relaxed to smb_wmb()).
> >>>>> Why?  It doesn't relate to an smp_rmb() on any other CPU, and is
> >>>>> therefore wrong to use.
> >>>> I think it does, just not with one that's spelled out as smp_rmb().
> >>>> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as
> >>>> a de-facto equivalent of smp_rmb().
> >>> __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map)
> >>> between the two context hunks.
> >> Exactly - so here we need the write side to that
> > 
> > No, we don't.
> > 
> > cpumask_set_cpu(cpu, &cpu_online_map) is a write operation, so orders
> > properly on x86.  C's ordering properties ensure that the adjacent
> > function calls happen in program order.
> 
> Well, that then again falls into the category of barriers which
> would be needed in arch-independent code, but can be omitted
> in x86-specific sources. I think we should separate both classes
> when relaxing/eliminating them.

Yes. It would be nice to keep a barrier, one that #define to nothing if
it's unneeded, so that if somebody else on a different arch (*cough*
*cough*) ends up copying the code, it will know what to do.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper
  2016-12-05 11:18   ` Jan Beulich
@ 2016-12-05 19:17   ` Stefano Stabellini
  2016-12-06  0:10     ` Andrew Cooper
  1 sibling, 1 reply; 36+ messages in thread
From: Stefano Stabellini @ 2016-12-05 19:17 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Julien Grall, Stefano Stabellini, Jan Beulich, Xen-devel

On Mon, 5 Dec 2016, Andrew Cooper wrote:
> None of these barriers serve any purpose, as they are not synchronising with
> any anything on remote CPUs.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> 
> Restricting to just the $ARCH maintainers, as this is a project-wide sweep.
> 
> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
> from x86, but I don't know whether further development has gained a dependence
> on them.

We turned them into smp_wmb already (kudos to IanC).


>  xen/arch/x86/acpi/cpu_idle.c  | 2 --
>  xen/arch/x86/cpu/mcheck/mce.c | 1 -
>  xen/arch/x86/crash.c          | 3 ---
>  xen/arch/x86/smpboot.c        | 2 --
>  4 files changed, 8 deletions(-)
> 
> diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
> index f36b184..09c8848 100644
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -1331,8 +1331,6 @@ void cpuidle_disable_deep_cstate(void)
>              max_cstate = 1;
>      }
>  
> -    mb();
> -
>      hpet_disable_legacy_broadcast();
>  }
>  
> diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
> index 2695b0c..460e336 100644
> --- a/xen/arch/x86/cpu/mcheck/mce.c
> +++ b/xen/arch/x86/cpu/mcheck/mce.c
> @@ -86,7 +86,6 @@ static x86_mce_vector_t _machine_check_vector = unexpected_machine_check;
>  void x86_mce_vector_register(x86_mce_vector_t hdlr)
>  {
>      _machine_check_vector = hdlr;
> -    wmb();
>  }
>  
>  /* Call the installed machine check handler for this CPU setup. */
> diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
> index f28f527..4cadb5e 100644
> --- a/xen/arch/x86/crash.c
> +++ b/xen/arch/x86/crash.c
> @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void)
>      write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])),
>                   (unsigned long)&do_nmi_crash);
>  
> -    /* Ensure the new callback function is set before sending out the NMI. */
> -    wmb();
> -
>      smp_send_nmi_allbutself();
>  
>      msecs = 1000; /* Wait at most a second for the other cpus to stop */
> diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
> index 3a9dd3e..0aa377a 100644
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -346,7 +346,6 @@ void start_secondary(void *unused)
>      spin_debug_enable();
>      set_cpu_sibling_map(cpu);
>      notify_cpu_starting(cpu);
> -    wmb();
>  
>      /*
>       * We need to hold vector_lock so there the set of online cpus
> @@ -364,7 +363,6 @@ void start_secondary(void *unused)
>  
>      microcode_resume_cpu(cpu);
>  
> -    wmb();
>      startup_cpu_idle_loop();
>  }
>  
> -- 
> 2.1.4
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-05 19:17   ` Stefano Stabellini
@ 2016-12-06  0:10     ` Andrew Cooper
  2016-12-06 20:27       ` Stefano Stabellini
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2016-12-06  0:10 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Jan Beulich, Xen-devel

On 05/12/2016 19:17, Stefano Stabellini wrote:
> On Mon, 5 Dec 2016, Andrew Cooper wrote:
>> None of these barriers serve any purpose, as they are not synchronising with
>> any anything on remote CPUs.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Stefano Stabellini <sstabellini@kernel.org>
>> CC: Julien Grall <julien.grall@arm.com>
>>
>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep.
>>
>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
>> from x86, but I don't know whether further development has gained a dependence
>> on them.
> We turned them into smp_wmb already (kudos to IanC).

Right, but the entire point I am trying to argue is that they are not
needed in the first place.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-06  0:10     ` Andrew Cooper
@ 2016-12-06 20:27       ` Stefano Stabellini
  2016-12-06 20:32         ` Stefano Stabellini
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Stabellini @ 2016-12-06 20:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Julien Grall, Stefano Stabellini, Jan Beulich, Xen-devel

On Tue, 6 Dec 2016, Andrew Cooper wrote:
> On 05/12/2016 19:17, Stefano Stabellini wrote:
> > On Mon, 5 Dec 2016, Andrew Cooper wrote:
> >> None of these barriers serve any purpose, as they are not synchronising with
> >> any anything on remote CPUs.
> >>
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Jan Beulich <JBeulich@suse.com>
> >> CC: Stefano Stabellini <sstabellini@kernel.org>
> >> CC: Julien Grall <julien.grall@arm.com>
> >>
> >> Restricting to just the $ARCH maintainers, as this is a project-wide sweep.
> >>
> >> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
> >> from x86, but I don't know whether further development has gained a dependence
> >> on them.
> > We turned them into smp_wmb already (kudos to IanC).
> 
> Right, but the entire point I am trying to argue is that they are not
> needed in the first place.

This is the current code:

    CPU 1                                  CPU 0
    -----                                  -----

    init stuff                             read cpu_online_map

    write barrier                          

    write cpu_online_map                   do more initialization

    write barrier

    init more stuff


I agree that it's wrong, because the second write barrier in
start_secondary is useless and in addition we are missing a read barrier
in __cpu_up, corresponding to the first write barrier in
start_secondary.

I think it should look like:


    CPU 1                                  CPU 0
    -----                                  -----

    init stuff                             read cpu_online_map

    write barrier                          read barrier 

    write cpu_online_map                   do more initialization

    init more stuff


The patch is as follow.

Julien, what do you think?

Also, do we need to change the remaming smp_wmb() in start_secondary to
wmb() to ensure execution ordering as well as memory access ordering?

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 90ad1d0..c841a15 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -311,7 +311,6 @@ void start_secondary(unsigned long boot_phys_offset,
 
     /* Now report this CPU is up */
     cpumask_set_cpu(cpuid, &cpu_online_map);
-    smp_wmb();
 
     local_irq_enable();
     local_abort_enable();
@@ -408,6 +407,7 @@ int __cpu_up(unsigned int cpu)
         cpu_relax();
         process_pending_softirqs();
     }
+    smp_rmb();
 
     /*
      * Nuke start of day info before checking one last time if the CPU

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-06 20:27       ` Stefano Stabellini
@ 2016-12-06 20:32         ` Stefano Stabellini
  2016-12-07  1:03           ` Andrew Cooper
  2016-12-07 18:31           ` Julien Grall
  0 siblings, 2 replies; 36+ messages in thread
From: Stefano Stabellini @ 2016-12-06 20:32 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Andrew Cooper, Julien Grall, Jan Beulich, Xen-devel

On Tue, 6 Dec 2016, Stefano Stabellini wrote:
> On Tue, 6 Dec 2016, Andrew Cooper wrote:
> > On 05/12/2016 19:17, Stefano Stabellini wrote:
> > > On Mon, 5 Dec 2016, Andrew Cooper wrote:
> > >> None of these barriers serve any purpose, as they are not synchronising with
> > >> any anything on remote CPUs.
> > >>
> > >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > >> ---
> > >> CC: Jan Beulich <JBeulich@suse.com>
> > >> CC: Stefano Stabellini <sstabellini@kernel.org>
> > >> CC: Julien Grall <julien.grall@arm.com>
> > >>
> > >> Restricting to just the $ARCH maintainers, as this is a project-wide sweep.
> > >>
> > >> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
> > >> from x86, but I don't know whether further development has gained a dependence
> > >> on them.
> > > We turned them into smp_wmb already (kudos to IanC).
> > 
> > Right, but the entire point I am trying to argue is that they are not
> > needed in the first place.

Just to be clear, on ARM the barriers are unneeded only if it is
unimportant that "init stuff" (which correspond to all the
initialization done in start_secondary up to smp_wmb) below is completed
before "write cpu_online_map". But it looks like we do want to complete
mmu, irq, timer initializations and set the current vcpu before marking
the cpu as online, right?


> This is the current code:
> 
>     CPU 1                                  CPU 0
>     -----                                  -----
> 
>     init stuff                             read cpu_online_map
> 
>     write barrier                          
> 
>     write cpu_online_map                   do more initialization
> 
>     write barrier
> 
>     init more stuff
> 
> 
> I agree that it's wrong, because the second write barrier in
> start_secondary is useless and in addition we are missing a read barrier
> in __cpu_up, corresponding to the first write barrier in
> start_secondary.
> 
> I think it should look like:
> 
> 
>     CPU 1                                  CPU 0
>     -----                                  -----
> 
>     init stuff                             read cpu_online_map
> 
>     write barrier                          read barrier 
> 
>     write cpu_online_map                   do more initialization
> 
>     init more stuff
> 
> 
> The patch is as follow.
> 
> Julien, what do you think?
> 
> Also, do we need to change the remaming smp_wmb() in start_secondary to
> wmb() to ensure execution ordering as well as memory access ordering?
> 
> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index 90ad1d0..c841a15 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -311,7 +311,6 @@ void start_secondary(unsigned long boot_phys_offset,
>  
>      /* Now report this CPU is up */
>      cpumask_set_cpu(cpuid, &cpu_online_map);
> -    smp_wmb();
>  
>      local_irq_enable();
>      local_abort_enable();
> @@ -408,6 +407,7 @@ int __cpu_up(unsigned int cpu)
>          cpu_relax();
>          process_pending_softirqs();
>      }
> +    smp_rmb();
>  
>      /*
>       * Nuke start of day info before checking one last time if the CPU
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-06 20:32         ` Stefano Stabellini
@ 2016-12-07  1:03           ` Andrew Cooper
  2016-12-07  1:20             ` Stefano Stabellini
  2016-12-07 18:31           ` Julien Grall
  1 sibling, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2016-12-07  1:03 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Jan Beulich, Xen-devel

On 06/12/2016 20:32, Stefano Stabellini wrote:
> On Tue, 6 Dec 2016, Stefano Stabellini wrote:
>> On Tue, 6 Dec 2016, Andrew Cooper wrote:
>>> On 05/12/2016 19:17, Stefano Stabellini wrote:
>>>> On Mon, 5 Dec 2016, Andrew Cooper wrote:
>>>>> None of these barriers serve any purpose, as they are not synchronising with
>>>>> any anything on remote CPUs.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> ---
>>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>>>> CC: Julien Grall <julien.grall@arm.com>
>>>>>
>>>>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep.
>>>>>
>>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
>>>>> from x86, but I don't know whether further development has gained a dependence
>>>>> on them.
>>>> We turned them into smp_wmb already (kudos to IanC).
>>> Right, but the entire point I am trying to argue is that they are not
>>> needed in the first place.
> Just to be clear, on ARM the barriers are unneeded only if it is
> unimportant that "init stuff" (which correspond to all the
> initialization done in start_secondary up to smp_wmb) below is completed
> before "write cpu_online_map". But it looks like we do want to complete
> mmu, irq, timer initializations and set the current vcpu before marking
> the cpu as online, right?

No.  I am sorry, but this question suggests that you still don't appear
to understand barriers.

>
>
>> This is the current code:
>>
>>     CPU 1                                  CPU 0
>>     -----                                  -----
>>
>>     init stuff                             read cpu_online_map
>>
>>     write barrier                          
>>
>>     write cpu_online_map                   do more initialization
>>
>>     write barrier
>>
>>     init more stuff
>>
>>
>> I agree that it's wrong, because the second write barrier in
>> start_secondary is useless and in addition we are missing a read barrier
>> in __cpu_up, corresponding to the first write barrier in
>> start_secondary.
>>
>> I think it should look like:
>>
>>
>>     CPU 1                                  CPU 0
>>     -----                                  -----
>>
>>     init stuff                             read cpu_online_map
>>
>>     write barrier                          read barrier 
>>
>>     write cpu_online_map                   do more initialization
>>
>>     init more stuff

The barriers here serve no purpose, because you have missed an important
blocking while loop on CPU 0.

Recall, that the read/write barrier example is:

foo = a;
smp_rmb();
bar = b;

and

a = baz;
smp_wmb();
b = fromble;

This is specifically relevant *only* to the shared variables a and b,
where for correctness an update to a must be observed remotely before
the update to b.

If you do not have the explicitly same a and b on either side of the
smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't
be using the barriers).

The init sequence is a different scenario.

Processor 0 spins waiting to observe an update to cpu_online_map.

Processor 1 performs its init sequence, and mid way through, sets its
own bit in the cpu_online_map.  It then continues further init actions.

It does not matter whether processor 0 observes the update to
cpu_online_map slightly early or late compared to the local-state
updates from the other parts of processor 1's init sequence (because
processor 0 had better not be looking at the local-state changes).  Once
the bit is set in cpu_online_map, processor 1 is ready to be IPI'd to
give it work to do, etc.  Even if processor 0 hasn't observed all of the
local-state updates of processor 1, once it has seen the cpu_online_map
update, it knows that processor 1 is available to be IPI'd, and IPIing
processor 1 will cause execution with expected program order being
observed (from the point of view of processor 1).

The only consideration is whether processor 1 can make an action which
will prioritise the update to cpu_online_map becoming visible to the
rest of the system.  If there is such an action available, then an
argument can be made that making the update visible more quickly will
allow processor 0 to continue earlier.  However, such an action would be
entirely local to processor 1 and not related to anything happening on
processor 0.

I am not aware of any such action on x86, and my gut feeling is that no
other architecture would have one either.  The ability to fast forward
one specific update to the shared cache-coherency layer would only
complicate an already complicated area of silicon, and would only be
useful to micro-optimise a corner case; I can't see any system designers
considering it a useful feature to add.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-07  1:03           ` Andrew Cooper
@ 2016-12-07  1:20             ` Stefano Stabellini
  2016-12-07  1:46               ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Stabellini @ 2016-12-07  1:20 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Julien Grall, Stefano Stabellini, Jan Beulich, Xen-devel

On Wed, 7 Dec 2016, Andrew Cooper wrote:
> On 06/12/2016 20:32, Stefano Stabellini wrote:
> > On Tue, 6 Dec 2016, Stefano Stabellini wrote:
> >> On Tue, 6 Dec 2016, Andrew Cooper wrote:
> >>> On 05/12/2016 19:17, Stefano Stabellini wrote:
> >>>> On Mon, 5 Dec 2016, Andrew Cooper wrote:
> >>>>> None of these barriers serve any purpose, as they are not synchronising with
> >>>>> any anything on remote CPUs.
> >>>>>
> >>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >>>>> ---
> >>>>> CC: Jan Beulich <JBeulich@suse.com>
> >>>>> CC: Stefano Stabellini <sstabellini@kernel.org>
> >>>>> CC: Julien Grall <julien.grall@arm.com>
> >>>>>
> >>>>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep.
> >>>>>
> >>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
> >>>>> from x86, but I don't know whether further development has gained a dependence
> >>>>> on them.
> >>>> We turned them into smp_wmb already (kudos to IanC).
> >>> Right, but the entire point I am trying to argue is that they are not
> >>> needed in the first place.
> > Just to be clear, on ARM the barriers are unneeded only if it is
> > unimportant that "init stuff" (which correspond to all the
> > initialization done in start_secondary up to smp_wmb) below is completed
> > before "write cpu_online_map". But it looks like we do want to complete
> > mmu, irq, timer initializations and set the current vcpu before marking
> > the cpu as online, right?
> 
> No.  I am sorry, but this question suggests that you still don't appear
> to understand barriers.
> 
> >
> >
> >> This is the current code:
> >>
> >>     CPU 1                                  CPU 0
> >>     -----                                  -----
> >>
> >>     init stuff                             read cpu_online_map
> >>
> >>     write barrier                          
> >>
> >>     write cpu_online_map                   do more initialization
> >>
> >>     write barrier
> >>
> >>     init more stuff
> >>
> >>
> >> I agree that it's wrong, because the second write barrier in
> >> start_secondary is useless and in addition we are missing a read barrier
> >> in __cpu_up, corresponding to the first write barrier in
> >> start_secondary.
> >>
> >> I think it should look like:
> >>
> >>
> >>     CPU 1                                  CPU 0
> >>     -----                                  -----
> >>
> >>     init stuff                             read cpu_online_map
> >>
> >>     write barrier                          read barrier 
> >>
> >>     write cpu_online_map                   do more initialization
> >>
> >>     init more stuff
> 
> The barriers here serve no purpose, because you have missed an important
> blocking while loop on CPU 0.
> 
> Recall, that the read/write barrier example is:
> 
> foo = a;
> smp_rmb();
> bar = b;
> 
> and
> 
> a = baz;
> smp_wmb();
> b = fromble;
> 
> This is specifically relevant *only* to the shared variables a and b,
> where for correctness an update to a must be observed remotely before
> the update to b.
> 
> If you do not have the explicitly same a and b on either side of the
> smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't
> be using the barriers).
> 
> 
> The init sequence is a different scenario.
> 
> Processor 0 spins waiting to observe an update to cpu_online_map.
> 
> Processor 1 performs its init sequence, and mid way through, sets its
> own bit in the cpu_online_map.  It then continues further init actions.
> 
> It does not matter whether processor 0 observes the update to
> cpu_online_map slightly early or late compared to the local-state
> updates from the other parts of processor 1's init sequence (because
> processor 0 had better not be looking at the local-state changes).

In that case of course there is no need for barriers (I wrote something
similar in the other follow-up email). The case I was worried about is
exactly the one where processor 0 looks at one of the changes made by
processor 1.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-07  1:20             ` Stefano Stabellini
@ 2016-12-07  1:46               ` Andrew Cooper
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-07  1:46 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Julien Grall, Jan Beulich, Xen-devel

On 07/12/2016 01:20, Stefano Stabellini wrote:
> On Wed, 7 Dec 2016, Andrew Cooper wrote:
>> On 06/12/2016 20:32, Stefano Stabellini wrote:
>>> On Tue, 6 Dec 2016, Stefano Stabellini wrote:
>>>> On Tue, 6 Dec 2016, Andrew Cooper wrote:
>>>>> On 05/12/2016 19:17, Stefano Stabellini wrote:
>>>>>> On Mon, 5 Dec 2016, Andrew Cooper wrote:
>>>>>>> None of these barriers serve any purpose, as they are not synchronising with
>>>>>>> any anything on remote CPUs.
>>>>>>>
>>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>> ---
>>>>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>>>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>>>>>> CC: Julien Grall <julien.grall@arm.com>
>>>>>>>
>>>>>>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep.
>>>>>>>
>>>>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
>>>>>>> from x86, but I don't know whether further development has gained a dependence
>>>>>>> on them.
>>>>>> We turned them into smp_wmb already (kudos to IanC).
>>>>> Right, but the entire point I am trying to argue is that they are not
>>>>> needed in the first place.
>>> Just to be clear, on ARM the barriers are unneeded only if it is
>>> unimportant that "init stuff" (which correspond to all the
>>> initialization done in start_secondary up to smp_wmb) below is completed
>>> before "write cpu_online_map". But it looks like we do want to complete
>>> mmu, irq, timer initializations and set the current vcpu before marking
>>> the cpu as online, right?
>> No.  I am sorry, but this question suggests that you still don't appear
>> to understand barriers.
>>
>>>
>>>> This is the current code:
>>>>
>>>>     CPU 1                                  CPU 0
>>>>     -----                                  -----
>>>>
>>>>     init stuff                             read cpu_online_map
>>>>
>>>>     write barrier                          
>>>>
>>>>     write cpu_online_map                   do more initialization
>>>>
>>>>     write barrier
>>>>
>>>>     init more stuff
>>>>
>>>>
>>>> I agree that it's wrong, because the second write barrier in
>>>> start_secondary is useless and in addition we are missing a read barrier
>>>> in __cpu_up, corresponding to the first write barrier in
>>>> start_secondary.
>>>>
>>>> I think it should look like:
>>>>
>>>>
>>>>     CPU 1                                  CPU 0
>>>>     -----                                  -----
>>>>
>>>>     init stuff                             read cpu_online_map
>>>>
>>>>     write barrier                          read barrier 
>>>>
>>>>     write cpu_online_map                   do more initialization
>>>>
>>>>     init more stuff
>> The barriers here serve no purpose, because you have missed an important
>> blocking while loop on CPU 0.
>>
>> Recall, that the read/write barrier example is:
>>
>> foo = a;
>> smp_rmb();
>> bar = b;
>>
>> and
>>
>> a = baz;
>> smp_wmb();
>> b = fromble;
>>
>> This is specifically relevant *only* to the shared variables a and b,
>> where for correctness an update to a must be observed remotely before
>> the update to b.
>>
>> If you do not have the explicitly same a and b on either side of the
>> smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't
>> be using the barriers).
>>
>>
>> The init sequence is a different scenario.
>>
>> Processor 0 spins waiting to observe an update to cpu_online_map.
>>
>> Processor 1 performs its init sequence, and mid way through, sets its
>> own bit in the cpu_online_map.  It then continues further init actions.
>>
>> It does not matter whether processor 0 observes the update to
>> cpu_online_map slightly early or late compared to the local-state
>> updates from the other parts of processor 1's init sequence (because
>> processor 0 had better not be looking at the local-state changes).
> In that case of course there is no need for barriers (I wrote something
> similar in the other follow-up email). The case I was worried about is
> exactly the one where processor 0 looks at one of the changes made by
> processor 1.

Looking at an isolated change doesn't involve any ordering.

Ordering (and therefore barriers) are only relevant when looking at
exactly two related changes which need observing in a particular order. 
If you can reduce the BSP and AP boot sequence down to the two 3-line
examples above (with specifically identified variables/aggregates for a
and b on both sides), then you can make an argument for barriers being
necessary.

I should point out that I don't know whether the barriers are necessary
or unnecessary on ARM.  All I am trying to do is to ensure that everyone
has the same understanding of how barriers work before conclusions are
drawn (because they really are tricky).

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-06 20:32         ` Stefano Stabellini
  2016-12-07  1:03           ` Andrew Cooper
@ 2016-12-07 18:31           ` Julien Grall
  2016-12-07 18:44             ` Stefano Stabellini
  1 sibling, 1 reply; 36+ messages in thread
From: Julien Grall @ 2016-12-07 18:31 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Andrew Cooper, Steve Capper, Jan Beulich, Xen-devel

Hi Stefano,

On 06/12/2016 20:32, Stefano Stabellini wrote:
> On Tue, 6 Dec 2016, Stefano Stabellini wrote:
>> On Tue, 6 Dec 2016, Andrew Cooper wrote:
>>> On 05/12/2016 19:17, Stefano Stabellini wrote:
>>>> On Mon, 5 Dec 2016, Andrew Cooper wrote:
>>>>> None of these barriers serve any purpose, as they are not synchronising with
>>>>> any anything on remote CPUs.
>>>>>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> ---
>>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>>> CC: Stefano Stabellini <sstabellini@kernel.org>
>>>>> CC: Julien Grall <julien.grall@arm.com>
>>>>>
>>>>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep.
>>>>>
>>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage
>>>>> from x86, but I don't know whether further development has gained a dependence
>>>>> on them.
>>>> We turned them into smp_wmb already (kudos to IanC).
>>>
>>> Right, but the entire point I am trying to argue is that they are not
>>> needed in the first place.
>
> Just to be clear, on ARM the barriers are unneeded only if it is
> unimportant that "init stuff" (which correspond to all the
> initialization done in start_secondary up to smp_wmb) below is completed
> before "write cpu_online_map". But it looks like we do want to complete
> mmu, irq, timer initializations and set the current vcpu before marking
> the cpu as online, right?

*mb are memory barriers. This would not prevent write to system 
registers and co-processor registers happening before "write 
cpu_online_map". Only an dsb(sy); isb() would ensure this.

However, I don't think we really care about the state of the hardware 
for a specific CPU. This should never be accessed by another one.

>
>> This is the current code:
>>
>>     CPU 1                                  CPU 0
>>     -----                                  -----
>>
>>     init stuff                             read cpu_online_map
>>
>>     write barrier
>>
>>     write cpu_online_map                   do more initialization
>>
>>     write barrier
>>
>>     init more stuff
>>
>>
>> I agree that it's wrong, because the second write barrier in
>> start_secondary is useless and in addition we are missing a read barrier
>> in __cpu_up, corresponding to the first write barrier in
>> start_secondary.
>>
>> I think it should look like:
>>
>>
>>     CPU 1                                  CPU 0
>>     -----                                  -----
>>
>>     init stuff                             read cpu_online_map
>>
>>     write barrier                          read barrier
>>
>>     write cpu_online_map                   do more initialization
>>
>>     init more stuff
>>
>>
>> The patch is as follow.
>>
>> Julien, what do you think?
>>
>> Also, do we need to change the remaming smp_wmb() in start_secondary to
>> wmb() to ensure execution ordering as well as memory access ordering?

I don't think so. If synchronization of hardware access was necessary it 
would have been taken care by the driver itself.

What we should care here is if there any xen internal state that are 
required between CPU0 and CPU1.

In this specific case, I think we should have the smp_wmb() barrier to 
ensure all write happening whilst calling local notifiers will be 
visible before the CPU mark itself as online. So IHMO, the patch looks 
good to me (see a small comment below).

>>
>> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
>>
>> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
>> index 90ad1d0..c841a15 100644
>> --- a/xen/arch/arm/smpboot.c
>> +++ b/xen/arch/arm/smpboot.c
>> @@ -311,7 +311,6 @@ void start_secondary(unsigned long boot_phys_offset,
>>
>>      /* Now report this CPU is up */
>>      cpumask_set_cpu(cpuid, &cpu_online_map);
>> -    smp_wmb();
>>
>>      local_irq_enable();
>>      local_abort_enable();
>> @@ -408,6 +407,7 @@ int __cpu_up(unsigned int cpu)
>>          cpu_relax();
>>          process_pending_softirqs();
>>      }
>> +    smp_rmb();

It would be good to start annotating the pair of barrier with "This 
barrier corresponds with the one in...". It would avoid "wild" barrier 
in the code :).

>>
>>      /*
>>       * Nuke start of day info before checking one last time if the CPU
>>

Regards,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-07 18:31           ` Julien Grall
@ 2016-12-07 18:44             ` Stefano Stabellini
  2016-12-07 18:55               ` Julien Grall
  0 siblings, 1 reply; 36+ messages in thread
From: Stefano Stabellini @ 2016-12-07 18:44 UTC (permalink / raw)
  To: Julien Grall
  Cc: Andrew Cooper, Stefano Stabellini, Steve Capper, Jan Beulich,
	Xen-devel

On Wed, 7 Dec 2016, Julien Grall wrote:
> Hi Stefano,
> 
> On 06/12/2016 20:32, Stefano Stabellini wrote:
> > On Tue, 6 Dec 2016, Stefano Stabellini wrote:
> > > On Tue, 6 Dec 2016, Andrew Cooper wrote:
> > > > On 05/12/2016 19:17, Stefano Stabellini wrote:
> > > > > On Mon, 5 Dec 2016, Andrew Cooper wrote:
> > > > > > None of these barriers serve any purpose, as they are not
> > > > > > synchronising with
> > > > > > any anything on remote CPUs.
> > > > > > 
> > > > > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > > > > > ---
> > > > > > CC: Jan Beulich <JBeulich@suse.com>
> > > > > > CC: Stefano Stabellini <sstabellini@kernel.org>
> > > > > > CC: Julien Grall <julien.grall@arm.com>
> > > > > > 
> > > > > > Restricting to just the $ARCH maintainers, as this is a project-wide
> > > > > > sweep.
> > > > > > 
> > > > > > Julien/Stefano: I think the ARM smpboot inhereted the erronious
> > > > > > barrier usage
> > > > > > from x86, but I don't know whether further development has gained a
> > > > > > dependence
> > > > > > on them.
> > > > > We turned them into smp_wmb already (kudos to IanC).
> > > > 
> > > > Right, but the entire point I am trying to argue is that they are not
> > > > needed in the first place.
> > 
> > Just to be clear, on ARM the barriers are unneeded only if it is
> > unimportant that "init stuff" (which correspond to all the
> > initialization done in start_secondary up to smp_wmb) below is completed
> > before "write cpu_online_map". But it looks like we do want to complete
> > mmu, irq, timer initializations and set the current vcpu before marking
> > the cpu as online, right?
> 
> *mb are memory barriers. This would not prevent write to system registers and
> co-processor registers happening before "write cpu_online_map". Only an
> dsb(sy); isb() would ensure this.
> 
> However, I don't think we really care about the state of the hardware for a
> specific CPU. This should never be accessed by another one.
> 
> > 
> > > This is the current code:
> > > 
> > >     CPU 1                                  CPU 0
> > >     -----                                  -----
> > > 
> > >     init stuff                             read cpu_online_map
> > > 
> > >     write barrier
> > > 
> > >     write cpu_online_map                   do more initialization
> > > 
> > >     write barrier
> > > 
> > >     init more stuff
> > > 
> > > 
> > > I agree that it's wrong, because the second write barrier in
> > > start_secondary is useless and in addition we are missing a read barrier
> > > in __cpu_up, corresponding to the first write barrier in
> > > start_secondary.
> > > 
> > > I think it should look like:
> > > 
> > > 
> > >     CPU 1                                  CPU 0
> > >     -----                                  -----
> > > 
> > >     init stuff                             read cpu_online_map
> > > 
> > >     write barrier                          read barrier
> > > 
> > >     write cpu_online_map                   do more initialization
> > > 
> > >     init more stuff
> > > 
> > > 
> > > The patch is as follow.
> > > 
> > > Julien, what do you think?
> > > 
> > > Also, do we need to change the remaming smp_wmb() in start_secondary to
> > > wmb() to ensure execution ordering as well as memory access ordering?
> 
> I don't think so. If synchronization of hardware access was necessary it would
> have been taken care by the driver itself.

I thought the same, thanks for confirming.


> What we should care here is if there any xen internal state that are required
> between CPU0 and CPU1.
> 
> In this specific case, I think we should have the smp_wmb() barrier to ensure
> all write happening whilst calling local notifiers will be visible before the
> CPU mark itself as online. So IHMO, the patch looks good to me (see a small
> comment below).

Andrew thinks that (on x86) there is actually nothing that CPU0 will be
looking at, that has been set by CPU1. However looking at the code it is
difficult to verify. There are many cpu notifiers and many things
written by start_secondary. I would prefer to submit this patch, and be
safe.


> > > 
> > > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
> > > 
> > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> > > index 90ad1d0..c841a15 100644
> > > --- a/xen/arch/arm/smpboot.c
> > > +++ b/xen/arch/arm/smpboot.c
> > > @@ -311,7 +311,6 @@ void start_secondary(unsigned long boot_phys_offset,
> > > 
> > >      /* Now report this CPU is up */
> > >      cpumask_set_cpu(cpuid, &cpu_online_map);
> > > -    smp_wmb();
> > > 
> > >      local_irq_enable();
> > >      local_abort_enable();
> > > @@ -408,6 +407,7 @@ int __cpu_up(unsigned int cpu)
> > >          cpu_relax();
> > >          process_pending_softirqs();
> > >      }
> > > +    smp_rmb();
> 
> It would be good to start annotating the pair of barrier with "This barrier
> corresponds with the one in...". It would avoid "wild" barrier in the code :).
> 
> > > 
> > >      /*
> > >       * Nuke start of day info before checking one last time if the CPU
> > > 
> 
> Regards,
> 
> -- 
> Julien Grall
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/4] xen/x86: Drop erronious barriers
  2016-12-07 18:44             ` Stefano Stabellini
@ 2016-12-07 18:55               ` Julien Grall
  0 siblings, 0 replies; 36+ messages in thread
From: Julien Grall @ 2016-12-07 18:55 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Andrew Cooper, Steve Capper, Jan Beulich, Xen-devel

On 07/12/2016 18:44, Stefano Stabellini wrote:
> On Wed, 7 Dec 2016, Julien Grall wrote:
> Andrew thinks that (on x86) there is actually nothing that CPU0 will be
> looking at, that has been set by CPU1. However looking at the code it is
> difficult to verify. There are many cpu notifiers and many things
> written by start_secondary. I would prefer to submit this patch, and be
> safe.

I agree on this. Better be safe than hunting a bug later on.

Although, I think I just found an example for ARM. The gic_cpu_id (see 
gic-v2.c) is stored per-cpu and initialized by each CPU at boot.

gic_cpu_id is commonly used to send a SGI to a specific target. So we 
need to ensure that CPU0 will see this value before sending an SGI (see 
gicv2_send_SGI). Otherwise the SGI may go to the wild.

While you are sending a patch, can you document in the code why the 
barrier is present?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper
  2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
  2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper
@ 2016-12-05 10:05 ` Andrew Cooper
  2016-12-05 11:47   ` Jan Beulich
  2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
  3 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Mandatory barriers are only for use with reduced-cacheability memory mappings.

All of these uses are just to deal with shared memory between multiple
processors, so use the smp_*() which are the correct barriers for the purpose.

This is no functional change, because Xen currently assigns the smp_* meaning
to non-smp_* barriers.  This will be fixed in the following patch.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

Restricting to just the $ARCH maintainers, as this is a project-wide sweep
---
 xen/arch/x86/acpi/cpu_idle.c             |  8 ++++----
 xen/arch/x86/cpu/mcheck/amd_nonfatal.c   |  4 ++--
 xen/arch/x86/cpu/mcheck/barrier.c        | 10 +++++-----
 xen/arch/x86/cpu/mcheck/mce.c            |  2 +-
 xen/arch/x86/cpu/mcheck/mctelem.c        | 10 +++++-----
 xen/arch/x86/genapic/x2apic.c            |  6 +++---
 xen/arch/x86/hpet.c                      |  6 +++---
 xen/arch/x86/hvm/ioreq.c                 |  4 ++--
 xen/arch/x86/io_apic.c                   |  4 ++--
 xen/arch/x86/irq.c                       |  4 ++--
 xen/arch/x86/mm.c                        | 10 +++++-----
 xen/arch/x86/mm/shadow/multi.c           |  2 +-
 xen/arch/x86/smpboot.c                   | 12 ++++++------
 xen/arch/x86/time.c                      |  8 ++++----
 xen/drivers/passthrough/amd/iommu_init.c |  4 ++--
 xen/include/asm-x86/desc.h               |  8 ++++----
 xen/include/asm-x86/system.h             |  2 +-
 17 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 09c8848..b481059 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
 
     if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
     {
-        mb();
+        smp_mb();
         clflush((void *)&mwait_wakeup(cpu));
-        mb();
+        smp_mb();
     }
 
     __monitor((void *)&mwait_wakeup(cpu), 0, 0);
@@ -756,10 +756,10 @@ void acpi_dead_idle(void)
              * instruction, hence memory fence is necessary to make sure all 
              * load/store visible before flush cache line.
              */
-            mb();
+            smp_mb();
             clflush(mwait_ptr);
             __monitor(mwait_ptr, 0, 0);
-            mb();
+            smp_mb();
             __mwait(cx->address, 0);
         }
     }
diff --git a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
index 8a80a9f..fb1d41d 100644
--- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
+++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
@@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data)
 			/* Counter enable */
 			value |= (1ULL << 51);
 			mca_wrmsr(MSR_IA32_MCx_MISC(4), value);
-			wmb();
+			smp_wmb();
 		}
 	}
 
@@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
 			value |= (1ULL << 51);
 			wrmsrl(MSR_IA32_MCx_MISC(4), value);
 			/* serialize */
-			wmb();
+			smp_wmb();
 			printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n");
 		}
 	}
diff --git a/xen/arch/x86/cpu/mcheck/barrier.c b/xen/arch/x86/cpu/mcheck/barrier.c
index 5dce1fb..1a2149f 100644
--- a/xen/arch/x86/cpu/mcheck/barrier.c
+++ b/xen/arch/x86/cpu/mcheck/barrier.c
@@ -12,7 +12,7 @@ void mce_barrier_init(struct mce_softirq_barrier *bar)
 void mce_barrier_dec(struct mce_softirq_barrier *bar)
 {
     atomic_inc(&bar->outgen);
-    wmb();
+    smp_wmb();
     atomic_dec(&bar->val);
 }
 
@@ -24,12 +24,12 @@ void mce_barrier_enter(struct mce_softirq_barrier *bar)
         return;
     atomic_inc(&bar->ingen);
     gen = atomic_read(&bar->outgen);
-    mb();
+    smp_mb();
     atomic_inc(&bar->val);
     while ( atomic_read(&bar->val) != num_online_cpus() &&
             atomic_read(&bar->outgen) == gen )
     {
-            mb();
+            smp_mb();
             mce_panic_check();
     }
 }
@@ -42,12 +42,12 @@ void mce_barrier_exit(struct mce_softirq_barrier *bar)
         return;
     atomic_inc(&bar->outgen);
     gen = atomic_read(&bar->ingen);
-    mb();
+    smp_mb();
     atomic_dec(&bar->val);
     while ( atomic_read(&bar->val) != 0 &&
             atomic_read(&bar->ingen) == gen )
     {
-            mb();
+            smp_mb();
             mce_panic_check();
     }
 }
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 460e336..02883fc 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -388,7 +388,7 @@ mcheck_mca_logout(enum mca_source who, struct mca_banks *bankmask,
         else if ( who == MCA_MCE_SCAN && need_clear)
             mcabanks_set(i, clear_bank);
 
-        wmb();
+        smp_wmb();
     }
 
     if (mig && errcnt > 0) {
diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c
index 95e83c5..01077fe 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -414,9 +414,9 @@ static void mctelem_append_processing(mctelem_class_t which)
 		ltep->mcte_prev = *procltp;
 		*procltp = dangling[target];
 	}
-	wmb();
+	smp_wmb();
 	dangling[target] = NULL;
-	wmb();
+	smp_wmb();
 }
 
 mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which)
@@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which)
 	}
 
 	mctelem_processing_hold(tep);
-	wmb();
+	smp_wmb();
 	spin_unlock(&processing_lock);
 	return MCTE2COOKIE(tep);
 }
@@ -444,7 +444,7 @@ void mctelem_consume_oldest_end(mctelem_cookie_t cookie)
 
 	spin_lock(&processing_lock);
 	mctelem_processing_release(tep);
-	wmb();
+	smp_wmb();
 	spin_unlock(&processing_lock);
 }
 
@@ -460,6 +460,6 @@ void mctelem_ack(mctelem_class_t which, mctelem_cookie_t cookie)
 	spin_lock(&processing_lock);
 	if (tep == mctctl.mctc_processing_head[target])
 		mctelem_processing_release(tep);
-	wmb();
+	smp_wmb();
 	spin_unlock(&processing_lock);
 }
diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
index d894a98..c63af0c 100644
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -107,12 +107,12 @@ static void send_IPI_mask_x2apic_phys(const cpumask_t *cpumask, int vector)
      * CPU is seen by notified remote CPUs. The WRMSR contained within
      * apic_icr_write() can otherwise be executed early.
      * 
-     * The reason mb() is sufficient here is subtle: the register arguments
+     * The reason smp_mb() is sufficient here is subtle: the register arguments
      * to WRMSR must depend on a memory read executed after the barrier. This
      * is guaranteed by cpu_physical_id(), which reads from a global array (and
      * so cannot be hoisted above the barrier even by a clever compiler).
      */
-    mb();
+    smp_mb();
 
     local_irq_save(flags);
 
@@ -136,7 +136,7 @@ static void send_IPI_mask_x2apic_cluster(const cpumask_t *cpumask, int vector)
     const cpumask_t *cluster_cpus;
     unsigned long flags;
 
-    mb(); /* See above for an explanation. */
+    smp_mb(); /* See above for an explanation. */
 
     local_irq_save(flags);
 
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index f78054d..c5ef534 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -190,9 +190,9 @@ static void handle_hpet_broadcast(struct hpet_event_channel *ch)
     {
         s_time_t deadline;
 
-        rmb();
+        smp_rmb();
         deadline = per_cpu(timer_deadline, cpu);
-        rmb();
+        smp_rmb();
         if ( !cpumask_test_cpu(cpu, ch->cpumask) )
             continue;
 
@@ -610,7 +610,7 @@ void __init hpet_broadcast_init(void)
         hpet_events[i].shift = 32;
         hpet_events[i].next_event = STIME_MAX;
         spin_lock_init(&hpet_events[i].lock);
-        wmb();
+        smp_wmb();
         hpet_events[i].event_handler = handle_hpet_broadcast;
 
         hpet_events[i].msi.msi_attrib.maskbit = 1;
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index 88071ab..6c00c0b 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -92,7 +92,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
     {
         unsigned int state = p->state;
 
-        rmb();
+        smp_rmb();
         switch ( state )
         {
         case STATE_IOREQ_NONE:
@@ -1272,7 +1272,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     }
 
     /* Make the ioreq_t visible /before/ write_pointer. */
-    wmb();
+    smp_wmb();
     pg->ptrs.write_pointer += qw ? 2 : 1;
 
     /* Canonicalize read/write pointers to prevent their overflow. */
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index 33e5927..185b956 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void)
     unsigned long t1, flags;
 
     t1 = pit0_ticks;
-    mb();
+    smp_mb();
 
     local_save_flags(flags);
     local_irq_enable();
@@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void)
      * might have cached one ExtINT interrupt.  Finally, at
      * least one tick may be lost due to delays.
      */
-    mb();
+    smp_mb();
     if (pit0_ticks - t1 > 4)
         return 1;
 
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 8c1545a..de72b0d 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -757,9 +757,9 @@ void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask)
     
     ASSERT(spin_is_locked(&desc->lock));
     desc->status &= ~IRQ_MOVE_PENDING;
-    wmb();
+    smp_wmb();
     cpumask_copy(desc->arch.pending_mask, mask);
-    wmb();
+    smp_wmb();
     desc->status |= IRQ_MOVE_PENDING;
 }
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 14552a1..a2672b1 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -460,7 +460,7 @@ void share_xen_page_with_guest(
     page->u.inuse.type_info |= PGT_validated | 1;
 
     page_set_owner(page, d);
-    wmb(); /* install valid domain ptr before updating refcnt. */
+    smp_wmb(); /* install valid domain ptr before updating refcnt. */
     ASSERT((page->count_info & ~PGC_xen_heap) == 0);
 
     /* Only add to the allocation list if the domain isn't dying. */
@@ -2281,7 +2281,7 @@ static int alloc_page_type(struct page_info *page, unsigned long type,
     }
 
     /* No need for atomic update of type_info here: noone else updates it. */
-    wmb();
+    smp_wmb();
     switch ( rc )
     {
     case 0:
@@ -2388,7 +2388,7 @@ static int __put_final_page_type(
         if ( !(shadow_mode_enabled(page_get_owner(page)) &&
                (page->count_info & PGC_page_table)) )
             page->tlbflush_timestamp = tlbflush_current_time();
-        wmb();
+        smp_wmb();
         page->u.inuse.type_info--;
     }
     else if ( rc == -EINTR )
@@ -2398,13 +2398,13 @@ static int __put_final_page_type(
         if ( !(shadow_mode_enabled(page_get_owner(page)) &&
                (page->count_info & PGC_page_table)) )
             page->tlbflush_timestamp = tlbflush_current_time();
-        wmb();
+        smp_wmb();
         page->u.inuse.type_info |= PGT_validated;
     }
     else
     {
         BUG_ON(rc != -ERESTART);
-        wmb();
+        smp_wmb();
         get_page_light(page);
         page->u.inuse.type_info |= PGT_partial;
     }
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 2696396..7268300 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3021,7 +3021,7 @@ static int sh_page_fault(struct vcpu *v,
      * will make sure no inconsistent mapping being translated into
      * shadow page table. */
     version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
-    rmb();
+    smp_rmb();
     rc = sh_walk_guest_tables(v, va, &gw, regs->error_code);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 0aa377a..ba0fffe 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -77,7 +77,7 @@ static enum cpu_state {
     CPU_STATE_CALLIN,   /* slave -> master: Completed phase 2 */
     CPU_STATE_ONLINE    /* master -> slave: Go fully online now. */
 } cpu_state;
-#define set_cpu_state(state) do { mb(); cpu_state = (state); } while (0)
+#define set_cpu_state(state) do { smp_mb(); cpu_state = (state); } while (0)
 
 void *stack_base[NR_CPUS];
 
@@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave)
     for ( i = 1; i <= 5; i++ )
     {
         tsc_value = rdtsc_ordered();
-        wmb();
+        smp_wmb();
         atomic_inc(&tsc_count);
         while ( atomic_read(&tsc_count) != (i<<1) )
             cpu_relax();
@@ -149,7 +149,7 @@ static void synchronize_tsc_slave(unsigned int slave)
     {
         while ( atomic_read(&tsc_count) != ((i<<1)-1) )
             cpu_relax();
-        rmb();
+        smp_rmb();
         /*
          * If a CPU has been physically hotplugged, we may as well write
          * to its TSC in spite of X86_FEATURE_TSC_RELIABLE. The platform does
@@ -552,13 +552,13 @@ static int do_boot_cpu(int apicid, int cpu)
         }
         else if ( cpu_state == CPU_STATE_DEAD )
         {
-            rmb();
+            smp_rmb();
             rc = cpu_error;
         }
         else
         {
             boot_error = 1;
-            mb();
+            smp_mb();
             if ( bootsym(trampoline_cpu_started) == 0xA5 )
                 /* trampoline started but...? */
                 printk("Stuck ??\n");
@@ -576,7 +576,7 @@ static int do_boot_cpu(int apicid, int cpu)
 
     /* mark "stuck" area as not stuck */
     bootsym(trampoline_cpu_started) = 0;
-    mb();
+    smp_mb();
 
     smpboot_restore_warm_reset_vector();
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index dda89d8..0039e23 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -977,10 +977,10 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
 
     /* 1. Update guest kernel version. */
     _u.version = u->version = version_update_begin(u->version);
-    wmb();
+    smp_wmb();
     /* 2. Update all other guest kernel fields. */
     *u = _u;
-    wmb();
+    smp_wmb();
     /* 3. Update guest kernel version. */
     u->version = version_update_end(u->version);
 
@@ -1006,10 +1006,10 @@ bool_t update_secondary_system_time(struct vcpu *v,
         smap_policy_change(v, saved_policy);
         return 0;
     }
-    wmb();
+    smp_wmb();
     /* 2. Update all other userspace fields. */
     __copy_to_guest(user_u, u, 1);
-    wmb();
+    smp_wmb();
     /* 3. Update userspace version. */
     u->version = version_update_end(u->version);
     __copy_field_to_guest(user_u, u, version);
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index ea9f7e7..4fcf6fa 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -559,7 +559,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
             return;
         }
         udelay(1);
-        rmb();
+        smp_rmb();
         code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
                                       IOMMU_EVENT_CODE_SHIFT);
     }
@@ -664,7 +664,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
             return;
         }
         udelay(1);
-        rmb();
+        smp_rmb();
         code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
                                       IOMMU_PPR_LOG_CODE_SHIFT);
     }
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index da924bf..9956aae 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -128,10 +128,10 @@ static inline void _write_gate_lower(volatile idt_entry_t *gate,
 #define _set_gate(gate_addr,type,dpl,addr)               \
 do {                                                     \
     (gate_addr)->a = 0;                                  \
-    wmb(); /* disable gate /then/ rewrite */             \
+    smp_wmb(); /* disable gate /then/ rewrite */         \
     (gate_addr)->b =                                     \
         ((unsigned long)(addr) >> 32);                   \
-    wmb(); /* rewrite /then/ enable gate */              \
+    smp_wmb(); /* rewrite /then/ enable gate */          \
     (gate_addr)->a =                                     \
         (((unsigned long)(addr) & 0xFFFF0000UL) << 32) | \
         ((unsigned long)(dpl) << 45) |                   \
@@ -174,11 +174,11 @@ static inline void _update_gate_addr_lower(idt_entry_t *gate, void *addr)
 #define _set_tssldt_desc(desc,addr,limit,type)           \
 do {                                                     \
     (desc)[0].b = (desc)[1].b = 0;                       \
-    wmb(); /* disable entry /then/ rewrite */            \
+    smp_wmb(); /* disable entry /then/ rewrite */        \
     (desc)[0].a =                                        \
         ((u32)(addr) << 16) | ((u32)(limit) & 0xFFFF);   \
     (desc)[1].a = (u32)(((unsigned long)(addr)) >> 32);  \
-    wmb(); /* rewrite /then/ enable entry */             \
+    smp_wmb(); /* rewrite /then/ enable entry */         \
     (desc)[0].b =                                        \
         ((u32)(addr) & 0xFF000000U) |                    \
         ((u32)(type) << 8) | 0x8000U |                   \
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index eb498f5..9cb6fd7 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -183,7 +183,7 @@ static always_inline unsigned long __xadd(
 #define smp_wmb()       wmb()
 
 #define set_mb(var, value) do { xchg(&var, value); } while (0)
-#define set_wmb(var, value) do { var = value; wmb(); } while (0)
+#define set_wmb(var, value) do { var = value; smp_wmb(); } while (0)
 
 #define local_irq_disable()     asm volatile ( "cli" : : : "memory" )
 #define local_irq_enable()      asm volatile ( "sti" : : : "memory" )
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 10:05 ` [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
@ 2016-12-05 11:47   ` Jan Beulich
  2016-12-05 13:29     ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 11:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
>  
>      if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
>      {
> -        mb();
> +        smp_mb();
>          clflush((void *)&mwait_wakeup(cpu));
> -        mb();
> +        smp_mb();
>      }

Both need to stay as they are afaict: In order for the clflush() to do
what we want we have to order it wrt earlier as well as later writes,
regardless of SMP-ness. Or wait - the SDM has changed in that
respect (and a footnote describes the earlier specified behavior now).
AMD, otoh, continues to require MFENCE for ordering purposes.

> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
> @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data)
>  			/* Counter enable */
>  			value |= (1ULL << 51);
>  			mca_wrmsr(MSR_IA32_MCx_MISC(4), value);
> -			wmb();
> +			smp_wmb();
>  		}
>  	}
>  
> @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
>  			value |= (1ULL << 51);
>  			wrmsrl(MSR_IA32_MCx_MISC(4), value);
>  			/* serialize */
> -			wmb();
> +			smp_wmb();
>  			printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n");
>  		}
>  	}

These will need confirming by AMD engineers.

> --- a/xen/arch/x86/cpu/mcheck/barrier.c
> +++ b/xen/arch/x86/cpu/mcheck/barrier.c
> @@ -12,7 +12,7 @@ void mce_barrier_init(struct mce_softirq_barrier *bar)
>  void mce_barrier_dec(struct mce_softirq_barrier *bar)
>  {
>      atomic_inc(&bar->outgen);
> -    wmb();
> +    smp_wmb();
>      atomic_dec(&bar->val);
>  }

This being x86-specific code - do we need a barrier here at all,
between two LOCKed instructions? (Same for two more cases further
down.)

> @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which)
>  	}
>  
>  	mctelem_processing_hold(tep);
> -	wmb();
> +	smp_wmb();
>  	spin_unlock(&processing_lock);

Don't we imply unlocks to be write barriers?

> --- a/xen/arch/x86/io_apic.c
> +++ b/xen/arch/x86/io_apic.c
> @@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void)
>      unsigned long t1, flags;
>  
>      t1 = pit0_ticks;
> -    mb();
> +    smp_mb();
>  
>      local_save_flags(flags);
>      local_irq_enable();
> @@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void)
>       * might have cached one ExtINT interrupt.  Finally, at
>       * least one tick may be lost due to delays.
>       */
> -    mb();
> +    smp_mb();
>      if (pit0_ticks - t1 > 4)
>          return 1;

I think these two can be barrier().

> @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave)
>      for ( i = 1; i <= 5; i++ )
>      {
>          tsc_value = rdtsc_ordered();
> -        wmb();
> +        smp_wmb();
>          atomic_inc(&tsc_count);

Same question as above wrt the following LOCKed instruction.

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -559,7 +559,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>              return;
>          }
>          udelay(1);
> -        rmb();
> +        smp_rmb();
>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>                                        IOMMU_EVENT_CODE_SHIFT);
>      }
> @@ -664,7 +664,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
>              return;
>          }
>          udelay(1);
> -        rmb();
> +        smp_rmb();
>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>      }

Don't these fall into the "reduced-cacheability memory mappings"
category? Or if the mappings these reads go through are UC,
aren't they unneeded altogether? In any event this would better be
a separate patch Cc-ed to the AMD IOMMU maintainers.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 11:47   ` Jan Beulich
@ 2016-12-05 13:29     ` Andrew Cooper
  2016-12-05 14:03       ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 13:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 05/12/16 11:47, Jan Beulich wrote:
>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/acpi/cpu_idle.c
>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
>>  
>>      if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
>>      {
>> -        mb();
>> +        smp_mb();
>>          clflush((void *)&mwait_wakeup(cpu));
>> -        mb();
>> +        smp_mb();
>>      }
> Both need to stay as they are afaict: In order for the clflush() to do
> what we want we have to order it wrt earlier as well as later writes,
> regardless of SMP-ness. Or wait - the SDM has changed in that
> respect (and a footnote describes the earlier specified behavior now).
> AMD, otoh, continues to require MFENCE for ordering purposes.

mb() == smp_mb().  They are both mfence instructions.

However, if AMD specifically requires mfence, we should explicitly use
that rather than relying on the implementation details of smp_mb().

>
>> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
>> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
>> @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data)
>>  			/* Counter enable */
>>  			value |= (1ULL << 51);
>>  			mca_wrmsr(MSR_IA32_MCx_MISC(4), value);
>> -			wmb();
>> +			smp_wmb();
>>  		}
>>  	}
>>  
>> @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
>>  			value |= (1ULL << 51);
>>  			wrmsrl(MSR_IA32_MCx_MISC(4), value);
>>  			/* serialize */
>> -			wmb();
>> +			smp_wmb();
>>  			printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n");
>>  		}
>>  	}
> These will need confirming by AMD engineers.

I was uncertain whether these were necessary at all, but as identified
in the commit message, this is no functional change as Xen currently has
rmb/wmb as plain barriers, not fence instructions.

>
>> --- a/xen/arch/x86/cpu/mcheck/barrier.c
>> +++ b/xen/arch/x86/cpu/mcheck/barrier.c
>> @@ -12,7 +12,7 @@ void mce_barrier_init(struct mce_softirq_barrier *bar)
>>  void mce_barrier_dec(struct mce_softirq_barrier *bar)
>>  {
>>      atomic_inc(&bar->outgen);
>> -    wmb();
>> +    smp_wmb();
>>      atomic_dec(&bar->val);
>>  }
> This being x86-specific code - do we need a barrier here at all,
> between two LOCKed instructions? (Same for two more cases further
> down.)

Hmm, I think not.  The C semantics guarantees ordering of the
atomic_inc() and atomic_dec(), and they are both writes to will be
strictly ordered.

>
>> @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which)
>>  	}
>>  
>>  	mctelem_processing_hold(tep);
>> -	wmb();
>> +	smp_wmb();
>>  	spin_unlock(&processing_lock);
> Don't we imply unlocks to be write barriers?

They are, as an unlock is necessarily a write, combined with x86's
ordering guarantees.

Then again, I am not sure how this would interact with TSX, so am not
sure if we should assume or rely on such behaviour.

>
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void)
>>      unsigned long t1, flags;
>>  
>>      t1 = pit0_ticks;
>> -    mb();
>> +    smp_mb();
>>  
>>      local_save_flags(flags);
>>      local_irq_enable();
>> @@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void)
>>       * might have cached one ExtINT interrupt.  Finally, at
>>       * least one tick may be lost due to delays.
>>       */
>> -    mb();
>> +    smp_mb();
>>      if (pit0_ticks - t1 > 4)
>>          return 1;
> I think these two can be barrier().

These were originally in the previous patch, but I wasn't so certain and
moved them back here.

Thinking about it further, pit0_ticks is only ever updated by the
current cpu, so so-long as the compiler doesn't elide the read, it
should be fine.

>
>> @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave)
>>      for ( i = 1; i <= 5; i++ )
>>      {
>>          tsc_value = rdtsc_ordered();
>> -        wmb();
>> +        smp_wmb();
>>          atomic_inc(&tsc_count);
> Same question as above wrt the following LOCKed instruction.

I'm not sure the locked instruction is relevant here.  C's
ordering-model is sufficient to make this correct.

>
>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>> @@ -559,7 +559,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>>              return;
>>          }
>>          udelay(1);
>> -        rmb();
>> +        smp_rmb();
>>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>>                                        IOMMU_EVENT_CODE_SHIFT);
>>      }
>> @@ -664,7 +664,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
>>              return;
>>          }
>>          udelay(1);
>> -        rmb();
>> +        smp_rmb();
>>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>>      }
> Don't these fall into the "reduced-cacheability memory mappings"
> category? Or if the mappings these reads go through are UC,
> aren't they unneeded altogether? In any event this would better be
> a separate patch Cc-ed to the AMD IOMMU maintainers.

I can't find any requirements in the AMD IOMMU spec about the
cacheability of mappings.

We use UC mappings, although frankly for the starting memset alone, we
should probably better using WB followed by an explicit cache flush,
then switching to UC.

With UC mappings, we don't need any barriers at all.  I will submit a
patch separately removing them.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 13:29     ` Andrew Cooper
@ 2016-12-05 14:03       ` Jan Beulich
  2016-12-05 14:24         ` Andrew Cooper
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 14:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.12.16 at 14:29, <andrew.cooper3@citrix.com> wrote:
> On 05/12/16 11:47, Jan Beulich wrote:
>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/acpi/cpu_idle.c
>>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>>> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
>>>  
>>>      if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
>>>      {
>>> -        mb();
>>> +        smp_mb();
>>>          clflush((void *)&mwait_wakeup(cpu));
>>> -        mb();
>>> +        smp_mb();
>>>      }
>> Both need to stay as they are afaict: In order for the clflush() to do
>> what we want we have to order it wrt earlier as well as later writes,
>> regardless of SMP-ness. Or wait - the SDM has changed in that
>> respect (and a footnote describes the earlier specified behavior now).
>> AMD, otoh, continues to require MFENCE for ordering purposes.
> 
> mb() == smp_mb().  They are both mfence instructions.

Of course. But still smp_mb() would be less correct from an
abstract perspective, as here we care only about the local CPU.
That said, ...

> However, if AMD specifically requires mfence, we should explicitly use
> that rather than relying on the implementation details of smp_mb().

... I'd be fine with this.

>>> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
>>> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
>>> @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data)
>>>  			/* Counter enable */
>>>  			value |= (1ULL << 51);
>>>  			mca_wrmsr(MSR_IA32_MCx_MISC(4), value);
>>> -			wmb();
>>> +			smp_wmb();
>>>  		}
>>>  	}
>>>  
>>> @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
>>>  			value |= (1ULL << 51);
>>>  			wrmsrl(MSR_IA32_MCx_MISC(4), value);
>>>  			/* serialize */
>>> -			wmb();
>>> +			smp_wmb();
>>>  			printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n");
>>>  		}
>>>  	}
>> These will need confirming by AMD engineers.
> 
> I was uncertain whether these were necessary at all, but as identified
> in the commit message, this is no functional change as Xen currently has
> rmb/wmb as plain barriers, not fence instructions.

And may hence be subtly broken, if this code was lifted from Linux?

>>> @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which)
>>>  	}
>>>  
>>>  	mctelem_processing_hold(tep);
>>> -	wmb();
>>> +	smp_wmb();
>>>  	spin_unlock(&processing_lock);
>> Don't we imply unlocks to be write barriers?
> 
> They are, as an unlock is necessarily a write, combined with x86's
> ordering guarantees.
> 
> Then again, I am not sure how this would interact with TSX, so am not
> sure if we should assume or rely on such behaviour.

Isn't architectural state at the end of a transactional region
indistinguishable as to whether TSX was actually used or the
abort path taken (assuming the two code patch don't differ
in their actions)?

>>> --- a/xen/arch/x86/io_apic.c
>>> +++ b/xen/arch/x86/io_apic.c
>>> @@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void)
>>>      unsigned long t1, flags;
>>>  
>>>      t1 = pit0_ticks;
>>> -    mb();
>>> +    smp_mb();
>>>  
>>>      local_save_flags(flags);
>>>      local_irq_enable();
>>> @@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void)
>>>       * might have cached one ExtINT interrupt.  Finally, at
>>>       * least one tick may be lost due to delays.
>>>       */
>>> -    mb();
>>> +    smp_mb();
>>>      if (pit0_ticks - t1 > 4)
>>>          return 1;
>> I think these two can be barrier().
> 
> These were originally in the previous patch, but I wasn't so certain and
> moved them back here.
> 
> Thinking about it further, pit0_ticks is only ever updated by the
> current cpu, so so-long as the compiler doesn't elide the read, it
> should be fine.

That was exactly my thinking when giving the comment.

>>> @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave)
>>>      for ( i = 1; i <= 5; i++ )
>>>      {
>>>          tsc_value = rdtsc_ordered();
>>> -        wmb();
>>> +        smp_wmb();
>>>          atomic_inc(&tsc_count);
>> Same question as above wrt the following LOCKed instruction.
> 
> I'm not sure the locked instruction is relevant here.  C's
> ordering-model is sufficient to make this correct.

I don't follow - the two involved variables are distinct, so I don't
see how C's ordering model helps here at all. We need (at the
machine level) the write to tsc_value to precede the increment
of tsc_count, and I don't think C alone guarantees any such
ordering.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 14:03       ` Jan Beulich
@ 2016-12-05 14:24         ` Andrew Cooper
  2016-12-05 14:33           ` Jan Beulich
  0 siblings, 1 reply; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 14:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 05/12/16 14:03, Jan Beulich wrote:
>>>> On 05.12.16 at 14:29, <andrew.cooper3@citrix.com> wrote:
>> On 05/12/16 11:47, Jan Beulich wrote:
>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/arch/x86/acpi/cpu_idle.c
>>>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>>>> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
>>>>  
>>>>      if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
>>>>      {
>>>> -        mb();
>>>> +        smp_mb();
>>>>          clflush((void *)&mwait_wakeup(cpu));
>>>> -        mb();
>>>> +        smp_mb();
>>>>      }
>>> Both need to stay as they are afaict: In order for the clflush() to do
>>> what we want we have to order it wrt earlier as well as later writes,
>>> regardless of SMP-ness. Or wait - the SDM has changed in that
>>> respect (and a footnote describes the earlier specified behavior now).
>>> AMD, otoh, continues to require MFENCE for ordering purposes.
>> mb() == smp_mb().  They are both mfence instructions.
> Of course. But still smp_mb() would be less correct from an
> abstract perspective

? That is entirely the purpose and intended meaning of the abstraction.

smp_mb() orders operations such that (visible to other CPUs in the
system), all writes will have completed before any subsequent reads begin.

> , as here we care only about the local CPU.
> That said, ...
>
>> However, if AMD specifically requires mfence, we should explicitly use
>> that rather than relying on the implementation details of smp_mb().
> ... I'd be fine with this.

An earlier version of the series introduced explicit {m,s,l}fence()
defines.  I will reintroduce these.

>
>>>> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
>>>> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
>>>> @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data)
>>>>  			/* Counter enable */
>>>>  			value |= (1ULL << 51);
>>>>  			mca_wrmsr(MSR_IA32_MCx_MISC(4), value);
>>>> -			wmb();
>>>> +			smp_wmb();
>>>>  		}
>>>>  	}
>>>>  
>>>> @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
>>>>  			value |= (1ULL << 51);
>>>>  			wrmsrl(MSR_IA32_MCx_MISC(4), value);
>>>>  			/* serialize */
>>>> -			wmb();
>>>> +			smp_wmb();
>>>>  			printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n");
>>>>  		}
>>>>  	}
>>> These will need confirming by AMD engineers.
>> I was uncertain whether these were necessary at all, but as identified
>> in the commit message, this is no functional change as Xen currently has
>> rmb/wmb as plain barriers, not fence instructions.
> And may hence be subtly broken, if this code was lifted from Linux?

It doesn't resemble anything in Linux these days.  I don't know if that
means we have lagged, or it was developed independently.

Looking at the Linux code, there are a few mandatory barriers which
should all be SMP barriers instead (guarding updates of shared memory),
but no barriers at all around MSR reads or writes.

>
>>>> @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which)
>>>>  	}
>>>>  
>>>>  	mctelem_processing_hold(tep);
>>>> -	wmb();
>>>> +	smp_wmb();
>>>>  	spin_unlock(&processing_lock);
>>> Don't we imply unlocks to be write barriers?
>> They are, as an unlock is necessarily a write, combined with x86's
>> ordering guarantees.
>>
>> Then again, I am not sure how this would interact with TSX, so am not
>> sure if we should assume or rely on such behaviour.
> Isn't architectural state at the end of a transactional region
> indistinguishable as to whether TSX was actually used or the
> abort path taken (assuming the two code patch don't differ
> in their actions)?

I'd hope so, but I haven't had occasion to dig into TSX in detail yet.

>>>> @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave)
>>>>      for ( i = 1; i <= 5; i++ )
>>>>      {
>>>>          tsc_value = rdtsc_ordered();
>>>> -        wmb();
>>>> +        smp_wmb();
>>>>          atomic_inc(&tsc_count);
>>> Same question as above wrt the following LOCKed instruction.
>> I'm not sure the locked instruction is relevant here.  C's
>> ordering-model is sufficient to make this correct.
> I don't follow - the two involved variables are distinct, so I don't
> see how C's ordering model helps here at all. We need (at the
> machine level) the write to tsc_value to precede the increment
> of tsc_count, and I don't think C alone guarantees any such
> ordering.

Sorry - I looked at that code and thought they were both using tsc_value.

Yes, we do need at least a compiler barrer here.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers
  2016-12-05 14:24         ` Andrew Cooper
@ 2016-12-05 14:33           ` Jan Beulich
  0 siblings, 0 replies; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 14:33 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.12.16 at 15:24, <andrew.cooper3@citrix.com> wrote:
> On 05/12/16 14:03, Jan Beulich wrote:
>>>>> On 05.12.16 at 14:29, <andrew.cooper3@citrix.com> wrote:
>>> On 05/12/16 11:47, Jan Beulich wrote:
>>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>>>>> --- a/xen/arch/x86/acpi/cpu_idle.c
>>>>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>>>>> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int 
> ecx)
>>>>>  
>>>>>      if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
>>>>>      {
>>>>> -        mb();
>>>>> +        smp_mb();
>>>>>          clflush((void *)&mwait_wakeup(cpu));
>>>>> -        mb();
>>>>> +        smp_mb();
>>>>>      }
>>>> Both need to stay as they are afaict: In order for the clflush() to do
>>>> what we want we have to order it wrt earlier as well as later writes,
>>>> regardless of SMP-ness. Or wait - the SDM has changed in that
>>>> respect (and a footnote describes the earlier specified behavior now).
>>>> AMD, otoh, continues to require MFENCE for ordering purposes.
>>> mb() == smp_mb().  They are both mfence instructions.
>> Of course. But still smp_mb() would be less correct from an
>> abstract perspective
> 
> ? That is entirely the purpose and intended meaning of the abstraction.
> 
> smp_mb() orders operations such that (visible to other CPUs in the
> system), all writes will have completed before any subsequent reads begin.

Yes, but this code is not about multiple CPUs, but just the local
one (we want to make sure CLFLUSH does what we want). Hence
using smp_ prefixed barriers would be wrong. But anyway, with
this becoming explicit mfence() invocations, the discussion is all
moot now.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper
                   ` (2 preceding siblings ...)
  2016-12-05 10:05 ` [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
@ 2016-12-05 10:05 ` Andrew Cooper
  2016-12-05 10:11   ` Juergen Gross
                     ` (2 more replies)
  3 siblings, 3 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Barriers are a complicated topic, a common source of confusion in submitted
code, and their incorrect use is a common cause of bugs.  It *really* doesn't
help when Xen's API is the same as Linux, but its ABI different.

Bring the two back in line, so programmers stand a chance of actually getting
their use correct.

As Xen has no current need for mandatory barriers, leave them commented out to
avoid accidential misue.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/include/asm-x86/system.h        | 31 +++++++++++++++++++++++--------
 xen/include/asm-x86/x86_64/system.h |  3 ---
 2 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 9cb6fd7..9cd401a 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -164,23 +164,38 @@ static always_inline unsigned long __xadd(
     ((typeof(*(ptr)))__xadd(ptr, (typeof(*(ptr)))(v), sizeof(*(ptr))))
 
 /*
+ * Mandatory barriers, for the ordering of reads and writes with MMIO devices
+ * mapped with reduced cacheability.
+ *
+ * Xen has no such device drivers, and therefore no need for mandatory
+ * barriers.  These these are hidden to avoid their misuse; If a future need
+ * is found, they can be re-introduced, but chances are very good that a
+ * programmer actually should be using the smp_*() barriers.
+ *
+#define mb()            asm volatile ("mfence" ::: "memory")
+#define rmb()           asm volatile ("lfence" ::: "memory")
+#define wmb()           asm volatile ("sfence" ::: "memory")
+ */
+
+/*
+ * SMP barriers, for ordering of reads and writes between CPUs, most commonly
+ * used with shared memory.
+ *
  * Both Intel and AMD agree that, from a programmer's viewpoint:
  *  Loads cannot be reordered relative to other loads.
  *  Stores cannot be reordered relative to other stores.
- * 
+ *  Loads may be reordered ahead of an unaliasing store.
+ *
  * Intel64 Architecture Memory Ordering White Paper
  * <http://developer.intel.com/products/processor/manuals/318147.pdf>
- * 
+ *
  * AMD64 Architecture Programmer's Manual, Volume 2: System Programming
  * <http://www.amd.com/us-en/assets/content_type/\
  *  white_papers_and_tech_docs/24593.pdf>
  */
-#define rmb()           barrier()
-#define wmb()           barrier()
-
-#define smp_mb()        mb()
-#define smp_rmb()       rmb()
-#define smp_wmb()       wmb()
+#define smp_mb()        asm volatile ("mfence" ::: "memory")
+#define smp_rmb()       barrier()
+#define smp_wmb()       barrier()
 
 #define set_mb(var, value) do { xchg(&var, value); } while (0)
 #define set_wmb(var, value) do { var = value; smp_wmb(); } while (0)
diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
index 7026c05..bdf45e5 100644
--- a/xen/include/asm-x86/x86_64/system.h
+++ b/xen/include/asm-x86/x86_64/system.h
@@ -79,7 +79,4 @@ static always_inline __uint128_t __cmpxchg16b(
     _rc;                                                                \
 })
 
-#define mb()                    \
-    asm volatile ( "mfence" : : : "memory" )
-
 #endif /* __X86_64_SYSTEM_H__ */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
@ 2016-12-05 10:11   ` Juergen Gross
  2016-12-05 13:45     ` Andrew Cooper
  2016-12-05 11:51   ` Jan Beulich
  2016-12-05 12:01   ` David Vrabel
  2 siblings, 1 reply; 36+ messages in thread
From: Juergen Gross @ 2016-12-05 10:11 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich

On 05/12/16 11:05, Andrew Cooper wrote:
> Barriers are a complicated topic, a common source of confusion in submitted
> code, and their incorrect use is a common cause of bugs.  It *really* doesn't
> help when Xen's API is the same as Linux, but its ABI different.
> 
> Bring the two back in line, so programmers stand a chance of actually getting
> their use correct.
> 
> As Xen has no current need for mandatory barriers, leave them commented out to
> avoid accidential misue.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> ---
>  xen/include/asm-x86/system.h        | 31 +++++++++++++++++++++++--------
>  xen/include/asm-x86/x86_64/system.h |  3 ---
>  2 files changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
> index 9cb6fd7..9cd401a 100644
> --- a/xen/include/asm-x86/system.h
> +++ b/xen/include/asm-x86/system.h
> @@ -164,23 +164,38 @@ static always_inline unsigned long __xadd(
>      ((typeof(*(ptr)))__xadd(ptr, (typeof(*(ptr)))(v), sizeof(*(ptr))))
>  
>  /*
> + * Mandatory barriers, for the ordering of reads and writes with MMIO devices
> + * mapped with reduced cacheability.
> + *
> + * Xen has no such device drivers, and therefore no need for mandatory
> + * barriers.  These these are hidden to avoid their misuse; If a future need

Duplicate "these".


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2016-12-05 10:11   ` Juergen Gross
@ 2016-12-05 13:45     ` Andrew Cooper
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 13:45 UTC (permalink / raw)
  To: Juergen Gross, Xen-devel; +Cc: Jan Beulich

On 05/12/16 10:11, Juergen Gross wrote:
> On 05/12/16 11:05, Andrew Cooper wrote:
>> Barriers are a complicated topic, a common source of confusion in submitted
>> code, and their incorrect use is a common cause of bugs.  It *really* doesn't
>> help when Xen's API is the same as Linux, but its ABI different.
>>
>> Bring the two back in line, so programmers stand a chance of actually getting
>> their use correct.
>>
>> As Xen has no current need for mandatory barriers, leave them commented out to
>> avoid accidential misue.
>>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> ---
>>  xen/include/asm-x86/system.h        | 31 +++++++++++++++++++++++--------
>>  xen/include/asm-x86/x86_64/system.h |  3 ---
>>  2 files changed, 23 insertions(+), 11 deletions(-)
>>
>> diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
>> index 9cb6fd7..9cd401a 100644
>> --- a/xen/include/asm-x86/system.h
>> +++ b/xen/include/asm-x86/system.h
>> @@ -164,23 +164,38 @@ static always_inline unsigned long __xadd(
>>      ((typeof(*(ptr)))__xadd(ptr, (typeof(*(ptr)))(v), sizeof(*(ptr))))
>>  
>>  /*
>> + * Mandatory barriers, for the ordering of reads and writes with MMIO devices
>> + * mapped with reduced cacheability.
>> + *
>> + * Xen has no such device drivers, and therefore no need for mandatory
>> + * barriers.  These these are hidden to avoid their misuse; If a future need
> Duplicate "these".

Fixed, thanks.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
  2016-12-05 10:11   ` Juergen Gross
@ 2016-12-05 11:51   ` Jan Beulich
  2016-12-05 14:08     ` Andrew Cooper
  2016-12-05 12:01   ` David Vrabel
  2 siblings, 1 reply; 36+ messages in thread
From: Jan Beulich @ 2016-12-05 11:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
> Barriers are a complicated topic, a common source of confusion in submitted
> code, and their incorrect use is a common cause of bugs.  It *really* doesn't
> help when Xen's API is the same as Linux, but its ABI different.
> 
> Bring the two back in line, so programmers stand a chance of actually getting
> their use correct.
> 
> As Xen has no current need for mandatory barriers, leave them commented out to
> avoid accidential misue.

I'm not convinced of this part - common driver code could very well
need to use such barriers, and I would find it rather odd if I had to
first re-introduce them when adding any such. In fact I'm surprised
there's no such use anywhere. (Depending on the discussion on
earlier patches, there may in fact remain a few such uses.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2016-12-05 11:51   ` Jan Beulich
@ 2016-12-05 14:08     ` Andrew Cooper
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 14:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 05/12/16 11:51, Jan Beulich wrote:
>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote:
>> Barriers are a complicated topic, a common source of confusion in submitted
>> code, and their incorrect use is a common cause of bugs.  It *really* doesn't
>> help when Xen's API is the same as Linux, but its ABI different.
>>
>> Bring the two back in line, so programmers stand a chance of actually getting
>> their use correct.
>>
>> As Xen has no current need for mandatory barriers, leave them commented out to
>> avoid accidential misue.
> I'm not convinced of this part - common driver code could very well
> need to use such barriers, and I would find it rather odd if I had to
> first re-introduce them when adding any such. In fact I'm surprised
> there's no such use anywhere. (Depending on the discussion on
> earlier patches, there may in fact remain a few such uses.)

I'm not surprised that there is no common use the mandatory barriers. 
Xen has no interesting device drivers using mappings other than WB or UC.

As for hiding, I am betting that it is far more likely that an
introduction is (mis)use of a barrier, rather than a genuine real use.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
  2016-12-05 10:11   ` Juergen Gross
  2016-12-05 11:51   ` Jan Beulich
@ 2016-12-05 12:01   ` David Vrabel
  2016-12-05 13:46     ` Andrew Cooper
  2 siblings, 1 reply; 36+ messages in thread
From: David Vrabel @ 2016-12-05 12:01 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich

On 05/12/16 10:05, Andrew Cooper wrote:
>
> + *  Loads may be reordered ahead of an unaliasing store.

It might be useful to summarize what an "unaliasing store" is.

David


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2016-12-05 12:01   ` David Vrabel
@ 2016-12-05 13:46     ` Andrew Cooper
  0 siblings, 0 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 13:46 UTC (permalink / raw)
  To: David Vrabel, Xen-devel; +Cc: Jan Beulich

On 05/12/16 12:01, David Vrabel wrote:
> On 05/12/16 10:05, Andrew Cooper wrote:
>> + *  Loads may be reordered ahead of an unaliasing store.
> It might be useful to summarize what an "unaliasing store" is.

"... ahead of stores, so long as the addresses don't alias" ?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-12-07 18:55 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper
2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
2016-12-05 10:55   ` Jan Beulich
2016-12-05 19:07   ` Stefano Stabellini
2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper
2016-12-05 11:18   ` Jan Beulich
2016-12-05 11:25     ` Andrew Cooper
2016-12-05 12:28       ` Jan Beulich
2016-12-05 13:43         ` Andrew Cooper
2016-12-05 13:50           ` Jan Beulich
2016-12-05 13:59             ` Andrew Cooper
2016-12-05 14:07               ` Jan Beulich
2016-12-05 19:14                 ` Stefano Stabellini
2016-12-05 19:17   ` Stefano Stabellini
2016-12-06  0:10     ` Andrew Cooper
2016-12-06 20:27       ` Stefano Stabellini
2016-12-06 20:32         ` Stefano Stabellini
2016-12-07  1:03           ` Andrew Cooper
2016-12-07  1:20             ` Stefano Stabellini
2016-12-07  1:46               ` Andrew Cooper
2016-12-07 18:31           ` Julien Grall
2016-12-07 18:44             ` Stefano Stabellini
2016-12-07 18:55               ` Julien Grall
2016-12-05 10:05 ` [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
2016-12-05 11:47   ` Jan Beulich
2016-12-05 13:29     ` Andrew Cooper
2016-12-05 14:03       ` Jan Beulich
2016-12-05 14:24         ` Andrew Cooper
2016-12-05 14:33           ` Jan Beulich
2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
2016-12-05 10:11   ` Juergen Gross
2016-12-05 13:45     ` Andrew Cooper
2016-12-05 11:51   ` Jan Beulich
2016-12-05 14:08     ` Andrew Cooper
2016-12-05 12:01   ` David Vrabel
2016-12-05 13:46     ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).