* [PATCH 0/4] Fixes to common and x86 barriers
@ 2016-12-05 10:05 Andrew Cooper
2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper
` (3 more replies)
0 siblings, 4 replies; 36+ messages in thread
From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper
Andrew Cooper (4):
xen/common: Replace incorrect mandatory barriers with SMP barriers
xen/x86: Drop erronious barriers
xen/x86: Replace incorrect mandatory barriers with SMP barriers
xen/x86: Correct mandatory and SMP barrier definitions
xen/arch/x86/acpi/cpu_idle.c | 10 ++++------
xen/arch/x86/cpu/mcheck/amd_nonfatal.c | 4 ++--
xen/arch/x86/cpu/mcheck/barrier.c | 10 +++++-----
xen/arch/x86/cpu/mcheck/mce.c | 3 +--
xen/arch/x86/cpu/mcheck/mctelem.c | 10 +++++-----
xen/arch/x86/crash.c | 3 ---
xen/arch/x86/genapic/x2apic.c | 6 +++---
xen/arch/x86/hpet.c | 6 +++---
xen/arch/x86/hvm/ioreq.c | 4 ++--
xen/arch/x86/io_apic.c | 4 ++--
xen/arch/x86/irq.c | 4 ++--
xen/arch/x86/mm.c | 10 +++++-----
xen/arch/x86/mm/shadow/multi.c | 2 +-
xen/arch/x86/smpboot.c | 14 ++++++--------
xen/arch/x86/time.c | 8 ++++----
xen/common/grant_table.c | 2 +-
xen/common/time.c | 4 ++--
xen/common/vm_event.c | 6 +++---
xen/drivers/passthrough/amd/iommu_init.c | 4 ++--
xen/include/asm-x86/desc.h | 8 ++++----
xen/include/asm-x86/system.h | 33 +++++++++++++++++++++++---------
xen/include/asm-x86/x86_64/system.h | 3 ---
22 files changed, 81 insertions(+), 77 deletions(-)
--
2.1.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 36+ messages in thread* [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper @ 2016-12-05 10:05 ` Andrew Cooper 2016-12-05 10:55 ` Jan Beulich 2016-12-05 19:07 ` Stefano Stabellini 2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper ` (2 subsequent siblings) 3 siblings, 2 replies; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw) To: Xen-devel; +Cc: Andrew Cooper, Julien Grall, Stefano Stabellini, Jan Beulich Mandatory barriers are only for use with reduced-cacheability MMIO mappings. All of these uses are just to deal with shared memory between multiple processors, so use the smp_*() which are the correct barriers for the purpose. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien.grall@arm.com> Restricting to just the $ARCH maintainers, as this is a project-wide sweep --- xen/common/grant_table.c | 2 +- xen/common/time.c | 4 ++-- xen/common/vm_event.c | 6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c index e2c4097..a425a9e 100644 --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -993,7 +993,7 @@ __gnttab_map_grant_ref( mt = &maptrack_entry(lgt, handle); mt->domid = op->dom; mt->ref = op->ref; - wmb(); + smp_wmb(); write_atomic(&mt->flags, op->flags); if ( need_iommu ) diff --git a/xen/common/time.c b/xen/common/time.c index 721ada8..a7caea9 100644 --- a/xen/common/time.c +++ b/xen/common/time.c @@ -103,7 +103,7 @@ void update_domain_wallclock_time(struct domain *d) wc_version = &shared_info(d, wc_version); *wc_version = version_update_begin(*wc_version); - wmb(); + smp_wmb(); sec = wc_sec + d->time_offset_seconds; shared_info(d, wc_sec) = sec; @@ -117,7 +117,7 @@ void update_domain_wallclock_time(struct domain *d) shared_info(d, wc_sec_hi) = sec >> 32; #endif - wmb(); + smp_wmb(); *wc_version = version_update_end(*wc_version); spin_unlock(&wc_lock); diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c index 907ab40..c6f7d32 100644 --- a/xen/common/vm_event.c +++ b/xen/common/vm_event.c @@ -31,9 +31,9 @@ #include <xsm/xsm.h> /* for public/io/ring.h macros */ -#define xen_mb() mb() -#define xen_rmb() rmb() -#define xen_wmb() wmb() +#define xen_mb() smp_mb() +#define xen_rmb() smp_rmb() +#define xen_wmb() smp_wmb() #define vm_event_ring_lock_init(_ved) spin_lock_init(&(_ved)->ring_lock) #define vm_event_ring_lock(_ved) spin_lock(&(_ved)->ring_lock) -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper @ 2016-12-05 10:55 ` Jan Beulich 2016-12-05 19:07 ` Stefano Stabellini 1 sibling, 0 replies; 36+ messages in thread From: Jan Beulich @ 2016-12-05 10:55 UTC (permalink / raw) To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel >>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: > Mandatory barriers are only for use with reduced-cacheability MMIO mappings. > > All of these uses are just to deal with shared memory between multiple > processors, so use the smp_*() which are the correct barriers for the > purpose. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper 2016-12-05 10:55 ` Jan Beulich @ 2016-12-05 19:07 ` Stefano Stabellini 1 sibling, 0 replies; 36+ messages in thread From: Stefano Stabellini @ 2016-12-05 19:07 UTC (permalink / raw) To: Andrew Cooper; +Cc: Julien Grall, Stefano Stabellini, Jan Beulich, Xen-devel On Mon, 5 Dec 2016, Andrew Cooper wrote: > Mandatory barriers are only for use with reduced-cacheability MMIO mappings. > > All of these uses are just to deal with shared memory between multiple > processors, so use the smp_*() which are the correct barriers for the purpose. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > CC: Jan Beulich <JBeulich@suse.com> > CC: Stefano Stabellini <sstabellini@kernel.org> > CC: Julien Grall <julien.grall@arm.com> > > Restricting to just the $ARCH maintainers, as this is a project-wide sweep > --- > xen/common/grant_table.c | 2 +- > xen/common/time.c | 4 ++-- > xen/common/vm_event.c | 6 +++--- > 3 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c > index e2c4097..a425a9e 100644 > --- a/xen/common/grant_table.c > +++ b/xen/common/grant_table.c > @@ -993,7 +993,7 @@ __gnttab_map_grant_ref( > mt = &maptrack_entry(lgt, handle); > mt->domid = op->dom; > mt->ref = op->ref; > - wmb(); > + smp_wmb(); > write_atomic(&mt->flags, op->flags); > > if ( need_iommu ) > diff --git a/xen/common/time.c b/xen/common/time.c > index 721ada8..a7caea9 100644 > --- a/xen/common/time.c > +++ b/xen/common/time.c > @@ -103,7 +103,7 @@ void update_domain_wallclock_time(struct domain *d) > > wc_version = &shared_info(d, wc_version); > *wc_version = version_update_begin(*wc_version); > - wmb(); > + smp_wmb(); > > sec = wc_sec + d->time_offset_seconds; > shared_info(d, wc_sec) = sec; > @@ -117,7 +117,7 @@ void update_domain_wallclock_time(struct domain *d) > shared_info(d, wc_sec_hi) = sec >> 32; > #endif > > - wmb(); > + smp_wmb(); > *wc_version = version_update_end(*wc_version); > > spin_unlock(&wc_lock); > diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c > index 907ab40..c6f7d32 100644 > --- a/xen/common/vm_event.c > +++ b/xen/common/vm_event.c > @@ -31,9 +31,9 @@ > #include <xsm/xsm.h> > > /* for public/io/ring.h macros */ > -#define xen_mb() mb() > -#define xen_rmb() rmb() > -#define xen_wmb() wmb() > +#define xen_mb() smp_mb() > +#define xen_rmb() smp_rmb() > +#define xen_wmb() smp_wmb() > > #define vm_event_ring_lock_init(_ved) spin_lock_init(&(_ved)->ring_lock) > #define vm_event_ring_lock(_ved) spin_lock(&(_ved)->ring_lock) > -- > 2.1.4 > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper 2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper @ 2016-12-05 10:05 ` Andrew Cooper 2016-12-05 11:18 ` Jan Beulich 2016-12-05 19:17 ` Stefano Stabellini 2016-12-05 10:05 ` [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper 2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper 3 siblings, 2 replies; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw) To: Xen-devel; +Cc: Andrew Cooper, Julien Grall, Stefano Stabellini, Jan Beulich None of these barriers serve any purpose, as they are not synchronising with any anything on remote CPUs. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien.grall@arm.com> Restricting to just the $ARCH maintainers, as this is a project-wide sweep. Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage from x86, but I don't know whether further development has gained a dependence on them. --- xen/arch/x86/acpi/cpu_idle.c | 2 -- xen/arch/x86/cpu/mcheck/mce.c | 1 - xen/arch/x86/crash.c | 3 --- xen/arch/x86/smpboot.c | 2 -- 4 files changed, 8 deletions(-) diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c index f36b184..09c8848 100644 --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -1331,8 +1331,6 @@ void cpuidle_disable_deep_cstate(void) max_cstate = 1; } - mb(); - hpet_disable_legacy_broadcast(); } diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c index 2695b0c..460e336 100644 --- a/xen/arch/x86/cpu/mcheck/mce.c +++ b/xen/arch/x86/cpu/mcheck/mce.c @@ -86,7 +86,6 @@ static x86_mce_vector_t _machine_check_vector = unexpected_machine_check; void x86_mce_vector_register(x86_mce_vector_t hdlr) { _machine_check_vector = hdlr; - wmb(); } /* Call the installed machine check handler for this CPU setup. */ diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c index f28f527..4cadb5e 100644 --- a/xen/arch/x86/crash.c +++ b/xen/arch/x86/crash.c @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void) write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])), (unsigned long)&do_nmi_crash); - /* Ensure the new callback function is set before sending out the NMI. */ - wmb(); - smp_send_nmi_allbutself(); msecs = 1000; /* Wait at most a second for the other cpus to stop */ diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 3a9dd3e..0aa377a 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -346,7 +346,6 @@ void start_secondary(void *unused) spin_debug_enable(); set_cpu_sibling_map(cpu); notify_cpu_starting(cpu); - wmb(); /* * We need to hold vector_lock so there the set of online cpus @@ -364,7 +363,6 @@ void start_secondary(void *unused) microcode_resume_cpu(cpu); - wmb(); startup_cpu_idle_loop(); } -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper @ 2016-12-05 11:18 ` Jan Beulich 2016-12-05 11:25 ` Andrew Cooper 2016-12-05 19:17 ` Stefano Stabellini 1 sibling, 1 reply; 36+ messages in thread From: Jan Beulich @ 2016-12-05 11:18 UTC (permalink / raw) To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel >>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: As to the title - don't you rather mean "pointless" or some such? It's not really an error to have them where they are. > --- a/xen/arch/x86/crash.c > +++ b/xen/arch/x86/crash.c > @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void) > write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])), > (unsigned long)&do_nmi_crash); > > - /* Ensure the new callback function is set before sending out the NMI. */ > - wmb(); > - > smp_send_nmi_allbutself(); I don't think I agree with this change - we certainly want to make sure the APIC write happens only after after the exception vector adjustment became visible, namely when in x2APIC mode (where the respective WRMSRs are not serializing). > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -346,7 +346,6 @@ void start_secondary(void *unused) > spin_debug_enable(); > set_cpu_sibling_map(cpu); > notify_cpu_starting(cpu); > - wmb(); > > /* > * We need to hold vector_lock so there the set of online cpus Hmm, this one is indeed redundant with the lock_vector_lock() following right below, but if that lock wasn't there, I think it would be needed to order set_cpu_sibling_map() and the setting of the bit in the online map. So I think it would better stay (but be relaxed to smb_wmb()). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 11:18 ` Jan Beulich @ 2016-12-05 11:25 ` Andrew Cooper 2016-12-05 12:28 ` Jan Beulich 0 siblings, 1 reply; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 11:25 UTC (permalink / raw) To: Jan Beulich; +Cc: JulienGrall, Stefano Stabellini, Xen-devel On 05/12/16 11:18, Jan Beulich wrote: >>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: > As to the title - don't you rather mean "pointless" or some such? It's > not really an error to have them where they are. True. I will update. > >> --- a/xen/arch/x86/crash.c >> +++ b/xen/arch/x86/crash.c >> @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void) >> write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])), >> (unsigned long)&do_nmi_crash); >> >> - /* Ensure the new callback function is set before sending out the NMI. */ >> - wmb(); >> - >> smp_send_nmi_allbutself(); > I don't think I agree with this change - we certainly want to make > sure the APIC write happens only after after the exception vector > adjustment became visible, namely when in x2APIC mode (where > the respective WRMSRs are not serializing). This wmb() is already only a barrier() (Fixed in the final patch) Even if it weren't, wrmsr doesn't interact with sfence, so the barrier would still be pointless. > >> --- a/xen/arch/x86/smpboot.c >> +++ b/xen/arch/x86/smpboot.c >> @@ -346,7 +346,6 @@ void start_secondary(void *unused) >> spin_debug_enable(); >> set_cpu_sibling_map(cpu); >> notify_cpu_starting(cpu); >> - wmb(); >> >> /* >> * We need to hold vector_lock so there the set of online cpus > Hmm, this one is indeed redundant with the lock_vector_lock() > following right below, but if that lock wasn't there, I think it > would be needed to order set_cpu_sibling_map() and the > setting of the bit in the online map. So I think it would better > stay (but be relaxed to smb_wmb()). Why? It doesn't relate to an smp_rmb() on any other CPU, and is therefore wrong to use. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 11:25 ` Andrew Cooper @ 2016-12-05 12:28 ` Jan Beulich 2016-12-05 13:43 ` Andrew Cooper 0 siblings, 1 reply; 36+ messages in thread From: Jan Beulich @ 2016-12-05 12:28 UTC (permalink / raw) To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel >>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote: > On 05/12/16 11:18, Jan Beulich wrote: >>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >>> --- a/xen/arch/x86/crash.c >>> +++ b/xen/arch/x86/crash.c >>> @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void) >>> write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])), >>> (unsigned long)&do_nmi_crash); >>> >>> - /* Ensure the new callback function is set before sending out the NMI. */ >>> - wmb(); >>> - >>> smp_send_nmi_allbutself(); >> I don't think I agree with this change - we certainly want to make >> sure the APIC write happens only after after the exception vector >> adjustment became visible, namely when in x2APIC mode (where >> the respective WRMSRs are not serializing). > > This wmb() is already only a barrier() (Fixed in the final patch) Good point. > Even if it weren't, wrmsr doesn't interact with sfence, so the barrier > would still be pointless. Are you sure there's absolutely nothing in replacement for the lack of serialization? That said, we're still having enough of a barrier left anyway, due to the ones in send_IPI_mask_x2apic_{phys,cluster}(), and due to the LAPIC mapping being UC. So yes, I'm fine with dropping it here then. >>> --- a/xen/arch/x86/smpboot.c >>> +++ b/xen/arch/x86/smpboot.c >>> @@ -346,7 +346,6 @@ void start_secondary(void *unused) >>> spin_debug_enable(); >>> set_cpu_sibling_map(cpu); >>> notify_cpu_starting(cpu); >>> - wmb(); >>> >>> /* >>> * We need to hold vector_lock so there the set of online cpus >> Hmm, this one is indeed redundant with the lock_vector_lock() >> following right below, but if that lock wasn't there, I think it >> would be needed to order set_cpu_sibling_map() and the >> setting of the bit in the online map. So I think it would better >> stay (but be relaxed to smb_wmb()). > > Why? It doesn't relate to an smp_rmb() on any other CPU, and is > therefore wrong to use. I think it does, just not with one that's spelled out as smp_rmb(). Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as a de-facto equivalent of smp_rmb(). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 12:28 ` Jan Beulich @ 2016-12-05 13:43 ` Andrew Cooper 2016-12-05 13:50 ` Jan Beulich 0 siblings, 1 reply; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 13:43 UTC (permalink / raw) To: Jan Beulich; +Cc: JulienGrall, Stefano Stabellini, Xen-devel On 05/12/16 12:28, Jan Beulich wrote: >>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote: >> On 05/12/16 11:18, Jan Beulich wrote: >>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >>>> --- a/xen/arch/x86/crash.c >>>> +++ b/xen/arch/x86/crash.c >>>> @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void) >>>> write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])), >>>> (unsigned long)&do_nmi_crash); >>>> >>>> - /* Ensure the new callback function is set before sending out the NMI. */ >>>> - wmb(); >>>> - >>>> smp_send_nmi_allbutself(); >>> I don't think I agree with this change - we certainly want to make >>> sure the APIC write happens only after after the exception vector >>> adjustment became visible, namely when in x2APIC mode (where >>> the respective WRMSRs are not serializing). >> This wmb() is already only a barrier() (Fixed in the final patch) > Good point. > >> Even if it weren't, wrmsr doesn't interact with sfence, so the barrier >> would still be pointless. > Are you sure there's absolutely nothing in replacement for the lack > of serialization? > > That said, we're still having enough of a barrier left anyway, due to > the ones in send_IPI_mask_x2apic_{phys,cluster}(), and due to the > LAPIC mapping being UC. So yes, I'm fine with dropping it here then. > >>>> --- a/xen/arch/x86/smpboot.c >>>> +++ b/xen/arch/x86/smpboot.c >>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused) >>>> spin_debug_enable(); >>>> set_cpu_sibling_map(cpu); >>>> notify_cpu_starting(cpu); >>>> - wmb(); >>>> >>>> /* >>>> * We need to hold vector_lock so there the set of online cpus >>> Hmm, this one is indeed redundant with the lock_vector_lock() >>> following right below, but if that lock wasn't there, I think it >>> would be needed to order set_cpu_sibling_map() and the >>> setting of the bit in the online map. So I think it would better >>> stay (but be relaxed to smb_wmb()). >> Why? It doesn't relate to an smp_rmb() on any other CPU, and is >> therefore wrong to use. > I think it does, just not with one that's spelled out as smp_rmb(). > Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as > a de-facto equivalent of smp_rmb(). __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map) between the two context hunks. I still don't see any purpose or need for there to be any barriers here at all, not even compiler barriers. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 13:43 ` Andrew Cooper @ 2016-12-05 13:50 ` Jan Beulich 2016-12-05 13:59 ` Andrew Cooper 0 siblings, 1 reply; 36+ messages in thread From: Jan Beulich @ 2016-12-05 13:50 UTC (permalink / raw) To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel >>> On 05.12.16 at 14:43, <andrew.cooper3@citrix.com> wrote: > On 05/12/16 12:28, Jan Beulich wrote: >>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote: >>> On 05/12/16 11:18, Jan Beulich wrote: >>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >>>>> --- a/xen/arch/x86/smpboot.c >>>>> +++ b/xen/arch/x86/smpboot.c >>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused) >>>>> spin_debug_enable(); >>>>> set_cpu_sibling_map(cpu); >>>>> notify_cpu_starting(cpu); >>>>> - wmb(); >>>>> >>>>> /* >>>>> * We need to hold vector_lock so there the set of online cpus >>>> Hmm, this one is indeed redundant with the lock_vector_lock() >>>> following right below, but if that lock wasn't there, I think it >>>> would be needed to order set_cpu_sibling_map() and the >>>> setting of the bit in the online map. So I think it would better >>>> stay (but be relaxed to smb_wmb()). >>> Why? It doesn't relate to an smp_rmb() on any other CPU, and is >>> therefore wrong to use. >> I think it does, just not with one that's spelled out as smp_rmb(). >> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as >> a de-facto equivalent of smp_rmb(). > > __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map) > between the two context hunks. Exactly - so here we need the write side to that, unless (like suggested elsewhere) we mean to make use of the barrier implied by the LOCKed instruction which cpumask_set_cpu() resolves to. Such "amortization", however, would better be recorded in a comment, so that it becomes less likely for a future change to eliminate a required barrier altogether. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 13:50 ` Jan Beulich @ 2016-12-05 13:59 ` Andrew Cooper 2016-12-05 14:07 ` Jan Beulich 0 siblings, 1 reply; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 13:59 UTC (permalink / raw) To: Jan Beulich; +Cc: JulienGrall, Stefano Stabellini, Xen-devel On 05/12/16 13:50, Jan Beulich wrote: >>>> On 05.12.16 at 14:43, <andrew.cooper3@citrix.com> wrote: >> On 05/12/16 12:28, Jan Beulich wrote: >>>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote: >>>> On 05/12/16 11:18, Jan Beulich wrote: >>>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >>>>>> --- a/xen/arch/x86/smpboot.c >>>>>> +++ b/xen/arch/x86/smpboot.c >>>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused) >>>>>> spin_debug_enable(); >>>>>> set_cpu_sibling_map(cpu); >>>>>> notify_cpu_starting(cpu); >>>>>> - wmb(); >>>>>> >>>>>> /* >>>>>> * We need to hold vector_lock so there the set of online cpus >>>>> Hmm, this one is indeed redundant with the lock_vector_lock() >>>>> following right below, but if that lock wasn't there, I think it >>>>> would be needed to order set_cpu_sibling_map() and the >>>>> setting of the bit in the online map. So I think it would better >>>>> stay (but be relaxed to smb_wmb()). >>>> Why? It doesn't relate to an smp_rmb() on any other CPU, and is >>>> therefore wrong to use. >>> I think it does, just not with one that's spelled out as smp_rmb(). >>> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as >>> a de-facto equivalent of smp_rmb(). >> __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map) >> between the two context hunks. > Exactly - so here we need the write side to that No, we don't. cpumask_set_cpu(cpu, &cpu_online_map) is a write operation, so orders properly on x86. C's ordering properties ensure that the adjacent function calls happen in program order. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 13:59 ` Andrew Cooper @ 2016-12-05 14:07 ` Jan Beulich 2016-12-05 19:14 ` Stefano Stabellini 0 siblings, 1 reply; 36+ messages in thread From: Jan Beulich @ 2016-12-05 14:07 UTC (permalink / raw) To: Andrew Cooper; +Cc: JulienGrall, Stefano Stabellini, Xen-devel >>> On 05.12.16 at 14:59, <andrew.cooper3@citrix.com> wrote: > On 05/12/16 13:50, Jan Beulich wrote: >>>>> On 05.12.16 at 14:43, <andrew.cooper3@citrix.com> wrote: >>> On 05/12/16 12:28, Jan Beulich wrote: >>>>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote: >>>>> On 05/12/16 11:18, Jan Beulich wrote: >>>>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >>>>>>> --- a/xen/arch/x86/smpboot.c >>>>>>> +++ b/xen/arch/x86/smpboot.c >>>>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused) >>>>>>> spin_debug_enable(); >>>>>>> set_cpu_sibling_map(cpu); >>>>>>> notify_cpu_starting(cpu); >>>>>>> - wmb(); >>>>>>> >>>>>>> /* >>>>>>> * We need to hold vector_lock so there the set of online cpus >>>>>> Hmm, this one is indeed redundant with the lock_vector_lock() >>>>>> following right below, but if that lock wasn't there, I think it >>>>>> would be needed to order set_cpu_sibling_map() and the >>>>>> setting of the bit in the online map. So I think it would better >>>>>> stay (but be relaxed to smb_wmb()). >>>>> Why? It doesn't relate to an smp_rmb() on any other CPU, and is >>>>> therefore wrong to use. >>>> I think it does, just not with one that's spelled out as smp_rmb(). >>>> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as >>>> a de-facto equivalent of smp_rmb(). >>> __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map) >>> between the two context hunks. >> Exactly - so here we need the write side to that > > No, we don't. > > cpumask_set_cpu(cpu, &cpu_online_map) is a write operation, so orders > properly on x86. C's ordering properties ensure that the adjacent > function calls happen in program order. Well, that then again falls into the category of barriers which would be needed in arch-independent code, but can be omitted in x86-specific sources. I think we should separate both classes when relaxing/eliminating them. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 14:07 ` Jan Beulich @ 2016-12-05 19:14 ` Stefano Stabellini 0 siblings, 0 replies; 36+ messages in thread From: Stefano Stabellini @ 2016-12-05 19:14 UTC (permalink / raw) To: Jan Beulich; +Cc: Andrew Cooper, JulienGrall, Stefano Stabellini, Xen-devel On Mon, 5 Dec 2016, Jan Beulich wrote: > >>> On 05.12.16 at 14:59, <andrew.cooper3@citrix.com> wrote: > > On 05/12/16 13:50, Jan Beulich wrote: > >>>>> On 05.12.16 at 14:43, <andrew.cooper3@citrix.com> wrote: > >>> On 05/12/16 12:28, Jan Beulich wrote: > >>>>>>> On 05.12.16 at 12:25, <andrew.cooper3@citrix.com> wrote: > >>>>> On 05/12/16 11:18, Jan Beulich wrote: > >>>>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: > >>>>>>> --- a/xen/arch/x86/smpboot.c > >>>>>>> +++ b/xen/arch/x86/smpboot.c > >>>>>>> @@ -346,7 +346,6 @@ void start_secondary(void *unused) > >>>>>>> spin_debug_enable(); > >>>>>>> set_cpu_sibling_map(cpu); > >>>>>>> notify_cpu_starting(cpu); > >>>>>>> - wmb(); > >>>>>>> > >>>>>>> /* > >>>>>>> * We need to hold vector_lock so there the set of online cpus > >>>>>> Hmm, this one is indeed redundant with the lock_vector_lock() > >>>>>> following right below, but if that lock wasn't there, I think it > >>>>>> would be needed to order set_cpu_sibling_map() and the > >>>>>> setting of the bit in the online map. So I think it would better > >>>>>> stay (but be relaxed to smb_wmb()). > >>>>> Why? It doesn't relate to an smp_rmb() on any other CPU, and is > >>>>> therefore wrong to use. > >>>> I think it does, just not with one that's spelled out as smp_rmb(). > >>>> Instead __cpu_up() spins on !cpu_online(), using cpu_relax() as > >>>> a de-facto equivalent of smp_rmb(). > >>> __cpu_up() is ordered with the cpumask_set_cpu(cpu, &cpu_online_map) > >>> between the two context hunks. > >> Exactly - so here we need the write side to that > > > > No, we don't. > > > > cpumask_set_cpu(cpu, &cpu_online_map) is a write operation, so orders > > properly on x86. C's ordering properties ensure that the adjacent > > function calls happen in program order. > > Well, that then again falls into the category of barriers which > would be needed in arch-independent code, but can be omitted > in x86-specific sources. I think we should separate both classes > when relaxing/eliminating them. Yes. It would be nice to keep a barrier, one that #define to nothing if it's unneeded, so that if somebody else on a different arch (*cough* *cough*) ends up copying the code, it will know what to do. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper 2016-12-05 11:18 ` Jan Beulich @ 2016-12-05 19:17 ` Stefano Stabellini 2016-12-06 0:10 ` Andrew Cooper 1 sibling, 1 reply; 36+ messages in thread From: Stefano Stabellini @ 2016-12-05 19:17 UTC (permalink / raw) To: Andrew Cooper; +Cc: Julien Grall, Stefano Stabellini, Jan Beulich, Xen-devel On Mon, 5 Dec 2016, Andrew Cooper wrote: > None of these barriers serve any purpose, as they are not synchronising with > any anything on remote CPUs. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Stefano Stabellini <sstabellini@kernel.org> > CC: Julien Grall <julien.grall@arm.com> > > Restricting to just the $ARCH maintainers, as this is a project-wide sweep. > > Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage > from x86, but I don't know whether further development has gained a dependence > on them. We turned them into smp_wmb already (kudos to IanC). > xen/arch/x86/acpi/cpu_idle.c | 2 -- > xen/arch/x86/cpu/mcheck/mce.c | 1 - > xen/arch/x86/crash.c | 3 --- > xen/arch/x86/smpboot.c | 2 -- > 4 files changed, 8 deletions(-) > > diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c > index f36b184..09c8848 100644 > --- a/xen/arch/x86/acpi/cpu_idle.c > +++ b/xen/arch/x86/acpi/cpu_idle.c > @@ -1331,8 +1331,6 @@ void cpuidle_disable_deep_cstate(void) > max_cstate = 1; > } > > - mb(); > - > hpet_disable_legacy_broadcast(); > } > > diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c > index 2695b0c..460e336 100644 > --- a/xen/arch/x86/cpu/mcheck/mce.c > +++ b/xen/arch/x86/cpu/mcheck/mce.c > @@ -86,7 +86,6 @@ static x86_mce_vector_t _machine_check_vector = unexpected_machine_check; > void x86_mce_vector_register(x86_mce_vector_t hdlr) > { > _machine_check_vector = hdlr; > - wmb(); > } > > /* Call the installed machine check handler for this CPU setup. */ > diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c > index f28f527..4cadb5e 100644 > --- a/xen/arch/x86/crash.c > +++ b/xen/arch/x86/crash.c > @@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void) > write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])), > (unsigned long)&do_nmi_crash); > > - /* Ensure the new callback function is set before sending out the NMI. */ > - wmb(); > - > smp_send_nmi_allbutself(); > > msecs = 1000; /* Wait at most a second for the other cpus to stop */ > diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c > index 3a9dd3e..0aa377a 100644 > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -346,7 +346,6 @@ void start_secondary(void *unused) > spin_debug_enable(); > set_cpu_sibling_map(cpu); > notify_cpu_starting(cpu); > - wmb(); > > /* > * We need to hold vector_lock so there the set of online cpus > @@ -364,7 +363,6 @@ void start_secondary(void *unused) > > microcode_resume_cpu(cpu); > > - wmb(); > startup_cpu_idle_loop(); > } > > -- > 2.1.4 > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-05 19:17 ` Stefano Stabellini @ 2016-12-06 0:10 ` Andrew Cooper 2016-12-06 20:27 ` Stefano Stabellini 0 siblings, 1 reply; 36+ messages in thread From: Andrew Cooper @ 2016-12-06 0:10 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Julien Grall, Jan Beulich, Xen-devel On 05/12/2016 19:17, Stefano Stabellini wrote: > On Mon, 5 Dec 2016, Andrew Cooper wrote: >> None of these barriers serve any purpose, as they are not synchronising with >> any anything on remote CPUs. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Stefano Stabellini <sstabellini@kernel.org> >> CC: Julien Grall <julien.grall@arm.com> >> >> Restricting to just the $ARCH maintainers, as this is a project-wide sweep. >> >> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage >> from x86, but I don't know whether further development has gained a dependence >> on them. > We turned them into smp_wmb already (kudos to IanC). Right, but the entire point I am trying to argue is that they are not needed in the first place. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-06 0:10 ` Andrew Cooper @ 2016-12-06 20:27 ` Stefano Stabellini 2016-12-06 20:32 ` Stefano Stabellini 0 siblings, 1 reply; 36+ messages in thread From: Stefano Stabellini @ 2016-12-06 20:27 UTC (permalink / raw) To: Andrew Cooper; +Cc: Julien Grall, Stefano Stabellini, Jan Beulich, Xen-devel On Tue, 6 Dec 2016, Andrew Cooper wrote: > On 05/12/2016 19:17, Stefano Stabellini wrote: > > On Mon, 5 Dec 2016, Andrew Cooper wrote: > >> None of these barriers serve any purpose, as they are not synchronising with > >> any anything on remote CPUs. > >> > >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > >> --- > >> CC: Jan Beulich <JBeulich@suse.com> > >> CC: Stefano Stabellini <sstabellini@kernel.org> > >> CC: Julien Grall <julien.grall@arm.com> > >> > >> Restricting to just the $ARCH maintainers, as this is a project-wide sweep. > >> > >> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage > >> from x86, but I don't know whether further development has gained a dependence > >> on them. > > We turned them into smp_wmb already (kudos to IanC). > > Right, but the entire point I am trying to argue is that they are not > needed in the first place. This is the current code: CPU 1 CPU 0 ----- ----- init stuff read cpu_online_map write barrier write cpu_online_map do more initialization write barrier init more stuff I agree that it's wrong, because the second write barrier in start_secondary is useless and in addition we are missing a read barrier in __cpu_up, corresponding to the first write barrier in start_secondary. I think it should look like: CPU 1 CPU 0 ----- ----- init stuff read cpu_online_map write barrier read barrier write cpu_online_map do more initialization init more stuff The patch is as follow. Julien, what do you think? Also, do we need to change the remaming smp_wmb() in start_secondary to wmb() to ensure execution ordering as well as memory access ordering? Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c index 90ad1d0..c841a15 100644 --- a/xen/arch/arm/smpboot.c +++ b/xen/arch/arm/smpboot.c @@ -311,7 +311,6 @@ void start_secondary(unsigned long boot_phys_offset, /* Now report this CPU is up */ cpumask_set_cpu(cpuid, &cpu_online_map); - smp_wmb(); local_irq_enable(); local_abort_enable(); @@ -408,6 +407,7 @@ int __cpu_up(unsigned int cpu) cpu_relax(); process_pending_softirqs(); } + smp_rmb(); /* * Nuke start of day info before checking one last time if the CPU _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-06 20:27 ` Stefano Stabellini @ 2016-12-06 20:32 ` Stefano Stabellini 2016-12-07 1:03 ` Andrew Cooper 2016-12-07 18:31 ` Julien Grall 0 siblings, 2 replies; 36+ messages in thread From: Stefano Stabellini @ 2016-12-06 20:32 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Andrew Cooper, Julien Grall, Jan Beulich, Xen-devel On Tue, 6 Dec 2016, Stefano Stabellini wrote: > On Tue, 6 Dec 2016, Andrew Cooper wrote: > > On 05/12/2016 19:17, Stefano Stabellini wrote: > > > On Mon, 5 Dec 2016, Andrew Cooper wrote: > > >> None of these barriers serve any purpose, as they are not synchronising with > > >> any anything on remote CPUs. > > >> > > >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > > >> --- > > >> CC: Jan Beulich <JBeulich@suse.com> > > >> CC: Stefano Stabellini <sstabellini@kernel.org> > > >> CC: Julien Grall <julien.grall@arm.com> > > >> > > >> Restricting to just the $ARCH maintainers, as this is a project-wide sweep. > > >> > > >> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage > > >> from x86, but I don't know whether further development has gained a dependence > > >> on them. > > > We turned them into smp_wmb already (kudos to IanC). > > > > Right, but the entire point I am trying to argue is that they are not > > needed in the first place. Just to be clear, on ARM the barriers are unneeded only if it is unimportant that "init stuff" (which correspond to all the initialization done in start_secondary up to smp_wmb) below is completed before "write cpu_online_map". But it looks like we do want to complete mmu, irq, timer initializations and set the current vcpu before marking the cpu as online, right? > This is the current code: > > CPU 1 CPU 0 > ----- ----- > > init stuff read cpu_online_map > > write barrier > > write cpu_online_map do more initialization > > write barrier > > init more stuff > > > I agree that it's wrong, because the second write barrier in > start_secondary is useless and in addition we are missing a read barrier > in __cpu_up, corresponding to the first write barrier in > start_secondary. > > I think it should look like: > > > CPU 1 CPU 0 > ----- ----- > > init stuff read cpu_online_map > > write barrier read barrier > > write cpu_online_map do more initialization > > init more stuff > > > The patch is as follow. > > Julien, what do you think? > > Also, do we need to change the remaming smp_wmb() in start_secondary to > wmb() to ensure execution ordering as well as memory access ordering? > > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > index 90ad1d0..c841a15 100644 > --- a/xen/arch/arm/smpboot.c > +++ b/xen/arch/arm/smpboot.c > @@ -311,7 +311,6 @@ void start_secondary(unsigned long boot_phys_offset, > > /* Now report this CPU is up */ > cpumask_set_cpu(cpuid, &cpu_online_map); > - smp_wmb(); > > local_irq_enable(); > local_abort_enable(); > @@ -408,6 +407,7 @@ int __cpu_up(unsigned int cpu) > cpu_relax(); > process_pending_softirqs(); > } > + smp_rmb(); > > /* > * Nuke start of day info before checking one last time if the CPU > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-06 20:32 ` Stefano Stabellini @ 2016-12-07 1:03 ` Andrew Cooper 2016-12-07 1:20 ` Stefano Stabellini 2016-12-07 18:31 ` Julien Grall 1 sibling, 1 reply; 36+ messages in thread From: Andrew Cooper @ 2016-12-07 1:03 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Julien Grall, Jan Beulich, Xen-devel On 06/12/2016 20:32, Stefano Stabellini wrote: > On Tue, 6 Dec 2016, Stefano Stabellini wrote: >> On Tue, 6 Dec 2016, Andrew Cooper wrote: >>> On 05/12/2016 19:17, Stefano Stabellini wrote: >>>> On Mon, 5 Dec 2016, Andrew Cooper wrote: >>>>> None of these barriers serve any purpose, as they are not synchronising with >>>>> any anything on remote CPUs. >>>>> >>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>>>> --- >>>>> CC: Jan Beulich <JBeulich@suse.com> >>>>> CC: Stefano Stabellini <sstabellini@kernel.org> >>>>> CC: Julien Grall <julien.grall@arm.com> >>>>> >>>>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep. >>>>> >>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage >>>>> from x86, but I don't know whether further development has gained a dependence >>>>> on them. >>>> We turned them into smp_wmb already (kudos to IanC). >>> Right, but the entire point I am trying to argue is that they are not >>> needed in the first place. > Just to be clear, on ARM the barriers are unneeded only if it is > unimportant that "init stuff" (which correspond to all the > initialization done in start_secondary up to smp_wmb) below is completed > before "write cpu_online_map". But it looks like we do want to complete > mmu, irq, timer initializations and set the current vcpu before marking > the cpu as online, right? No. I am sorry, but this question suggests that you still don't appear to understand barriers. > > >> This is the current code: >> >> CPU 1 CPU 0 >> ----- ----- >> >> init stuff read cpu_online_map >> >> write barrier >> >> write cpu_online_map do more initialization >> >> write barrier >> >> init more stuff >> >> >> I agree that it's wrong, because the second write barrier in >> start_secondary is useless and in addition we are missing a read barrier >> in __cpu_up, corresponding to the first write barrier in >> start_secondary. >> >> I think it should look like: >> >> >> CPU 1 CPU 0 >> ----- ----- >> >> init stuff read cpu_online_map >> >> write barrier read barrier >> >> write cpu_online_map do more initialization >> >> init more stuff The barriers here serve no purpose, because you have missed an important blocking while loop on CPU 0. Recall, that the read/write barrier example is: foo = a; smp_rmb(); bar = b; and a = baz; smp_wmb(); b = fromble; This is specifically relevant *only* to the shared variables a and b, where for correctness an update to a must be observed remotely before the update to b. If you do not have the explicitly same a and b on either side of the smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't be using the barriers). The init sequence is a different scenario. Processor 0 spins waiting to observe an update to cpu_online_map. Processor 1 performs its init sequence, and mid way through, sets its own bit in the cpu_online_map. It then continues further init actions. It does not matter whether processor 0 observes the update to cpu_online_map slightly early or late compared to the local-state updates from the other parts of processor 1's init sequence (because processor 0 had better not be looking at the local-state changes). Once the bit is set in cpu_online_map, processor 1 is ready to be IPI'd to give it work to do, etc. Even if processor 0 hasn't observed all of the local-state updates of processor 1, once it has seen the cpu_online_map update, it knows that processor 1 is available to be IPI'd, and IPIing processor 1 will cause execution with expected program order being observed (from the point of view of processor 1). The only consideration is whether processor 1 can make an action which will prioritise the update to cpu_online_map becoming visible to the rest of the system. If there is such an action available, then an argument can be made that making the update visible more quickly will allow processor 0 to continue earlier. However, such an action would be entirely local to processor 1 and not related to anything happening on processor 0. I am not aware of any such action on x86, and my gut feeling is that no other architecture would have one either. The ability to fast forward one specific update to the shared cache-coherency layer would only complicate an already complicated area of silicon, and would only be useful to micro-optimise a corner case; I can't see any system designers considering it a useful feature to add. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-07 1:03 ` Andrew Cooper @ 2016-12-07 1:20 ` Stefano Stabellini 2016-12-07 1:46 ` Andrew Cooper 0 siblings, 1 reply; 36+ messages in thread From: Stefano Stabellini @ 2016-12-07 1:20 UTC (permalink / raw) To: Andrew Cooper; +Cc: Julien Grall, Stefano Stabellini, Jan Beulich, Xen-devel On Wed, 7 Dec 2016, Andrew Cooper wrote: > On 06/12/2016 20:32, Stefano Stabellini wrote: > > On Tue, 6 Dec 2016, Stefano Stabellini wrote: > >> On Tue, 6 Dec 2016, Andrew Cooper wrote: > >>> On 05/12/2016 19:17, Stefano Stabellini wrote: > >>>> On Mon, 5 Dec 2016, Andrew Cooper wrote: > >>>>> None of these barriers serve any purpose, as they are not synchronising with > >>>>> any anything on remote CPUs. > >>>>> > >>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > >>>>> --- > >>>>> CC: Jan Beulich <JBeulich@suse.com> > >>>>> CC: Stefano Stabellini <sstabellini@kernel.org> > >>>>> CC: Julien Grall <julien.grall@arm.com> > >>>>> > >>>>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep. > >>>>> > >>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage > >>>>> from x86, but I don't know whether further development has gained a dependence > >>>>> on them. > >>>> We turned them into smp_wmb already (kudos to IanC). > >>> Right, but the entire point I am trying to argue is that they are not > >>> needed in the first place. > > Just to be clear, on ARM the barriers are unneeded only if it is > > unimportant that "init stuff" (which correspond to all the > > initialization done in start_secondary up to smp_wmb) below is completed > > before "write cpu_online_map". But it looks like we do want to complete > > mmu, irq, timer initializations and set the current vcpu before marking > > the cpu as online, right? > > No. I am sorry, but this question suggests that you still don't appear > to understand barriers. > > > > > > >> This is the current code: > >> > >> CPU 1 CPU 0 > >> ----- ----- > >> > >> init stuff read cpu_online_map > >> > >> write barrier > >> > >> write cpu_online_map do more initialization > >> > >> write barrier > >> > >> init more stuff > >> > >> > >> I agree that it's wrong, because the second write barrier in > >> start_secondary is useless and in addition we are missing a read barrier > >> in __cpu_up, corresponding to the first write barrier in > >> start_secondary. > >> > >> I think it should look like: > >> > >> > >> CPU 1 CPU 0 > >> ----- ----- > >> > >> init stuff read cpu_online_map > >> > >> write barrier read barrier > >> > >> write cpu_online_map do more initialization > >> > >> init more stuff > > The barriers here serve no purpose, because you have missed an important > blocking while loop on CPU 0. > > Recall, that the read/write barrier example is: > > foo = a; > smp_rmb(); > bar = b; > > and > > a = baz; > smp_wmb(); > b = fromble; > > This is specifically relevant *only* to the shared variables a and b, > where for correctness an update to a must be observed remotely before > the update to b. > > If you do not have the explicitly same a and b on either side of the > smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't > be using the barriers). > > > The init sequence is a different scenario. > > Processor 0 spins waiting to observe an update to cpu_online_map. > > Processor 1 performs its init sequence, and mid way through, sets its > own bit in the cpu_online_map. It then continues further init actions. > > It does not matter whether processor 0 observes the update to > cpu_online_map slightly early or late compared to the local-state > updates from the other parts of processor 1's init sequence (because > processor 0 had better not be looking at the local-state changes). In that case of course there is no need for barriers (I wrote something similar in the other follow-up email). The case I was worried about is exactly the one where processor 0 looks at one of the changes made by processor 1. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-07 1:20 ` Stefano Stabellini @ 2016-12-07 1:46 ` Andrew Cooper 0 siblings, 0 replies; 36+ messages in thread From: Andrew Cooper @ 2016-12-07 1:46 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Julien Grall, Jan Beulich, Xen-devel On 07/12/2016 01:20, Stefano Stabellini wrote: > On Wed, 7 Dec 2016, Andrew Cooper wrote: >> On 06/12/2016 20:32, Stefano Stabellini wrote: >>> On Tue, 6 Dec 2016, Stefano Stabellini wrote: >>>> On Tue, 6 Dec 2016, Andrew Cooper wrote: >>>>> On 05/12/2016 19:17, Stefano Stabellini wrote: >>>>>> On Mon, 5 Dec 2016, Andrew Cooper wrote: >>>>>>> None of these barriers serve any purpose, as they are not synchronising with >>>>>>> any anything on remote CPUs. >>>>>>> >>>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>>>>>> --- >>>>>>> CC: Jan Beulich <JBeulich@suse.com> >>>>>>> CC: Stefano Stabellini <sstabellini@kernel.org> >>>>>>> CC: Julien Grall <julien.grall@arm.com> >>>>>>> >>>>>>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep. >>>>>>> >>>>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage >>>>>>> from x86, but I don't know whether further development has gained a dependence >>>>>>> on them. >>>>>> We turned them into smp_wmb already (kudos to IanC). >>>>> Right, but the entire point I am trying to argue is that they are not >>>>> needed in the first place. >>> Just to be clear, on ARM the barriers are unneeded only if it is >>> unimportant that "init stuff" (which correspond to all the >>> initialization done in start_secondary up to smp_wmb) below is completed >>> before "write cpu_online_map". But it looks like we do want to complete >>> mmu, irq, timer initializations and set the current vcpu before marking >>> the cpu as online, right? >> No. I am sorry, but this question suggests that you still don't appear >> to understand barriers. >> >>> >>>> This is the current code: >>>> >>>> CPU 1 CPU 0 >>>> ----- ----- >>>> >>>> init stuff read cpu_online_map >>>> >>>> write barrier >>>> >>>> write cpu_online_map do more initialization >>>> >>>> write barrier >>>> >>>> init more stuff >>>> >>>> >>>> I agree that it's wrong, because the second write barrier in >>>> start_secondary is useless and in addition we are missing a read barrier >>>> in __cpu_up, corresponding to the first write barrier in >>>> start_secondary. >>>> >>>> I think it should look like: >>>> >>>> >>>> CPU 1 CPU 0 >>>> ----- ----- >>>> >>>> init stuff read cpu_online_map >>>> >>>> write barrier read barrier >>>> >>>> write cpu_online_map do more initialization >>>> >>>> init more stuff >> The barriers here serve no purpose, because you have missed an important >> blocking while loop on CPU 0. >> >> Recall, that the read/write barrier example is: >> >> foo = a; >> smp_rmb(); >> bar = b; >> >> and >> >> a = baz; >> smp_wmb(); >> b = fromble; >> >> This is specifically relevant *only* to the shared variables a and b, >> where for correctness an update to a must be observed remotely before >> the update to b. >> >> If you do not have the explicitly same a and b on either side of the >> smp_rmb/smp_wmb, then your code is wrong (and most likely, you shouldn't >> be using the barriers). >> >> >> The init sequence is a different scenario. >> >> Processor 0 spins waiting to observe an update to cpu_online_map. >> >> Processor 1 performs its init sequence, and mid way through, sets its >> own bit in the cpu_online_map. It then continues further init actions. >> >> It does not matter whether processor 0 observes the update to >> cpu_online_map slightly early or late compared to the local-state >> updates from the other parts of processor 1's init sequence (because >> processor 0 had better not be looking at the local-state changes). > In that case of course there is no need for barriers (I wrote something > similar in the other follow-up email). The case I was worried about is > exactly the one where processor 0 looks at one of the changes made by > processor 1. Looking at an isolated change doesn't involve any ordering. Ordering (and therefore barriers) are only relevant when looking at exactly two related changes which need observing in a particular order. If you can reduce the BSP and AP boot sequence down to the two 3-line examples above (with specifically identified variables/aggregates for a and b on both sides), then you can make an argument for barriers being necessary. I should point out that I don't know whether the barriers are necessary or unnecessary on ARM. All I am trying to do is to ensure that everyone has the same understanding of how barriers work before conclusions are drawn (because they really are tricky). ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-06 20:32 ` Stefano Stabellini 2016-12-07 1:03 ` Andrew Cooper @ 2016-12-07 18:31 ` Julien Grall 2016-12-07 18:44 ` Stefano Stabellini 1 sibling, 1 reply; 36+ messages in thread From: Julien Grall @ 2016-12-07 18:31 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Andrew Cooper, Steve Capper, Jan Beulich, Xen-devel Hi Stefano, On 06/12/2016 20:32, Stefano Stabellini wrote: > On Tue, 6 Dec 2016, Stefano Stabellini wrote: >> On Tue, 6 Dec 2016, Andrew Cooper wrote: >>> On 05/12/2016 19:17, Stefano Stabellini wrote: >>>> On Mon, 5 Dec 2016, Andrew Cooper wrote: >>>>> None of these barriers serve any purpose, as they are not synchronising with >>>>> any anything on remote CPUs. >>>>> >>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>>>> --- >>>>> CC: Jan Beulich <JBeulich@suse.com> >>>>> CC: Stefano Stabellini <sstabellini@kernel.org> >>>>> CC: Julien Grall <julien.grall@arm.com> >>>>> >>>>> Restricting to just the $ARCH maintainers, as this is a project-wide sweep. >>>>> >>>>> Julien/Stefano: I think the ARM smpboot inhereted the erronious barrier usage >>>>> from x86, but I don't know whether further development has gained a dependence >>>>> on them. >>>> We turned them into smp_wmb already (kudos to IanC). >>> >>> Right, but the entire point I am trying to argue is that they are not >>> needed in the first place. > > Just to be clear, on ARM the barriers are unneeded only if it is > unimportant that "init stuff" (which correspond to all the > initialization done in start_secondary up to smp_wmb) below is completed > before "write cpu_online_map". But it looks like we do want to complete > mmu, irq, timer initializations and set the current vcpu before marking > the cpu as online, right? *mb are memory barriers. This would not prevent write to system registers and co-processor registers happening before "write cpu_online_map". Only an dsb(sy); isb() would ensure this. However, I don't think we really care about the state of the hardware for a specific CPU. This should never be accessed by another one. > >> This is the current code: >> >> CPU 1 CPU 0 >> ----- ----- >> >> init stuff read cpu_online_map >> >> write barrier >> >> write cpu_online_map do more initialization >> >> write barrier >> >> init more stuff >> >> >> I agree that it's wrong, because the second write barrier in >> start_secondary is useless and in addition we are missing a read barrier >> in __cpu_up, corresponding to the first write barrier in >> start_secondary. >> >> I think it should look like: >> >> >> CPU 1 CPU 0 >> ----- ----- >> >> init stuff read cpu_online_map >> >> write barrier read barrier >> >> write cpu_online_map do more initialization >> >> init more stuff >> >> >> The patch is as follow. >> >> Julien, what do you think? >> >> Also, do we need to change the remaming smp_wmb() in start_secondary to >> wmb() to ensure execution ordering as well as memory access ordering? I don't think so. If synchronization of hardware access was necessary it would have been taken care by the driver itself. What we should care here is if there any xen internal state that are required between CPU0 and CPU1. In this specific case, I think we should have the smp_wmb() barrier to ensure all write happening whilst calling local notifiers will be visible before the CPU mark itself as online. So IHMO, the patch looks good to me (see a small comment below). >> >> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> >> >> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c >> index 90ad1d0..c841a15 100644 >> --- a/xen/arch/arm/smpboot.c >> +++ b/xen/arch/arm/smpboot.c >> @@ -311,7 +311,6 @@ void start_secondary(unsigned long boot_phys_offset, >> >> /* Now report this CPU is up */ >> cpumask_set_cpu(cpuid, &cpu_online_map); >> - smp_wmb(); >> >> local_irq_enable(); >> local_abort_enable(); >> @@ -408,6 +407,7 @@ int __cpu_up(unsigned int cpu) >> cpu_relax(); >> process_pending_softirqs(); >> } >> + smp_rmb(); It would be good to start annotating the pair of barrier with "This barrier corresponds with the one in...". It would avoid "wild" barrier in the code :). >> >> /* >> * Nuke start of day info before checking one last time if the CPU >> Regards, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-07 18:31 ` Julien Grall @ 2016-12-07 18:44 ` Stefano Stabellini 2016-12-07 18:55 ` Julien Grall 0 siblings, 1 reply; 36+ messages in thread From: Stefano Stabellini @ 2016-12-07 18:44 UTC (permalink / raw) To: Julien Grall Cc: Andrew Cooper, Stefano Stabellini, Steve Capper, Jan Beulich, Xen-devel On Wed, 7 Dec 2016, Julien Grall wrote: > Hi Stefano, > > On 06/12/2016 20:32, Stefano Stabellini wrote: > > On Tue, 6 Dec 2016, Stefano Stabellini wrote: > > > On Tue, 6 Dec 2016, Andrew Cooper wrote: > > > > On 05/12/2016 19:17, Stefano Stabellini wrote: > > > > > On Mon, 5 Dec 2016, Andrew Cooper wrote: > > > > > > None of these barriers serve any purpose, as they are not > > > > > > synchronising with > > > > > > any anything on remote CPUs. > > > > > > > > > > > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > > > > > > --- > > > > > > CC: Jan Beulich <JBeulich@suse.com> > > > > > > CC: Stefano Stabellini <sstabellini@kernel.org> > > > > > > CC: Julien Grall <julien.grall@arm.com> > > > > > > > > > > > > Restricting to just the $ARCH maintainers, as this is a project-wide > > > > > > sweep. > > > > > > > > > > > > Julien/Stefano: I think the ARM smpboot inhereted the erronious > > > > > > barrier usage > > > > > > from x86, but I don't know whether further development has gained a > > > > > > dependence > > > > > > on them. > > > > > We turned them into smp_wmb already (kudos to IanC). > > > > > > > > Right, but the entire point I am trying to argue is that they are not > > > > needed in the first place. > > > > Just to be clear, on ARM the barriers are unneeded only if it is > > unimportant that "init stuff" (which correspond to all the > > initialization done in start_secondary up to smp_wmb) below is completed > > before "write cpu_online_map". But it looks like we do want to complete > > mmu, irq, timer initializations and set the current vcpu before marking > > the cpu as online, right? > > *mb are memory barriers. This would not prevent write to system registers and > co-processor registers happening before "write cpu_online_map". Only an > dsb(sy); isb() would ensure this. > > However, I don't think we really care about the state of the hardware for a > specific CPU. This should never be accessed by another one. > > > > > > This is the current code: > > > > > > CPU 1 CPU 0 > > > ----- ----- > > > > > > init stuff read cpu_online_map > > > > > > write barrier > > > > > > write cpu_online_map do more initialization > > > > > > write barrier > > > > > > init more stuff > > > > > > > > > I agree that it's wrong, because the second write barrier in > > > start_secondary is useless and in addition we are missing a read barrier > > > in __cpu_up, corresponding to the first write barrier in > > > start_secondary. > > > > > > I think it should look like: > > > > > > > > > CPU 1 CPU 0 > > > ----- ----- > > > > > > init stuff read cpu_online_map > > > > > > write barrier read barrier > > > > > > write cpu_online_map do more initialization > > > > > > init more stuff > > > > > > > > > The patch is as follow. > > > > > > Julien, what do you think? > > > > > > Also, do we need to change the remaming smp_wmb() in start_secondary to > > > wmb() to ensure execution ordering as well as memory access ordering? > > I don't think so. If synchronization of hardware access was necessary it would > have been taken care by the driver itself. I thought the same, thanks for confirming. > What we should care here is if there any xen internal state that are required > between CPU0 and CPU1. > > In this specific case, I think we should have the smp_wmb() barrier to ensure > all write happening whilst calling local notifiers will be visible before the > CPU mark itself as online. So IHMO, the patch looks good to me (see a small > comment below). Andrew thinks that (on x86) there is actually nothing that CPU0 will be looking at, that has been set by CPU1. However looking at the code it is difficult to verify. There are many cpu notifiers and many things written by start_secondary. I would prefer to submit this patch, and be safe. > > > > > > Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> > > > > > > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c > > > index 90ad1d0..c841a15 100644 > > > --- a/xen/arch/arm/smpboot.c > > > +++ b/xen/arch/arm/smpboot.c > > > @@ -311,7 +311,6 @@ void start_secondary(unsigned long boot_phys_offset, > > > > > > /* Now report this CPU is up */ > > > cpumask_set_cpu(cpuid, &cpu_online_map); > > > - smp_wmb(); > > > > > > local_irq_enable(); > > > local_abort_enable(); > > > @@ -408,6 +407,7 @@ int __cpu_up(unsigned int cpu) > > > cpu_relax(); > > > process_pending_softirqs(); > > > } > > > + smp_rmb(); > > It would be good to start annotating the pair of barrier with "This barrier > corresponds with the one in...". It would avoid "wild" barrier in the code :). > > > > > > > /* > > > * Nuke start of day info before checking one last time if the CPU > > > > > Regards, > > -- > Julien Grall > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/4] xen/x86: Drop erronious barriers 2016-12-07 18:44 ` Stefano Stabellini @ 2016-12-07 18:55 ` Julien Grall 0 siblings, 0 replies; 36+ messages in thread From: Julien Grall @ 2016-12-07 18:55 UTC (permalink / raw) To: Stefano Stabellini; +Cc: Andrew Cooper, Steve Capper, Jan Beulich, Xen-devel On 07/12/2016 18:44, Stefano Stabellini wrote: > On Wed, 7 Dec 2016, Julien Grall wrote: > Andrew thinks that (on x86) there is actually nothing that CPU0 will be > looking at, that has been set by CPU1. However looking at the code it is > difficult to verify. There are many cpu notifiers and many things > written by start_secondary. I would prefer to submit this patch, and be > safe. I agree on this. Better be safe than hunting a bug later on. Although, I think I just found an example for ARM. The gic_cpu_id (see gic-v2.c) is stored per-cpu and initialized by each CPU at boot. gic_cpu_id is commonly used to send a SGI to a specific target. So we need to ensure that CPU0 will see this value before sending an SGI (see gicv2_send_SGI). Otherwise the SGI may go to the wild. While you are sending a patch, can you document in the code why the barrier is present? Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper 2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper 2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper @ 2016-12-05 10:05 ` Andrew Cooper 2016-12-05 11:47 ` Jan Beulich 2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper 3 siblings, 1 reply; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw) To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich Mandatory barriers are only for use with reduced-cacheability memory mappings. All of these uses are just to deal with shared memory between multiple processors, so use the smp_*() which are the correct barriers for the purpose. This is no functional change, because Xen currently assigns the smp_* meaning to non-smp_* barriers. This will be fixed in the following patch. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> Restricting to just the $ARCH maintainers, as this is a project-wide sweep --- xen/arch/x86/acpi/cpu_idle.c | 8 ++++---- xen/arch/x86/cpu/mcheck/amd_nonfatal.c | 4 ++-- xen/arch/x86/cpu/mcheck/barrier.c | 10 +++++----- xen/arch/x86/cpu/mcheck/mce.c | 2 +- xen/arch/x86/cpu/mcheck/mctelem.c | 10 +++++----- xen/arch/x86/genapic/x2apic.c | 6 +++--- xen/arch/x86/hpet.c | 6 +++--- xen/arch/x86/hvm/ioreq.c | 4 ++-- xen/arch/x86/io_apic.c | 4 ++-- xen/arch/x86/irq.c | 4 ++-- xen/arch/x86/mm.c | 10 +++++----- xen/arch/x86/mm/shadow/multi.c | 2 +- xen/arch/x86/smpboot.c | 12 ++++++------ xen/arch/x86/time.c | 8 ++++---- xen/drivers/passthrough/amd/iommu_init.c | 4 ++-- xen/include/asm-x86/desc.h | 8 ++++---- xen/include/asm-x86/system.h | 2 +- 17 files changed, 52 insertions(+), 52 deletions(-) diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c index 09c8848..b481059 100644 --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx) if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) ) { - mb(); + smp_mb(); clflush((void *)&mwait_wakeup(cpu)); - mb(); + smp_mb(); } __monitor((void *)&mwait_wakeup(cpu), 0, 0); @@ -756,10 +756,10 @@ void acpi_dead_idle(void) * instruction, hence memory fence is necessary to make sure all * load/store visible before flush cache line. */ - mb(); + smp_mb(); clflush(mwait_ptr); __monitor(mwait_ptr, 0, 0); - mb(); + smp_mb(); __mwait(cx->address, 0); } } diff --git a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c index 8a80a9f..fb1d41d 100644 --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data) /* Counter enable */ value |= (1ULL << 51); mca_wrmsr(MSR_IA32_MCx_MISC(4), value); - wmb(); + smp_wmb(); } } @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c) value |= (1ULL << 51); wrmsrl(MSR_IA32_MCx_MISC(4), value); /* serialize */ - wmb(); + smp_wmb(); printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n"); } } diff --git a/xen/arch/x86/cpu/mcheck/barrier.c b/xen/arch/x86/cpu/mcheck/barrier.c index 5dce1fb..1a2149f 100644 --- a/xen/arch/x86/cpu/mcheck/barrier.c +++ b/xen/arch/x86/cpu/mcheck/barrier.c @@ -12,7 +12,7 @@ void mce_barrier_init(struct mce_softirq_barrier *bar) void mce_barrier_dec(struct mce_softirq_barrier *bar) { atomic_inc(&bar->outgen); - wmb(); + smp_wmb(); atomic_dec(&bar->val); } @@ -24,12 +24,12 @@ void mce_barrier_enter(struct mce_softirq_barrier *bar) return; atomic_inc(&bar->ingen); gen = atomic_read(&bar->outgen); - mb(); + smp_mb(); atomic_inc(&bar->val); while ( atomic_read(&bar->val) != num_online_cpus() && atomic_read(&bar->outgen) == gen ) { - mb(); + smp_mb(); mce_panic_check(); } } @@ -42,12 +42,12 @@ void mce_barrier_exit(struct mce_softirq_barrier *bar) return; atomic_inc(&bar->outgen); gen = atomic_read(&bar->ingen); - mb(); + smp_mb(); atomic_dec(&bar->val); while ( atomic_read(&bar->val) != 0 && atomic_read(&bar->ingen) == gen ) { - mb(); + smp_mb(); mce_panic_check(); } } diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c index 460e336..02883fc 100644 --- a/xen/arch/x86/cpu/mcheck/mce.c +++ b/xen/arch/x86/cpu/mcheck/mce.c @@ -388,7 +388,7 @@ mcheck_mca_logout(enum mca_source who, struct mca_banks *bankmask, else if ( who == MCA_MCE_SCAN && need_clear) mcabanks_set(i, clear_bank); - wmb(); + smp_wmb(); } if (mig && errcnt > 0) { diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c index 95e83c5..01077fe 100644 --- a/xen/arch/x86/cpu/mcheck/mctelem.c +++ b/xen/arch/x86/cpu/mcheck/mctelem.c @@ -414,9 +414,9 @@ static void mctelem_append_processing(mctelem_class_t which) ltep->mcte_prev = *procltp; *procltp = dangling[target]; } - wmb(); + smp_wmb(); dangling[target] = NULL; - wmb(); + smp_wmb(); } mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which) @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which) } mctelem_processing_hold(tep); - wmb(); + smp_wmb(); spin_unlock(&processing_lock); return MCTE2COOKIE(tep); } @@ -444,7 +444,7 @@ void mctelem_consume_oldest_end(mctelem_cookie_t cookie) spin_lock(&processing_lock); mctelem_processing_release(tep); - wmb(); + smp_wmb(); spin_unlock(&processing_lock); } @@ -460,6 +460,6 @@ void mctelem_ack(mctelem_class_t which, mctelem_cookie_t cookie) spin_lock(&processing_lock); if (tep == mctctl.mctc_processing_head[target]) mctelem_processing_release(tep); - wmb(); + smp_wmb(); spin_unlock(&processing_lock); } diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c index d894a98..c63af0c 100644 --- a/xen/arch/x86/genapic/x2apic.c +++ b/xen/arch/x86/genapic/x2apic.c @@ -107,12 +107,12 @@ static void send_IPI_mask_x2apic_phys(const cpumask_t *cpumask, int vector) * CPU is seen by notified remote CPUs. The WRMSR contained within * apic_icr_write() can otherwise be executed early. * - * The reason mb() is sufficient here is subtle: the register arguments + * The reason smp_mb() is sufficient here is subtle: the register arguments * to WRMSR must depend on a memory read executed after the barrier. This * is guaranteed by cpu_physical_id(), which reads from a global array (and * so cannot be hoisted above the barrier even by a clever compiler). */ - mb(); + smp_mb(); local_irq_save(flags); @@ -136,7 +136,7 @@ static void send_IPI_mask_x2apic_cluster(const cpumask_t *cpumask, int vector) const cpumask_t *cluster_cpus; unsigned long flags; - mb(); /* See above for an explanation. */ + smp_mb(); /* See above for an explanation. */ local_irq_save(flags); diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index f78054d..c5ef534 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -190,9 +190,9 @@ static void handle_hpet_broadcast(struct hpet_event_channel *ch) { s_time_t deadline; - rmb(); + smp_rmb(); deadline = per_cpu(timer_deadline, cpu); - rmb(); + smp_rmb(); if ( !cpumask_test_cpu(cpu, ch->cpumask) ) continue; @@ -610,7 +610,7 @@ void __init hpet_broadcast_init(void) hpet_events[i].shift = 32; hpet_events[i].next_event = STIME_MAX; spin_lock_init(&hpet_events[i].lock); - wmb(); + smp_wmb(); hpet_events[i].event_handler = handle_hpet_broadcast; hpet_events[i].msi.msi_attrib.maskbit = 1; diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index 88071ab..6c00c0b 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -92,7 +92,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p) { unsigned int state = p->state; - rmb(); + smp_rmb(); switch ( state ) { case STATE_IOREQ_NONE: @@ -1272,7 +1272,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p) } /* Make the ioreq_t visible /before/ write_pointer. */ - wmb(); + smp_wmb(); pg->ptrs.write_pointer += qw ? 2 : 1; /* Canonicalize read/write pointers to prevent their overflow. */ diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index 33e5927..185b956 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void) unsigned long t1, flags; t1 = pit0_ticks; - mb(); + smp_mb(); local_save_flags(flags); local_irq_enable(); @@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void) * might have cached one ExtINT interrupt. Finally, at * least one tick may be lost due to delays. */ - mb(); + smp_mb(); if (pit0_ticks - t1 > 4) return 1; diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 8c1545a..de72b0d 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -757,9 +757,9 @@ void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask) ASSERT(spin_is_locked(&desc->lock)); desc->status &= ~IRQ_MOVE_PENDING; - wmb(); + smp_wmb(); cpumask_copy(desc->arch.pending_mask, mask); - wmb(); + smp_wmb(); desc->status |= IRQ_MOVE_PENDING; } diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 14552a1..a2672b1 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -460,7 +460,7 @@ void share_xen_page_with_guest( page->u.inuse.type_info |= PGT_validated | 1; page_set_owner(page, d); - wmb(); /* install valid domain ptr before updating refcnt. */ + smp_wmb(); /* install valid domain ptr before updating refcnt. */ ASSERT((page->count_info & ~PGC_xen_heap) == 0); /* Only add to the allocation list if the domain isn't dying. */ @@ -2281,7 +2281,7 @@ static int alloc_page_type(struct page_info *page, unsigned long type, } /* No need for atomic update of type_info here: noone else updates it. */ - wmb(); + smp_wmb(); switch ( rc ) { case 0: @@ -2388,7 +2388,7 @@ static int __put_final_page_type( if ( !(shadow_mode_enabled(page_get_owner(page)) && (page->count_info & PGC_page_table)) ) page->tlbflush_timestamp = tlbflush_current_time(); - wmb(); + smp_wmb(); page->u.inuse.type_info--; } else if ( rc == -EINTR ) @@ -2398,13 +2398,13 @@ static int __put_final_page_type( if ( !(shadow_mode_enabled(page_get_owner(page)) && (page->count_info & PGC_page_table)) ) page->tlbflush_timestamp = tlbflush_current_time(); - wmb(); + smp_wmb(); page->u.inuse.type_info |= PGT_validated; } else { BUG_ON(rc != -ERESTART); - wmb(); + smp_wmb(); get_page_light(page); page->u.inuse.type_info |= PGT_partial; } diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 2696396..7268300 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -3021,7 +3021,7 @@ static int sh_page_fault(struct vcpu *v, * will make sure no inconsistent mapping being translated into * shadow page table. */ version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version); - rmb(); + smp_rmb(); rc = sh_walk_guest_tables(v, va, &gw, regs->error_code); #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC) diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c index 0aa377a..ba0fffe 100644 --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -77,7 +77,7 @@ static enum cpu_state { CPU_STATE_CALLIN, /* slave -> master: Completed phase 2 */ CPU_STATE_ONLINE /* master -> slave: Go fully online now. */ } cpu_state; -#define set_cpu_state(state) do { mb(); cpu_state = (state); } while (0) +#define set_cpu_state(state) do { smp_mb(); cpu_state = (state); } while (0) void *stack_base[NR_CPUS]; @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave) for ( i = 1; i <= 5; i++ ) { tsc_value = rdtsc_ordered(); - wmb(); + smp_wmb(); atomic_inc(&tsc_count); while ( atomic_read(&tsc_count) != (i<<1) ) cpu_relax(); @@ -149,7 +149,7 @@ static void synchronize_tsc_slave(unsigned int slave) { while ( atomic_read(&tsc_count) != ((i<<1)-1) ) cpu_relax(); - rmb(); + smp_rmb(); /* * If a CPU has been physically hotplugged, we may as well write * to its TSC in spite of X86_FEATURE_TSC_RELIABLE. The platform does @@ -552,13 +552,13 @@ static int do_boot_cpu(int apicid, int cpu) } else if ( cpu_state == CPU_STATE_DEAD ) { - rmb(); + smp_rmb(); rc = cpu_error; } else { boot_error = 1; - mb(); + smp_mb(); if ( bootsym(trampoline_cpu_started) == 0xA5 ) /* trampoline started but...? */ printk("Stuck ??\n"); @@ -576,7 +576,7 @@ static int do_boot_cpu(int apicid, int cpu) /* mark "stuck" area as not stuck */ bootsym(trampoline_cpu_started) = 0; - mb(); + smp_mb(); smpboot_restore_warm_reset_vector(); diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index dda89d8..0039e23 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -977,10 +977,10 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) /* 1. Update guest kernel version. */ _u.version = u->version = version_update_begin(u->version); - wmb(); + smp_wmb(); /* 2. Update all other guest kernel fields. */ *u = _u; - wmb(); + smp_wmb(); /* 3. Update guest kernel version. */ u->version = version_update_end(u->version); @@ -1006,10 +1006,10 @@ bool_t update_secondary_system_time(struct vcpu *v, smap_policy_change(v, saved_policy); return 0; } - wmb(); + smp_wmb(); /* 2. Update all other userspace fields. */ __copy_to_guest(user_u, u, 1); - wmb(); + smp_wmb(); /* 3. Update userspace version. */ u->version = version_update_end(u->version); __copy_field_to_guest(user_u, u, version); diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c index ea9f7e7..4fcf6fa 100644 --- a/xen/drivers/passthrough/amd/iommu_init.c +++ b/xen/drivers/passthrough/amd/iommu_init.c @@ -559,7 +559,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) return; } udelay(1); - rmb(); + smp_rmb(); code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK, IOMMU_EVENT_CODE_SHIFT); } @@ -664,7 +664,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[]) return; } udelay(1); - rmb(); + smp_rmb(); code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK, IOMMU_PPR_LOG_CODE_SHIFT); } diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h index da924bf..9956aae 100644 --- a/xen/include/asm-x86/desc.h +++ b/xen/include/asm-x86/desc.h @@ -128,10 +128,10 @@ static inline void _write_gate_lower(volatile idt_entry_t *gate, #define _set_gate(gate_addr,type,dpl,addr) \ do { \ (gate_addr)->a = 0; \ - wmb(); /* disable gate /then/ rewrite */ \ + smp_wmb(); /* disable gate /then/ rewrite */ \ (gate_addr)->b = \ ((unsigned long)(addr) >> 32); \ - wmb(); /* rewrite /then/ enable gate */ \ + smp_wmb(); /* rewrite /then/ enable gate */ \ (gate_addr)->a = \ (((unsigned long)(addr) & 0xFFFF0000UL) << 32) | \ ((unsigned long)(dpl) << 45) | \ @@ -174,11 +174,11 @@ static inline void _update_gate_addr_lower(idt_entry_t *gate, void *addr) #define _set_tssldt_desc(desc,addr,limit,type) \ do { \ (desc)[0].b = (desc)[1].b = 0; \ - wmb(); /* disable entry /then/ rewrite */ \ + smp_wmb(); /* disable entry /then/ rewrite */ \ (desc)[0].a = \ ((u32)(addr) << 16) | ((u32)(limit) & 0xFFFF); \ (desc)[1].a = (u32)(((unsigned long)(addr)) >> 32); \ - wmb(); /* rewrite /then/ enable entry */ \ + smp_wmb(); /* rewrite /then/ enable entry */ \ (desc)[0].b = \ ((u32)(addr) & 0xFF000000U) | \ ((u32)(type) << 8) | 0x8000U | \ diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h index eb498f5..9cb6fd7 100644 --- a/xen/include/asm-x86/system.h +++ b/xen/include/asm-x86/system.h @@ -183,7 +183,7 @@ static always_inline unsigned long __xadd( #define smp_wmb() wmb() #define set_mb(var, value) do { xchg(&var, value); } while (0) -#define set_wmb(var, value) do { var = value; wmb(); } while (0) +#define set_wmb(var, value) do { var = value; smp_wmb(); } while (0) #define local_irq_disable() asm volatile ( "cli" : : : "memory" ) #define local_irq_enable() asm volatile ( "sti" : : : "memory" ) -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 10:05 ` [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper @ 2016-12-05 11:47 ` Jan Beulich 2016-12-05 13:29 ` Andrew Cooper 0 siblings, 1 reply; 36+ messages in thread From: Jan Beulich @ 2016-12-05 11:47 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel >>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: > --- a/xen/arch/x86/acpi/cpu_idle.c > +++ b/xen/arch/x86/acpi/cpu_idle.c > @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx) > > if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) ) > { > - mb(); > + smp_mb(); > clflush((void *)&mwait_wakeup(cpu)); > - mb(); > + smp_mb(); > } Both need to stay as they are afaict: In order for the clflush() to do what we want we have to order it wrt earlier as well as later writes, regardless of SMP-ness. Or wait - the SDM has changed in that respect (and a footnote describes the earlier specified behavior now). AMD, otoh, continues to require MFENCE for ordering purposes. > --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c > +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c > @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data) > /* Counter enable */ > value |= (1ULL << 51); > mca_wrmsr(MSR_IA32_MCx_MISC(4), value); > - wmb(); > + smp_wmb(); > } > } > > @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c) > value |= (1ULL << 51); > wrmsrl(MSR_IA32_MCx_MISC(4), value); > /* serialize */ > - wmb(); > + smp_wmb(); > printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n"); > } > } These will need confirming by AMD engineers. > --- a/xen/arch/x86/cpu/mcheck/barrier.c > +++ b/xen/arch/x86/cpu/mcheck/barrier.c > @@ -12,7 +12,7 @@ void mce_barrier_init(struct mce_softirq_barrier *bar) > void mce_barrier_dec(struct mce_softirq_barrier *bar) > { > atomic_inc(&bar->outgen); > - wmb(); > + smp_wmb(); > atomic_dec(&bar->val); > } This being x86-specific code - do we need a barrier here at all, between two LOCKed instructions? (Same for two more cases further down.) > @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which) > } > > mctelem_processing_hold(tep); > - wmb(); > + smp_wmb(); > spin_unlock(&processing_lock); Don't we imply unlocks to be write barriers? > --- a/xen/arch/x86/io_apic.c > +++ b/xen/arch/x86/io_apic.c > @@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void) > unsigned long t1, flags; > > t1 = pit0_ticks; > - mb(); > + smp_mb(); > > local_save_flags(flags); > local_irq_enable(); > @@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void) > * might have cached one ExtINT interrupt. Finally, at > * least one tick may be lost due to delays. > */ > - mb(); > + smp_mb(); > if (pit0_ticks - t1 > 4) > return 1; I think these two can be barrier(). > @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave) > for ( i = 1; i <= 5; i++ ) > { > tsc_value = rdtsc_ordered(); > - wmb(); > + smp_wmb(); > atomic_inc(&tsc_count); Same question as above wrt the following LOCKed instruction. > --- a/xen/drivers/passthrough/amd/iommu_init.c > +++ b/xen/drivers/passthrough/amd/iommu_init.c > @@ -559,7 +559,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) > return; > } > udelay(1); > - rmb(); > + smp_rmb(); > code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK, > IOMMU_EVENT_CODE_SHIFT); > } > @@ -664,7 +664,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[]) > return; > } > udelay(1); > - rmb(); > + smp_rmb(); > code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK, > IOMMU_PPR_LOG_CODE_SHIFT); > } Don't these fall into the "reduced-cacheability memory mappings" category? Or if the mappings these reads go through are UC, aren't they unneeded altogether? In any event this would better be a separate patch Cc-ed to the AMD IOMMU maintainers. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 11:47 ` Jan Beulich @ 2016-12-05 13:29 ` Andrew Cooper 2016-12-05 14:03 ` Jan Beulich 0 siblings, 1 reply; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 13:29 UTC (permalink / raw) To: Jan Beulich; +Cc: Xen-devel On 05/12/16 11:47, Jan Beulich wrote: >>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >> --- a/xen/arch/x86/acpi/cpu_idle.c >> +++ b/xen/arch/x86/acpi/cpu_idle.c >> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx) >> >> if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) ) >> { >> - mb(); >> + smp_mb(); >> clflush((void *)&mwait_wakeup(cpu)); >> - mb(); >> + smp_mb(); >> } > Both need to stay as they are afaict: In order for the clflush() to do > what we want we have to order it wrt earlier as well as later writes, > regardless of SMP-ness. Or wait - the SDM has changed in that > respect (and a footnote describes the earlier specified behavior now). > AMD, otoh, continues to require MFENCE for ordering purposes. mb() == smp_mb(). They are both mfence instructions. However, if AMD specifically requires mfence, we should explicitly use that rather than relying on the implementation details of smp_mb(). > >> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c >> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c >> @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data) >> /* Counter enable */ >> value |= (1ULL << 51); >> mca_wrmsr(MSR_IA32_MCx_MISC(4), value); >> - wmb(); >> + smp_wmb(); >> } >> } >> >> @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c) >> value |= (1ULL << 51); >> wrmsrl(MSR_IA32_MCx_MISC(4), value); >> /* serialize */ >> - wmb(); >> + smp_wmb(); >> printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n"); >> } >> } > These will need confirming by AMD engineers. I was uncertain whether these were necessary at all, but as identified in the commit message, this is no functional change as Xen currently has rmb/wmb as plain barriers, not fence instructions. > >> --- a/xen/arch/x86/cpu/mcheck/barrier.c >> +++ b/xen/arch/x86/cpu/mcheck/barrier.c >> @@ -12,7 +12,7 @@ void mce_barrier_init(struct mce_softirq_barrier *bar) >> void mce_barrier_dec(struct mce_softirq_barrier *bar) >> { >> atomic_inc(&bar->outgen); >> - wmb(); >> + smp_wmb(); >> atomic_dec(&bar->val); >> } > This being x86-specific code - do we need a barrier here at all, > between two LOCKed instructions? (Same for two more cases further > down.) Hmm, I think not. The C semantics guarantees ordering of the atomic_inc() and atomic_dec(), and they are both writes to will be strictly ordered. > >> @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which) >> } >> >> mctelem_processing_hold(tep); >> - wmb(); >> + smp_wmb(); >> spin_unlock(&processing_lock); > Don't we imply unlocks to be write barriers? They are, as an unlock is necessarily a write, combined with x86's ordering guarantees. Then again, I am not sure how this would interact with TSX, so am not sure if we should assume or rely on such behaviour. > >> --- a/xen/arch/x86/io_apic.c >> +++ b/xen/arch/x86/io_apic.c >> @@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void) >> unsigned long t1, flags; >> >> t1 = pit0_ticks; >> - mb(); >> + smp_mb(); >> >> local_save_flags(flags); >> local_irq_enable(); >> @@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void) >> * might have cached one ExtINT interrupt. Finally, at >> * least one tick may be lost due to delays. >> */ >> - mb(); >> + smp_mb(); >> if (pit0_ticks - t1 > 4) >> return 1; > I think these two can be barrier(). These were originally in the previous patch, but I wasn't so certain and moved them back here. Thinking about it further, pit0_ticks is only ever updated by the current cpu, so so-long as the compiler doesn't elide the read, it should be fine. > >> @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave) >> for ( i = 1; i <= 5; i++ ) >> { >> tsc_value = rdtsc_ordered(); >> - wmb(); >> + smp_wmb(); >> atomic_inc(&tsc_count); > Same question as above wrt the following LOCKed instruction. I'm not sure the locked instruction is relevant here. C's ordering-model is sufficient to make this correct. > >> --- a/xen/drivers/passthrough/amd/iommu_init.c >> +++ b/xen/drivers/passthrough/amd/iommu_init.c >> @@ -559,7 +559,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[]) >> return; >> } >> udelay(1); >> - rmb(); >> + smp_rmb(); >> code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK, >> IOMMU_EVENT_CODE_SHIFT); >> } >> @@ -664,7 +664,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[]) >> return; >> } >> udelay(1); >> - rmb(); >> + smp_rmb(); >> code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK, >> IOMMU_PPR_LOG_CODE_SHIFT); >> } > Don't these fall into the "reduced-cacheability memory mappings" > category? Or if the mappings these reads go through are UC, > aren't they unneeded altogether? In any event this would better be > a separate patch Cc-ed to the AMD IOMMU maintainers. I can't find any requirements in the AMD IOMMU spec about the cacheability of mappings. We use UC mappings, although frankly for the starting memset alone, we should probably better using WB followed by an explicit cache flush, then switching to UC. With UC mappings, we don't need any barriers at all. I will submit a patch separately removing them. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 13:29 ` Andrew Cooper @ 2016-12-05 14:03 ` Jan Beulich 2016-12-05 14:24 ` Andrew Cooper 0 siblings, 1 reply; 36+ messages in thread From: Jan Beulich @ 2016-12-05 14:03 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel >>> On 05.12.16 at 14:29, <andrew.cooper3@citrix.com> wrote: > On 05/12/16 11:47, Jan Beulich wrote: >>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >>> --- a/xen/arch/x86/acpi/cpu_idle.c >>> +++ b/xen/arch/x86/acpi/cpu_idle.c >>> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx) >>> >>> if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) ) >>> { >>> - mb(); >>> + smp_mb(); >>> clflush((void *)&mwait_wakeup(cpu)); >>> - mb(); >>> + smp_mb(); >>> } >> Both need to stay as they are afaict: In order for the clflush() to do >> what we want we have to order it wrt earlier as well as later writes, >> regardless of SMP-ness. Or wait - the SDM has changed in that >> respect (and a footnote describes the earlier specified behavior now). >> AMD, otoh, continues to require MFENCE for ordering purposes. > > mb() == smp_mb(). They are both mfence instructions. Of course. But still smp_mb() would be less correct from an abstract perspective, as here we care only about the local CPU. That said, ... > However, if AMD specifically requires mfence, we should explicitly use > that rather than relying on the implementation details of smp_mb(). ... I'd be fine with this. >>> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c >>> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c >>> @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data) >>> /* Counter enable */ >>> value |= (1ULL << 51); >>> mca_wrmsr(MSR_IA32_MCx_MISC(4), value); >>> - wmb(); >>> + smp_wmb(); >>> } >>> } >>> >>> @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c) >>> value |= (1ULL << 51); >>> wrmsrl(MSR_IA32_MCx_MISC(4), value); >>> /* serialize */ >>> - wmb(); >>> + smp_wmb(); >>> printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n"); >>> } >>> } >> These will need confirming by AMD engineers. > > I was uncertain whether these were necessary at all, but as identified > in the commit message, this is no functional change as Xen currently has > rmb/wmb as plain barriers, not fence instructions. And may hence be subtly broken, if this code was lifted from Linux? >>> @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which) >>> } >>> >>> mctelem_processing_hold(tep); >>> - wmb(); >>> + smp_wmb(); >>> spin_unlock(&processing_lock); >> Don't we imply unlocks to be write barriers? > > They are, as an unlock is necessarily a write, combined with x86's > ordering guarantees. > > Then again, I am not sure how this would interact with TSX, so am not > sure if we should assume or rely on such behaviour. Isn't architectural state at the end of a transactional region indistinguishable as to whether TSX was actually used or the abort path taken (assuming the two code patch don't differ in their actions)? >>> --- a/xen/arch/x86/io_apic.c >>> +++ b/xen/arch/x86/io_apic.c >>> @@ -1486,7 +1486,7 @@ static int __init timer_irq_works(void) >>> unsigned long t1, flags; >>> >>> t1 = pit0_ticks; >>> - mb(); >>> + smp_mb(); >>> >>> local_save_flags(flags); >>> local_irq_enable(); >>> @@ -1501,7 +1501,7 @@ static int __init timer_irq_works(void) >>> * might have cached one ExtINT interrupt. Finally, at >>> * least one tick may be lost due to delays. >>> */ >>> - mb(); >>> + smp_mb(); >>> if (pit0_ticks - t1 > 4) >>> return 1; >> I think these two can be barrier(). > > These were originally in the previous patch, but I wasn't so certain and > moved them back here. > > Thinking about it further, pit0_ticks is only ever updated by the > current cpu, so so-long as the compiler doesn't elide the read, it > should be fine. That was exactly my thinking when giving the comment. >>> @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave) >>> for ( i = 1; i <= 5; i++ ) >>> { >>> tsc_value = rdtsc_ordered(); >>> - wmb(); >>> + smp_wmb(); >>> atomic_inc(&tsc_count); >> Same question as above wrt the following LOCKed instruction. > > I'm not sure the locked instruction is relevant here. C's > ordering-model is sufficient to make this correct. I don't follow - the two involved variables are distinct, so I don't see how C's ordering model helps here at all. We need (at the machine level) the write to tsc_value to precede the increment of tsc_count, and I don't think C alone guarantees any such ordering. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 14:03 ` Jan Beulich @ 2016-12-05 14:24 ` Andrew Cooper 2016-12-05 14:33 ` Jan Beulich 0 siblings, 1 reply; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 14:24 UTC (permalink / raw) To: Jan Beulich; +Cc: Xen-devel On 05/12/16 14:03, Jan Beulich wrote: >>>> On 05.12.16 at 14:29, <andrew.cooper3@citrix.com> wrote: >> On 05/12/16 11:47, Jan Beulich wrote: >>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >>>> --- a/xen/arch/x86/acpi/cpu_idle.c >>>> +++ b/xen/arch/x86/acpi/cpu_idle.c >>>> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx) >>>> >>>> if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) ) >>>> { >>>> - mb(); >>>> + smp_mb(); >>>> clflush((void *)&mwait_wakeup(cpu)); >>>> - mb(); >>>> + smp_mb(); >>>> } >>> Both need to stay as they are afaict: In order for the clflush() to do >>> what we want we have to order it wrt earlier as well as later writes, >>> regardless of SMP-ness. Or wait - the SDM has changed in that >>> respect (and a footnote describes the earlier specified behavior now). >>> AMD, otoh, continues to require MFENCE for ordering purposes. >> mb() == smp_mb(). They are both mfence instructions. > Of course. But still smp_mb() would be less correct from an > abstract perspective ? That is entirely the purpose and intended meaning of the abstraction. smp_mb() orders operations such that (visible to other CPUs in the system), all writes will have completed before any subsequent reads begin. > , as here we care only about the local CPU. > That said, ... > >> However, if AMD specifically requires mfence, we should explicitly use >> that rather than relying on the implementation details of smp_mb(). > ... I'd be fine with this. An earlier version of the series introduced explicit {m,s,l}fence() defines. I will reintroduce these. > >>>> --- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c >>>> +++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c >>>> @@ -175,7 +175,7 @@ static void mce_amd_work_fn(void *data) >>>> /* Counter enable */ >>>> value |= (1ULL << 51); >>>> mca_wrmsr(MSR_IA32_MCx_MISC(4), value); >>>> - wmb(); >>>> + smp_wmb(); >>>> } >>>> } >>>> >>>> @@ -240,7 +240,7 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c) >>>> value |= (1ULL << 51); >>>> wrmsrl(MSR_IA32_MCx_MISC(4), value); >>>> /* serialize */ >>>> - wmb(); >>>> + smp_wmb(); >>>> printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n"); >>>> } >>>> } >>> These will need confirming by AMD engineers. >> I was uncertain whether these were necessary at all, but as identified >> in the commit message, this is no functional change as Xen currently has >> rmb/wmb as plain barriers, not fence instructions. > And may hence be subtly broken, if this code was lifted from Linux? It doesn't resemble anything in Linux these days. I don't know if that means we have lagged, or it was developed independently. Looking at the Linux code, there are a few mandatory barriers which should all be SMP barriers instead (guarding updates of shared memory), but no barriers at all around MSR reads or writes. > >>>> @@ -433,7 +433,7 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which) >>>> } >>>> >>>> mctelem_processing_hold(tep); >>>> - wmb(); >>>> + smp_wmb(); >>>> spin_unlock(&processing_lock); >>> Don't we imply unlocks to be write barriers? >> They are, as an unlock is necessarily a write, combined with x86's >> ordering guarantees. >> >> Then again, I am not sure how this would interact with TSX, so am not >> sure if we should assume or rely on such behaviour. > Isn't architectural state at the end of a transactional region > indistinguishable as to whether TSX was actually used or the > abort path taken (assuming the two code patch don't differ > in their actions)? I'd hope so, but I haven't had occasion to dig into TSX in detail yet. >>>> @@ -124,7 +124,7 @@ static void synchronize_tsc_master(unsigned int slave) >>>> for ( i = 1; i <= 5; i++ ) >>>> { >>>> tsc_value = rdtsc_ordered(); >>>> - wmb(); >>>> + smp_wmb(); >>>> atomic_inc(&tsc_count); >>> Same question as above wrt the following LOCKed instruction. >> I'm not sure the locked instruction is relevant here. C's >> ordering-model is sufficient to make this correct. > I don't follow - the two involved variables are distinct, so I don't > see how C's ordering model helps here at all. We need (at the > machine level) the write to tsc_value to precede the increment > of tsc_count, and I don't think C alone guarantees any such > ordering. Sorry - I looked at that code and thought they were both using tsc_value. Yes, we do need at least a compiler barrer here. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers 2016-12-05 14:24 ` Andrew Cooper @ 2016-12-05 14:33 ` Jan Beulich 0 siblings, 0 replies; 36+ messages in thread From: Jan Beulich @ 2016-12-05 14:33 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel >>> On 05.12.16 at 15:24, <andrew.cooper3@citrix.com> wrote: > On 05/12/16 14:03, Jan Beulich wrote: >>>>> On 05.12.16 at 14:29, <andrew.cooper3@citrix.com> wrote: >>> On 05/12/16 11:47, Jan Beulich wrote: >>>>>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >>>>> --- a/xen/arch/x86/acpi/cpu_idle.c >>>>> +++ b/xen/arch/x86/acpi/cpu_idle.c >>>>> @@ -391,9 +391,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int > ecx) >>>>> >>>>> if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) ) >>>>> { >>>>> - mb(); >>>>> + smp_mb(); >>>>> clflush((void *)&mwait_wakeup(cpu)); >>>>> - mb(); >>>>> + smp_mb(); >>>>> } >>>> Both need to stay as they are afaict: In order for the clflush() to do >>>> what we want we have to order it wrt earlier as well as later writes, >>>> regardless of SMP-ness. Or wait - the SDM has changed in that >>>> respect (and a footnote describes the earlier specified behavior now). >>>> AMD, otoh, continues to require MFENCE for ordering purposes. >>> mb() == smp_mb(). They are both mfence instructions. >> Of course. But still smp_mb() would be less correct from an >> abstract perspective > > ? That is entirely the purpose and intended meaning of the abstraction. > > smp_mb() orders operations such that (visible to other CPUs in the > system), all writes will have completed before any subsequent reads begin. Yes, but this code is not about multiple CPUs, but just the local one (we want to make sure CLFLUSH does what we want). Hence using smp_ prefixed barriers would be wrong. But anyway, with this becoming explicit mfence() invocations, the discussion is all moot now. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions 2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper ` (2 preceding siblings ...) 2016-12-05 10:05 ` [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper @ 2016-12-05 10:05 ` Andrew Cooper 2016-12-05 10:11 ` Juergen Gross ` (2 more replies) 3 siblings, 3 replies; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 10:05 UTC (permalink / raw) To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich Barriers are a complicated topic, a common source of confusion in submitted code, and their incorrect use is a common cause of bugs. It *really* doesn't help when Xen's API is the same as Linux, but its ABI different. Bring the two back in line, so programmers stand a chance of actually getting their use correct. As Xen has no current need for mandatory barriers, leave them commented out to avoid accidential misue. No functional change. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> --- xen/include/asm-x86/system.h | 31 +++++++++++++++++++++++-------- xen/include/asm-x86/x86_64/system.h | 3 --- 2 files changed, 23 insertions(+), 11 deletions(-) diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h index 9cb6fd7..9cd401a 100644 --- a/xen/include/asm-x86/system.h +++ b/xen/include/asm-x86/system.h @@ -164,23 +164,38 @@ static always_inline unsigned long __xadd( ((typeof(*(ptr)))__xadd(ptr, (typeof(*(ptr)))(v), sizeof(*(ptr)))) /* + * Mandatory barriers, for the ordering of reads and writes with MMIO devices + * mapped with reduced cacheability. + * + * Xen has no such device drivers, and therefore no need for mandatory + * barriers. These these are hidden to avoid their misuse; If a future need + * is found, they can be re-introduced, but chances are very good that a + * programmer actually should be using the smp_*() barriers. + * +#define mb() asm volatile ("mfence" ::: "memory") +#define rmb() asm volatile ("lfence" ::: "memory") +#define wmb() asm volatile ("sfence" ::: "memory") + */ + +/* + * SMP barriers, for ordering of reads and writes between CPUs, most commonly + * used with shared memory. + * * Both Intel and AMD agree that, from a programmer's viewpoint: * Loads cannot be reordered relative to other loads. * Stores cannot be reordered relative to other stores. - * + * Loads may be reordered ahead of an unaliasing store. + * * Intel64 Architecture Memory Ordering White Paper * <http://developer.intel.com/products/processor/manuals/318147.pdf> - * + * * AMD64 Architecture Programmer's Manual, Volume 2: System Programming * <http://www.amd.com/us-en/assets/content_type/\ * white_papers_and_tech_docs/24593.pdf> */ -#define rmb() barrier() -#define wmb() barrier() - -#define smp_mb() mb() -#define smp_rmb() rmb() -#define smp_wmb() wmb() +#define smp_mb() asm volatile ("mfence" ::: "memory") +#define smp_rmb() barrier() +#define smp_wmb() barrier() #define set_mb(var, value) do { xchg(&var, value); } while (0) #define set_wmb(var, value) do { var = value; smp_wmb(); } while (0) diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h index 7026c05..bdf45e5 100644 --- a/xen/include/asm-x86/x86_64/system.h +++ b/xen/include/asm-x86/x86_64/system.h @@ -79,7 +79,4 @@ static always_inline __uint128_t __cmpxchg16b( _rc; \ }) -#define mb() \ - asm volatile ( "mfence" : : : "memory" ) - #endif /* __X86_64_SYSTEM_H__ */ -- 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions 2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper @ 2016-12-05 10:11 ` Juergen Gross 2016-12-05 13:45 ` Andrew Cooper 2016-12-05 11:51 ` Jan Beulich 2016-12-05 12:01 ` David Vrabel 2 siblings, 1 reply; 36+ messages in thread From: Juergen Gross @ 2016-12-05 10:11 UTC (permalink / raw) To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich On 05/12/16 11:05, Andrew Cooper wrote: > Barriers are a complicated topic, a common source of confusion in submitted > code, and their incorrect use is a common cause of bugs. It *really* doesn't > help when Xen's API is the same as Linux, but its ABI different. > > Bring the two back in line, so programmers stand a chance of actually getting > their use correct. > > As Xen has no current need for mandatory barriers, leave them commented out to > avoid accidential misue. > > No functional change. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > --- > xen/include/asm-x86/system.h | 31 +++++++++++++++++++++++-------- > xen/include/asm-x86/x86_64/system.h | 3 --- > 2 files changed, 23 insertions(+), 11 deletions(-) > > diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h > index 9cb6fd7..9cd401a 100644 > --- a/xen/include/asm-x86/system.h > +++ b/xen/include/asm-x86/system.h > @@ -164,23 +164,38 @@ static always_inline unsigned long __xadd( > ((typeof(*(ptr)))__xadd(ptr, (typeof(*(ptr)))(v), sizeof(*(ptr)))) > > /* > + * Mandatory barriers, for the ordering of reads and writes with MMIO devices > + * mapped with reduced cacheability. > + * > + * Xen has no such device drivers, and therefore no need for mandatory > + * barriers. These these are hidden to avoid their misuse; If a future need Duplicate "these". Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions 2016-12-05 10:11 ` Juergen Gross @ 2016-12-05 13:45 ` Andrew Cooper 0 siblings, 0 replies; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 13:45 UTC (permalink / raw) To: Juergen Gross, Xen-devel; +Cc: Jan Beulich On 05/12/16 10:11, Juergen Gross wrote: > On 05/12/16 11:05, Andrew Cooper wrote: >> Barriers are a complicated topic, a common source of confusion in submitted >> code, and their incorrect use is a common cause of bugs. It *really* doesn't >> help when Xen's API is the same as Linux, but its ABI different. >> >> Bring the two back in line, so programmers stand a chance of actually getting >> their use correct. >> >> As Xen has no current need for mandatory barriers, leave them commented out to >> avoid accidential misue. >> >> No functional change. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> --- >> xen/include/asm-x86/system.h | 31 +++++++++++++++++++++++-------- >> xen/include/asm-x86/x86_64/system.h | 3 --- >> 2 files changed, 23 insertions(+), 11 deletions(-) >> >> diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h >> index 9cb6fd7..9cd401a 100644 >> --- a/xen/include/asm-x86/system.h >> +++ b/xen/include/asm-x86/system.h >> @@ -164,23 +164,38 @@ static always_inline unsigned long __xadd( >> ((typeof(*(ptr)))__xadd(ptr, (typeof(*(ptr)))(v), sizeof(*(ptr)))) >> >> /* >> + * Mandatory barriers, for the ordering of reads and writes with MMIO devices >> + * mapped with reduced cacheability. >> + * >> + * Xen has no such device drivers, and therefore no need for mandatory >> + * barriers. These these are hidden to avoid their misuse; If a future need > Duplicate "these". Fixed, thanks. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions 2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper 2016-12-05 10:11 ` Juergen Gross @ 2016-12-05 11:51 ` Jan Beulich 2016-12-05 14:08 ` Andrew Cooper 2016-12-05 12:01 ` David Vrabel 2 siblings, 1 reply; 36+ messages in thread From: Jan Beulich @ 2016-12-05 11:51 UTC (permalink / raw) To: Andrew Cooper; +Cc: Xen-devel >>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: > Barriers are a complicated topic, a common source of confusion in submitted > code, and their incorrect use is a common cause of bugs. It *really* doesn't > help when Xen's API is the same as Linux, but its ABI different. > > Bring the two back in line, so programmers stand a chance of actually getting > their use correct. > > As Xen has no current need for mandatory barriers, leave them commented out to > avoid accidential misue. I'm not convinced of this part - common driver code could very well need to use such barriers, and I would find it rather odd if I had to first re-introduce them when adding any such. In fact I'm surprised there's no such use anywhere. (Depending on the discussion on earlier patches, there may in fact remain a few such uses.) Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions 2016-12-05 11:51 ` Jan Beulich @ 2016-12-05 14:08 ` Andrew Cooper 0 siblings, 0 replies; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 14:08 UTC (permalink / raw) To: Jan Beulich; +Cc: Xen-devel On 05/12/16 11:51, Jan Beulich wrote: >>>> On 05.12.16 at 11:05, <andrew.cooper3@citrix.com> wrote: >> Barriers are a complicated topic, a common source of confusion in submitted >> code, and their incorrect use is a common cause of bugs. It *really* doesn't >> help when Xen's API is the same as Linux, but its ABI different. >> >> Bring the two back in line, so programmers stand a chance of actually getting >> their use correct. >> >> As Xen has no current need for mandatory barriers, leave them commented out to >> avoid accidential misue. > I'm not convinced of this part - common driver code could very well > need to use such barriers, and I would find it rather odd if I had to > first re-introduce them when adding any such. In fact I'm surprised > there's no such use anywhere. (Depending on the discussion on > earlier patches, there may in fact remain a few such uses.) I'm not surprised that there is no common use the mandatory barriers. Xen has no interesting device drivers using mappings other than WB or UC. As for hiding, I am betting that it is far more likely that an introduction is (mis)use of a barrier, rather than a genuine real use. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions 2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper 2016-12-05 10:11 ` Juergen Gross 2016-12-05 11:51 ` Jan Beulich @ 2016-12-05 12:01 ` David Vrabel 2016-12-05 13:46 ` Andrew Cooper 2 siblings, 1 reply; 36+ messages in thread From: David Vrabel @ 2016-12-05 12:01 UTC (permalink / raw) To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich On 05/12/16 10:05, Andrew Cooper wrote: > > + * Loads may be reordered ahead of an unaliasing store. It might be useful to summarize what an "unaliasing store" is. David _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions 2016-12-05 12:01 ` David Vrabel @ 2016-12-05 13:46 ` Andrew Cooper 0 siblings, 0 replies; 36+ messages in thread From: Andrew Cooper @ 2016-12-05 13:46 UTC (permalink / raw) To: David Vrabel, Xen-devel; +Cc: Jan Beulich On 05/12/16 12:01, David Vrabel wrote: > On 05/12/16 10:05, Andrew Cooper wrote: >> + * Loads may be reordered ahead of an unaliasing store. > It might be useful to summarize what an "unaliasing store" is. "... ahead of stores, so long as the addresses don't alias" ? ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2016-12-07 18:55 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-05 10:05 [PATCH 0/4] Fixes to common and x86 barriers Andrew Cooper 2016-12-05 10:05 ` [PATCH 1/4] xen/common: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper 2016-12-05 10:55 ` Jan Beulich 2016-12-05 19:07 ` Stefano Stabellini 2016-12-05 10:05 ` [PATCH 2/4] xen/x86: Drop erronious barriers Andrew Cooper 2016-12-05 11:18 ` Jan Beulich 2016-12-05 11:25 ` Andrew Cooper 2016-12-05 12:28 ` Jan Beulich 2016-12-05 13:43 ` Andrew Cooper 2016-12-05 13:50 ` Jan Beulich 2016-12-05 13:59 ` Andrew Cooper 2016-12-05 14:07 ` Jan Beulich 2016-12-05 19:14 ` Stefano Stabellini 2016-12-05 19:17 ` Stefano Stabellini 2016-12-06 0:10 ` Andrew Cooper 2016-12-06 20:27 ` Stefano Stabellini 2016-12-06 20:32 ` Stefano Stabellini 2016-12-07 1:03 ` Andrew Cooper 2016-12-07 1:20 ` Stefano Stabellini 2016-12-07 1:46 ` Andrew Cooper 2016-12-07 18:31 ` Julien Grall 2016-12-07 18:44 ` Stefano Stabellini 2016-12-07 18:55 ` Julien Grall 2016-12-05 10:05 ` [PATCH 3/4] xen/x86: Replace incorrect mandatory barriers with SMP barriers Andrew Cooper 2016-12-05 11:47 ` Jan Beulich 2016-12-05 13:29 ` Andrew Cooper 2016-12-05 14:03 ` Jan Beulich 2016-12-05 14:24 ` Andrew Cooper 2016-12-05 14:33 ` Jan Beulich 2016-12-05 10:05 ` [PATCH 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper 2016-12-05 10:11 ` Juergen Gross 2016-12-05 13:45 ` Andrew Cooper 2016-12-05 11:51 ` Jan Beulich 2016-12-05 14:08 ` Andrew Cooper 2016-12-05 12:01 ` David Vrabel 2016-12-05 13:46 ` Andrew Cooper
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).