[PATCH v2 0/4] x86: Corrections to barrier usage

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/4] x86: Corrections to barrier usage
@ 2017-08-16 11:22 Andrew Cooper
  2017-08-16 11:22 ` [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal Andrew Cooper
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-08-16 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper

This series brings Xen's barrier ABI in line with Linuxs, so developers stand
a better chance of not getting them wrong.

Andrew Cooper (4):
  x86/mcheck: Minor cleanup to amd_nonfatal
  xen/x86: Drop unnecessary barriers
  xen/x86: Replace remaining mandatory barriers with SMP barriers
  xen/x86: Correct mandatory and SMP barrier definitions

 xen/arch/x86/acpi/cpu_idle.c             | 10 ++++------
 xen/arch/x86/cpu/mcheck/amd_nonfatal.c   | 15 +++++----------
 xen/arch/x86/cpu/mcheck/barrier.c        | 10 +++++-----
 xen/arch/x86/cpu/mcheck/mce.c            |  3 ---
 xen/arch/x86/cpu/mcheck/mctelem.c        |  7 ++-----
 xen/arch/x86/crash.c                     |  3 ---
 xen/arch/x86/genapic/x2apic.c            |  6 +++---
 xen/arch/x86/hpet.c                      |  2 +-
 xen/arch/x86/hvm/ioreq.c                 |  4 ++--
 xen/arch/x86/irq.c                       |  4 ++--
 xen/arch/x86/mm/shadow/multi.c           |  1 -
 xen/arch/x86/smpboot.c                   | 14 ++++++--------
 xen/arch/x86/time.c                      |  8 ++++----
 xen/drivers/passthrough/amd/iommu_init.c |  2 --
 xen/include/asm-x86/desc.h               |  8 ++++----
 xen/include/asm-x86/system.h             | 30 +++++++++++++++++-------------
 xen/include/asm-x86/x86_64/system.h      |  3 ---
 17 files changed, 55 insertions(+), 75 deletions(-)

-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal
  2017-08-16 11:22 [PATCH v2 0/4] x86: Corrections to barrier usage Andrew Cooper
@ 2017-08-16 11:22 ` Andrew Cooper
  2017-08-16 15:11   ` Jan Beulich
  2017-08-18 13:19   ` Tim Deegan
  2017-08-16 11:22 ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Andrew Cooper
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-08-16 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

  * Drop trailing whitespace.
  * Move amd_nonfatal_mcheck_init() into .init.text and drop a trailing return.
  * Drop unnecessary wmb()'s.  Because of Xen's implementation, they are only
    compiler barriers anyway, and each wrmsr() is already fully serialising.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

New in v2
---
 xen/arch/x86/cpu/mcheck/amd_nonfatal.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
index c6a9c89..222f539 100644
--- a/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
+++ b/xen/arch/x86/cpu/mcheck/amd_nonfatal.c
@@ -106,7 +106,7 @@ static void mce_amd_checkregs(void *info)
 		} else {
 			mctelem_dismiss(mctc);
 		}
-		
+
 	} else if (mctc != NULL) {
 		mctelem_dismiss(mctc);
 	}
@@ -151,7 +151,7 @@ static void mce_amd_work_fn(void *data)
 
 		/* HW does not count *all* kinds of correctable errors.
 		 * Thus it is possible, that the polling routine finds an
-		 * correctable error even if the HW reports nothing. */ 
+		 * correctable error even if the HW reports nothing. */
 		if (counter > 0) {
 			/* HW reported correctable errors,
 			 * the polling routine did not find...
@@ -164,8 +164,8 @@ static void mce_amd_work_fn(void *data)
 					(counter == 1 ? "" : "s"),
 					(counter == 1 ? "was" : "were"));
 			}
-			/* subtract 1 to not double count the error 
-			 * from the polling service routine */ 
+			/* subtract 1 to not double count the error
+			 * from the polling service routine */
 			adjust += (counter - 1);
 
 			/* Restart counter */
@@ -174,7 +174,6 @@ static void mce_amd_work_fn(void *data)
 			/* Counter enable */
 			value |= (1ULL << 51);
 			mca_wrmsr(MSR_IA32_MCx_MISC(4), value);
-			wmb();
 		}
 	}
 
@@ -202,7 +201,7 @@ static void mce_amd_work_fn(void *data)
 	adjust = 0;
 }
 
-void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
+void __init amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
 {
 	if (c->x86_vendor != X86_VENDOR_AMD)
 		return;
@@ -238,14 +237,10 @@ void amd_nonfatal_mcheck_init(struct cpuinfo_x86 *c)
 			/* Counter enable */
 			value |= (1ULL << 51);
 			wrmsrl(MSR_IA32_MCx_MISC(4), value);
-			/* serialize */
-			wmb();
 			printk(XENLOG_INFO "MCA: Use hw thresholding to adjust polling frequency\n");
 		}
 	}
 
 	init_timer(&mce_timer, mce_amd_work_fn, NULL, 0);
 	set_timer(&mce_timer, NOW() + period);
-
-	return;
 }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-16 11:22 [PATCH v2 0/4] x86: Corrections to barrier usage Andrew Cooper
  2017-08-16 11:22 ` [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal Andrew Cooper
@ 2017-08-16 11:22 ` Andrew Cooper
  2017-08-16 15:23   ` Jan Beulich
                     ` (2 more replies)
  2017-08-16 11:22 ` [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers Andrew Cooper
  2017-08-16 11:22 ` [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
  3 siblings, 3 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-08-16 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

x86's current implementation of wmb() is a compiler barrier.  As a result, the
only change in this patch is to remove an mfence instruction from
cpuidle_disable_deep_cstate().

None of these barriers serve any purpose.  Most aren't aren't synchronising
with any remote cpus, where as the mcetelem barriers are redundant with
spin_unlock(), which already has full read/write barrier semantics.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * s/erronious/unnecessary/
 * Drop more unnecessary barriers
---
 xen/arch/x86/acpi/cpu_idle.c             | 2 --
 xen/arch/x86/cpu/mcheck/mce.c            | 3 ---
 xen/arch/x86/cpu/mcheck/mctelem.c        | 3 ---
 xen/arch/x86/crash.c                     | 3 ---
 xen/arch/x86/mm/shadow/multi.c           | 1 -
 xen/arch/x86/smpboot.c                   | 2 --
 xen/drivers/passthrough/amd/iommu_init.c | 2 --
 7 files changed, 16 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 482b8a7..5879ad6 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -1331,8 +1331,6 @@ void cpuidle_disable_deep_cstate(void)
             max_cstate = 1;
     }
 
-    mb();
-
     hpet_disable_legacy_broadcast();
 }
 
diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 30525dd..aa6e556 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -86,7 +86,6 @@ static x86_mce_vector_t _machine_check_vector = unexpected_machine_check;
 void x86_mce_vector_register(x86_mce_vector_t hdlr)
 {
     _machine_check_vector = hdlr;
-    wmb();
 }
 
 /* Call the installed machine check handler for this CPU setup. */
@@ -369,8 +368,6 @@ mcheck_mca_logout(enum mca_source who, struct mca_banks *bankmask,
             mcabank_clear(i);
         else if ( who == MCA_MCE_SCAN && need_clear)
             mcabanks_set(i, clear_bank);
-
-        wmb();
     }
 
     if (mig && errcnt > 0) {
diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c
index b144a66..1731514 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -520,7 +520,6 @@ mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which)
 	}
 
 	mctelem_processing_hold(tep);
-	wmb();
 	spin_unlock(&processing_lock);
 	return MCTE2COOKIE(tep);
 }
@@ -531,7 +530,6 @@ void mctelem_consume_oldest_end(mctelem_cookie_t cookie)
 
 	spin_lock(&processing_lock);
 	mctelem_processing_release(tep);
-	wmb();
 	spin_unlock(&processing_lock);
 }
 
@@ -547,6 +545,5 @@ void mctelem_ack(mctelem_class_t which, mctelem_cookie_t cookie)
 	spin_lock(&processing_lock);
 	if (tep == mctctl.mctc_processing_head[target])
 		mctelem_processing_release(tep);
-	wmb();
 	spin_unlock(&processing_lock);
 }
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index 82535c4..8d74258 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -146,9 +146,6 @@ static void nmi_shootdown_cpus(void)
     write_atomic((unsigned long *)__va(__pa(&exception_table[TRAP_nmi])),
                  (unsigned long)&do_nmi_crash);
 
-    /* Ensure the new callback function is set before sending out the NMI. */
-    wmb();
-
     smp_send_nmi_allbutself();
 
     msecs = 1000; /* Wait at most a second for the other cpus to stop */
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index c9c2252..1e3dfaf 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
      * will make sure no inconsistent mapping being translated into
      * shadow page table. */
     version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
-    rmb();
     walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 8d91f6c..5b094b4 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -355,7 +355,6 @@ void start_secondary(void *unused)
     spin_debug_enable();
     set_cpu_sibling_map(cpu);
     notify_cpu_starting(cpu);
-    wmb();
 
     /*
      * We need to hold vector_lock so there the set of online cpus
@@ -371,7 +370,6 @@ void start_secondary(void *unused)
     local_irq_enable();
     mtrr_ap_init();
 
-    wmb();
     startup_cpu_idle_loop();
 }
 
diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index a459e99..d5b6049 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
             return;
         }
         udelay(1);
-        rmb();
         code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
                                       IOMMU_EVENT_CODE_SHIFT);
     }
@@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
             return;
         }
         udelay(1);
-        rmb();
         code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
                                       IOMMU_PPR_LOG_CODE_SHIFT);
     }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers
  2017-08-16 11:22 [PATCH v2 0/4] x86: Corrections to barrier usage Andrew Cooper
  2017-08-16 11:22 ` [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal Andrew Cooper
  2017-08-16 11:22 ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Andrew Cooper
@ 2017-08-16 11:22 ` Andrew Cooper
  2017-08-16 15:42   ` Dario Faggioli
  2017-08-17  8:37   ` Jan Beulich
  2017-08-16 11:22 ` [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
  3 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-08-16 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

There is no functional change.  Xen currently assignes smp_* meaning to
the non-smp_* barriers.

All of these uses are just to deal with shared memory between multiple
processors, so use the smp_*() which are the correct barriers for the purpose.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Drop more unnecessary barriers, rather than converting them to smp
---
 xen/arch/x86/acpi/cpu_idle.c      |  8 ++++----
 xen/arch/x86/cpu/mcheck/barrier.c | 10 +++++-----
 xen/arch/x86/cpu/mcheck/mctelem.c |  4 ++--
 xen/arch/x86/genapic/x2apic.c     |  6 +++---
 xen/arch/x86/hpet.c               |  2 +-
 xen/arch/x86/hvm/ioreq.c          |  4 ++--
 xen/arch/x86/irq.c                |  4 ++--
 xen/arch/x86/smpboot.c            | 12 ++++++------
 xen/arch/x86/time.c               |  8 ++++----
 xen/include/asm-x86/desc.h        |  8 ++++----
 xen/include/asm-x86/system.h      |  2 +-
 11 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c
index 5879ad6..dea834c 100644
--- a/xen/arch/x86/acpi/cpu_idle.c
+++ b/xen/arch/x86/acpi/cpu_idle.c
@@ -390,9 +390,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
 
     if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
     {
-        mb();
+        smp_mb();
         clflush((void *)&mwait_wakeup(cpu));
-        mb();
+        smp_mb();
     }
 
     __monitor((void *)&mwait_wakeup(cpu), 0, 0);
@@ -755,10 +755,10 @@ void acpi_dead_idle(void)
              * instruction, hence memory fence is necessary to make sure all 
              * load/store visible before flush cache line.
              */
-            mb();
+            smp_mb();
             clflush(mwait_ptr);
             __monitor(mwait_ptr, 0, 0);
-            mb();
+            smp_mb();
             __mwait(cx->address, 0);
         }
     }
diff --git a/xen/arch/x86/cpu/mcheck/barrier.c b/xen/arch/x86/cpu/mcheck/barrier.c
index 7de8e45..a7e5b19 100644
--- a/xen/arch/x86/cpu/mcheck/barrier.c
+++ b/xen/arch/x86/cpu/mcheck/barrier.c
@@ -12,7 +12,7 @@ void mce_barrier_init(struct mce_softirq_barrier *bar)
 void mce_barrier_dec(struct mce_softirq_barrier *bar)
 {
     atomic_inc(&bar->outgen);
-    wmb();
+    smp_wmb();
     atomic_dec(&bar->val);
 }
 
@@ -24,12 +24,12 @@ void mce_barrier_enter(struct mce_softirq_barrier *bar, bool wait)
         return;
     atomic_inc(&bar->ingen);
     gen = atomic_read(&bar->outgen);
-    mb();
+    smp_mb();
     atomic_inc(&bar->val);
     while ( atomic_read(&bar->val) != num_online_cpus() &&
             atomic_read(&bar->outgen) == gen )
     {
-            mb();
+            smp_mb();
             mce_panic_check();
     }
 }
@@ -42,12 +42,12 @@ void mce_barrier_exit(struct mce_softirq_barrier *bar, bool wait)
         return;
     atomic_inc(&bar->outgen);
     gen = atomic_read(&bar->ingen);
-    mb();
+    smp_mb();
     atomic_dec(&bar->val);
     while ( atomic_read(&bar->val) != 0 &&
             atomic_read(&bar->ingen) == gen )
     {
-            mb();
+            smp_mb();
             mce_panic_check();
     }
 }
diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c b/xen/arch/x86/cpu/mcheck/mctelem.c
index 1731514..b071dc8 100644
--- a/xen/arch/x86/cpu/mcheck/mctelem.c
+++ b/xen/arch/x86/cpu/mcheck/mctelem.c
@@ -501,9 +501,9 @@ static void mctelem_append_processing(mctelem_class_t which)
 		ltep->mcte_prev = *procltp;
 		*procltp = dangling[target];
 	}
-	wmb();
+	smp_wmb();
 	dangling[target] = NULL;
-	wmb();
+	smp_wmb();
 }
 
 mctelem_cookie_t mctelem_consume_oldest_begin(mctelem_class_t which)
diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
index 5fffb31..4779b0d 100644
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -106,12 +106,12 @@ static void send_IPI_mask_x2apic_phys(const cpumask_t *cpumask, int vector)
      * CPU is seen by notified remote CPUs. The WRMSR contained within
      * apic_icr_write() can otherwise be executed early.
      * 
-     * The reason mb() is sufficient here is subtle: the register arguments
+     * The reason smp_mb() is sufficient here is subtle: the register arguments
      * to WRMSR must depend on a memory read executed after the barrier. This
      * is guaranteed by cpu_physical_id(), which reads from a global array (and
      * so cannot be hoisted above the barrier even by a clever compiler).
      */
-    mb();
+    smp_mb();
 
     local_irq_save(flags);
 
@@ -135,7 +135,7 @@ static void send_IPI_mask_x2apic_cluster(const cpumask_t *cpumask, int vector)
     const cpumask_t *cluster_cpus;
     unsigned long flags;
 
-    mb(); /* See above for an explanation. */
+    smp_mb(); /* See above for an explanation. */
 
     local_irq_save(flags);
 
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 8229c63..bc7a851 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -608,7 +608,7 @@ void __init hpet_broadcast_init(void)
         hpet_events[i].shift = 32;
         hpet_events[i].next_event = STIME_MAX;
         spin_lock_init(&hpet_events[i].lock);
-        wmb();
+        smp_wmb();
         hpet_events[i].event_handler = handle_hpet_broadcast;
 
         hpet_events[i].msi.msi_attrib.maskbit = 1;
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index b2a8b0e..e9851f6 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -91,7 +91,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
     {
         unsigned int state = p->state;
 
-        rmb();
+        smp_rmb();
         switch ( state )
         {
         case STATE_IOREQ_NONE:
@@ -1327,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
     }
 
     /* Make the ioreq_t visible /before/ write_pointer. */
-    wmb();
+    smp_wmb();
     pg->ptrs.write_pointer += qw ? 2 : 1;
 
     /* Canonicalize read/write pointers to prevent their overflow. */
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 57e6c18..ee9afd8 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -759,9 +759,9 @@ void irq_set_affinity(struct irq_desc *desc, const cpumask_t *mask)
     
     ASSERT(spin_is_locked(&desc->lock));
     desc->status &= ~IRQ_MOVE_PENDING;
-    wmb();
+    smp_wmb();
     cpumask_copy(desc->arch.pending_mask, mask);
-    wmb();
+    smp_wmb();
     desc->status |= IRQ_MOVE_PENDING;
 }
 
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 5b094b4..ee17f6d 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -79,7 +79,7 @@ static enum cpu_state {
     CPU_STATE_CALLIN,   /* slave -> master: Completed phase 2 */
     CPU_STATE_ONLINE    /* master -> slave: Go fully online now. */
 } cpu_state;
-#define set_cpu_state(state) do { mb(); cpu_state = (state); } while (0)
+#define set_cpu_state(state) do { smp_mb(); cpu_state = (state); } while (0)
 
 void *stack_base[NR_CPUS];
 
@@ -126,7 +126,7 @@ static void synchronize_tsc_master(unsigned int slave)
     for ( i = 1; i <= 5; i++ )
     {
         tsc_value = rdtsc_ordered();
-        wmb();
+        smp_wmb();
         atomic_inc(&tsc_count);
         while ( atomic_read(&tsc_count) != (i<<1) )
             cpu_relax();
@@ -151,7 +151,7 @@ static void synchronize_tsc_slave(unsigned int slave)
     {
         while ( atomic_read(&tsc_count) != ((i<<1)-1) )
             cpu_relax();
-        rmb();
+        smp_rmb();
         /*
          * If a CPU has been physically hotplugged, we may as well write
          * to its TSC in spite of X86_FEATURE_TSC_RELIABLE. The platform does
@@ -553,13 +553,13 @@ static int do_boot_cpu(int apicid, int cpu)
         }
         else if ( cpu_state == CPU_STATE_DEAD )
         {
-            rmb();
+            smp_rmb();
             rc = cpu_error;
         }
         else
         {
             boot_error = 1;
-            mb();
+            smp_mb();
             if ( bootsym(trampoline_cpu_started) == 0xA5 )
                 /* trampoline started but...? */
                 printk("Stuck ??\n");
@@ -577,7 +577,7 @@ static int do_boot_cpu(int apicid, int cpu)
 
     /* mark "stuck" area as not stuck */
     bootsym(trampoline_cpu_started) = 0;
-    mb();
+    smp_mb();
 
     smpboot_restore_warm_reset_vector();
 
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index b988b94..a7d7d77 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -976,10 +976,10 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
 
     /* 1. Update guest kernel version. */
     _u.version = u->version = version_update_begin(u->version);
-    wmb();
+    smp_wmb();
     /* 2. Update all other guest kernel fields. */
     *u = _u;
-    wmb();
+    smp_wmb();
     /* 3. Update guest kernel version. */
     u->version = version_update_end(u->version);
 
@@ -1006,10 +1006,10 @@ bool update_secondary_system_time(struct vcpu *v,
         update_guest_memory_policy(v, &policy);
         return false;
     }
-    wmb();
+    smp_wmb();
     /* 2. Update all other userspace fields. */
     __copy_to_guest(user_u, u, 1);
-    wmb();
+    smp_wmb();
     /* 3. Update userspace version. */
     u->version = version_update_end(u->version);
     __copy_field_to_guest(user_u, u, version);
diff --git a/xen/include/asm-x86/desc.h b/xen/include/asm-x86/desc.h
index da924bf..9956aae 100644
--- a/xen/include/asm-x86/desc.h
+++ b/xen/include/asm-x86/desc.h
@@ -128,10 +128,10 @@ static inline void _write_gate_lower(volatile idt_entry_t *gate,
 #define _set_gate(gate_addr,type,dpl,addr)               \
 do {                                                     \
     (gate_addr)->a = 0;                                  \
-    wmb(); /* disable gate /then/ rewrite */             \
+    smp_wmb(); /* disable gate /then/ rewrite */         \
     (gate_addr)->b =                                     \
         ((unsigned long)(addr) >> 32);                   \
-    wmb(); /* rewrite /then/ enable gate */              \
+    smp_wmb(); /* rewrite /then/ enable gate */          \
     (gate_addr)->a =                                     \
         (((unsigned long)(addr) & 0xFFFF0000UL) << 32) | \
         ((unsigned long)(dpl) << 45) |                   \
@@ -174,11 +174,11 @@ static inline void _update_gate_addr_lower(idt_entry_t *gate, void *addr)
 #define _set_tssldt_desc(desc,addr,limit,type)           \
 do {                                                     \
     (desc)[0].b = (desc)[1].b = 0;                       \
-    wmb(); /* disable entry /then/ rewrite */            \
+    smp_wmb(); /* disable entry /then/ rewrite */        \
     (desc)[0].a =                                        \
         ((u32)(addr) << 16) | ((u32)(limit) & 0xFFFF);   \
     (desc)[1].a = (u32)(((unsigned long)(addr)) >> 32);  \
-    wmb(); /* rewrite /then/ enable entry */             \
+    smp_wmb(); /* rewrite /then/ enable entry */         \
     (desc)[0].b =                                        \
         ((u32)(addr) & 0xFF000000U) |                    \
         ((u32)(type) << 8) | 0x8000U |                   \
diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index eb498f5..9cb6fd7 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -183,7 +183,7 @@ static always_inline unsigned long __xadd(
 #define smp_wmb()       wmb()
 
 #define set_mb(var, value) do { xchg(&var, value); } while (0)
-#define set_wmb(var, value) do { var = value; wmb(); } while (0)
+#define set_wmb(var, value) do { var = value; smp_wmb(); } while (0)
 
 #define local_irq_disable()     asm volatile ( "cli" : : : "memory" )
 #define local_irq_enable()      asm volatile ( "sti" : : : "memory" )
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2017-08-16 11:22 [PATCH v2 0/4] x86: Corrections to barrier usage Andrew Cooper
                   ` (2 preceding siblings ...)
  2017-08-16 11:22 ` [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers Andrew Cooper
@ 2017-08-16 11:22 ` Andrew Cooper
  2017-08-16 15:44   ` Dario Faggioli
  2017-08-17  8:41   ` Jan Beulich
  3 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-08-16 11:22 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

Barriers are a complicated topic, a source of confusion, and their incorrect
use is a common cause of bugs.  It *really* doesn't help when Xen's API is the
same as Linux, but its ABI different.

Bring the two back in line, so programmers stand a chance of actually getting
their usage correct.

Drop the links in the comment, both of which are now stale.  Instead, refer to
the vendor system manuals.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>

v2:
 * Keep mandatory barrier definitions
 * Drop stale documentation links
---
 xen/include/asm-x86/system.h        | 28 ++++++++++++++++------------
 xen/include/asm-x86/x86_64/system.h |  3 ---
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/xen/include/asm-x86/system.h b/xen/include/asm-x86/system.h
index 9cb6fd7..3d21291 100644
--- a/xen/include/asm-x86/system.h
+++ b/xen/include/asm-x86/system.h
@@ -164,23 +164,27 @@ static always_inline unsigned long __xadd(
     ((typeof(*(ptr)))__xadd(ptr, (typeof(*(ptr)))(v), sizeof(*(ptr))))
 
 /*
+ * Mandatory barriers, for enforced ordering of reads and writes, e.g. for use
+ * with MMIO devices mapped with reduced cacheability.
+ */
+#define mb()            asm volatile ("mfence" ::: "memory")
+#define rmb()           asm volatile ("lfence" ::: "memory")
+#define wmb()           asm volatile ("sfence" ::: "memory")
+
+/*
+ * SMP barriers, for ordering of reads and writes between CPUs, most commonly
+ * used with shared memory.
+ *
  * Both Intel and AMD agree that, from a programmer's viewpoint:
  *  Loads cannot be reordered relative to other loads.
  *  Stores cannot be reordered relative to other stores.
- * 
- * Intel64 Architecture Memory Ordering White Paper
- * <http://developer.intel.com/products/processor/manuals/318147.pdf>
- * 
- * AMD64 Architecture Programmer's Manual, Volume 2: System Programming
- * <http://www.amd.com/us-en/assets/content_type/\
- *  white_papers_and_tech_docs/24593.pdf>
+ *  Loads may be reordered ahead of an unaliasing store.
+ *
+ * Refer to the vendor system programming manuals for further details.
  */
-#define rmb()           barrier()
-#define wmb()           barrier()
-
 #define smp_mb()        mb()
-#define smp_rmb()       rmb()
-#define smp_wmb()       wmb()
+#define smp_rmb()       barrier()
+#define smp_wmb()       barrier()
 
 #define set_mb(var, value) do { xchg(&var, value); } while (0)
 #define set_wmb(var, value) do { var = value; smp_wmb(); } while (0)
diff --git a/xen/include/asm-x86/x86_64/system.h b/xen/include/asm-x86/x86_64/system.h
index 88beae1..6b56761 100644
--- a/xen/include/asm-x86/x86_64/system.h
+++ b/xen/include/asm-x86/x86_64/system.h
@@ -80,7 +80,4 @@ static always_inline __uint128_t __cmpxchg16b(
     _rc;                                                                \
 })
 
-#define mb()                    \
-    asm volatile ( "mfence" : : : "memory" )
-
 #endif /* __X86_64_SYSTEM_H__ */
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal
  2017-08-16 11:22 ` [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal Andrew Cooper
@ 2017-08-16 15:11   ` Jan Beulich
  2017-08-18 13:19   ` Tim Deegan
  1 sibling, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-08-16 15:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

  >>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
> * Drop trailing whitespace.
>   * Move amd_nonfatal_mcheck_init() into .init.text and drop a trailing return.
>   * Drop unnecessary wmb()'s.  Because of Xen's implementation, they are only
>     compiler barriers anyway, and each wrmsr() is already fully serialising.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-16 11:22 ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Andrew Cooper
@ 2017-08-16 15:23   ` Jan Beulich
  2017-08-16 16:47     ` Andrew Cooper
  2017-08-16 17:18   ` [PATCH v2 2.5/4] xen/x86: Replace mandatory barriers with compiler barriers Andrew Cooper
  2017-08-18 13:55   ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Tim Deegan
  2 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-08-16 15:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
> x86's current implementation of wmb() is a compiler barrier.  As a result, the
> only change in this patch is to remove an mfence instruction from
> cpuidle_disable_deep_cstate().
> 
> None of these barriers serve any purpose.  Most aren't aren't synchronising
> with any remote cpus, where as the mcetelem barriers are redundant with
> spin_unlock(), which already has full read/write barrier semantics.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

For the relevant parts
Acked-by: Jan Beulich <jbeulich@suse.com>
For the parts the ack doesn't extend to, however:

> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>       * will make sure no inconsistent mapping being translated into
>       * shadow page table. */
>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> -    rmb();
>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);

Isn't this supposed to make sure version is being read first? I.e.
doesn't this at least need to be barrier()?

> index a459e99..d5b6049 100644
> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>              return;
>          }
>          udelay(1);
> -        rmb();
>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>                                        IOMMU_EVENT_CODE_SHIFT);
>      }
> @@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
>              return;
>          }
>          udelay(1);
> -        rmb();
>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>      }

With these fully removed, what keeps the compiler from moving
the entry[1] reads out of the loop? Implementation details of
udelay() don't count...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers
  2017-08-16 11:22 ` [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers Andrew Cooper
@ 2017-08-16 15:42   ` Dario Faggioli
  2017-08-17  8:37   ` Jan Beulich
  1 sibling, 0 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-08-16 15:42 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 813 bytes --]

On Wed, 2017-08-16 at 12:22 +0100, Andrew Cooper wrote:
> There is no functional change.  Xen currently assignes smp_* meaning
> to
> the non-smp_* barriers.
> 
> All of these uses are just to deal with shared memory between
> multiple
> processors, so use the smp_*() which are the correct barriers for the
> purpose.
> 
FWIW, I had to deal with barriers recently, and this is much
appreciated! :-)

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2017-08-16 11:22 ` [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
@ 2017-08-16 15:44   ` Dario Faggioli
  2017-08-17  8:41   ` Jan Beulich
  1 sibling, 0 replies; 28+ messages in thread
From: Dario Faggioli @ 2017-08-16 15:44 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1053 bytes --]

On Wed, 2017-08-16 at 12:22 +0100, Andrew Cooper wrote:
> Barriers are a complicated topic, a source of confusion, and their
> incorrect
> use is a common cause of bugs.  It *really* doesn't help when Xen's
> API is the
> same as Linux, but its ABI different.
> 
> Bring the two back in line, so programmers stand a chance of actually
> getting
> their usage correct.
> 
> Drop the links in the comment, both of which are now stale.  Instead,
> refer to
> the vendor system manuals.
> 
Does it perhaps make sense to link this:

https://www.kernel.org/doc/Documentation/memory-barriers.txt

> No functional change.
> 
IAC, FWIW:

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-16 15:23   ` Jan Beulich
@ 2017-08-16 16:47     ` Andrew Cooper
  2017-08-16 17:03       ` Andrew Cooper
  2017-08-17  7:48       ` Jan Beulich
  0 siblings, 2 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-08-16 16:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 16/08/17 16:23, Jan Beulich wrote:
>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
>> x86's current implementation of wmb() is a compiler barrier.  As a result, the
>> only change in this patch is to remove an mfence instruction from
>> cpuidle_disable_deep_cstate().
>>
>> None of these barriers serve any purpose.  Most aren't aren't synchronising
>> with any remote cpus, where as the mcetelem barriers are redundant with
>> spin_unlock(), which already has full read/write barrier semantics.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> For the relevant parts
> Acked-by: Jan Beulich <jbeulich@suse.com>
> For the parts the ack doesn't extend to, however:
>
>> --- a/xen/arch/x86/mm/shadow/multi.c
>> +++ b/xen/arch/x86/mm/shadow/multi.c
>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>>       * will make sure no inconsistent mapping being translated into
>>       * shadow page table. */
>>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
>> -    rmb();
>>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
> Isn't this supposed to make sure version is being read first? I.e.
> doesn't this at least need to be barrier()?

atomic_read() is not free to be reordered by the compiler.  It is an asm
volatile with a volatile memory reference.

>
>> index a459e99..d5b6049 100644
>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>> @@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>>              return;
>>          }
>>          udelay(1);
>> -        rmb();
>>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>>                                        IOMMU_EVENT_CODE_SHIFT);
>>      }
>> @@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
>>              return;
>>          }
>>          udelay(1);
>> -        rmb();
>>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>>      }
> With these fully removed, what keeps the compiler from moving
> the entry[1] reads out of the loop? Implementation details of
> udelay() don't count...

It is a write to the control variable which is derived from a non-local
non-constant object.  It can't be hoisted at all.

Consider this simplified version:

while ( count == 0 )
    count = entry[1];

If entry were const, the compiler would be free to expect that the value
doesn't change on repeated reads, but that is not the case here.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-16 16:47     ` Andrew Cooper
@ 2017-08-16 17:03       ` Andrew Cooper
  2017-08-17  7:50         ` Jan Beulich
  2017-08-17  7:48       ` Jan Beulich
  1 sibling, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-08-16 17:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 16/08/17 17:47, Andrew Cooper wrote:
> On 16/08/17 16:23, Jan Beulich wrote:
>>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
>>> x86's current implementation of wmb() is a compiler barrier.  As a result, the
>>> only change in this patch is to remove an mfence instruction from
>>> cpuidle_disable_deep_cstate().
>>>
>>> None of these barriers serve any purpose.  Most aren't aren't synchronising
>>> with any remote cpus, where as the mcetelem barriers are redundant with
>>> spin_unlock(), which already has full read/write barrier semantics.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> For the relevant parts
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>> For the parts the ack doesn't extend to, however:
>>
>>> --- a/xen/arch/x86/mm/shadow/multi.c
>>> +++ b/xen/arch/x86/mm/shadow/multi.c
>>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>>>       * will make sure no inconsistent mapping being translated into
>>>       * shadow page table. */
>>>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
>>> -    rmb();
>>>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
>> Isn't this supposed to make sure version is being read first? I.e.
>> doesn't this at least need to be barrier()?
> atomic_read() is not free to be reordered by the compiler.  It is an asm
> volatile with a volatile memory reference.
>
>>> index a459e99..d5b6049 100644
>>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>>> @@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>>>              return;
>>>          }
>>>          udelay(1);
>>> -        rmb();
>>>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>>>                                        IOMMU_EVENT_CODE_SHIFT);
>>>      }
>>> @@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
>>>              return;
>>>          }
>>>          udelay(1);
>>> -        rmb();
>>>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>>>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>>>      }
>> With these fully removed, what keeps the compiler from moving
>> the entry[1] reads out of the loop? Implementation details of
>> udelay() don't count...
> It is a write to the control variable which is derived from a non-local
> non-constant object.  It can't be hoisted at all.
>
> Consider this simplified version:
>
> while ( count == 0 )
>     count = entry[1];
>
> If entry were const, the compiler would be free to expect that the value
> doesn't change on repeated reads, but that is not the case here.

(And continuing my run of luck today), it turns out that GCC does
compile my example here to an infinite loop.

ffff82d08026025f:       84 c0                   test   %al,%al
ffff82d080260261:       75 0a                   jne    ffff82d08026026d <parse_ppr_log_entry+0x29>
ffff82d080260263:       8b 46 04                mov    0x4(%rsi),%eax
ffff82d080260266:       c1 e8 1c                shr    $0x1c,%eax
ffff82d080260269:       84 c0                   test   %al,%al
ffff82d08026026b:       74 fc                   je     ffff82d080260269 <parse_ppr_log_entry+0x25>


I will move this to being a barrer() with a hoisting comment (to avoid
it looking like an SMP issue), and I'm going to have to re-evaluate how
sane I think the C standard to be.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 2.5/4] xen/x86: Replace mandatory barriers with compiler barriers
  2017-08-16 11:22 ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Andrew Cooper
  2017-08-16 15:23   ` Jan Beulich
@ 2017-08-16 17:18   ` Andrew Cooper
  2017-08-17  8:15     ` Jan Beulich
  2017-08-18 13:55   ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Tim Deegan
  2 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-08-16 17:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich

In this case, rmb() is being used for its compiler barrier property.  Replace
it with an explicit barrer() and comment, to avoid it becoming an unnecessary
lfence instruction (when rmb() gets fixed) or looking like an SMP issue.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
---
 xen/drivers/passthrough/amd/iommu_init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
index a459e99..474992a 100644
--- a/xen/drivers/passthrough/amd/iommu_init.c
+++ b/xen/drivers/passthrough/amd/iommu_init.c
@@ -558,7 +558,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
             return;
         }
         udelay(1);
-        rmb();
+        barrier(); /* Prevent hoisting of the entry[] read. */
         code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
                                       IOMMU_EVENT_CODE_SHIFT);
     }
@@ -663,7 +663,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
             return;
         }
         udelay(1);
-        rmb();
+        barrier(); /* Prevent hoisting of the entry[] read. */
         code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
                                       IOMMU_PPR_LOG_CODE_SHIFT);
     }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-16 16:47     ` Andrew Cooper
  2017-08-16 17:03       ` Andrew Cooper
@ 2017-08-17  7:48       ` Jan Beulich
  2017-08-18 14:47         ` Tim Deegan
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-08-17  7:48 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 16.08.17 at 18:47, <andrew.cooper3@citrix.com> wrote:
> On 16/08/17 16:23, Jan Beulich wrote:
>>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/mm/shadow/multi.c
>>> +++ b/xen/arch/x86/mm/shadow/multi.c
>>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>>>       * will make sure no inconsistent mapping being translated into
>>>       * shadow page table. */
>>>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
>>> -    rmb();
>>>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
>> Isn't this supposed to make sure version is being read first? I.e.
>> doesn't this at least need to be barrier()?
> 
> atomic_read() is not free to be reordered by the compiler.  It is an asm
> volatile with a volatile memory reference.

Oh, right - I did forget about the volatiles there (since generally,
like in Linux, we appear to try to avoid volatile).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-16 17:03       ` Andrew Cooper
@ 2017-08-17  7:50         ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-08-17  7:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 16.08.17 at 19:03, <andrew.cooper3@citrix.com> wrote:
> On 16/08/17 17:47, Andrew Cooper wrote:
>> On 16/08/17 16:23, Jan Beulich wrote:
>>>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/drivers/passthrough/amd/iommu_init.c
>>>> +++ b/xen/drivers/passthrough/amd/iommu_init.c
>>>> @@ -558,7 +558,6 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>>>>              return;
>>>>          }
>>>>          udelay(1);
>>>> -        rmb();
>>>>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>>>>                                        IOMMU_EVENT_CODE_SHIFT);
>>>>      }
>>>> @@ -663,7 +662,6 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
>>>>              return;
>>>>          }
>>>>          udelay(1);
>>>> -        rmb();
>>>>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>>>>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>>>>      }
>>> With these fully removed, what keeps the compiler from moving
>>> the entry[1] reads out of the loop? Implementation details of
>>> udelay() don't count...
>> It is a write to the control variable which is derived from a non-local
>> non-constant object.  It can't be hoisted at all.
>>
>> Consider this simplified version:
>>
>> while ( count == 0 )
>>     count = entry[1];
>>
>> If entry were const, the compiler would be free to expect that the value
>> doesn't change on repeated reads, but that is not the case here.
> 
> (And continuing my run of luck today), it turns out that GCC does
> compile my example here to an infinite loop.
> 
> ffff82d08026025f:       84 c0                   test   %al,%al
> ffff82d080260261:       75 0a                   jne    ffff82d08026026d <parse_ppr_log_entry+0x29>
> ffff82d080260263:       8b 46 04                mov    0x4(%rsi),%eax
> ffff82d080260266:       c1 e8 1c                shr    $0x1c,%eax
> ffff82d080260269:       84 c0                   test   %al,%al
> ffff82d08026026b:       74 fc                   je     ffff82d080260269 <parse_ppr_log_entry+0x25>
> 
> 
> I will move this to being a barrer() with a hoisting comment (to avoid
> it looking like an SMP issue), and I'm going to have to re-evaluate how
> sane I think the C standard to be.

Well, as always, the standard assumes just a single thread (i.e.
const-ness doesn't matter at all here).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2.5/4] xen/x86: Replace mandatory barriers with compiler barriers
  2017-08-16 17:18   ` [PATCH v2 2.5/4] xen/x86: Replace mandatory barriers with compiler barriers Andrew Cooper
@ 2017-08-17  8:15     ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-08-17  8:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Suravee Suthikulpanit, Xen-devel

>>> On 16.08.17 at 19:18, <andrew.cooper3@citrix.com> wrote:
> In this case, rmb() is being used for its compiler barrier property.  Replace
> it with an explicit barrer() and comment, to avoid it becoming an 
> unnecessary
> lfence instruction (when rmb() gets fixed) or looking like an SMP issue.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

But you forgot to Cc Suravee (now done), for him to have a chance
to ack the change.

Jan

> --- a/xen/drivers/passthrough/amd/iommu_init.c
> +++ b/xen/drivers/passthrough/amd/iommu_init.c
> @@ -558,7 +558,7 @@ static void parse_event_log_entry(struct amd_iommu *iommu, u32 entry[])
>              return;
>          }
>          udelay(1);
> -        rmb();
> +        barrier(); /* Prevent hoisting of the entry[] read. */
>          code = get_field_from_reg_u32(entry[1], IOMMU_EVENT_CODE_MASK,
>                                        IOMMU_EVENT_CODE_SHIFT);
>      }
> @@ -663,7 +663,7 @@ void parse_ppr_log_entry(struct amd_iommu *iommu, u32 entry[])
>              return;
>          }
>          udelay(1);
> -        rmb();
> +        barrier(); /* Prevent hoisting of the entry[] read. */
>          code = get_field_from_reg_u32(entry[1], IOMMU_PPR_LOG_CODE_MASK,
>                                        IOMMU_PPR_LOG_CODE_SHIFT);
>      }
> -- 
> 2.1.4




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers
  2017-08-16 11:22 ` [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers Andrew Cooper
  2017-08-16 15:42   ` Dario Faggioli
@ 2017-08-17  8:37   ` Jan Beulich
  2017-08-17  9:35     ` Andrew Cooper
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-08-17  8:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
> There is no functional change.  Xen currently assignes smp_* meaning to
> the non-smp_* barriers.
> 
> All of these uses are just to deal with shared memory between multiple
> processors, so use the smp_*() which are the correct barriers for the 
> purpose.

Taking this together with ...

> --- a/xen/arch/x86/acpi/cpu_idle.c
> +++ b/xen/arch/x86/acpi/cpu_idle.c
> @@ -390,9 +390,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
>  
>      if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
>      {
> -        mb();
> +        smp_mb();
>          clflush((void *)&mwait_wakeup(cpu));
> -        mb();
> +        smp_mb();
>      }

See commit 48d32458bc ("x86, idle: add barriers to CLFLUSH
workaround") for why these better stay the way they are.

> @@ -755,10 +755,10 @@ void acpi_dead_idle(void)
>               * instruction, hence memory fence is necessary to make sure all 
>               * load/store visible before flush cache line.
>               */
> -            mb();
> +            smp_mb();
>              clflush(mwait_ptr);
>              __monitor(mwait_ptr, 0, 0);
> -            mb();
> +            smp_mb();
>              __mwait(cx->address, 0);

... the comment the tail of which is in context here, I'm rather
surprised you convert these: They're there strictly for
correctness on a single processor (the need for prior memory
accesses to be visible isn't limited to the CPUs in the system).

In both cases, while smp_mb() and mb() are the same, I'd rather
keep the distinction at use sites with the assumption that the
smp_* ones would expand to just barrier() when !CONFIG_SMP (a
configuration we currently simply don't allow). The only alternative
I see would be to open-code the fences.

> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -91,7 +91,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>      {
>          unsigned int state = p->state;
>  
> -        rmb();
> +        smp_rmb();
>          switch ( state )
>          {
>          case STATE_IOREQ_NONE:
> @@ -1327,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
>      }
>  
>      /* Make the ioreq_t visible /before/ write_pointer. */
> -    wmb();
> +    smp_wmb();
>      pg->ptrs.write_pointer += qw ? 2 : 1;

I agree with these changes, but it needs to be clear that their
counterparts cannot be smp_?mb().

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -976,10 +976,10 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>  
>      /* 1. Update guest kernel version. */
>      _u.version = u->version = version_update_begin(u->version);
> -    wmb();
> +    smp_wmb();
>      /* 2. Update all other guest kernel fields. */
>      *u = _u;
> -    wmb();
> +    smp_wmb();
>      /* 3. Update guest kernel version. */
>      u->version = version_update_end(u->version);
>  
> @@ -1006,10 +1006,10 @@ bool update_secondary_system_time(struct vcpu *v,
>          update_guest_memory_policy(v, &policy);
>          return false;
>      }
> -    wmb();
> +    smp_wmb();
>      /* 2. Update all other userspace fields. */
>      __copy_to_guest(user_u, u, 1);
> -    wmb();
> +    smp_wmb();
>      /* 3. Update userspace version. */
>      u->version = version_update_end(u->version);
>      __copy_field_to_guest(user_u, u, version);

Same fore these.

So with the cpu_idle.c changes dropped or replaced by open-coded
fences
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions
  2017-08-16 11:22 ` [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
  2017-08-16 15:44   ` Dario Faggioli
@ 2017-08-17  8:41   ` Jan Beulich
  1 sibling, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-08-17  8:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
> Barriers are a complicated topic, a source of confusion, and their incorrect
> use is a common cause of bugs.  It *really* doesn't help when Xen's API is the
> same as Linux, but its ABI different.
> 
> Bring the two back in line, so programmers stand a chance of actually getting
> their usage correct.
> 
> Drop the links in the comment, both of which are now stale.  Instead, refer to
> the vendor system manuals.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one remark:

> --- a/xen/include/asm-x86/system.h
> +++ b/xen/include/asm-x86/system.h
> @@ -164,23 +164,27 @@ static always_inline unsigned long __xadd(
>      ((typeof(*(ptr)))__xadd(ptr, (typeof(*(ptr)))(v), sizeof(*(ptr))))
>  
>  /*
> + * Mandatory barriers, for enforced ordering of reads and writes, e.g. for use
> + * with MMIO devices mapped with reduced cacheability.
> + */
> +#define mb()            asm volatile ("mfence" ::: "memory")
> +#define rmb()           asm volatile ("lfence" ::: "memory")
> +#define wmb()           asm volatile ("sfence" ::: "memory")
> +
> +/*
> + * SMP barriers, for ordering of reads and writes between CPUs, most commonly
> + * used with shared memory.
> + *
>   * Both Intel and AMD agree that, from a programmer's viewpoint:
>   *  Loads cannot be reordered relative to other loads.
>   *  Stores cannot be reordered relative to other stores.
> - * 
> - * Intel64 Architecture Memory Ordering White Paper
> - * <http://developer.intel.com/products/processor/manuals/318147.pdf>
> - * 
> - * AMD64 Architecture Programmer's Manual, Volume 2: System Programming
> - * <http://www.amd.com/us-en/assets/content_type/\ 
> - *  white_papers_and_tech_docs/24593.pdf>
> + *  Loads may be reordered ahead of an unaliasing store.

For consistency with the other two sentences, perhaps use plural
at the end of the sentence too?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers
  2017-08-17  8:37   ` Jan Beulich
@ 2017-08-17  9:35     ` Andrew Cooper
  2017-08-17 10:01       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Cooper @ 2017-08-17  9:35 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Xen-devel

On 17/08/17 09:37, Jan Beulich wrote:
>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
>> There is no functional change.  Xen currently assignes smp_* meaning to
>> the non-smp_* barriers.
>>
>> All of these uses are just to deal with shared memory between multiple
>> processors, so use the smp_*() which are the correct barriers for the 
>> purpose.
> Taking this together with ...
>
>> --- a/xen/arch/x86/acpi/cpu_idle.c
>> +++ b/xen/arch/x86/acpi/cpu_idle.c
>> @@ -390,9 +390,9 @@ void mwait_idle_with_hints(unsigned int eax, unsigned int ecx)
>>  
>>      if ( boot_cpu_has(X86_FEATURE_CLFLUSH_MONITOR) )
>>      {
>> -        mb();
>> +        smp_mb();
>>          clflush((void *)&mwait_wakeup(cpu));
>> -        mb();
>> +        smp_mb();
>>      }
> See commit 48d32458bc ("x86, idle: add barriers to CLFLUSH
> workaround") for why these better stay the way they are.
>
>> @@ -755,10 +755,10 @@ void acpi_dead_idle(void)
>>               * instruction, hence memory fence is necessary to make sure all 
>>               * load/store visible before flush cache line.
>>               */
>> -            mb();
>> +            smp_mb();
>>              clflush(mwait_ptr);
>>              __monitor(mwait_ptr, 0, 0);
>> -            mb();
>> +            smp_mb();
>>              __mwait(cx->address, 0);
> ... the comment the tail of which is in context here, I'm rather
> surprised you convert these: They're there strictly for
> correctness on a single processor (the need for prior memory
> accesses to be visible isn't limited to the CPUs in the system).
>
> In both cases, while smp_mb() and mb() are the same, I'd rather
> keep the distinction at use sites with the assumption that the
> smp_* ones would expand to just barrier() when !CONFIG_SMP (a
> configuration we currently simply don't allow). The only alternative
> I see would be to open-code the fences.

Yeah - in hindsight they should logically stay as mb() (even as you say,
there is no change).

>
>> --- a/xen/arch/x86/hvm/ioreq.c
>> +++ b/xen/arch/x86/hvm/ioreq.c
>> @@ -91,7 +91,7 @@ static bool_t hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>>      {
>>          unsigned int state = p->state;
>>  
>> -        rmb();
>> +        smp_rmb();
>>          switch ( state )
>>          {
>>          case STATE_IOREQ_NONE:
>> @@ -1327,7 +1327,7 @@ static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
>>      }
>>  
>>      /* Make the ioreq_t visible /before/ write_pointer. */
>> -    wmb();
>> +    smp_wmb();
>>      pg->ptrs.write_pointer += qw ? 2 : 1;
> I agree with these changes, but it needs to be clear that their
> counterparts cannot be smp_?mb().
>
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -976,10 +976,10 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>>  
>>      /* 1. Update guest kernel version. */
>>      _u.version = u->version = version_update_begin(u->version);
>> -    wmb();
>> +    smp_wmb();
>>      /* 2. Update all other guest kernel fields. */
>>      *u = _u;
>> -    wmb();
>> +    smp_wmb();
>>      /* 3. Update guest kernel version. */
>>      u->version = version_update_end(u->version);
>>  
>> @@ -1006,10 +1006,10 @@ bool update_secondary_system_time(struct vcpu *v,
>>          update_guest_memory_policy(v, &policy);
>>          return false;
>>      }
>> -    wmb();
>> +    smp_wmb();
>>      /* 2. Update all other userspace fields. */
>>      __copy_to_guest(user_u, u, 1);
>> -    wmb();
>> +    smp_wmb();
>>      /* 3. Update userspace version. */
>>      u->version = version_update_end(u->version);
>>      __copy_field_to_guest(user_u, u, version);
> Same fore these.

Why?  The guest side of this protocol is just reads.

Irrespective, how do you suggest I make things more clear?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers
  2017-08-17  9:35     ` Andrew Cooper
@ 2017-08-17 10:01       ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2017-08-17 10:01 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Xen-devel

>>> On 17.08.17 at 11:35, <andrew.cooper3@citrix.com> wrote:
> On 17/08/17 09:37, Jan Beulich wrote:
>>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/time.c
>>> +++ b/xen/arch/x86/time.c
>>> @@ -976,10 +976,10 @@ static void __update_vcpu_system_time(struct vcpu *v, int force)
>>>  
>>>      /* 1. Update guest kernel version. */
>>>      _u.version = u->version = version_update_begin(u->version);
>>> -    wmb();
>>> +    smp_wmb();
>>>      /* 2. Update all other guest kernel fields. */
>>>      *u = _u;
>>> -    wmb();
>>> +    smp_wmb();
>>>      /* 3. Update guest kernel version. */
>>>      u->version = version_update_end(u->version);
>>>  
>>> @@ -1006,10 +1006,10 @@ bool update_secondary_system_time(struct vcpu *v,
>>>          update_guest_memory_policy(v, &policy);
>>>          return false;
>>>      }
>>> -    wmb();
>>> +    smp_wmb();
>>>      /* 2. Update all other userspace fields. */
>>>      __copy_to_guest(user_u, u, 1);
>>> -    wmb();
>>> +    smp_wmb();
>>>      /* 3. Update userspace version. */
>>>      u->version = version_update_end(u->version);
>>>      __copy_field_to_guest(user_u, u, version);
>> Same fore these.
> 
> Why?  The guest side of this protocol is just reads.

As always (and as you keep stressing) barriers make sense almost
exclusively when they're being used on both sides. This applies
here too. It's just that even a non-SMP consumer would need
barriers; that's along the lines of why Linux has gained virt_mb().

> Irrespective, how do you suggest I make things more clear?

Well, it was more a remark than a request for you to change
anything. I agree there's little point of adding a comment on the
hypervisor side of things.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal
  2017-08-16 11:22 ` [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal Andrew Cooper
  2017-08-16 15:11   ` Jan Beulich
@ 2017-08-18 13:19   ` Tim Deegan
  1 sibling, 0 replies; 28+ messages in thread
From: Tim Deegan @ 2017-08-18 13:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

At 12:22 +0100 on 16 Aug (1502886127), Andrew Cooper wrote:
>   * Drop trailing whitespace.
>   * Move amd_nonfatal_mcheck_init() into .init.text and drop a trailing return.
>   * Drop unnecessary wmb()'s.  Because of Xen's implementation, they are only
>     compiler barriers anyway, and each wrmsr() is already fully serialising.

But wrmsr() is not a compiler barrier!  So if the write-barriers are
needed (e.g. for the update to the global 'adjust') then you can't
remove them just because WRMSR is a CPU barrier.

If they're not needed (which is plausible) then the commit message
should explain that instead.

Nit: I think tinkering with memory barriers deserves its own commit,
not to be the third item in a list of 'minor cleanup's.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-16 11:22 ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Andrew Cooper
  2017-08-16 15:23   ` Jan Beulich
  2017-08-16 17:18   ` [PATCH v2 2.5/4] xen/x86: Replace mandatory barriers with compiler barriers Andrew Cooper
@ 2017-08-18 13:55   ` Tim Deegan
  2017-08-18 14:07     ` Tim Deegan
  2017-08-18 14:23     ` [PATCH] xen/x86/shadow: adjust barriers around gtable_dirty_version Tim Deegan
  2 siblings, 2 replies; 28+ messages in thread
From: Tim Deegan @ 2017-08-18 13:55 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

At 12:22 +0100 on 16 Aug (1502886128), Andrew Cooper wrote:
> diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> index c9c2252..1e3dfaf 100644
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>       * will make sure no inconsistent mapping being translated into
>       * shadow page table. */
>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> -    rmb();
>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);

Nack.  We must read the version before reading the tables, or we might
accidentally use out-of-date tables.

If anything, this needs more barriers!  There ought to be a read
barrier before we re-read the version in shadow_check_gwalk(), but
taking the paging lock DTRT.  And there ought to be a wmb() before we
increment the version later on, which I guess I'll make a patch for.

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-18 13:55   ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Tim Deegan
@ 2017-08-18 14:07     ` Tim Deegan
  2017-08-18 14:23     ` [PATCH] xen/x86/shadow: adjust barriers around gtable_dirty_version Tim Deegan
  1 sibling, 0 replies; 28+ messages in thread
From: Tim Deegan @ 2017-08-18 14:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

At 14:55 +0100 on 18 Aug (1503068128), Tim Deegan wrote:
> At 12:22 +0100 on 16 Aug (1502886128), Andrew Cooper wrote:
> > diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> > index c9c2252..1e3dfaf 100644
> > --- a/xen/arch/x86/mm/shadow/multi.c
> > +++ b/xen/arch/x86/mm/shadow/multi.c
> > @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
> >       * will make sure no inconsistent mapping being translated into
> >       * shadow page table. */
> >      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> > -    rmb();
> >      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
> 
> Nack.  We must read the version before reading the tables, or we might
> accidentally use out-of-date tables.
> 
> If anything, this needs more barriers!  There ought to be a read
> barrier before we re-read the version in shadow_check_gwalk(), but
> taking the paging lock DTRT.  And there ought to be a wmb() before we
> increment the version later on, which I guess I'll make a patch for.

These can be smp_*mb(), though, to align with the rest of the series.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] xen/x86/shadow: adjust barriers around gtable_dirty_version.
  2017-08-18 13:55   ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Tim Deegan
  2017-08-18 14:07     ` Tim Deegan
@ 2017-08-18 14:23     ` Tim Deegan
  2017-08-18 14:26       ` Andrew Cooper
  1 sibling, 1 reply; 28+ messages in thread
From: Tim Deegan @ 2017-08-18 14:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel

Use the smp_ variants, as we're only synchronizing against other CPUs.

Add a write barrier before incrementing the version.

x86's memory ordering rules and the presence of various out-of-unit
function calls mean that this code worked OK before, and the barriers
are mostly decorative.

Signed-off-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/mm/shadow/multi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index c9c2252..f8a8928 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -206,6 +206,7 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, walk_t *gw, int version)
 
     ASSERT(paging_locked_by_me(d));
 
+    /* No need for smp_rmb() here; taking the paging lock was enough. */
     if ( version == atomic_read(&d->arch.paging.shadow.gtable_dirty_version) )
          return 1;
 
@@ -3112,7 +3113,7 @@ static int sh_page_fault(struct vcpu *v,
      * will make sure no inconsistent mapping being translated into
      * shadow page table. */
     version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
-    rmb();
+    smp_rmb();
     walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -3188,6 +3189,7 @@ static int sh_page_fault(struct vcpu *v,
          * overlapping with this one may be inconsistent
          */
         perfc_incr(shadow_rm_write_flush_tlb);
+        smp_wmb();
         atomic_inc(&d->arch.paging.shadow.gtable_dirty_version);
         flush_tlb_mask(d->domain_dirty_cpumask);
     }
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] xen/x86/shadow: adjust barriers around gtable_dirty_version.
  2017-08-18 14:23     ` [PATCH] xen/x86/shadow: adjust barriers around gtable_dirty_version Tim Deegan
@ 2017-08-18 14:26       ` Andrew Cooper
  0 siblings, 0 replies; 28+ messages in thread
From: Andrew Cooper @ 2017-08-18 14:26 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Jan Beulich, Xen-devel

On 18/08/17 15:23, Tim Deegan wrote:
> Use the smp_ variants, as we're only synchronizing against other CPUs.
>
> Add a write barrier before incrementing the version.
>
> x86's memory ordering rules and the presence of various out-of-unit
> function calls mean that this code worked OK before, and the barriers
> are mostly decorative.
>
> Signed-off-by: Tim Deegan <tim@xen.org>

Thanks!

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

> ---
>  xen/arch/x86/mm/shadow/multi.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
> index c9c2252..f8a8928 100644
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -206,6 +206,7 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, walk_t *gw, int version)
>  
>      ASSERT(paging_locked_by_me(d));
>  
> +    /* No need for smp_rmb() here; taking the paging lock was enough. */
>      if ( version == atomic_read(&d->arch.paging.shadow.gtable_dirty_version) )
>           return 1;
>  
> @@ -3112,7 +3113,7 @@ static int sh_page_fault(struct vcpu *v,
>       * will make sure no inconsistent mapping being translated into
>       * shadow page table. */
>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> -    rmb();
> +    smp_rmb();
>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
>  
>  #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
> @@ -3188,6 +3189,7 @@ static int sh_page_fault(struct vcpu *v,
>           * overlapping with this one may be inconsistent
>           */
>          perfc_incr(shadow_rm_write_flush_tlb);
> +        smp_wmb();
>          atomic_inc(&d->arch.paging.shadow.gtable_dirty_version);
>          flush_tlb_mask(d->domain_dirty_cpumask);
>      }


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-17  7:48       ` Jan Beulich
@ 2017-08-18 14:47         ` Tim Deegan
  2017-08-18 15:04           ` Jan Beulich
  2017-08-18 15:07           ` Tim Deegan
  0 siblings, 2 replies; 28+ messages in thread
From: Tim Deegan @ 2017-08-18 14:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Xen-devel

At 01:48 -0600 on 17 Aug (1502934495), Jan Beulich wrote:
> >>> On 16.08.17 at 18:47, <andrew.cooper3@citrix.com> wrote:
> > On 16/08/17 16:23, Jan Beulich wrote:
> >>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
> >>> --- a/xen/arch/x86/mm/shadow/multi.c
> >>> +++ b/xen/arch/x86/mm/shadow/multi.c
> >>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
> >>>       * will make sure no inconsistent mapping being translated into
> >>>       * shadow page table. */
> >>>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> >>> -    rmb();
> >>>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
> >> Isn't this supposed to make sure version is being read first? I.e.
> >> doesn't this at least need to be barrier()?
> > 
> > atomic_read() is not free to be reordered by the compiler.  It is an asm
> > volatile with a volatile memory reference.
> 
> Oh, right - I did forget about the volatiles there (since generally,
> like in Linux, we appear to try to avoid volatile).

FWIW, I don't think that's quite right.  The GCC docs I have say that
"volatile" will stop the compiler from omitting an asm altogether, or
hoisting it out of a loop (on the assumption that it will always
produce the same output for the same inputs).  And that "the compiler
can move even volatile 'asm' instructions relative to other code,
including across jump instructions."

Cheers,

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-18 14:47         ` Tim Deegan
@ 2017-08-18 15:04           ` Jan Beulich
  2017-08-18 15:13             ` Tim Deegan
  2017-08-18 15:07           ` Tim Deegan
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2017-08-18 15:04 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Andrew Cooper, Xen-devel

>>> On 18.08.17 at 16:47, <tim@xen.org> wrote:
> At 01:48 -0600 on 17 Aug (1502934495), Jan Beulich wrote:
>> >>> On 16.08.17 at 18:47, <andrew.cooper3@citrix.com> wrote:
>> > On 16/08/17 16:23, Jan Beulich wrote:
>> >>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
>> >>> --- a/xen/arch/x86/mm/shadow/multi.c
>> >>> +++ b/xen/arch/x86/mm/shadow/multi.c
>> >>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
>> >>>       * will make sure no inconsistent mapping being translated into
>> >>>       * shadow page table. */
>> >>>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
>> >>> -    rmb();
>> >>>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
>> >> Isn't this supposed to make sure version is being read first? I.e.
>> >> doesn't this at least need to be barrier()?
>> > 
>> > atomic_read() is not free to be reordered by the compiler.  It is an asm
>> > volatile with a volatile memory reference.
>> 
>> Oh, right - I did forget about the volatiles there (since generally,
>> like in Linux, we appear to try to avoid volatile).
> 
> FWIW, I don't think that's quite right.  The GCC docs I have say that
> "volatile" will stop the compiler from omitting an asm altogether, or
> hoisting it out of a loop (on the assumption that it will always
> produce the same output for the same inputs).  And that "the compiler
> can move even volatile 'asm' instructions relative to other code,
> including across jump instructions."

Oh, I had talked about the volatile qualifiers, no the volatile asm-s.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-18 14:47         ` Tim Deegan
  2017-08-18 15:04           ` Jan Beulich
@ 2017-08-18 15:07           ` Tim Deegan
  1 sibling, 0 replies; 28+ messages in thread
From: Tim Deegan @ 2017-08-18 15:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Xen-devel

At 15:47 +0100 on 18 Aug (1503071247), Tim Deegan wrote:
> At 01:48 -0600 on 17 Aug (1502934495), Jan Beulich wrote:
> > >>> On 16.08.17 at 18:47, <andrew.cooper3@citrix.com> wrote:
> > > atomic_read() is not free to be reordered by the compiler.  It is an asm
> > > volatile with a volatile memory reference.
> > 
> > Oh, right - I did forget about the volatiles there (since generally,
> > like in Linux, we appear to try to avoid volatile).
> 
> FWIW, I don't think that's quite right.  The GCC docs I have say that
> "volatile" will stop the compiler from omitting an asm altogether, or
> hoisting it out of a loop (on the assumption that it will always
> produce the same output for the same inputs).  And that "the compiler
> can move even volatile 'asm' instructions relative to other code,
> including across jump instructions."

...and indeed: https://godbolt.org/g/KW19QR

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/4] xen/x86: Drop unnecessary barriers
  2017-08-18 15:04           ` Jan Beulich
@ 2017-08-18 15:13             ` Tim Deegan
  0 siblings, 0 replies; 28+ messages in thread
From: Tim Deegan @ 2017-08-18 15:13 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Xen-devel

At 09:04 -0600 on 18 Aug (1503047077), Jan Beulich wrote:
> >>> On 18.08.17 at 16:47, <tim@xen.org> wrote:
> > At 01:48 -0600 on 17 Aug (1502934495), Jan Beulich wrote:
> >> >>> On 16.08.17 at 18:47, <andrew.cooper3@citrix.com> wrote:
> >> > On 16/08/17 16:23, Jan Beulich wrote:
> >> >>>>> On 16.08.17 at 13:22, <andrew.cooper3@citrix.com> wrote:
> >> >>> --- a/xen/arch/x86/mm/shadow/multi.c
> >> >>> +++ b/xen/arch/x86/mm/shadow/multi.c
> >> >>> @@ -3112,7 +3112,6 @@ static int sh_page_fault(struct vcpu *v,
> >> >>>       * will make sure no inconsistent mapping being translated into
> >> >>>       * shadow page table. */
> >> >>>      version = atomic_read(&d->arch.paging.shadow.gtable_dirty_version);
> >> >>> -    rmb();
> >> >>>      walk_ok = sh_walk_guest_tables(v, va, &gw, error_code);
> >> >> Isn't this supposed to make sure version is being read first? I.e.
> >> >> doesn't this at least need to be barrier()?
> >> > 
> >> > atomic_read() is not free to be reordered by the compiler.  It is an asm
> >> > volatile with a volatile memory reference.
> >> 
> >> Oh, right - I did forget about the volatiles there (since generally,
> >> like in Linux, we appear to try to avoid volatile).
> > 
> > FWIW, I don't think that's quite right.  The GCC docs I have say that
> > "volatile" will stop the compiler from omitting an asm altogether, or
> > hoisting it out of a loop (on the assumption that it will always
> > produce the same output for the same inputs).  And that "the compiler
> > can move even volatile 'asm' instructions relative to other code,
> > including across jump instructions."
> 
> Oh, I had talked about the volatile qualifiers, no the volatile asm-s.

I'm not sure what other volatile you mean here, but accesses to
volatile objects are only ordered WRT other _volatile_ accesses.
So, e.g.: https://godbolt.org/g/L2qa8h

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-08-18 15:13 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-16 11:22 [PATCH v2 0/4] x86: Corrections to barrier usage Andrew Cooper
2017-08-16 11:22 ` [PATCH v2 1/4] x86/mcheck: Minor cleanup to amd_nonfatal Andrew Cooper
2017-08-16 15:11   ` Jan Beulich
2017-08-18 13:19   ` Tim Deegan
2017-08-16 11:22 ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Andrew Cooper
2017-08-16 15:23   ` Jan Beulich
2017-08-16 16:47     ` Andrew Cooper
2017-08-16 17:03       ` Andrew Cooper
2017-08-17  7:50         ` Jan Beulich
2017-08-17  7:48       ` Jan Beulich
2017-08-18 14:47         ` Tim Deegan
2017-08-18 15:04           ` Jan Beulich
2017-08-18 15:13             ` Tim Deegan
2017-08-18 15:07           ` Tim Deegan
2017-08-16 17:18   ` [PATCH v2 2.5/4] xen/x86: Replace mandatory barriers with compiler barriers Andrew Cooper
2017-08-17  8:15     ` Jan Beulich
2017-08-18 13:55   ` [PATCH v2 2/4] xen/x86: Drop unnecessary barriers Tim Deegan
2017-08-18 14:07     ` Tim Deegan
2017-08-18 14:23     ` [PATCH] xen/x86/shadow: adjust barriers around gtable_dirty_version Tim Deegan
2017-08-18 14:26       ` Andrew Cooper
2017-08-16 11:22 ` [PATCH v2 3/4] xen/x86: Replace remaining mandatory barriers with SMP barriers Andrew Cooper
2017-08-16 15:42   ` Dario Faggioli
2017-08-17  8:37   ` Jan Beulich
2017-08-17  9:35     ` Andrew Cooper
2017-08-17 10:01       ` Jan Beulich
2017-08-16 11:22 ` [PATCH v2 4/4] xen/x86: Correct mandatory and SMP barrier definitions Andrew Cooper
2017-08-16 15:44   ` Dario Faggioli
2017-08-17  8:41   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).