[Qemu-devel] [PULL 06/50] atomic: introduce smp_mb_acquire and smp_mb_release

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 06/50] atomic: introduce smp_mb_acquire and smp_mb_release
Date: Mon, 24 Oct 2016 15:46:51 +0200	[thread overview]
Message-ID: <1477316855-42218-7-git-send-email-pbonzini@redhat.com> (raw)
In-Reply-To: <1477316855-42218-1-git-send-email-pbonzini@redhat.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/atomics.txt      | 79 ++++++++++++++++++++++++++++++---------------------
 include/qemu/atomic.h | 50 +++++++++++++++++++-------------
 2 files changed, 77 insertions(+), 52 deletions(-)

diff --git a/docs/atomics.txt b/docs/atomics.txt
index c95950b..c8e4cbe 100644
--- a/docs/atomics.txt
+++ b/docs/atomics.txt
@@ -15,7 +15,8 @@ Macros defined by qemu/atomic.h fall in three camps:
 - compiler barriers: barrier();
 
 - weak atomic access and manual memory barriers: atomic_read(),
-  atomic_set(), smp_rmb(), smp_wmb(), smp_mb(), smp_read_barrier_depends();
+  atomic_set(), smp_rmb(), smp_wmb(), smp_mb(), smp_mb_acquire(),
+  smp_mb_release(), smp_read_barrier_depends();
 
 - sequentially consistent atomic access: everything else.
 
@@ -111,8 +112,8 @@ consistent primitives.
 
 When using this model, variables are accessed with atomic_read() and
 atomic_set(), and restrictions to the ordering of accesses is enforced
-using the smp_rmb(), smp_wmb(), smp_mb() and smp_read_barrier_depends()
-memory barriers.
+using the memory barrier macros: smp_rmb(), smp_wmb(), smp_mb(),
+smp_mb_acquire(), smp_mb_release(), smp_read_barrier_depends().
 
 atomic_read() and atomic_set() prevents the compiler from using
 optimizations that might otherwise optimize accesses out of existence
@@ -124,7 +125,7 @@ other threads, and which are local to the current thread or protected
 by other, more mundane means.
 
 Memory barriers control the order of references to shared memory.
-They come in four kinds:
+They come in six kinds:
 
 - smp_rmb() guarantees that all the LOAD operations specified before
   the barrier will appear to happen before all the LOAD operations
@@ -142,6 +143,16 @@ They come in four kinds:
   In other words, smp_wmb() puts a partial ordering on stores, but is not
   required to have any effect on loads.
 
+- smp_mb_acquire() guarantees that all the LOAD operations specified before
+  the barrier will appear to happen before all the LOAD or STORE operations
+  specified after the barrier with respect to the other components of
+  the system.
+
+- smp_mb_release() guarantees that all the STORE operations specified *after*
+  the barrier will appear to happen after all the LOAD or STORE operations
+  specified *before* the barrier with respect to the other components of
+  the system.
+
 - smp_mb() guarantees that all the LOAD and STORE operations specified
   before the barrier will appear to happen before all the LOAD and
   STORE operations specified after the barrier with respect to the other
@@ -149,8 +160,9 @@ They come in four kinds:
 
   smp_mb() puts a partial ordering on both loads and stores.  It is
   stronger than both a read and a write memory barrier; it implies both
-  smp_rmb() and smp_wmb(), but it also prevents STOREs coming before the
-  barrier from overtaking LOADs coming after the barrier and vice versa.
+  smp_mb_acquire() and smp_mb_release(), but it also prevents STOREs
+  coming before the barrier from overtaking LOADs coming after the
+  barrier and vice versa.
 
 - smp_read_barrier_depends() is a weaker kind of read barrier.  On
   most processors, whenever two loads are performed such that the
@@ -173,24 +185,21 @@ They come in four kinds:
 This is the set of barriers that is required *between* two atomic_read()
 and atomic_set() operations to achieve sequential consistency:
 
-                    |               2nd operation             |
-                    |-----------------------------------------|
-     1st operation  | (after last) | atomic_read | atomic_set |
-     ---------------+--------------+-------------+------------|
-     (before first) |              | none        | smp_wmb()  |
-     ---------------+--------------+-------------+------------|
-     atomic_read    | smp_rmb()    | smp_rmb()*  | **         |
-     ---------------+--------------+-------------+------------|
-     atomic_set     | none         | smp_mb()*** | smp_wmb()  |
-     ---------------+--------------+-------------+------------|
+                    |               2nd operation                   |
+                    |-----------------------------------------------|
+     1st operation  | (after last)   | atomic_read | atomic_set     |
+     ---------------+----------------+-------------+----------------|
+     (before first) |                | none        | smp_mb_release |
+     ---------------+----------------+-------------+----------------|
+     atomic_read    | smp_mb_acquire | smp_rmb     | **             |
+     ---------------+----------------+-------------+----------------|
+     atomic_set     | none           | smp_mb()*** | smp_wmb()      |
+     ---------------+----------------+-------------+----------------|
 
        * Or smp_read_barrier_depends().
 
-      ** This requires a load-store barrier.  How to achieve this varies
-         depending on the machine, but in practice smp_rmb()+smp_wmb()
-         should have the desired effect.  For example, on PowerPC the
-         lwsync instruction is a combined load-load, load-store and
-         store-store barrier.
+      ** This requires a load-store barrier.  This is achieved by
+         either smp_mb_acquire() or smp_mb_release().
 
      *** This requires a store-load barrier.  On most machines, the only
          way to achieve this is a full barrier.
@@ -199,11 +208,11 @@ and atomic_set() operations to achieve sequential consistency:
 You can see that the two possible definitions of atomic_mb_read()
 and atomic_mb_set() are the following:
 
-    1) atomic_mb_read(p)   = atomic_read(p); smp_rmb()
-       atomic_mb_set(p, v) = smp_wmb(); atomic_set(p, v); smp_mb()
+    1) atomic_mb_read(p)   = atomic_read(p); smp_mb_acquire()
+       atomic_mb_set(p, v) = smp_mb_release(); atomic_set(p, v); smp_mb()
 
-    2) atomic_mb_read(p)   = smp_mb() atomic_read(p); smp_rmb()
-       atomic_mb_set(p, v) = smp_wmb(); atomic_set(p, v);
+    2) atomic_mb_read(p)   = smp_mb() atomic_read(p); smp_mb_acquire()
+       atomic_mb_set(p, v) = smp_mb_release(); atomic_set(p, v);
 
 Usually the former is used, because smp_mb() is expensive and a program
 normally has more reads than writes.  Therefore it makes more sense to
@@ -222,7 +231,7 @@ place barriers instead:
      thread 1                                thread 1
      -------------------------               ------------------------
      (other writes)
-                                             smp_wmb()
+                                             smp_mb_release()
      atomic_mb_set(&a, x)                    atomic_set(&a, x)
                                              smp_wmb()
      atomic_mb_set(&b, y)                    atomic_set(&b, y)
@@ -233,7 +242,13 @@ place barriers instead:
      y = atomic_mb_read(&b)                  y = atomic_read(&b)
                                              smp_rmb()
      x = atomic_mb_read(&a)                  x = atomic_read(&a)
-                                             smp_rmb()
+                                             smp_mb_acquire()
+
+  Note that the barrier between the stores in thread 1, and between
+  the loads in thread 2, has been optimized here to a write or a
+  read memory barrier respectively.  On some architectures, notably
+  ARMv7, smp_mb_acquire and smp_mb_release are just as expensive as
+  smp_mb, but smp_rmb and/or smp_wmb are more efficient.
 
 - sometimes, a thread is accessing many variables that are otherwise
   unrelated to each other (for example because, apart from the current
@@ -246,12 +261,12 @@ place barriers instead:
      n = 0;                                  n = 0;
      for (i = 0; i < 10; i++)          =>    for (i = 0; i < 10; i++)
        n += atomic_mb_read(&a[i]);             n += atomic_read(&a[i]);
-                                             smp_rmb();
+                                             smp_mb_acquire();
 
   Similarly, atomic_mb_set() can be transformed as follows:
   smp_mb():
 
-                                             smp_wmb();
+                                             smp_mb_release();
      for (i = 0; i < 10; i++)          =>    for (i = 0; i < 10; i++)
        atomic_mb_set(&a[i], false);            atomic_set(&a[i], false);
                                              smp_mb();
@@ -261,7 +276,7 @@ The two tricks can be combined.  In this case, splitting a loop in
 two lets you hoist the barriers out of the loops _and_ eliminate the
 expensive smp_mb():
 
-                                             smp_wmb();
+                                             smp_mb_release();
      for (i = 0; i < 10; i++) {        =>    for (i = 0; i < 10; i++)
        atomic_mb_set(&a[i], false);            atomic_set(&a[i], false);
        atomic_mb_set(&b[i], false);          smb_wmb();
@@ -312,8 +327,8 @@ access and for data dependency barriers:
                              smp_read_barrier_depends();
                              z = b[y];
 
-smp_wmb() also pairs with atomic_mb_read(), and smp_rmb() also pairs
-with atomic_mb_set().
+smp_wmb() also pairs with atomic_mb_read() and smp_mb_acquire().
+and smp_rmb() also pairs with atomic_mb_set() and smp_mb_release().
 
 
 COMPARISON WITH LINUX KERNEL MEMORY BARRIERS
diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index c4f6950..b108df0 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -72,16 +72,16 @@
  * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends().
  */
 
-#define smp_mb()    ({ barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); })
-#define smp_wmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); })
-#define smp_rmb()   ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); })
+#define smp_mb()                     ({ barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); })
+#define smp_mb_release()             ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); })
+#define smp_mb_acquire()             ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); })
 
 /* Most compilers currently treat consume and acquire the same, but really
  * no processors except Alpha need a barrier here.  Leave it in if
  * using Thread Sanitizer to avoid warnings, otherwise optimize it away.
  */
 #if defined(__SANITIZE_THREAD__)
-#define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); })
+#define smp_read_barrier_depends()   ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); })
 #elif defined(__alpha__)
 #define smp_read_barrier_depends()   asm volatile("mb":::"memory")
 #else
@@ -149,13 +149,13 @@
     QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
     typeof_strip_qual(*ptr) _val;                       \
      __atomic_load(ptr, &_val, __ATOMIC_RELAXED);       \
-     smp_rmb();                                         \
+     smp_mb_acquire();                                  \
     _val;                                               \
     })
 
 #define atomic_mb_set(ptr, i)  do {                     \
     QEMU_BUILD_BUG_ON(sizeof(*ptr) > sizeof(void *));   \
-    smp_wmb();                                          \
+    smp_mb_release();                                   \
     __atomic_store_n(ptr, i, __ATOMIC_RELAXED);         \
     smp_mb();                                           \
 } while(0)
@@ -238,8 +238,8 @@
  * here (a compiler barrier only).  QEMU doesn't do accesses to write-combining
  * qemu memory or non-temporal load/stores from C code.
  */
-#define smp_wmb()   barrier()
-#define smp_rmb()   barrier()
+#define smp_mb_release()   barrier()
+#define smp_mb_acquire()   barrier()
 
 /*
  * __sync_lock_test_and_set() is documented to be an acquire barrier only,
@@ -263,13 +263,15 @@
  * smp_mb has the same problem as on x86 for not-very-new GCC
  * (http://patchwork.ozlabs.org/patch/126184/, Nov 2011).
  */
-#define smp_wmb()   ({ asm volatile("eieio" ::: "memory"); (void)0; })
+#define smp_wmb()          ({ asm volatile("eieio" ::: "memory"); (void)0; })
 #if defined(__powerpc64__)
-#define smp_rmb()   ({ asm volatile("lwsync" ::: "memory"); (void)0; })
+#define smp_mb_release()   ({ asm volatile("lwsync" ::: "memory"); (void)0; })
+#define smp_mb_acquire()   ({ asm volatile("lwsync" ::: "memory"); (void)0; })
 #else
-#define smp_rmb()   ({ asm volatile("sync" ::: "memory"); (void)0; })
+#define smp_mb_release()   ({ asm volatile("sync" ::: "memory"); (void)0; })
+#define smp_mb_acquire()   ({ asm volatile("sync" ::: "memory"); (void)0; })
 #endif
-#define smp_mb()    ({ asm volatile("sync" ::: "memory"); (void)0; })
+#define smp_mb()           ({ asm volatile("sync" ::: "memory"); (void)0; })
 
 #endif /* _ARCH_PPC */
 
@@ -277,18 +279,18 @@
  * For (host) platforms we don't have explicit barrier definitions
  * for, we use the gcc __sync_synchronize() primitive to generate a
  * full barrier.  This should be safe on all platforms, though it may
- * be overkill for smp_wmb() and smp_rmb().
+ * be overkill for smp_mb_acquire() and smp_mb_release().
  */
 #ifndef smp_mb
-#define smp_mb()    __sync_synchronize()
+#define smp_mb()           __sync_synchronize()
 #endif
 
-#ifndef smp_wmb
-#define smp_wmb()   __sync_synchronize()
+#ifndef smp_mb_acquire
+#define smp_mb_acquire()   __sync_synchronize()
 #endif
 
-#ifndef smp_rmb
-#define smp_rmb()   __sync_synchronize()
+#ifndef smp_mb_release
+#define smp_mb_release()   __sync_synchronize()
 #endif
 
 #ifndef smp_read_barrier_depends
@@ -365,13 +367,13 @@
  */
 #define atomic_mb_read(ptr)    ({           \
     typeof(*ptr) _val = atomic_read(ptr);   \
-    smp_rmb();                              \
+    smp_mb_acquire();                       \
     _val;                                   \
 })
 
 #ifndef atomic_mb_set
 #define atomic_mb_set(ptr, i)  do {         \
-    smp_wmb();                              \
+    smp_mb_release();                       \
     atomic_set(ptr, i);                     \
     smp_mb();                               \
 } while (0)
@@ -404,4 +406,12 @@
 #define atomic_or(ptr, n)      ((void) __sync_fetch_and_or(ptr, n))
 
 #endif /* __ATOMIC_RELAXED */
+
+#ifndef smp_wmb
+#define smp_wmb()   smp_mb_release()
+#endif
+#ifndef smp_rmb
+#define smp_rmb()   smp_mb_acquire()
+#endif
+
 #endif /* QEMU_ATOMIC_H */
-- 
1.8.3.1

next prev parent reply	other threads:[~2016-10-24 13:47 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24 13:46 [Qemu-devel] [PULL 00/50] Miscellaneous patches for 2016-10-24 Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 01/50] kvm-all: don't use stale dbg_data->cpu Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 02/50] rbd: shift byte count as a 64-bit value Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 03/50] block/iscsi: Introducing new zero-copy API Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 04/50] block/iscsi: Adding new iSER transport layer option Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 05/50] Put the copyright information on a separate line Paolo Bonzini
2016-10-24 13:46 ` Paolo Bonzini [this message]
2016-10-24 13:46 ` [Qemu-devel] [PULL 07/50] qemu-thread: use acquire/release to clarify semantics of QemuEvent Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 08/50] rcu: simplify memory barriers Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 09/50] atomic: base mb_read/mb_set on load-acquire and store-release Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 10/50] qht-bench: relax test_start/stop atomic accesses Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 11/50] test-i386: fix bitrot for 64-bit Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 12/50] target-i386: fix 32-bit addresses in LEA Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 13/50] tcg: try sti when moving a constant into a dead memory temp Paolo Bonzini
2016-10-24 13:46 ` [Qemu-devel] [PULL 14/50] memory: eliminate global MemoryListeners Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 15/50] memory: add a per-AddressSpace list of listeners Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 16/50] memory: optimize memory_global_dirty_log_sync Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 17/50] memory: optimize memory_region_sync_dirty_bitmap Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 18/50] char: serial: check divider value against baud base Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 19/50] char.h: misc doc fix Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 20/50] rng: remove unused included header Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 21/50] char: remove use-after-free on win-stdio Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 22/50] ringbuf: fix chr_write return value Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 23/50] sun4uv: fix serial initialization regression Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 24/50] malta: replace chr init by CHR_EVENT_OPENED handler Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 25/50] char: remove init callback Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 26/50] xilinx: fix buffer overflow on realize Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 27/50] mux: split mux_chr_update_read_handler() Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 28/50] char: introduce CharBackend Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 29/50] char: start converting mux driver to use CharBackend Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 30/50] char: replace PROP_CHR with CharBackend Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 31/50] char: remaining switch to CharBackend in frontend Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 32/50] char: rename some frontend functions Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 33/50] colo: claim in find_and_check_chardev Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 34/50] char: use qemu_chr_fe* functions with CharBackend argument Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 35/50] char: fold qemu_chr_set_handlers in qemu_chr_fe_set_handlers Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 36/50] vhost-user: only initialize queue 0 CharBackend Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 37/50] char: replace qemu_chr_claim/release with qemu_chr_fe_init/deinit Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 38/50] char: make some qemu_chr_fe skip if no driver Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 39/50] tests: start chardev unit tests Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 40/50] char: move front end handlers in CharBackend Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 41/50] char: rename chr_close/chr_free Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 42/50] char: remove explicit_fe_open, use a set_handlers argument Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 43/50] char: move fe_open in CharBackend Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 44/50] char: remove unused CHR_EVENT_FOCUS Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 45/50] char: use an enum for CHR_EVENT Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 46/50] char: remove unused qemu_chr_fe_event Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 47/50] char: replace avail_connections Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 48/50] char: use common error path in qmp_chardev_add Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 49/50] char: remove explicit_be_open from CharDriverState Paolo Bonzini
2016-10-24 13:47 ` [Qemu-devel] [PULL 50/50] exec.c: workaround regression caused by alignment change in d2f39ad Paolo Bonzini
2016-10-24 15:11 ` [Qemu-devel] [PULL 00/50] Miscellaneous patches for 2016-10-24 Peter Maydell

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c95950b dfblob:c8e4cbe dfblob:c4f6950 dfblob:b108df0 )
 OR (
bs:"[Qemu-devel] [PULL 06/50] atomic: introduce smp_mb_acquire and smp_mb_release" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1477316855-42218-7-git-send-email-pbonzini@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).