From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38743) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UyFJs-0000Az-Bg for qemu-devel@nongnu.org; Sun, 14 Jul 2013 01:58:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UyFJr-0004VX-A8 for qemu-devel@nongnu.org; Sun, 14 Jul 2013 01:58:04 -0400 Received: from mail-ee0-x22e.google.com ([2a00:1450:4013:c00::22e]:58489) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UyFJq-0004VT-Vi for qemu-devel@nongnu.org; Sun, 14 Jul 2013 01:58:03 -0400 Received: by mail-ee0-f46.google.com with SMTP id d41so6858479eek.19 for ; Sat, 13 Jul 2013 22:58:02 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <51E23DE2.2040908@redhat.com> Date: Sun, 14 Jul 2013 07:57:54 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1373770422-5111-1-git-send-email-pingfank@linux.vnet.ibm.com> In-Reply-To: <1373770422-5111-1-git-send-email-pingfank@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2] atomic: using memory_order_relaxed for refcnt inc/dec ops List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Liu Ping Fan Cc: qemu-devel@nongnu.org, Richard Henderson Il 14/07/2013 04:53, Liu Ping Fan ha scritto: > Refcnt's atomic inc/dec ops are frequent and its idiom need no seq_cst > order. So to get better performance, it worth to adopt _relaxed > other than _seq_cst memory model on them. > > We resort to gcc builtins. If gcc supports C11 memory model, __atomic_* > buitlins is used, otherwise __sync_* builtins. > > Signed-off-by: Liu Ping Fan No, not at all. :( First of all, I'd like to understand how you benchmarked this. For inc/dec, relaxed vs. seq_cst has no effect except on PPC and ARM. And if the refcount ops are frequent enough, I strongly suspect cacheline bouncing has a bigger effect than the memory barriers. Second, it is wrong because you need a further read memory barrier when you are removing the last reference Third, it is making the API completely unorthogonal, and "tend to be exceptions" is not a justification. The justification here could be, rather than the performance, having to remember how to use atomic_fetch_dec in the unref side. I don't really buy that, but if you really care, do something like #define atomic_ref(ptr, field) \ __atomic_fetch_add(&((ptr)->field), 1, __ATOMIC_RELAXED) #define atomic_unref(ptr, field, releasefn) ( \ __atomic_fetch_add(&((ptr)->field), -1, __ATOMIC_RELEASE) == 1 \ ? (__atomic_thread_fence(__ATOMIC_ACQUIRE), (releasefn)(ptr)) : false) i.e. define a new interface similar to kref_get/kref_put and, since you are at it, optimize it. Paolo > --- > v2: > update atomics.txt > fix the dependency MACRO > --- > docs/atomics.txt | 4 ++++ > include/qemu/atomic.h | 14 ++++++++++++++ > 2 files changed, 18 insertions(+) > > diff --git a/docs/atomics.txt b/docs/atomics.txt > index 6f2997b..c8ad31d 100644 > --- a/docs/atomics.txt > +++ b/docs/atomics.txt > @@ -45,6 +45,10 @@ consistency*, where "the result of any execution is the same as if the > operations of all the processors were executed in some sequential order, > and the operations of each individual processor appear in this sequence > in the order specified by its program". > +Note that atomic_inc/_dec and atomic_fetch_inc/_dec tend to be exceptions. > +Once gcc provides atomic_* builtins, they adopt *relaxed* memory order. > +The reason is that these pairs are used by the refcnt's ops which is frequent > +and has more effect on performance. > > qemu/atomic.h provides the following set of atomic read-modify-write > operations: > diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h > index 0aa8913..e8353d2 100644 > --- a/include/qemu/atomic.h > +++ b/include/qemu/atomic.h > @@ -183,8 +183,15 @@ > #endif > > /* Provide shorter names for GCC atomic builtins. */ > +#ifdef __ATOMIC_RELAXED > +/* C11 memory_order_relaxed */ > +#define atomic_fetch_inc(ptr) __atomic_fetch_add(ptr, 1, __ATOMIC_RELAXED) > +#define atomic_fetch_dec(ptr) __atomic_fetch_add(ptr, -1, __ATOMIC_RELAXED) > +#else > +/* close to C11 memory_order_seq_cst */ > #define atomic_fetch_inc(ptr) __sync_fetch_and_add(ptr, 1) > #define atomic_fetch_dec(ptr) __sync_fetch_and_add(ptr, -1) > +#endif > #define atomic_fetch_add __sync_fetch_and_add > #define atomic_fetch_sub __sync_fetch_and_sub > #define atomic_fetch_and __sync_fetch_and_and > @@ -192,8 +199,15 @@ > #define atomic_cmpxchg __sync_val_compare_and_swap > > /* And even shorter names that return void. */ > +#ifdef __ATOMIC_RELAXED > +/* C11 memory_order_relaxed */ > +#define atomic_inc(ptr) ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_RELAXED)) > +#define atomic_dec(ptr) ((void) __atomic_fetch_add(ptr, -1, __ATOMIC_RELAXED)) > +#else > +/* close to C11 memory_order_seq_cst */ > #define atomic_inc(ptr) ((void) __sync_fetch_and_add(ptr, 1)) > #define atomic_dec(ptr) ((void) __sync_fetch_and_add(ptr, -1)) > +#endif > #define atomic_add(ptr, n) ((void) __sync_fetch_and_add(ptr, n)) > #define atomic_sub(ptr, n) ((void) __sync_fetch_and_sub(ptr, n)) > #define atomic_and(ptr, n) ((void) __sync_fetch_and_and(ptr, n)) >