Re: [Qemu-devel] [PATCH v2] atomic: using memory_order_relaxed for refcnt inc/dec ops

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Liu Ping Fan <qemulist@gmail.com>
Cc: qemu-devel@nongnu.org, Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH v2] atomic: using memory_order_relaxed for refcnt inc/dec ops
Date: Sun, 14 Jul 2013 07:57:54 +0200	[thread overview]
Message-ID: <51E23DE2.2040908@redhat.com> (raw)
In-Reply-To: <1373770422-5111-1-git-send-email-pingfank@linux.vnet.ibm.com>

Il 14/07/2013 04:53, Liu Ping Fan ha scritto:
> Refcnt's atomic inc/dec ops are frequent and its idiom need no seq_cst
> order. So to get better performance, it worth to adopt _relaxed
> other than _seq_cst memory model on them.
> 
> We resort to gcc builtins. If gcc supports C11 memory model, __atomic_*
> buitlins is used, otherwise __sync_* builtins.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

No, not at all. :(

First of all, I'd like to understand how you benchmarked this.  For
inc/dec, relaxed vs. seq_cst has no effect except on PPC and ARM.  And
if the refcount ops are frequent enough, I strongly suspect cacheline
bouncing has a bigger effect than the memory barriers.

Second, it is wrong because you need a further read memory barrier when
you are removing the last reference

Third, it is making the API completely unorthogonal, and "tend to be
exceptions" is not a justification.

The justification here could be, rather than the performance, having to
remember how to use atomic_fetch_dec in the unref side.  I don't really
buy that, but if you really care, do something like

#define atomic_ref(ptr, field) \
  __atomic_fetch_add(&((ptr)->field), 1, __ATOMIC_RELAXED)
#define atomic_unref(ptr, field, releasefn) ( \
  __atomic_fetch_add(&((ptr)->field), -1, __ATOMIC_RELEASE) == 1 \
  ? (__atomic_thread_fence(__ATOMIC_ACQUIRE), (releasefn)(ptr)) : false)

i.e. define a new interface similar to kref_get/kref_put and, since you
are at it, optimize it.

Paolo

> ---
> v2:
>   update atomics.txt
>   fix the dependency MACRO
> ---
>  docs/atomics.txt      |  4 ++++
>  include/qemu/atomic.h | 14 ++++++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/docs/atomics.txt b/docs/atomics.txt
> index 6f2997b..c8ad31d 100644
> --- a/docs/atomics.txt
> +++ b/docs/atomics.txt
> @@ -45,6 +45,10 @@ consistency*, where "the result of any execution is the same as if the
>  operations of all the processors were executed in some sequential order,
>  and the operations of each individual processor appear in this sequence
>  in the order specified by its program".
> +Note that atomic_inc/_dec and atomic_fetch_inc/_dec tend to be exceptions.
> +Once gcc provides atomic_* builtins, they adopt *relaxed* memory order.
> +The reason is that these pairs are used by the refcnt's ops which is frequent
> +and has more effect on performance.
>  
>  qemu/atomic.h provides the following set of atomic read-modify-write
>  operations:
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 0aa8913..e8353d2 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -183,8 +183,15 @@
>  #endif
>  
>  /* Provide shorter names for GCC atomic builtins.  */
> +#ifdef __ATOMIC_RELAXED
> +/* C11 memory_order_relaxed */
> +#define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_RELAXED)
> +#define atomic_fetch_dec(ptr)  __atomic_fetch_add(ptr, -1, __ATOMIC_RELAXED)
> +#else
> +/* close to C11 memory_order_seq_cst */
>  #define atomic_fetch_inc(ptr)  __sync_fetch_and_add(ptr, 1)
>  #define atomic_fetch_dec(ptr)  __sync_fetch_and_add(ptr, -1)
> +#endif
>  #define atomic_fetch_add       __sync_fetch_and_add
>  #define atomic_fetch_sub       __sync_fetch_and_sub
>  #define atomic_fetch_and       __sync_fetch_and_and
> @@ -192,8 +199,15 @@
>  #define atomic_cmpxchg         __sync_val_compare_and_swap
>  
>  /* And even shorter names that return void.  */
> +#ifdef __ATOMIC_RELAXED
> +/* C11 memory_order_relaxed */
> +#define atomic_inc(ptr)        ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_RELAXED))
> +#define atomic_dec(ptr)        ((void) __atomic_fetch_add(ptr, -1, __ATOMIC_RELAXED))
> +#else
> +/* close to C11 memory_order_seq_cst */
>  #define atomic_inc(ptr)        ((void) __sync_fetch_and_add(ptr, 1))
>  #define atomic_dec(ptr)        ((void) __sync_fetch_and_add(ptr, -1))
> +#endif
>  #define atomic_add(ptr, n)     ((void) __sync_fetch_and_add(ptr, n))
>  #define atomic_sub(ptr, n)     ((void) __sync_fetch_and_sub(ptr, n))
>  #define atomic_and(ptr, n)     ((void) __sync_fetch_and_and(ptr, n))
>

next prev parent reply	other threads:[~2013-07-14  5:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-14  2:53 [Qemu-devel] [PATCH v2] atomic: using memory_order_relaxed for refcnt inc/dec ops Liu Ping Fan
2013-07-14  5:57 ` Paolo Bonzini [this message]
2013-07-14 10:23   ` liu ping fan
2013-07-15 10:39     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E23DE2.2040908@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemulist@gmail.com \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).