All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Akira Yokosawa <akiyks@gmail.com>
Cc: Junchang Wang <junchangwang@gmail.com>, perfbook@vger.kernel.org
Subject: Re: [PATCH 3/4] CodeSamples: Fix definition of cmpxchg() in api-gcc.h
Date: Sat, 15 Dec 2018 16:54:37 -0800	[thread overview]
Message-ID: <20181216005437.GZ4170@linux.ibm.com> (raw)
In-Reply-To: <f93f136d-1cdd-dafd-ee45-6e1e88906ae9@gmail.com>

On Sun, Dec 16, 2018 at 08:42:56AM +0900, Akira Yokosawa wrote:
> On 2018/12/15 11:37:44 -0800, Paul E. McKenney wrote:
> > On Sun, Dec 16, 2018 at 12:10:18AM +0900, Akira Yokosawa wrote:
> >> Hi Paul,
> >>
> >> There is something I want to make sure.
> >> Please see inline comment below.
> >>
> >> On 2018/12/14 00:33:15 +0900, Akira Yokosawa wrote:
> >>> On 2018/12/13 00:01:33 +0800, Junchang Wang wrote:
> >>>> On 12/11/18 11:42 PM, Akira Yokosawa wrote:
> >>>>> From 7e7c3a20d08831cd64b77a4e8d8f693b4725ef89 Mon Sep 17 00:00:00 2001
> >>>>> From: Akira Yokosawa <akiyks@gmail.com>
> >>>>> Date: Tue, 11 Dec 2018 21:37:11 +0900
> >>>>> Subject: [PATCH 3/4] CodeSamples: Fix definition of cmpxchg() in api-gcc.h
> >>>>>
> >>>>> Do the same change as CodeSamples/formal/litmus/api.h.
> >>>>>
> >>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> >>>>> ---
> >>>>>  CodeSamples/api-pthreads/api-gcc.h | 5 +++--
> >>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/CodeSamples/api-pthreads/api-gcc.h b/CodeSamples/api-pthreads/api-gcc.h
> >>>>> index 3afe340..b66f4b9 100644
> >>>>> --- a/CodeSamples/api-pthreads/api-gcc.h
> >>>>> +++ b/CodeSamples/api-pthreads/api-gcc.h
> >>>>> @@ -168,8 +168,9 @@ struct __xchg_dummy {
> >>>>>  ({ \
> >>>>>  	typeof(*ptr) _____actual = (o); \
> >>>>>  	\
> >>>>> -	__atomic_compare_exchange_n(ptr, (void *)&_____actual, (n), 1, \
> >>>>> -			__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST) ? (o) : (o)+1; \
> >>>>> +	__atomic_compare_exchange_n((ptr), (void *)&_____actual, (n), 0, \
> >>>>> +			__ATOMIC_SEQ_CST, __ATOMIC_RELAXED); \
> >>>>> +	_____actual; \
> >>>>>  })
> >>>>>  
> >>>>
> >>>> Hi Akira,
> >>>>
> >>>> Another reason that the performance of cmpxchg is catching up with cmpxchg_weak is that __ATOMIC_SEQ_CST is replaced by __ATOMIC_RELAXED in this patch. The use of __ATOMIC_RELAXED means if the CAS primitive fails, the relaxed semantic is used, rather than sequential consistent. Following are some experiment results:
> >>>>
> >>>> # If __ATOMIC_RELAXED is used for both cmpxchg and cmpxchg_weak
> >>>>
> >>>> ./count_lim_atomic 64 uperf
> >>>> ns/update: 290
> >>>>
> >>>> ./count_lim_atomic_weak 64 uperf
> >>>> ns/update: 301
> >>>>
> >>>>
> >>>> # and then if __ATOMIC_SEQ_CST is used for both cmpxchg and cmpxchg_weak
> >>>>
> >>>> ./count_lim_atomic 64 uperf
> >>>> ns/update: 316
> >>>>
> >>>> ./count_lim_atomic_weak 64 uperf
> >>>> ns/update: 302
> >>>>
> >>>> ./count_lim_atomic 120 uperf
> >>>> ns/update: 630
> >>>>
> >>>> ./count_lim_atomic_weak 120 uperf
> >>>> ns/update: 568
> >>>>
> >>>> The results show that if we want to ensure sequential consistency when the CAS primitive fails, cmpxchg_weak performs better than cmpxchg. It seems that the (type of variation, failure_memorder) pair affects performance. I know that PPC uses LL/SC to simulate CAS. But what's the relationship between a simulated CAS and the memory order. This is interesting because as far as I know, PPC and ARM are using LL/SC to simulate atomic primitives such as CAS and FAA. So FAA might have the same behavior.
> >>>>
> >>>> In actually, I'm not very clear about the meaning of different types of failure memory orders. For example, when should we use __ATOMIC_RELAXED, rather than __ATOMIC_SEQ_CST, if a CAS fails? What happen if __ATOMIC_RELAXED is used for x86? The one I'm look at is https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html . Do you know some resources about this? I can look into this tomorrow. Thanks.
> >>>
> >>> Hi Junchang,
> >>>
> >>> In Linux-kernel speak, Documentation/core-api/atomic.rst says:
> >>>
> >>> --------------------------------------------------------------------------
> >>> atomic_xchg must provide explicit memory barriers around the operation. ::
> >>>
> >>> 	int atomic_cmpxchg(atomic_t *v, int old, int new);
> >>>
> >>> This performs an atomic compare exchange operation on the atomic value v,
> >>> with the given old and new values. Like all atomic_xxx operations,
> >>> atomic_cmpxchg will only satisfy its atomicity semantics as long as all
> >>> other accesses of \*v are performed through atomic_xxx operations.
> >>>
> >>> atomic_cmpxchg must provide explicit memory barriers around the operation,
> >>> although if the comparison fails then no memory ordering guarantees are
> >>> required.
> >>>
> >>> [snip]
> >>>
> >>> The routines xchg() and cmpxchg() must provide the same exact
> >>> memory-barrier semantics as the atomic and bit operations returning
> >>> values.
> >>> --------------------------------------------------------------------------
> >>>
> >>> The 2nd __ATOMIC_RELAXED to __atomic_compare_exchange_n() matches this
> >>> lack of requirement.
> >>>
> >>> On x86_64, __atomic_compare_exchange_n() is translated to the same code
> >>> in both cases  (with the help of litmus7's cross compiling):
> >>>
> >>> #START _litmus_P1
> >>> 	xorl	%eax, %eax
> >>> 	movl	$0, 4(%rsp)
> >>> 	lock cmpxchgl	%r10d, (%rdx)
> >>> 	je	.L36
> >>> 	movl	%eax, 4(%rsp)
> >>> .L36:
> >>> 	movl	4(%rsp), %eax
> >>>
> >>> There is no difference between the code with __ATOMIC_RELAXED and
> >>> the code with __ATOMIC_SEQ_CST as the 2nd parameter. As you can see,
> >>> there is no memory barrier instruction emitted.
> >>>
> >>> On PPC, there is a difference. With __ATOMIC_RELAXED as 2nd parameter,
> >>> the code looks like:
> >>>
> >>> #START _litmus_P1
> >>>         sync
> >>> .L34:
> >>>         lwarx 7,0,9
> >>>         cmpwi 0,7,0
> >>>         bne 0,.L35
> >>>         stwcx. 5,0,9
> >>>         bne 0,.L34
> >>>         isync
> >>> .L35:
> >>>
> >>> , OTOH with __ATOMIC_SEQ_CST as 2nd argument:
> >>>
> >>> #START _litmus_P1
> >>>         sync
> >>> .L34:
> >>>         lwarx 7,0,9
> >>>         cmpwi 0,7,0
> >>>         bne 0,.L35
> >>>         stwcx. 5,0,9
> >>>         bne 0,.L34
> >>> .L35:
> >>>         isync
> >>>
> >>
> >> In this asm code snippets, the barrier instruction at the end of
> >> cmpxchg() is "isync".
> >>
> >> In arch/powerpc/include/asm/synch.h of Linux kernel source tree,
> >> both PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER are
> >> defined as '"\n" stringify_in_c(sync) "\n"', which will result
> >> in "sync".
> >>
> >> IIUC, "isync" of PowerPC is not good enough for __ATOMIC_SEQ_CST,
> >> is it?
> > 
> > By itself, no, but in combination with the "sync" instruction at
> > the beginning, combined with the ordering of other __ATOMIC_SEQ_CST
> > operations, it actually is sufficient.  For more information, please
> > see:
> > 
> > http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html
> > 
> > And this is an update that is mostly irrelevant on modern hardware:
> > 
> > http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
> > 
> > Note also that an lwsync instruction can be substituted for each isync,
> > which can sometimes provide better performance.
> 
> Now I'm wondering why PPC_ATOMIC_EXIT_BARRIER is defined as
> '"\n" stringify_in_c(sync) "\n"' rather than
> '"\n" stringify_in_c(isync) "\n"' (or '"\n" stringify_in_c(LWSYNC) "\n"')
> in arch/powerpc/include/asm/synch.h.
> 
> ???

Interactions with spin_is_locked(), if I remember correctly.  It is not
clear to me that this was the right way to go, though.

							Thanx, Paul

>         Thanks, Akira
> 
> > 
> > 							Thanx, Paul
> > 
> >>         Thanks, Akira
> >>
> >>> See the difference of position of label .L35.
> >>> (Note that we are talking about strong version of cmpxchg().)
> >>>
> >>> Does the above example make sense to you?
> >>>
> >>>         Thanks, Akira
> >>>
> >>>>
> >>>>
> >>>> --Junchang
> >>>>
> >>>>
> >>>>
> >>>>>  static __inline__ int atomic_cmpxchg(atomic_t *v, int old, int new)
> >>>>>
> >>>
> >>
> > 
> 


  reply	other threads:[~2018-12-16  0:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-11 15:39 [PATCH 0/4] Update definition of cmpxchg() under CodeSamples Akira Yokosawa
2018-12-11 15:40 ` [PATCH 1/4] CodeSamples: Add C-cmpxchg.litmus Akira Yokosawa
2018-12-11 15:42 ` [PATCH 2/4] CodeSamples/formal/litmus/api.h: Fix definition of cmpxchg() Akira Yokosawa
2018-12-11 15:42 ` [PATCH 3/4] CodeSamples: Fix definition of cmpxchg() in api-gcc.h Akira Yokosawa
2018-12-12 16:01   ` Junchang Wang
2018-12-13 15:33     ` Akira Yokosawa
2018-12-14 14:32       ` Junchang Wang
2018-12-15 14:58         ` Akira Yokosawa
2018-12-16  0:55           ` Junchang Wang
2018-12-15 15:10       ` Akira Yokosawa
2018-12-15 19:37         ` Paul E. McKenney
2018-12-15 23:42           ` Akira Yokosawa
2018-12-16  0:54             ` Paul E. McKenney [this message]
2018-12-11 15:44 ` [PATCH 4/4] EXP CodeSamples: Add weak variant of cmpxchg() as cmpxchg_weak() Akira Yokosawa
2018-12-12 15:48   ` Junchang Wang
2018-12-11 17:23 ` [PATCH 0/4] Update definition of cmpxchg() under CodeSamples Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181216005437.GZ4170@linux.ibm.com \
    --to=paulmck@linux.ibm.com \
    --cc=akiyks@gmail.com \
    --cc=junchangwang@gmail.com \
    --cc=perfbook@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.