From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Akira Yokosawa <akiyks@gmail.com>
Cc: Junchang Wang <junchangwang@gmail.com>, perfbook@vger.kernel.org
Subject: Re: [PATCH 3/4] CodeSamples: Fix definition of cmpxchg() in api-gcc.h
Date: Sat, 15 Dec 2018 16:54:37 -0800 [thread overview]
Message-ID: <20181216005437.GZ4170@linux.ibm.com> (raw)
In-Reply-To: <f93f136d-1cdd-dafd-ee45-6e1e88906ae9@gmail.com>
On Sun, Dec 16, 2018 at 08:42:56AM +0900, Akira Yokosawa wrote:
> On 2018/12/15 11:37:44 -0800, Paul E. McKenney wrote:
> > On Sun, Dec 16, 2018 at 12:10:18AM +0900, Akira Yokosawa wrote:
> >> Hi Paul,
> >>
> >> There is something I want to make sure.
> >> Please see inline comment below.
> >>
> >> On 2018/12/14 00:33:15 +0900, Akira Yokosawa wrote:
> >>> On 2018/12/13 00:01:33 +0800, Junchang Wang wrote:
> >>>> On 12/11/18 11:42 PM, Akira Yokosawa wrote:
> >>>>> From 7e7c3a20d08831cd64b77a4e8d8f693b4725ef89 Mon Sep 17 00:00:00 2001
> >>>>> From: Akira Yokosawa <akiyks@gmail.com>
> >>>>> Date: Tue, 11 Dec 2018 21:37:11 +0900
> >>>>> Subject: [PATCH 3/4] CodeSamples: Fix definition of cmpxchg() in api-gcc.h
> >>>>>
> >>>>> Do the same change as CodeSamples/formal/litmus/api.h.
> >>>>>
> >>>>> Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
> >>>>> ---
> >>>>> CodeSamples/api-pthreads/api-gcc.h | 5 +++--
> >>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/CodeSamples/api-pthreads/api-gcc.h b/CodeSamples/api-pthreads/api-gcc.h
> >>>>> index 3afe340..b66f4b9 100644
> >>>>> --- a/CodeSamples/api-pthreads/api-gcc.h
> >>>>> +++ b/CodeSamples/api-pthreads/api-gcc.h
> >>>>> @@ -168,8 +168,9 @@ struct __xchg_dummy {
> >>>>> ({ \
> >>>>> typeof(*ptr) _____actual = (o); \
> >>>>> \
> >>>>> - __atomic_compare_exchange_n(ptr, (void *)&_____actual, (n), 1, \
> >>>>> - __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST) ? (o) : (o)+1; \
> >>>>> + __atomic_compare_exchange_n((ptr), (void *)&_____actual, (n), 0, \
> >>>>> + __ATOMIC_SEQ_CST, __ATOMIC_RELAXED); \
> >>>>> + _____actual; \
> >>>>> })
> >>>>>
> >>>>
> >>>> Hi Akira,
> >>>>
> >>>> Another reason that the performance of cmpxchg is catching up with cmpxchg_weak is that __ATOMIC_SEQ_CST is replaced by __ATOMIC_RELAXED in this patch. The use of __ATOMIC_RELAXED means if the CAS primitive fails, the relaxed semantic is used, rather than sequential consistent. Following are some experiment results:
> >>>>
> >>>> # If __ATOMIC_RELAXED is used for both cmpxchg and cmpxchg_weak
> >>>>
> >>>> ./count_lim_atomic 64 uperf
> >>>> ns/update: 290
> >>>>
> >>>> ./count_lim_atomic_weak 64 uperf
> >>>> ns/update: 301
> >>>>
> >>>>
> >>>> # and then if __ATOMIC_SEQ_CST is used for both cmpxchg and cmpxchg_weak
> >>>>
> >>>> ./count_lim_atomic 64 uperf
> >>>> ns/update: 316
> >>>>
> >>>> ./count_lim_atomic_weak 64 uperf
> >>>> ns/update: 302
> >>>>
> >>>> ./count_lim_atomic 120 uperf
> >>>> ns/update: 630
> >>>>
> >>>> ./count_lim_atomic_weak 120 uperf
> >>>> ns/update: 568
> >>>>
> >>>> The results show that if we want to ensure sequential consistency when the CAS primitive fails, cmpxchg_weak performs better than cmpxchg. It seems that the (type of variation, failure_memorder) pair affects performance. I know that PPC uses LL/SC to simulate CAS. But what's the relationship between a simulated CAS and the memory order. This is interesting because as far as I know, PPC and ARM are using LL/SC to simulate atomic primitives such as CAS and FAA. So FAA might have the same behavior.
> >>>>
> >>>> In actually, I'm not very clear about the meaning of different types of failure memory orders. For example, when should we use __ATOMIC_RELAXED, rather than __ATOMIC_SEQ_CST, if a CAS fails? What happen if __ATOMIC_RELAXED is used for x86? The one I'm look at is https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html . Do you know some resources about this? I can look into this tomorrow. Thanks.
> >>>
> >>> Hi Junchang,
> >>>
> >>> In Linux-kernel speak, Documentation/core-api/atomic.rst says:
> >>>
> >>> --------------------------------------------------------------------------
> >>> atomic_xchg must provide explicit memory barriers around the operation. ::
> >>>
> >>> int atomic_cmpxchg(atomic_t *v, int old, int new);
> >>>
> >>> This performs an atomic compare exchange operation on the atomic value v,
> >>> with the given old and new values. Like all atomic_xxx operations,
> >>> atomic_cmpxchg will only satisfy its atomicity semantics as long as all
> >>> other accesses of \*v are performed through atomic_xxx operations.
> >>>
> >>> atomic_cmpxchg must provide explicit memory barriers around the operation,
> >>> although if the comparison fails then no memory ordering guarantees are
> >>> required.
> >>>
> >>> [snip]
> >>>
> >>> The routines xchg() and cmpxchg() must provide the same exact
> >>> memory-barrier semantics as the atomic and bit operations returning
> >>> values.
> >>> --------------------------------------------------------------------------
> >>>
> >>> The 2nd __ATOMIC_RELAXED to __atomic_compare_exchange_n() matches this
> >>> lack of requirement.
> >>>
> >>> On x86_64, __atomic_compare_exchange_n() is translated to the same code
> >>> in both cases (with the help of litmus7's cross compiling):
> >>>
> >>> #START _litmus_P1
> >>> xorl %eax, %eax
> >>> movl $0, 4(%rsp)
> >>> lock cmpxchgl %r10d, (%rdx)
> >>> je .L36
> >>> movl %eax, 4(%rsp)
> >>> .L36:
> >>> movl 4(%rsp), %eax
> >>>
> >>> There is no difference between the code with __ATOMIC_RELAXED and
> >>> the code with __ATOMIC_SEQ_CST as the 2nd parameter. As you can see,
> >>> there is no memory barrier instruction emitted.
> >>>
> >>> On PPC, there is a difference. With __ATOMIC_RELAXED as 2nd parameter,
> >>> the code looks like:
> >>>
> >>> #START _litmus_P1
> >>> sync
> >>> .L34:
> >>> lwarx 7,0,9
> >>> cmpwi 0,7,0
> >>> bne 0,.L35
> >>> stwcx. 5,0,9
> >>> bne 0,.L34
> >>> isync
> >>> .L35:
> >>>
> >>> , OTOH with __ATOMIC_SEQ_CST as 2nd argument:
> >>>
> >>> #START _litmus_P1
> >>> sync
> >>> .L34:
> >>> lwarx 7,0,9
> >>> cmpwi 0,7,0
> >>> bne 0,.L35
> >>> stwcx. 5,0,9
> >>> bne 0,.L34
> >>> .L35:
> >>> isync
> >>>
> >>
> >> In this asm code snippets, the barrier instruction at the end of
> >> cmpxchg() is "isync".
> >>
> >> In arch/powerpc/include/asm/synch.h of Linux kernel source tree,
> >> both PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER are
> >> defined as '"\n" stringify_in_c(sync) "\n"', which will result
> >> in "sync".
> >>
> >> IIUC, "isync" of PowerPC is not good enough for __ATOMIC_SEQ_CST,
> >> is it?
> >
> > By itself, no, but in combination with the "sync" instruction at
> > the beginning, combined with the ordering of other __ATOMIC_SEQ_CST
> > operations, it actually is sufficient. For more information, please
> > see:
> >
> > http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html
> >
> > And this is an update that is mostly irrelevant on modern hardware:
> >
> > http://www.rdrop.com/users/paulmck/scalability/paper/N2745r.2011.03.04a.html
> >
> > Note also that an lwsync instruction can be substituted for each isync,
> > which can sometimes provide better performance.
>
> Now I'm wondering why PPC_ATOMIC_EXIT_BARRIER is defined as
> '"\n" stringify_in_c(sync) "\n"' rather than
> '"\n" stringify_in_c(isync) "\n"' (or '"\n" stringify_in_c(LWSYNC) "\n"')
> in arch/powerpc/include/asm/synch.h.
>
> ???
Interactions with spin_is_locked(), if I remember correctly. It is not
clear to me that this was the right way to go, though.
Thanx, Paul
> Thanks, Akira
>
> >
> > Thanx, Paul
> >
> >> Thanks, Akira
> >>
> >>> See the difference of position of label .L35.
> >>> (Note that we are talking about strong version of cmpxchg().)
> >>>
> >>> Does the above example make sense to you?
> >>>
> >>> Thanks, Akira
> >>>
> >>>>
> >>>>
> >>>> --Junchang
> >>>>
> >>>>
> >>>>
> >>>>> static __inline__ int atomic_cmpxchg(atomic_t *v, int old, int new)
> >>>>>
> >>>
> >>
> >
>
next prev parent reply other threads:[~2018-12-16 0:54 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-11 15:39 [PATCH 0/4] Update definition of cmpxchg() under CodeSamples Akira Yokosawa
2018-12-11 15:40 ` [PATCH 1/4] CodeSamples: Add C-cmpxchg.litmus Akira Yokosawa
2018-12-11 15:42 ` [PATCH 2/4] CodeSamples/formal/litmus/api.h: Fix definition of cmpxchg() Akira Yokosawa
2018-12-11 15:42 ` [PATCH 3/4] CodeSamples: Fix definition of cmpxchg() in api-gcc.h Akira Yokosawa
2018-12-12 16:01 ` Junchang Wang
2018-12-13 15:33 ` Akira Yokosawa
2018-12-14 14:32 ` Junchang Wang
2018-12-15 14:58 ` Akira Yokosawa
2018-12-16 0:55 ` Junchang Wang
2018-12-15 15:10 ` Akira Yokosawa
2018-12-15 19:37 ` Paul E. McKenney
2018-12-15 23:42 ` Akira Yokosawa
2018-12-16 0:54 ` Paul E. McKenney [this message]
2018-12-11 15:44 ` [PATCH 4/4] EXP CodeSamples: Add weak variant of cmpxchg() as cmpxchg_weak() Akira Yokosawa
2018-12-12 15:48 ` Junchang Wang
2018-12-11 17:23 ` [PATCH 0/4] Update definition of cmpxchg() under CodeSamples Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181216005437.GZ4170@linux.ibm.com \
--to=paulmck@linux.ibm.com \
--cc=akiyks@gmail.com \
--cc=junchangwang@gmail.com \
--cc=perfbook@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox