From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>,
Peter Zijlstra <peterz@infradead.org>,
Michael Ellerman <mpe@ellerman.id.au>,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
Anton Blanchard <anton@samba.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation
Date: Tue, 20 Oct 2015 16:34:51 -0700 [thread overview]
Message-ID: <20151020233451.GI5105@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151019011718.GB924@fixme-laptop.cn.ibm.com>
On Mon, Oct 19, 2015 at 09:17:18AM +0800, Boqun Feng wrote:
> On Fri, Oct 09, 2015 at 10:40:39AM +0100, Will Deacon wrote:
> > On Fri, Oct 09, 2015 at 10:31:38AM +0200, Peter Zijlstra wrote:
> [snip]
> > >
> > > So lots of little confusions added up to complete fail :-{
> > >
> > > Mostly I think it was the UNLOCK x + LOCK x are fully ordered (where I
> > > forgot: but not against uninvolved CPUs) and RELEASE/ACQUIRE are
> > > transitive (where I forgot: RELEASE/ACQUIRE _chains_ are transitive, but
> > > again not against uninvolved CPUs).
> > >
> > > Which leads me to think I would like to suggest alternative rules for
> > > RELEASE/ACQUIRE (to replace those Will suggested; as I think those are
> > > partly responsible for my confusion).
> >
> > Yeah, sorry. I originally used the phrase "fully ordered" but changed it
> > to "full barrier", which has stronger transitivity (newly understood
> > definition) requirements that I didn't intend.
> >
> > RELEASE -> ACQUIRE should be used for message passing between two CPUs
> > and not have ordering effects on other observers unless they're part of
> > the RELEASE -> ACQUIRE chain.
> >
> > > - RELEASE -> ACQUIRE is fully ordered (but not a full barrier) when
> > > they operate on the same variable and the ACQUIRE reads from the
> > > RELEASE. Notable, RELEASE/ACQUIRE are RCpc and lack transitivity.
> >
> > Are we explicit about the difference between "fully ordered" and "full
> > barrier" somewhere else, because this looks like it will confuse people.
> >
>
> This is confusing me right now. ;-)
>
> Let's use a simple example for only one primitive, as I understand it,
> if we say a primitive A is "fully ordered", we actually mean:
>
> 1. The memory operations preceding(in program order) A can't be
> reordered after the memory operations following(in PO) A.
>
> and
>
> 2. The memory operation(s) in A can't be reordered before the
> memory operations preceding(in PO) A and after the memory
> operations following(in PO) A.
>
> If we say A is a "full barrier", we actually means:
>
> 1. The memory operations preceding(in program order) A can't be
> reordered after the memory operations following(in PO) A.
>
> and
>
> 2. The memory ordering guarantee in #1 is visible globally.
>
> Is that correct? Or "full barrier" is more strong than I understand,
> i.e. there is a third property of "full barrier":
>
> 3. The memory operation(s) in A can't be reordered before the
> memory operations preceding(in PO) A and after the memory
> operations following(in PO) A.
>
> IOW, is "full barrier" a more strong version of "fully ordered" or not?
There is also the question of whether the barrier forces ordering
of unrelated stores, everything initially zero and all accesses
READ_ONCE() or WRITE_ONCE():
P0 P1 P2 P3
X = 1; Y = 1; r1 = X; r3 = Y;
some_barrier(); some_barrier();
r2 = Y; r4 = X;
P2's and P3's ordering could be globally visible without requiring
P0's and P1's independent stores to be ordered, for example, if you
used smp_rmb() for some_barrier(). In contrast, if we used smp_mb()
for barrier, everyone would agree on the order of P0's and P0's stores.
There are actually a fair number of different combinations of
aspects of memory ordering. We will need to choose wisely. ;-)
My hope is that the store-ordering gets folded into the globally
visible transitive level. Especially given that I have not (yet)
seen any algorithms used in production that relied on the ordering of
independent stores.
Thanx, Paul
> Regards,
> Boqun
>
> > > - RELEASE -> ACQUIRE can be upgraded to a full barrier (including
> > > transitivity) using smp_mb__release_acquire(), either before RELEASE
> > > or after ACQUIRE (but consistently [*]).
> >
> > Hmm, but we don't actually need this for RELEASE -> ACQUIRE, afaict. This
> > is just needed for UNLOCK -> LOCK, and is exactly what RCU is currently
> > using (for PPC only).
> >
> > Stepping back a second, I believe that there are three cases:
> >
> >
> > RELEASE X -> ACQUIRE Y (same CPU)
> > * Needs a barrier on TSO architectures for full ordering
> >
> > UNLOCK X -> LOCK Y (same CPU)
> > * Needs a barrier on PPC for full ordering
> >
> > RELEASE X -> ACQUIRE X (different CPUs)
> > UNLOCK X -> ACQUIRE X (different CPUs)
> > * Fully ordered everywhere...
> > * ... but needs a barrier on PPC to become a full barrier
> >
> >
next prev parent reply other threads:[~2015-10-20 23:34 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-07 10:59 [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation Will Deacon
2015-10-07 11:19 ` Peter Zijlstra
2015-10-07 13:23 ` Will Deacon
2015-10-07 14:53 ` Peter Zijlstra
2015-10-07 15:25 ` Paul E. McKenney
2015-10-08 3:50 ` Michael Ellerman
2015-10-08 11:16 ` Peter Zijlstra
2015-10-08 12:59 ` Will Deacon
2015-10-08 22:17 ` Paul E. McKenney
2015-10-09 9:51 ` Will Deacon
2015-10-09 11:25 ` Peter Zijlstra
2015-10-09 17:44 ` Paul E. McKenney
2015-10-09 17:44 ` Paul E. McKenney
2015-10-09 17:43 ` Paul E. McKenney
2015-10-09 18:33 ` Will Deacon
2015-10-12 23:30 ` Paul E. McKenney
2015-10-20 14:20 ` Boqun Feng
2015-10-08 21:44 ` Paul E. McKenney
2015-10-09 7:29 ` Peter Zijlstra
2015-10-09 8:31 ` Peter Zijlstra
2015-10-09 9:40 ` Will Deacon
2015-10-09 11:02 ` Peter Zijlstra
2015-10-09 12:41 ` Will Deacon
2015-10-09 11:12 ` Peter Zijlstra
2015-10-09 12:51 ` Will Deacon
2015-10-09 13:06 ` Peter Zijlstra
2015-10-09 11:13 ` Peter Zijlstra
2015-10-09 17:21 ` Paul E. McKenney
2015-10-19 1:17 ` Boqun Feng
2015-10-19 10:23 ` Peter Zijlstra
2015-10-20 7:35 ` Boqun Feng
2015-10-20 23:34 ` Paul E. McKenney [this message]
2015-10-21 8:24 ` Peter Zijlstra
2015-10-21 19:29 ` Paul E. McKenney
2015-10-21 19:36 ` Peter Zijlstra
2015-10-21 19:56 ` Paul E. McKenney
2015-10-21 16:04 ` David Laight
2015-10-21 16:04 ` David Laight
2015-10-21 16:04 ` David Laight
2015-10-21 19:34 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151020233451.GI5105@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=anton@samba.org \
--cc=benh@kernel.crashing.org \
--cc=boqun.feng@gmail.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.