From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [RFC][PATCH 0/5] arch: atomic rework Date: Thu, 6 Feb 2014 20:20:51 -0800 Message-ID: <20140207042051.GL4250@linux.vnet.ibm.com> References: <20140206134825.305510953@infradead.org> <21984.1391711149@warthog.procyon.org.uk> <52F3DA85.1060209@arm.com> <20140206185910.GE27276@mudshark.cambridge.arm.com> <20140206192743.GH4250@linux.vnet.ibm.com> <1391721423.23421.3898.camel@triegel.csb> <20140206221117.GJ4250@linux.vnet.ibm.com> <1391730288.23421.4102.camel@triegel.csb> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from e36.co.us.ibm.com ([32.97.110.154]:37345 "EHLO e36.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751354AbaBGEU5 (ORCPT ); Thu, 6 Feb 2014 23:20:57 -0500 Received: from /spool/local by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 6 Feb 2014 21:20:56 -0700 Content-Disposition: inline In-Reply-To: <1391730288.23421.4102.camel@triegel.csb> Sender: linux-arch-owner@vger.kernel.org List-ID: To: Torvald Riegel Cc: Will Deacon , Ramana Radhakrishnan , David Howells , Peter Zijlstra , "linux-arch@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "torvalds@linux-foundation.org" , "akpm@linux-foundation.org" , "mingo@kernel.org" , "gcc@gcc.gnu.org" On Fri, Feb 07, 2014 at 12:44:48AM +0100, Torvald Riegel wrote: > On Thu, 2014-02-06 at 14:11 -0800, Paul E. McKenney wrote: > > On Thu, Feb 06, 2014 at 10:17:03PM +0100, Torvald Riegel wrote: > > > On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote: > > > > On Thu, Feb 06, 2014 at 06:59:10PM +0000, Will Deacon wrote: > > > > > There are also so many ways to blow your head off it's untrue= =2E For example, > > > > > cmpxchg takes a separate memory model parameter for failure a= nd success, but > > > > > then there are restrictions on the sets you can use for each.= It's not hard > > > > > to find well-known memory-ordering experts shouting "Just use > > > > > memory_model_seq_cst for everything, it's too hard otherwise"= =2E Then there's > > > > > the fun of load-consume vs load-acquire (arm64 GCC completely= ignores consume > > > > > atm and optimises all of the data dependencies away) as well = as the definition > > > > > of "data races", which seem to be used as an excuse to miscom= pile a program > > > > > at the earliest opportunity. > > > >=20 > > > > Trust me, rcu_dereference() is not going to be defined in terms= of > > > > memory_order_consume until the compilers implement it both corr= ectly and > > > > efficiently. They are not there yet, and there is currently no= shortage > > > > of compiler writers who would prefer to ignore memory_order_con= sume. > > >=20 > > > Do you have any input on > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D59448? In particul= ar, the > > > language standard's definition of dependencies? > >=20 > > Let's see... 1.10p9 says that a dependency must be carried unless: > >=20 > > =E2=80=94 B is an invocation of any specialization of std::kill_dep= endency (29.3), or > > =E2=80=94 A is the left operand of a built-in logical AND (&&, see = 5.14) or logical OR (||, see 5.15) operator, > > or > > =E2=80=94 A is the left operand of a conditional (?:, see 5.16) ope= rator, or > > =E2=80=94 A is the left operand of the built-in comma (,) operator = (5.18); > >=20 > > So the use of "flag" before the "?" is ignored. But the "flag - fl= ag" > > after the "?" will carry a dependency, so the code fragment in 5944= 8 > > needs to do the ordering rather than just optimizing "flag - flag" = out > > of existence. One way to do that on both ARM and Power is to actua= lly > > emit code for "flag - flag", but there are a number of other ways t= o > > make that work. >=20 > And that's what would concern me, considering that these requirements > seem to be able to creep out easily. Also, whereas the other atomics > just constrain compilers wrt. reordering across atomic accesses or > changes to the atomic accesses themselves, the dependencies are new > requirements on pieces of otherwise non-synchronizing code. The latt= er > seems far more involved to me. Well, the wording of 1.10p9 is pretty explicit on this point. There are only a few exceptions to the rule that dependencies from memory_order_consume loads must be tracked. And to your point about requirements being placed on pieces of otherwise non-synchronizing code= , we already have that with plain old load acquire and store release -- both of these put ordering constraints that affect the surrounding non-synchronizing code. This issue got a lot of discussion, and the compromise is that dependencies cannot leak into or out of functions unless the relevant parameters or return values are annotated with [[carries_dependency]]. This means that the compiler can see all the places where dependencies must be tracked. This is described in 7.6.4. If a dependency chain headed by a memory_order_consume load goes into or out of a function without the aid of the [[carries_dependency]] attribute, the compiler needs to do something else to enforce ordering, e.g., emit a memory barrier. =46rom a Linux-kernel viewpoint, this is a bit ugly, as it requires annotations and use of kill_dependency, but it was the best I could do at the time. If things go as they usually do, there will be some other reason why those are needed... > > BTW, there is some discussion on 1.10p9's handling of && and ||, an= d > > that clause is likely to change. And yes, I am behind on analyzing > > usage in the Linux kernel to find out if Linux cares... >=20 > Do you have any pointers to these discussions (e.g., LWG issues)? Nope, just a bare email thread. I would guess that it will come up next week. The question is whether dependencies should be carried through && or || at all, and if so how. My current guess is that && and || should not carry dependencies. > > > > And rcu_dereference() will need per-arch overrides for some tim= e during > > > > any transition to memory_order_consume. > > > >=20 > > > > > Trying to introduce system concepts (writes to devices, inter= rupts, > > > > > non-coherent agents) into this mess is going to be an uphill = battle IMHO. I'd > > > > > just rather stick to the semantics we have and the asm volati= le barriers. > > > >=20 > > > > And barrier() isn't going to go away any time soon, either. An= d > > > > ACCESS_ONCE() needs to keep volatile semantics until there is s= ome > > > > memory_order_whatever that prevents loads and stores from being= coalesced. > > >=20 > > > I'd be happy to discuss something like this in ISO C++ SG1 (or ha= s this > > > been discussed in the past already?). But it needs to have a pap= er I > > > suppose. > >=20 > > The current position of the usual suspects other than me is that th= is > > falls into the category of forward-progress guarantees, which are > > considers (again, by the usual suspects other than me) to be out > > of scope. >=20 > But I think we need to better describe forward progress, even though > that might be tricky. We made at least some progress on > http://cplusplus.github.io/LWG/lwg-active.html#2159 in Chicago, even > though we can't constrain the OS schedulers too much, and for lock-fr= ee > we're in this weird position that on most general-purpose schedulers = and > machines, obstruction-free algorithms are likely to work just fine li= ke > lock-free, most of the time, in practice... Yep, there is a draft paper by Alistarh et al. making this point. They could go quite a bit further. With a reasonably short set of additional constraints, you can get bounded execution times out of locking as well. They were not amused when I suggested this, Bjoern Brandenberg's dissertation notwithstanding. ;-) > We also need to discuss forward progress guarantees for any > parallelism/concurrency abstractions, I believe: > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3874.pdf >=20 > Hopefully we'll get some more acceptance of this being in scope... That would be good. Just in case C11 is to be applicable to real-time software. > > > Will you be in Issaquah for the C++ meeting next week? > >=20 > > Weather permitting, I will be there! >=20 > Great, maybe we can find some time in SG1 to discuss this then. Even= if > the standard doesn't want to include it, SG1 should be a good forum t= o > understand everyone's concerns around that, with the hope that this > would help potential non-standard extensions to be still checked by t= he > same folks that did the rest of the memory model. Sounds good! Hopefully some discussion of out-of-thin-air values as we= ll. Thanx, Paul