From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [RFC][PATCH 0/5] arch: atomic rework Date: Mon, 24 Feb 2014 09:40:37 -0800 Message-ID: <20140224174037.GQ8264@linux.vnet.ibm.com> References: <20140220181116.GT4250@linux.vnet.ibm.com> <20140220185608.GX4250@linux.vnet.ibm.com> <20140220221027.GC4250@linux.vnet.ibm.com> <20140221191318.GK4250@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from e33.co.us.ibm.com ([32.97.110.151]:54806 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752456AbaBXRkm (ORCPT ); Mon, 24 Feb 2014 12:40:42 -0500 Received: from /spool/local by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 24 Feb 2014 10:40:42 -0700 Content-Disposition: inline In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-ID: To: Michael Matz Cc: Linus Torvalds , Torvald Riegel , Will Deacon , Peter Zijlstra , Ramana Radhakrishnan , David Howells , "linux-arch@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "akpm@linux-foundation.org" , "mingo@kernel.org" , "gcc@gcc.gnu.org" On Mon, Feb 24, 2014 at 02:55:07PM +0100, Michael Matz wrote: > Hi, >=20 > On Fri, 21 Feb 2014, Paul E. McKenney wrote: >=20 > > > And with conservative I mean "everything is a source of a depende= ncy, and=20 > > > hence can't be removed, reordered or otherwise fiddled with", and= that=20 > > > includes code sequences where no atomic objects are anywhere in s= ight [1]. > > > In the light of that the only realistic way (meaning to not have = to=20 > > > disable optimization everywhere) to implement consume as currentl= y=20 > > > specified is to map it to acquire. At which point it becomes poi= ntless. > >=20 > > No, only memory_order_consume loads and [[carries_dependency]] > > function arguments are sources of dependency chains. >=20 > I don't see [[carries_dependency]] in the C11 final draft (yeah, shou= ld=20 > get a real copy, I know, but let's assume it's the same language as t= he=20 > standard). Therefore, yes, only consume loads are sources of=20 > dependencies. The problem with the definition of the "carries a=20 > dependency" relation is not the sources, but rather where it stops. =20 > It's transitively closed over "value of evaluation A is used as opera= nd in=20 > evaluation B", with very few exceptions as per 5.1.2.4#14. Evaluatio= ns=20 > can contain function calls, so if there's _any_ chance that an operan= d of=20 > an evaluation might even indirectly use something resulting from a co= nsume=20 > load then that evaluation must be compiled in a way to not break=20 > dependency chains. >=20 > I don't see a way to generally assume that e.g. the value of a functi= on=20 > argument can impossibly result from a consume load, therefore the com= piler=20 > must assume that all function arguments _can_ result from such loads,= and=20 > so must disable all depchain breaking optimization (which are many). >=20 > > > [1] Simple example of what type of transformations would be disal= lowed: > > >=20 > > > int getzero (int i) { return i - i; } > >=20 > > This needs to be as follows: > >=20 > > [[carries_dependency]] int getzero(int i [[carries_dependency]]) > > { > > return i - i; > > } > >=20 > > Otherwise dependencies won't get carried through it. >=20 > So, with the above do you agree that in absense of any other magic (s= ee=20 > below) the compiler is not allowed to transform my initial getzero()=20 > (without the carries_dependency markers) implementation into "return = 0;"=20 > because of the C11 rules for "carries-a-dependency"? >=20 > If so, do you then also agree that the specification of "carries a=20 > dependency" is somewhat, err, shall we say, overbroad? =46rom what I can see, overbroad. The problem is that the C++11 standa= rd defines how carries-dependency interacts with function calls and return= s in 7.6.4, which describes the [[carries_dependency]] attribute. For ex= ample, 7.6.4p6 says: Function g=E2=80=99s second parameter has a carries_dependency attribute, but its first parameter does not. Therefore, function h=E2=80=99s first call to g carries a dependency into g, but its secon= d call does not. The implementation might need to insert a fence prior to the second call to g. When C11 declined to take attributes, they also left out the part sayin= g how carries-dependency interacts with functions. :-/ Might be fixed by now, checking up on it. One could argue that the bit about emitting fence instructions at function calls and returns is implied by the as-if rule even without this wording, but... > > > depchains don't matter, could _then_ optmize it to zero. But tha= t's=20 > > > insane, especially considering that it's hard to detect if a give= n context=20 > > > doesn't care for depchains, after all the depchain relation is co= nstructed=20 > > > exactly so that it bleeds into nearly everywhere. So we would mo= st of=20 > > > the time have to assume that the ultimate context will be depchai= n-aware=20 > > > and therefore disable many transformations. > >=20 > > Any function that does not contain a memory_order_consume load and = that=20 > > doesn't have any arguments marked [[carries_dependency]] can be=20 > > optimized just as before. >=20 > And as such marker doesn't exist we must conservatively assume that i= t's=20 > on _all_ parameters, so I'll stand by my claim. Or that you have to emit a fence instruction when a dependency chain enters or leaves a function in cases where all callers/calles are not visible to the compiler. My preference is that the ordering properties of a carries-dependency chain is implementation defined at the point that it enters or leaves a function without the marker, but others strongly disagreed. ;-) > > > Then inlining getzero would merely add another "# j.dep =3D i.dep= "=20 > > > relation, so depchains are still there but the value optimization= can=20 > > > happen before inlining. Having to do something like that I'd fin= d=20 > > > disgusting, and rather rewrite consume into acquire :) Or make t= he=20 > > > depchain relation somehow realistically implementable. > >=20 > > I was actually OK with arithmetic cancellation breaking the depende= ncy=20 > > chains. Others on the committee felt otherwise, and I figured that= (1)=20 > > I wouldn't be writing that kind of function anyway and (2) they kne= w=20 > > more about writing compilers than I. I would still be OK saying th= at=20 > > things like "i-i", "i*0", "i%1", "i&0", "i|~0" and so on just break= the=20 > > dependency chain. >=20 > Exactly. I can see the problem that people had with that, though. T= here=20 > are very many ways to write conceiled zeros (or generally neutral ele= ments=20 > of the function in question). My getzero() function is one (it could= e.g.=20 > be an assembler implementation). The allowance to break dependency c= hains=20 > would have to apply to such cancellation as well, and so can't simply= =20 > itemize all cases in which cancellation is allowed. Rather it would = have=20 > had to argue about something like "value dependency", ala "evaluation= B=20 > depends on A, if there exist at least two different values A1 and A2=20 > (results from A), for which evaluation B (with otherwise same operand= s)=20 > yields different values B1 and B2". And that was in fact one of the arguments used against me. ;-) > Alas, it doesn't, except if you want to understand the term "the valu= e of=20 > A is used as an operand of B" in that way. Even then you'd still hav= e the=20 > second case of the depchain definition, via intermediate not even ato= mic=20 > memory stores and loads to make two evaluations be ordered per=20 > carries-a-dependency. >=20 > And even that understanding of "is used" wouldn't be enough, because = there=20 > are cases where the cancellation happens in steps, and where it inter= acts=20 > with the third clause (transitiveness): Assume this: >=20 > a =3D something() // evaluation A > b =3D 1 - a // evaluation B > c =3D a - 1 + b // evaluation C >=20 > Now, clearly B depends on A. Also C depends on B (because with other= wise=20 > same operands changing just B also changes C), because of transitiven= ess C=20 > then also depends on A. But equally cleary C was just an elaborate w= ay to=20 > write "0", and so depends on nothing. The problem was of course that= A=20 > and B weren't independent when determining the dependencies of C. Bu= t=20 > allowing cancellation to break dependency chains would have to allow = for=20 > these cases as well. >=20 > So, now, that leaves us basically with depchains forcing us to disabl= e=20 > many useful transformation or finding some other magic. One would be= to=20 > just regard all consume loads as acquire loads and be done (and=20 > effectively remove the ill-advised "carries a dependency" relation fr= om=20 > consideration). >=20 > You say downthread that it'd also be possible to just emit barriers b= efore=20 > all function calls (I say "all" because the compiler will generally=20 > have applied some transformation that broke depchains if they existed= ). =20 > That seems to me to be a bigger hammer than just ignoring depchains a= nd=20 > emit acquires instead of consumes (because the latter changes only ex= actly=20 > where atomics are used, the former seems to me to have unbounded effe= ct). Yep, converting the acquire to a consume is a valid alternative to emitting a memory-barrier instruction prior to entering/exiting the function in question. > So, am still missing something or is my understanding of the=20 > carries-a-dependency relation correct and my conclusions are merely t= oo=20 > pessimistic? Given the definition as it is, I believe you understand it. Thanx, Paul