From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933271AbcI3MOO (ORCPT ); Fri, 30 Sep 2016 08:14:14 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:33098 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932996AbcI3MOG (ORCPT ); Fri, 30 Sep 2016 08:14:06 -0400 Date: Fri, 30 Sep 2016 05:14:03 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Will Deacon , linux-kernel@vger.kernel.org, mingo@kernel.org, dhowells@redhat.com, stern@rowland.harvard.edu Subject: Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability Reply-To: paulmck@linux.vnet.ibm.com References: <20160929155817.GB5016@twins.programming.kicks-ass.net> <20160929160307.GT13862@arm.com> <20160929164353.GX14933@linux.vnet.ibm.com> <20160929171036.GV13862@arm.com> <20160929172322.GZ14933@linux.vnet.ibm.com> <20160929180444.GA22882@linux.vnet.ibm.com> <20160929181015.GB22882@linux.vnet.ibm.com> <20160929184439.GD5016@twins.programming.kicks-ass.net> <20160929191858.GD14933@linux.vnet.ibm.com> <20160930095738.GG5016@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160930095738.GG5016@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16093012-0024-0000-0000-000014AF9CE3 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00005830; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000186; SDB=6.00763063; UDB=6.00363823; IPR=6.00538217; BA=6.00004773; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012830; XFM=3.00000011; UTC=2016-09-30 12:14:03 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16093012-0025-0000-0000-000044EED76D Message-Id: <20160930121403.GO14933@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-09-30_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609280000 definitions=main-1609300222 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 30, 2016 at 11:57:38AM +0200, Peter Zijlstra wrote: > On Thu, Sep 29, 2016 at 12:18:58PM -0700, Paul E. McKenney wrote: > > On Thu, Sep 29, 2016 at 08:44:39PM +0200, Peter Zijlstra wrote: > > > > How about something like so on PPC? > > > > > > P0(int *x, int *y) > > > { > > > WRITE_ONCE(*x, 1); > > > smp_store_release(y, 1); > > > } > > > > > > P1(int *x, int *y) > > > { > > > WRITE_ONCE(x, 2); > > > > Need "WRITE_ONCE(*x, 2)" here. > > > > > smp_store_release(y, 2); > > > } > > > > > > P2(int *x, int *y) > > > { > > > r1 = smp_load_acquire(y); > > > r2 = READ_ONCE(*x); > > > } > > > > > > (((x==1 && y==2) | (x==2 && y==1)) && (r1==1 || r1==2) && r2==0) > > > > That exists-clause is quite dazzling... So if each of P0 and P1 > > win, but on different stores, and if P2 follows one or the other > > of P0 or P1, can r2 get the pre-initialization value for x? > > > > > If you execute P0 and P1 concurrently and one store of each 'wins' the > > > LWSYNC of either is null and void, and therefore P2 is unordered and can > > > observe r2==0. > > > > That vaguely resembles the infamous Z6.3, but only vaguely. The Linux-kernel > > memory model says "forbidden" to this: > > https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc710.html > > That one, right? That is the one! Prohibiting the cycle requires smp_mb() on both threads 0 and 1 on the one hand, or on both threads 0 and 2 on the other hand. > Hmm, I seem to remember something else.. /me goes poke through history > and comes up with: > > https://lkml.kernel.org/r/20160115215853.GC3818@linux.vnet.ibm.com > > So what was that about then? I remember it being a completely > nonsensical case, but a weird one. That was an Alan Stern example where PowerPC prohibits the outcome, and the ppcmem tool agrees, but where herd does not. (ppcmem is giving the correct answer.) But this is specific to PowerPC. I would not advise writing code that relies on this one. ;-) > > So let's try PPCMEM. If PPCMEM allows it, then the kernel model is > > clearly broken. > > > > PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus > > { > > 0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y; > > 1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y; > > 2:r3=x; 2:r4=y; > > } > > P0 | P1 | P2 ; > > stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ; > > lwsync | lwsync | lwsync ; > > stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ; > > exists > > (((x=1 /\ y=2) \/ (x=2 /\ y=1)) /\ (2:r1=1 \/ 2:r1=2) /\ 2:r2=0) > > > Or did I incorrectly translate your litmus test? > > Looks about right. > > Still not seeing how that is prohibited though. My reasoning is as > follows: > > - P0 and P1 both store to x, one looses (say P0). Effectively only P1 > does a store. > > - P0 and P1 both store to y, one looses (say P1). Effectively only P0 > does a store. PowerPC does not "obscure" stores, so both stores really are there and the lwsync really has effect on all CPUs. From what I understand, even CPUs that do obscure stores only do so in the case of repeated stores by the same CPU to the same variable, and the above litmus test doesn't have this. So all the stores happen, and each CPU's stores are at least locally ordered. > - P2 reads y, sees the value from P0. Fair enough! > - P2 does lwsync, which constraints P2 to not issue the load of x > before this. It also forms a (local) sync-point with P0 for having > seen its store or y. > > - P2 reads x, sees the initial value because the store from P1 hasn't > been propagated yet. > > It will not see the store P0 did to x, since that didn't happen. Well, it saw the store to y, so it absolutely must see one of the other of the stores to x in this particular litmus test. If P1's store overwrote P0's store, then P2 has to see P1's store, for example. > Assuming I'm wrong on that last part, is then the following possible? > > (x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1) > > Where we see a store that didn't happen? Again, both stores really did happen. ;-) So with this litmus test: PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus { 0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y; 1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y; 2:r3=x; 2:r4=y; } P0 | P1 | P2 ; stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ; lwsync | lwsync | lwsync ; stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ; exists (x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1) The herd tool says: Test PeterZijlstra+o-r+o-r+a-o-SB Allowed States 24 2:r1=0; 2:r2=0; x=1; y=1; 2:r1=0; 2:r2=0; x=1; y=2; 2:r1=0; 2:r2=0; x=2; y=1; 2:r1=0; 2:r2=0; x=2; y=2; 2:r1=0; 2:r2=1; x=1; y=1; 2:r1=0; 2:r2=1; x=1; y=2; 2:r1=0; 2:r2=1; x=2; y=1; 2:r1=0; 2:r2=1; x=2; y=2; 2:r1=0; 2:r2=2; x=1; y=1; 2:r1=0; 2:r2=2; x=1; y=2; 2:r1=0; 2:r2=2; x=2; y=1; 2:r1=0; 2:r2=2; x=2; y=2; 2:r1=1; 2:r2=1; x=1; y=1; 2:r1=1; 2:r2=1; x=1; y=2; 2:r1=1; 2:r2=1; x=2; y=1; 2:r1=1; 2:r2=1; x=2; y=2; 2:r1=1; 2:r2=2; x=2; y=1; 2:r1=1; 2:r2=2; x=2; y=2; 2:r1=2; 2:r2=1; x=1; y=1; 2:r1=2; 2:r2=1; x=1; y=2; 2:r1=2; 2:r2=2; x=1; y=1; 2:r1=2; 2:r2=2; x=1; y=2; 2:r1=2; 2:r2=2; x=2; y=1; 2:r1=2; 2:r2=2; x=2; y=2; Ok Witnesses Positive: 1 Negative: 23 Condition exists (x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1) Observation PeterZijlstra+o-r+o-r+a-o-SB Sometimes 1 23 Hash=c32afd1ac8bfee7d4b23a27e783d0998 So herd believes that it can happen. I was also able to force the web ppcmem tool (https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html) to get into this state with the following sequence of choices: [0;5;4;0;0;4;4;2;2;2;4;4;1;1;1;7;6;0;0;3;4;0;1;2;5;0;0] So, yes, this can happen, architecturally at least. Thanx, Paul