All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Ingo Molnar <mingo@kernel.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will.deacon@arm.com>,
	Waiman Long <waiman.long@hp.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	stable@vger.kernel.org
Subject: Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier
Date: Wed, 14 Oct 2015 20:07:56 -0700	[thread overview]
Message-ID: <20151015030756.GE3910@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151015012226.GC14305@fixme-laptop.cn.ibm.com>

On Thu, Oct 15, 2015 at 09:22:26AM +0800, Boqun Feng wrote:
> On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> > On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> > > On Wed, Oct 14, 2015 at 11:04:19PM +0200, Peter Zijlstra wrote:
> > > > On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > > > > Suppose we have something like the following, where "a" and "x" are both
> > > > > initially zero:
> > > > > 
> > > > > 	CPU 0				CPU 1
> > > > > 	-----				-----
> > > > > 
> > > > > 	WRITE_ONCE(x, 1);		WRITE_ONCE(a, 2);
> > > > > 	r3 = xchg(&a, 1);		smp_mb();
> > > > > 					r3 = READ_ONCE(x);
> > > > > 
> > > > > If xchg() is fully ordered, we should never observe both CPUs'
> > > > > r3 values being zero, correct?
> > > > > 
> > > > > And wouldn't this be represented by the following litmus test?
> > > > > 
> > > > > 	PPC SB+lwsync-RMW2-lwsync+st-sync-leading
> > > > > 	""
> > > > > 	{
> > > > > 	0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
> > > > > 	1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
> > > > > 	}
> > > > > 	 P0                 | P1                 ;
> > > > > 	 stw r1,0(r2)       | stw r1,0(r12)      ;
> > > > > 	 lwsync             | sync               ;
> > > > > 	 lwarx  r11,r10,r12 | lwz r3,0(r2)       ;
> > > > > 	 stwcx. r1,r10,r12  | ;
> > > > > 	 bne Fail0          | ;
> > > > > 	 mr r3,r11          | ;
> > > > > 	 Fail0:             | ;
> > > > > 	exists
> > > > > 	(0:r3=0 /\ a=2 /\ 1:r3=0)
> > > > > 
> > > > > I left off P0's trailing sync because there is nothing for it to order
> > > > > against in this particular litmus test.  I tried adding it and verified
> > > > > that it has no effect.
> > > > > 
> > > > > Am I missing something here?  If not, it seems to me that you need
> > > > > the leading lwsync to instead be a sync.
> > 
> > I'm afraid more than that, the above litmus also shows that
> 
> I mean there will be more things we need to fix, perhaps even smp_wmb()
> need to be sync then..

That should not be necessary.  For smp_wmb(), lwsync should be just fine.

								Thanx, Paul

> Regards,
> Boqun
> 
> > 	CPU 0				CPU 1
> > 	-----				-----
> > 
> > 	WRITE_ONCE(x, 1);		WRITE_ONCE(a, 2);
> > 	r3 = xchg_release(&a, 1);	smp_mb();
> > 					r3 = READ_ONCE(x);
> > 
> > 	(0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted
> > 
> > in the implementation of this patchset, which should be disallowed by
> > the semantics of RELEASE, right?
> > 
> > And even:
> > 
> > 	CPU 0				CPU 1
> > 	-----				-----
> > 
> > 	WRITE_ONCE(x, 1);		WRITE_ONCE(a, 2);
> > 	smp_store_release(&a, 1);	smp_mb();
> > 					r3 = READ_ONCE(x);
> > 
> > 	(1:r3 == 0 && a == 2) is not prohibitted
> > 
> > shows by:
> > 
> > 	PPC weird-lwsync
> > 	""
> > 	{
> > 	0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
> > 	1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
> > 	}
> > 	 P0                 | P1                 ;
> > 	 stw r1,0(r2)       | stw r1,0(r12)      ;
> > 	 lwsync             | sync               ;
> > 	 stw  r1,0(r12)	    | lwz r3,0(r2)       ;
> > 	exists
> > 	(a=2 /\ 1:r3=0)
> > 
> > 
> > Please find something I'm (or the tool is) missing, maybe we can't use
> > (a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
> > 0?
> > 
> > And there is really something I find strange, see below.
> > 
> > > > 
> > > > So the scenario that would fail would be this one, right?
> > > > 
> > > > a = x = 0
> > > > 
> > > > 	CPU0				CPU1
> > > > 
> > > > 	r3 = load_locked (&a);
> > > > 					a = 2;
> > > > 					sync();
> > > > 					r3 = x;
> > > > 	x = 1;
> > > > 	lwsync();
> > > > 	if (!store_cond(&a, 1))
> > > > 		goto again
> > > > 
> > > > 
> > > > Where we hoist the load way up because lwsync allows this.
> > > 
> > > That scenario would end up with a==1 rather than a==2.
> > > 
> > > > I always thought this would fail because CPU1's store to @a would fail
> > > > the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> > > > load and now seeing the new value (2).
> > > 
> > > The stwcx. failure was one thing that prevented a number of other
> > > misordering cases.  The problem is that we have to let go of the notion
> > > of an implicit global clock.
> > > 
> > > To that end, the herd tool can make a diagram of what it thought
> > > happened, and I have attached it.  I used this diagram to try and force
> > > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > > and succeeded.  Here is the sequence of events:
> > > 
> > > o	Commit P0's write.  The model offers to propagate this write
> > > 	to the coherence point and to P1, but don't do so yet.
> > > 
> > > o	Commit P1's write.  Similar offers, but don't take them up yet.
> > > 
> > > o	Commit P0's lwsync.
> > > 
> > > o	Execute P0's lwarx, which reads a=0.  Then commit it.
> > > 
> > > o	Commit P0's stwcx. as successful.  This stores a=1.
> > > 
> > > o	Commit P0's branch (not taken).
> > > 
> > 
> > So at this point, P0's write to 'a' has propagated to P1, right? But
> > P0's write to 'x' hasn't, even there is a lwsync between them, right?
> > Doesn't the lwsync prevent this from happening?
> > 
> > If at this point P0's write to 'a' hasn't propagated then when?
> > 
> > Regards,
> > Boqun
> > 
> > > o	Commit P0's final register-to-register move.
> > > 
> > > o	Commit P1's sync instruction.
> > > 
> > > o	There is now nothing that can happen in either processor.
> > > 	P0 is done, and P1 is waiting for its sync.  Therefore,
> > > 	propagate P1's a=2 write to the coherence point and to
> > > 	the other thread.
> > > 
> > > o	There is still nothing that can happen in either processor.
> > > 	So pick the barrier propagate, then the acknowledge sync.
> > > 
> > > o	P1 can now execute its read from x.  Because P0's write to
> > > 	x is still waiting to propagate to P1, this still reads
> > > 	x=0.  Execute and commit, and we now have both r3 registers
> > > 	equal to zero and the final value a=2.
> > > 
> > > o	Clean up by propagating the write to x everywhere, and
> > > 	propagating the lwsync.
> > > 
> > > And the "exists" clause really does trigger: 0:r3=0; 1:r3=0; [a]=2;
> > > 
> > > I am still not 100% confident of my litmus test.  It is quite possible
> > > that I lost something in translation, but that is looking less likely.
> > > 
> > > > > Of course, if I am not missing something, then this applies also to the
> > > > > value-returning RMW atomic operations that you pulled this pattern from.
> > > > > If so, it would seem that I didn't think through all the possibilities
> > > > > back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
> > > > > that I worried about the RMW atomic operation acting as a barrier,
> > > > > but not as the load/store itself.  :-/
> > > > 
> > > > AARGH64 does something very similar; it does something like:
> > > > 
> > > > 	ll
> > > > 	...
> > > > 	sc-release
> > > > 
> > > > 	mb
> > > > 
> > > > Which I assumed worked for the same reason, any change to the variable
> > > > would fail the sc, and we go for round 2, now observing the new value.
> > > 
> > > I have to defer to Will on this one.  You are right that ARM and PowerPC
> > > do have similar memory models, but there are some differences.
> > > 
> > > 							Thanx, Paul
> > 
> > 

  reply	other threads:[~2015-10-15  3:08 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-14 15:55 [PATCH tip/locking/core v4 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier Boqun Feng
2015-10-14 20:19   ` Paul E. McKenney
2015-10-14 21:04     ` Peter Zijlstra
2015-10-14 21:44       ` Paul E. McKenney
2015-10-15  0:53         ` Boqun Feng
2015-10-15  1:22           ` Boqun Feng
2015-10-15  3:07             ` Paul E. McKenney [this message]
2015-10-15  3:07           ` Paul E. McKenney
2015-10-15  4:48             ` Boqun Feng
2015-10-15 16:30               ` Paul E. McKenney
2015-10-19  0:19                 ` Boqun Feng
2015-10-15  3:11           ` Boqun Feng
2015-10-15  3:33             ` Paul E. McKenney
2015-10-15 10:35         ` Will Deacon
2015-10-15 14:40           ` Boqun Feng
2015-10-15 14:50           ` Will Deacon
2015-10-15 16:29             ` Paul E. McKenney
2015-10-15 15:42           ` Paul E. McKenney
2015-10-15 14:49     ` Boqun Feng
2015-10-15 16:16       ` Paul E. McKenney
2015-10-20  7:15     ` Boqun Feng
2015-10-20  9:21       ` Peter Zijlstra
2015-10-20 21:28         ` Paul E. McKenney
2015-10-21  8:18           ` Peter Zijlstra
2015-10-21 19:36             ` Paul E. McKenney
2015-10-26  2:06               ` Boqun Feng
2015-10-26  2:20               ` Michael Ellerman
2015-10-26  8:55                 ` Boqun Feng
2015-10-26  3:20             ` Paul Mackerras
2015-10-26  8:58               ` Boqun Feng
2015-10-21  8:45           ` Boqun Feng
2015-10-21 19:35             ` Paul E. McKenney
2015-10-21 19:48               ` Peter Zijlstra
2015-10-22 12:07                 ` Boqun Feng
2015-10-24 10:26                   ` Peter Zijlstra
2015-10-24 11:53                     ` Boqun Feng
2015-10-25 13:14                       ` Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 2/6] atomics: Add test for atomic operations with _relaxed variants Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 3/6] atomics: Allow architectures to define their own __atomic_op_* helpers Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 4/6] powerpc: atomic: Implement atomic{, 64}_*_return_* variants Boqun Feng
2015-10-14 15:55   ` [PATCH tip/locking/core v4 4/6] powerpc: atomic: Implement atomic{,64}_*_return_* variants Boqun Feng
2015-10-14 15:56 ` [PATCH tip/locking/core v4 5/6] powerpc: atomic: Implement xchg_* and atomic{, 64}_xchg_* variants Boqun Feng
2015-10-14 15:56   ` [PATCH tip/locking/core v4 5/6] powerpc: atomic: Implement xchg_* and atomic{,64}_xchg_* variants Boqun Feng
2015-10-14 15:56 ` [PATCH tip/locking/core v4 6/6] powerpc: atomic: Implement cmpxchg{, 64}_* and atomic{, 64}_cmpxchg_* variants Boqun Feng
2015-10-14 15:56   ` [PATCH tip/locking/core v4 6/6] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants Boqun Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151015030756.GE3910@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=boqun.feng@gmail.com \
    --cc=dave@stgolabs.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=waiman.long@hp.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.