From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
Ingo Molnar <mingo@kernel.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will.deacon@arm.com>,
Waiman Long <waiman.long@hp.com>,
Davidlohr Bueso <dave@stgolabs.net>,
stable@vger.kernel.org
Subject: Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier
Date: Thu, 15 Oct 2015 09:30:40 -0700 [thread overview]
Message-ID: <20151015163040.GJ3910@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151015044803.GC29432@fixme-laptop.cn.ibm.com>
On Thu, Oct 15, 2015 at 12:48:03PM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 08:07:05PM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> [snip]
> > >
> > > I'm afraid more than that, the above litmus also shows that
> > >
> > > CPU 0 CPU 1
> > > ----- -----
> > >
> > > WRITE_ONCE(x, 1); WRITE_ONCE(a, 2);
> > > r3 = xchg_release(&a, 1); smp_mb();
> > > r3 = READ_ONCE(x);
> > >
> > > (0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted
> > >
> > > in the implementation of this patchset, which should be disallowed by
> > > the semantics of RELEASE, right?
> >
> > Not necessarily. If you had the read first on CPU 1, and you had a
> > similar problem, I would be more worried.
> >
>
> Sometimes I think maybe we should say that a single unpaired ACQUIRE or
> RELEASE doesn't have any order guarantee because of the above case.
>
> But seems that's not a normal or even existing case, my bad ;-(
>
> > > And even:
> > >
> > > CPU 0 CPU 1
> > > ----- -----
> > >
> > > WRITE_ONCE(x, 1); WRITE_ONCE(a, 2);
> > > smp_store_release(&a, 1); smp_mb();
> > > r3 = READ_ONCE(x);
> > >
> > > (1:r3 == 0 && a == 2) is not prohibitted
> > >
> > > shows by:
> > >
> > > PPC weird-lwsync
> > > ""
> > > {
> > > 0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
> > > 1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
> > > }
> > > P0 | P1 ;
> > > stw r1,0(r2) | stw r1,0(r12) ;
> > > lwsync | sync ;
> > > stw r1,0(r12) | lwz r3,0(r2) ;
> > > exists
> > > (a=2 /\ 1:r3=0)
> > >
> > > Please find something I'm (or the tool is) missing, maybe we can't use
> > > (a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
> > > 0?
> >
> > Again, if you were pairing the smp_store_release() with an smp_load_acquire()
> > or even a READ_ONCE() followed by a barrier, I would be quite concerned.
> > I am not at all worried about the above two litmus tests.
> >
>
> Understood, thank you for think through that ;-)
>
> > > And there is really something I find strange, see below.
> > >
> > > > >
> > > > > So the scenario that would fail would be this one, right?
> > > > >
> > > > > a = x = 0
> > > > >
> > > > > CPU0 CPU1
> > > > >
> > > > > r3 = load_locked (&a);
> > > > > a = 2;
> > > > > sync();
> > > > > r3 = x;
> > > > > x = 1;
> > > > > lwsync();
> > > > > if (!store_cond(&a, 1))
> > > > > goto again
> > > > >
> > > > >
> > > > > Where we hoist the load way up because lwsync allows this.
> > > >
> > > > That scenario would end up with a==1 rather than a==2.
> > > >
> > > > > I always thought this would fail because CPU1's store to @a would fail
> > > > > the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> > > > > load and now seeing the new value (2).
> > > >
> > > > The stwcx. failure was one thing that prevented a number of other
> > > > misordering cases. The problem is that we have to let go of the notion
> > > > of an implicit global clock.
> > > >
> > > > To that end, the herd tool can make a diagram of what it thought
> > > > happened, and I have attached it. I used this diagram to try and force
> > > > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > > > and succeeded. Here is the sequence of events:
> > > >
> > > > o Commit P0's write. The model offers to propagate this write
> > > > to the coherence point and to P1, but don't do so yet.
> > > >
> > > > o Commit P1's write. Similar offers, but don't take them up yet.
> > > >
> > > > o Commit P0's lwsync.
> > > >
> > > > o Execute P0's lwarx, which reads a=0. Then commit it.
> > > >
> > > > o Commit P0's stwcx. as successful. This stores a=1.
> > > >
> > > > o Commit P0's branch (not taken).
> > >
> > > So at this point, P0's write to 'a' has propagated to P1, right? But
> > > P0's write to 'x' hasn't, even there is a lwsync between them, right?
> > > Doesn't the lwsync prevent this from happening?
> >
> > No, because lwsync is quite a bit weaker than sync aside from just
> > the store-load ordering.
> >
>
> Understood, I've tried the ppcmem, much clear now ;-)
>
> > > If at this point P0's write to 'a' hasn't propagated then when?
> >
> > Later. At the very end of the test, in this case. ;-)
> >
>
> Hmm.. I tried exactly this sequence in ppcmem, seems propagation of P0's
> write to 'a' is never an option...
>
> > Why not try creating a longer litmus test that requires P0's write to
> > "a" to propagate to P1 before both processes complete?
> >
>
> I will try to write one, but to be clear, you mean we still observe
>
> 0:r3 == 0 && a == 2 && 1:r3 == 0
>
> at the end, right? Because I understand that if P1's write to 'a'
> doesn't override P0's, P0's write to 'a' will propagate.
Your choice. My question is whether you can come up with a similar
litmus test where lwsync is allowing the behavior here, but clearly
is affecting some other aspect of ordering.
Thanx, Paul
next prev parent reply other threads:[~2015-10-15 16:30 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-14 15:55 [PATCH tip/locking/core v4 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier Boqun Feng
2015-10-14 20:19 ` Paul E. McKenney
2015-10-14 21:04 ` Peter Zijlstra
2015-10-14 21:44 ` Paul E. McKenney
2015-10-15 0:53 ` Boqun Feng
2015-10-15 1:22 ` Boqun Feng
2015-10-15 3:07 ` Paul E. McKenney
2015-10-15 3:07 ` Paul E. McKenney
2015-10-15 4:48 ` Boqun Feng
2015-10-15 16:30 ` Paul E. McKenney [this message]
2015-10-19 0:19 ` Boqun Feng
2015-10-15 3:11 ` Boqun Feng
2015-10-15 3:33 ` Paul E. McKenney
2015-10-15 10:35 ` Will Deacon
2015-10-15 14:40 ` Boqun Feng
2015-10-15 14:50 ` Will Deacon
2015-10-15 16:29 ` Paul E. McKenney
2015-10-15 15:42 ` Paul E. McKenney
2015-10-15 14:49 ` Boqun Feng
2015-10-15 16:16 ` Paul E. McKenney
2015-10-20 7:15 ` Boqun Feng
2015-10-20 9:21 ` Peter Zijlstra
2015-10-20 21:28 ` Paul E. McKenney
2015-10-21 8:18 ` Peter Zijlstra
2015-10-21 19:36 ` Paul E. McKenney
2015-10-26 2:06 ` Boqun Feng
2015-10-26 2:20 ` Michael Ellerman
2015-10-26 8:55 ` Boqun Feng
2015-10-26 3:20 ` Paul Mackerras
2015-10-26 8:58 ` Boqun Feng
2015-10-21 8:45 ` Boqun Feng
2015-10-21 19:35 ` Paul E. McKenney
2015-10-21 19:48 ` Peter Zijlstra
2015-10-22 12:07 ` Boqun Feng
2015-10-24 10:26 ` Peter Zijlstra
2015-10-24 11:53 ` Boqun Feng
2015-10-25 13:14 ` Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 2/6] atomics: Add test for atomic operations with _relaxed variants Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 3/6] atomics: Allow architectures to define their own __atomic_op_* helpers Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 4/6] powerpc: atomic: Implement atomic{, 64}_*_return_* variants Boqun Feng
2015-10-14 15:55 ` [PATCH tip/locking/core v4 4/6] powerpc: atomic: Implement atomic{,64}_*_return_* variants Boqun Feng
2015-10-14 15:56 ` [PATCH tip/locking/core v4 5/6] powerpc: atomic: Implement xchg_* and atomic{, 64}_xchg_* variants Boqun Feng
2015-10-14 15:56 ` [PATCH tip/locking/core v4 5/6] powerpc: atomic: Implement xchg_* and atomic{,64}_xchg_* variants Boqun Feng
2015-10-14 15:56 ` [PATCH tip/locking/core v4 6/6] powerpc: atomic: Implement cmpxchg{, 64}_* and atomic{, 64}_cmpxchg_* variants Boqun Feng
2015-10-14 15:56 ` [PATCH tip/locking/core v4 6/6] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants Boqun Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151015163040.GJ3910@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=boqun.feng@gmail.com \
--cc=dave@stgolabs.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=waiman.long@hp.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.