From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Ingo Molnar <mingo@elte.hu>,
Andrew Morton <akpm@linux-foundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Waiman Long <waiman.long@hp.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Alex Shi <alex.shi@linaro.org>, Andi Kleen <andi@firstfloor.org>,
Michel Lespinasse <walken@google.com>,
Davidlohr Bueso <davidlohr.bueso@hp.com>,
Matthew R Wilcox <matthew.r.wilcox@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Rik van Riel <riel@redhat.com>,
Peter Hurley <peter@hurleysoftware.com>,
Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
George Spelvin <linux@horizon.com>,
"H. Peter Anvin" <hpa@zytor.com>, Arnd Bergmann <arnd@arndb.de>,
Aswin Chandramouleeswaran <aswin@hp.com>,
Scott J Norton <scott.norton@hp.com>,
"Figo.zhang" <figo1802@gmail.com>
Subject: Re: [PATCH v6 4/5] MCS Lock: Barrier corrections
Date: Fri, 22 Nov 2013 10:26:32 -0800 [thread overview]
Message-ID: <20131122182632.GW4138@linux.vnet.ibm.com> (raw)
In-Reply-To: <20131122155835.GR3866@twins.programming.kicks-ass.net>
On Fri, Nov 22, 2013 at 04:58:35PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 21, 2013 at 02:18:59PM -0800, Paul E. McKenney wrote:
> > > > Let's apply the Intel manual to the earlier example:
> > > >
> > > > CPU 0 CPU 1 CPU 2
> > > > ----- ----- -----
> > > > x = 1; r1 = SLA(lock); y = 1;
> > > > SSR(lock, 1); r2 = y; smp_mb();
> > > > r3 = x;
> > > >
> > > > assert(!(r1 == 1 && r2 == 0 && r3 == 0));
> > > >
> > > > Let's try applying this to x86:
> > > >
> > > > o Stores from a given processor are ordered, so everyone
> > > > agrees that CPU 0's store to x happens before the store-release
> > > > to lock.
> > > >
> > > > o Reads from a given processor are ordered, so everyone agrees
> > > > that CPU 1's load from lock precedes its load from y.
> > > >
> > > > o Because r1 == 1, we know that CPU 0's store to lock happened
> > > > before CPU 1's load from lock.
> > > >
> > > > o Causality (AKA transitive visibility) means that everyone
> > > > agrees that CPU 0's store to x happens before CPU 1's load
> > > > from y. (This is a crucial point, so it would be good to
> > > > have confirmation/debunking from someone who understands
> > > > the x86 architecture better than I do.)
> > > >
> > > > o CPU 2's memory barrier prohibits CPU 2's store to y from
> > > > being reordered with its load from x.
> > > >
> > > > o Because r2 == 0, we know that CPU 1's load from y happened
> > > > before CPU 2's store to y.
> > > >
> > > > o At this point, it looks to me that (r1 == 1 && r2 == 0)
> > > > implies r3 == 1.
>
> Agreed, and I now fully appreciate the transitive point. I can't say if
> x86 does in fact do this, but I can agree that rules in the SDM support
> your logic.
OK, thank you for looking it over!
> > > > Sewell's model plays out as follows:
>
> I have problems with these rules, for instance:
>
> > > > o Rules 3 and 4 force CPU 0's writes to be seen in order.
>
> Nothing in those rules state the store buffer is a strict FIFO, it might
> be suggested by rule 4's use of 'oldest', but being a pendant the text
> as given doesn't disallow store reordering in the store buffer.
>
> Suppose an address a was written to two times, a store buffer might
> simply update the entry for the first write with the new value. The
> entry would still become oldest at some point and get flushed.
>
> (Note the above is ambiguous in if the entry's time stamp is updated or
> not -- also note that both cases violate TSO and therefore it doesn't
> matter.)
>
> That violates FIFO (and TSO) but not the definitions.
>
> Similarly its not at all clear from rule 2 that reads are not
> re-ordered.
>
> So I'll ignore this section for now.
>
>
> OK, so reading back a little he does describe the abstract machine and
> does say the store buffer is FIFO, but what use are rules if you first
> need more 'rules'.
>
> Rules should be self supporting.
Better than the older versions, but yes, could be better. The key point
is that if a store updates an existing store buffer, that buffer becomes
the youngest. It would be interesting to see what a continuous series
of stores to the same variable from different CPUs did to the hardware.
By this set of rules, the hardware would be within its rights to never
propagate any of the stored values to memory. ;-)
> > > I _think_ so.. but its late. I'm also struggling to find where lwsync
> > > goes bad in this story, because as you say lwsync does all except flush
> > > the store buffer, which sounds like TSO.. but clearly is not quite the
> > > same.
> > >
> > > For one TSO has multi-copy atomicity, whereas ARM/PPC do not have this.
> >
> > At least PPC lwsync does not have multi-copy atomicity.
>
> Right, which is why it lacks transitivity.
>
> > The heavier sync
> > instruction does. Last I checked, ARM did not have a direct replacment
> > for lwsync, though it does have something sort of like eieio.
> >
> > But yes, we were trying to use lwsync for both smp_load_acquire() and
> > smp_store_release(), which does not provide exactly the same guarantees
> > that TSO does. Though it does come close in many cases.
>
> Right, and this lack of transitivity is what kills it.
I am thinking that Linus's email and Ingo's follow-on says that the
powerpc version of smp_store_release() needs to be sync. Though to
be honest, it is x86 that has been causing me the most cognitive pain.
And to be even more honest, I must confess that I still don't understand
the nature of the complaint. Ah well, that is the next email to
reply to. ;-)
The real source of my cognitive pain is that here we have a sequence of
code that has neither atomic instructions or memory-barrier instructions,
but it looks like it still manages to act as a full memory barrier.
Still not quite sure I should trust it...
> > > The explanation of lwsync given in 3.3 of 'A Tutorial Introduction to
> > > the ARM and POWER Relaxed Memory Models'
> > >
> > > http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf
> > >
> > > Leaves me slightly puzzled as to the exact differences between the 2 WW
> > > variants.
> >
> > The WW at the top of the page is discussing the ARM "dmb" instruction
> > and the PPC "sync" instruction, both of which are full barriers, and
> > both of which can therefore be thought of as flushing the store buffer.
> >
> > In contrast, the WW at the bottom of the page is discussing the PPC
> > lwsync instruction, which most definitely is not guaranteed to flush
> > the write buffer.
>
> Right, my complaint was that from the two definitions given the factual
> difference in behaviour was not clear to me.
>
> I knew that upgrading the SSR's lwsync to sync would 'fix' the thing,
> and that is exactly the WW case, but given these two definitions I was
> at a loss to find the hole.
Well, it certainly shows that placing lwsync between each pair of memory
references does not bring powerpc all the way to TSO.
Thanx, Paul
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-11-22 18:26 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1384885312.git.tim.c.chen@linux.intel.com>
2013-11-20 1:37 ` [PATCH v6 0/5] MCS Lock: MCS lock code cleanup and optimizations Tim Chen
2013-11-20 10:19 ` Will Deacon
2013-11-20 12:50 ` Paul E. McKenney
2013-11-20 17:00 ` Will Deacon
2013-11-20 17:14 ` Paul E. McKenney
2013-11-20 17:00 ` Tim Chen
2013-11-20 17:16 ` Paul E. McKenney
2013-11-20 1:37 ` [PATCH v6 1/5] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
2013-11-20 1:37 ` [PATCH v6 2/5] MCS Lock: optimizations and extra comments Tim Chen
2013-11-20 1:37 ` [PATCH v6 3/5] MCS Lock: Move mcs_lock/unlock function into its own file Tim Chen
2013-11-20 1:37 ` [PATCH v6 4/5] MCS Lock: Barrier corrections Tim Chen
2013-11-20 15:31 ` Paul E. McKenney
2013-11-20 15:46 ` Will Deacon
2013-11-20 17:14 ` Paul E. McKenney
2013-11-20 18:43 ` Tim Chen
2013-11-20 19:06 ` Paul E. McKenney
2013-11-20 20:36 ` Tim Chen
2013-11-20 21:44 ` Paul E. McKenney
2013-11-20 23:51 ` Tim Chen
2013-11-21 4:53 ` Paul E. McKenney
2013-11-21 10:17 ` Will Deacon
2013-11-21 13:16 ` Paul E. McKenney
2013-11-21 10:45 ` Peter Zijlstra
2013-11-21 13:18 ` Paul E. McKenney
2013-11-21 22:27 ` Linus Torvalds
2013-11-21 22:52 ` Paul E. McKenney
2013-11-22 0:09 ` Linus Torvalds
2013-11-22 4:08 ` Paul E. McKenney
2013-11-22 4:25 ` Linus Torvalds
2013-11-22 6:23 ` Paul E. McKenney
2013-11-22 15:16 ` Ingo Molnar
2013-11-22 18:49 ` Paul E. McKenney
2013-11-22 19:06 ` Linus Torvalds
2013-11-22 20:06 ` Paul E. McKenney
2013-11-22 20:09 ` Linus Torvalds
2013-11-22 20:37 ` Paul E. McKenney
2013-11-22 21:01 ` Linus Torvalds
2013-11-22 21:52 ` Paul E. McKenney
2013-11-22 22:19 ` Linus Torvalds
2013-11-23 0:25 ` Paul E. McKenney
2013-11-23 0:42 ` Linus Torvalds
2013-11-23 1:36 ` Paul E. McKenney
2013-11-23 2:11 ` Linus Torvalds
2013-11-23 4:05 ` Paul E. McKenney
2013-11-23 11:24 ` Ingo Molnar
2013-11-23 17:06 ` Paul E. McKenney
2013-11-26 12:02 ` Ingo Molnar
2013-11-26 19:28 ` Paul E. McKenney
2013-11-23 20:21 ` Linus Torvalds
2013-11-23 20:39 ` Linus Torvalds
2013-11-25 12:09 ` Peter Zijlstra
2013-11-25 17:18 ` Will Deacon
2013-11-25 17:56 ` Paul E. McKenney
2013-11-25 17:54 ` Paul E. McKenney
2013-11-23 21:29 ` Peter Zijlstra
2013-11-23 22:24 ` Linus Torvalds
2013-11-25 17:53 ` Paul E. McKenney
2013-11-25 18:21 ` Peter Zijlstra
2013-11-21 11:03 ` Peter Zijlstra
2013-11-21 12:56 ` Peter Zijlstra
2013-11-21 13:20 ` Paul E. McKenney
2013-11-21 17:25 ` Paul E. McKenney
2013-11-21 21:52 ` Peter Zijlstra
2013-11-21 22:18 ` Paul E. McKenney
2013-11-22 15:58 ` Peter Zijlstra
2013-11-22 18:26 ` Paul E. McKenney [this message]
2013-11-22 18:51 ` Peter Zijlstra
2013-11-22 18:59 ` Paul E. McKenney
2013-11-25 17:35 ` Peter Zijlstra
2013-11-25 18:02 ` Paul E. McKenney
2013-11-25 18:24 ` Peter Zijlstra
2013-11-25 18:34 ` Tim Chen
2013-11-25 18:27 ` Peter Zijlstra
2013-11-25 23:52 ` Paul E. McKenney
2013-11-26 9:59 ` Peter Zijlstra
2013-11-26 17:11 ` Paul E. McKenney
2013-11-26 17:18 ` Peter Zijlstra
2013-11-26 19:00 ` Linus Torvalds
2013-11-26 19:20 ` Paul E. McKenney
2013-11-26 19:32 ` Linus Torvalds
2013-11-26 22:51 ` Paul E. McKenney
2013-11-26 23:58 ` Linus Torvalds
2013-11-27 0:21 ` Thomas Gleixner
2013-11-27 0:39 ` Paul E. McKenney
2013-11-27 1:05 ` Linus Torvalds
2013-11-27 1:31 ` Paul E. McKenney
2013-11-27 10:16 ` Will Deacon
2013-11-27 17:11 ` Paul E. McKenney
2013-11-28 11:40 ` Will Deacon
2013-11-28 17:38 ` Paul E. McKenney
2013-11-28 18:03 ` Will Deacon
2013-11-28 18:27 ` Paul E. McKenney
2013-11-28 18:53 ` Will Deacon
2013-11-28 19:50 ` Paul E. McKenney
2013-11-29 16:17 ` Will Deacon
2013-11-29 16:44 ` Linus Torvalds
2013-11-29 18:18 ` Will Deacon
2013-11-30 17:38 ` Paul E. McKenney
2013-11-26 19:21 ` Peter Zijlstra
2013-11-27 16:58 ` Oleg Nesterov
2013-11-26 23:08 ` Benjamin Herrenschmidt
2013-11-25 23:55 ` H. Peter Anvin
2013-11-26 3:16 ` Paul E. McKenney
2013-11-27 0:46 ` H. Peter Anvin
2013-11-27 1:07 ` Linus Torvalds
2013-11-27 1:27 ` Paul E. McKenney
2013-11-27 2:59 ` H. Peter Anvin
2013-11-25 18:52 ` H. Peter Anvin
2013-11-25 22:58 ` Tim Chen
2013-11-25 23:28 ` H. Peter Anvin
2013-11-25 23:51 ` Paul E. McKenney
2013-11-25 23:36 ` Paul E. McKenney
2013-12-04 21:26 ` Andi Kleen
2013-12-04 22:07 ` Paul E. McKenney
2013-11-21 13:19 ` Paul E. McKenney
2013-11-20 1:37 ` [PATCH v6 5/5] MCS Lock: Allows for architecture specific mcs lock and unlock Tim Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131122182632.GW4138@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linaro.org \
--cc=andi@firstfloor.org \
--cc=arnd@arndb.de \
--cc=aswin@hp.com \
--cc=dave.hansen@intel.com \
--cc=davidlohr.bueso@hp.com \
--cc=figo1802@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@horizon.com \
--cc=matthew.r.wilcox@intel.com \
--cc=mingo@elte.hu \
--cc=peter@hurleysoftware.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@linux.vnet.ibm.com \
--cc=riel@redhat.com \
--cc=scott.norton@hp.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=waiman.long@hp.com \
--cc=walken@google.com \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).