linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Will Deacon <will.deacon@arm.com>, Ingo Molnar <mingo@elte.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	Waiman Long <waiman.long@hp.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Alex Shi <alex.shi@linaro.org>, Andi Kleen <andi@firstfloor.org>,
	Michel Lespinasse <walken@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	Matthew R Wilcox <matthew.r.wilcox@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Rik van Riel <riel@redhat.com>,
	Peter Hurley <peter@hurleysoftware.com>,
	Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>,
	George Spelvin <linux@horizon.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Arnd Bergmann <arnd@arndb.de>,
	Aswin Chandramouleeswaran <aswin@hp.com>,
	Scott J Norton <scott.norton@hp.com>,
	"Figo.zhang" <figo1802@gmail.com>
Subject: Re: [PATCH v6 4/5] MCS Lock: Barrier corrections
Date: Fri, 22 Nov 2013 17:36:54 -0800	[thread overview]
Message-ID: <20131123013654.GG4138@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+55aFy8kx1qaWszc9nrbUaqFu7GfTtDkpzPBeE2g2U6RZjYkA@mail.gmail.com>

On Fri, Nov 22, 2013 at 04:42:37PM -0800, Linus Torvalds wrote:
> On Fri, Nov 22, 2013 at 4:25 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> >
> > Start with Tim Chen's most recent patches for MCS locking, the ones that
> > do the lock handoff using smp_store_release() and smp_load_acquire().
> > Add to that Peter Zijlstra's patch that uses PowerPC lwsync for both
> > smp_store_release() and smp_load_acquire().  Run the resulting lock
> > at high contention, so that all lock handoffs are done via the queue.
> > Then you will have something that acts like a lock from the viewpoint
> > of CPU holding that lock, but which does -not- guarantee that an
> > unlock+lock acts like a full memory barrier if the unlock and lock run
> > on two different CPUs, and if the observer is running on a third CPU.
> 
> Umm. If the unlock and the lock run on different CPU's, then the lock
> handoff cannot be done through the queue (I assume that what you mean
> by "queue" is the write buffer).

No, I mean by the MCS lock's queue of waiters.  Software, not hardware.

You know, this really isn't all -that- difficult.

Here is how Tim's MCS lock hands off to the next requester on the queue:

+       smp_store_release(&next->locked, 1);                            \

Given Peter's powerpc implementation, this is an lwsync followed by
a store.

Here is how Tim's MCS lock has the next requester take the handoff:

+       while (!(smp_load_acquire(&node->locked)))                      \
+               arch_mutex_cpu_relax();                                 \

Given Peter's powerpc implementation, this is a load followed by an
lwsync.

So a lock handoff looks like this, where the variable lock is initially 1
(held by CPU 0):

	CPU 0 (releasing)	CPU 1 (acquiring)
	-----			-----
	CS0			while (ACCESS_ONCE(lock) == 1)
	lwsync				continue;
	ACCESS_ONCE(lock) = 0;	lwsync
				CS1

Because lwsync orders both loads and stores before stores, CPU 0's
lwsync does the ordering required to keep CS0 from bleeding out.
Because lwsync orders loads before both loads and stores, CPU 1's lwsync
does the ordering required to keep CS1 from bleeding out.  It even works
transitively because we use the same lock variable throughout, all
from the perspective of a CPU holding "lock".

Therefore, Tim's MCS lock combined with Peter's powerpc implementations
of smp_load_acquire() and smp_store_release() really does act like a
lock from the viewpoint of whoever is holding the lock.

But this does -not- guarantee that some other non-lock-holding CPU 2 will
see CS0 and CS1 in order.  To see this, let's fill in the two critical
sections, where variables X and Y are both initially zero:

	CPU 0 (releasing)	CPU 1 (acquiring)
	-----			-----
	ACCESS_ONCE(X) = 1;	while (ACCESS_ONCE(lock) == 1)
	lwsync				continue;
	ACCESS_ONCE(lock) = 0;	lwsync
				r1 = ACCESS_ONCE(Y);

Then let's add in the observer CPU 2:

	CPU 2
	-----
	ACCESS_ONCE(Y) = 1;
	sync
	r2 = ACCESS_ONCE(X);

If unlock+lock act as a full memory barrier, it would be impossible to
end up with (r1 == 0 && r2 == 0).  After all, (r1 == 0) implies that
CPU 2's store to Y happened after CPU 1's load from Y, and (r2 == 0)
implies that CPU 0's load from X happened after CPU 2's store to X.
If CPU 0's unlock combined with CPU 1's lock really acted like a full
memory barrier, we end up with CPU 0's load happening before CPU 1's
store happening before CPU 2's store happening before CPU 2's load
happening before CPU 0's load.

However, the outcome (r1 == 0 && r2 == 0) really does happen both
in theory and on real hardware.  Therefore, although this acts as
a lock from the viewpoint of a CPU holding the lock, the unlock+lock
combination does -not- act as a full memory barrier.

So there is your example.  It really can and does happen.

Again, easy fix.  Just change powerpc's smp_store_release() from lwsync
to smp_mb().  That fixes the problem and doesn't hurt anyone but powerpc.

OK?

							Thanx, Paul

> And yes, the write buffer is why running unlock+lock on the *same* CPU
> is a special case and can generate more re-ordering than is visible
> externally (and I generally do agree that we should strive for
> serialization at that point), but even it does not actually violate
> the rules mentioned in Documentation/memory-barriers.txt wrt an
> external CPU because the write that releases the lock isn't actually
> visible at that point in the cache, and if the same CPU re-aquires it
> by doing a read that bypasses the write and hits in the write buffer
> or the unlock, the unlocked state in between won't even be seen
> outside of that CPU.
> 
> See? The local write buffer is special. It very much bypasses the
> cache, but the thing about it is that it's local to that CPU.
> 
> Now, I do have to admit that cache coherency protocols are really
> subtle, and there may be something else I'm missing, but the thing you
> brought up is not one of those things, afaik.
> 
>               Linus
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-11-23  1:37 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1384885312.git.tim.c.chen@linux.intel.com>
2013-11-20  1:37 ` [PATCH v6 0/5] MCS Lock: MCS lock code cleanup and optimizations Tim Chen
2013-11-20 10:19   ` Will Deacon
2013-11-20 12:50     ` Paul E. McKenney
2013-11-20 17:00       ` Will Deacon
2013-11-20 17:14         ` Paul E. McKenney
2013-11-20 17:00     ` Tim Chen
2013-11-20 17:16       ` Paul E. McKenney
2013-11-20  1:37 ` [PATCH v6 1/5] MCS Lock: Restructure the MCS lock defines and locking code into its own file Tim Chen
2013-11-20  1:37 ` [PATCH v6 2/5] MCS Lock: optimizations and extra comments Tim Chen
2013-11-20  1:37 ` [PATCH v6 3/5] MCS Lock: Move mcs_lock/unlock function into its own file Tim Chen
2013-11-20  1:37 ` [PATCH v6 4/5] MCS Lock: Barrier corrections Tim Chen
2013-11-20 15:31   ` Paul E. McKenney
2013-11-20 15:46     ` Will Deacon
2013-11-20 17:14       ` Paul E. McKenney
2013-11-20 18:43         ` Tim Chen
2013-11-20 19:06           ` Paul E. McKenney
2013-11-20 20:36             ` Tim Chen
2013-11-20 21:44               ` Paul E. McKenney
2013-11-20 23:51                 ` Tim Chen
2013-11-21  4:53                   ` Paul E. McKenney
2013-11-21 10:17                     ` Will Deacon
2013-11-21 13:16                       ` Paul E. McKenney
2013-11-21 10:45                     ` Peter Zijlstra
2013-11-21 13:18                       ` Paul E. McKenney
2013-11-21 22:27                     ` Linus Torvalds
2013-11-21 22:52                       ` Paul E. McKenney
2013-11-22  0:09                         ` Linus Torvalds
2013-11-22  4:08                           ` Paul E. McKenney
2013-11-22  4:25                             ` Linus Torvalds
2013-11-22  6:23                               ` Paul E. McKenney
2013-11-22 15:16                                 ` Ingo Molnar
2013-11-22 18:49                                   ` Paul E. McKenney
2013-11-22 19:06                                     ` Linus Torvalds
2013-11-22 20:06                                       ` Paul E. McKenney
2013-11-22 20:09                                         ` Linus Torvalds
2013-11-22 20:37                                           ` Paul E. McKenney
2013-11-22 21:01                                             ` Linus Torvalds
2013-11-22 21:52                                               ` Paul E. McKenney
2013-11-22 22:19                                                 ` Linus Torvalds
2013-11-23  0:25                                                   ` Paul E. McKenney
2013-11-23  0:42                                                     ` Linus Torvalds
2013-11-23  1:36                                                       ` Paul E. McKenney [this message]
2013-11-23  2:11                                                         ` Linus Torvalds
2013-11-23  4:05                                                           ` Paul E. McKenney
2013-11-23 11:24                                                             ` Ingo Molnar
2013-11-23 17:06                                                               ` Paul E. McKenney
2013-11-26 12:02                                                                 ` Ingo Molnar
2013-11-26 19:28                                                                   ` Paul E. McKenney
2013-11-23 20:21                                                         ` Linus Torvalds
2013-11-23 20:39                                                           ` Linus Torvalds
2013-11-25 12:09                                                             ` Peter Zijlstra
2013-11-25 17:18                                                               ` Will Deacon
2013-11-25 17:56                                                                 ` Paul E. McKenney
2013-11-25 17:54                                                             ` Paul E. McKenney
2013-11-23 21:29                                                           ` Peter Zijlstra
2013-11-23 22:24                                                             ` Linus Torvalds
2013-11-25 17:53                                                           ` Paul E. McKenney
2013-11-25 18:21                                                             ` Peter Zijlstra
2013-11-21 11:03         ` Peter Zijlstra
2013-11-21 12:56           ` Peter Zijlstra
2013-11-21 13:20             ` Paul E. McKenney
2013-11-21 17:25               ` Paul E. McKenney
2013-11-21 21:52                 ` Peter Zijlstra
2013-11-21 22:18                   ` Paul E. McKenney
2013-11-22 15:58                     ` Peter Zijlstra
2013-11-22 18:26                       ` Paul E. McKenney
2013-11-22 18:51                         ` Peter Zijlstra
2013-11-22 18:59                           ` Paul E. McKenney
2013-11-25 17:35                           ` Peter Zijlstra
2013-11-25 18:02                             ` Paul E. McKenney
2013-11-25 18:24                               ` Peter Zijlstra
2013-11-25 18:34                                 ` Tim Chen
2013-11-25 18:27                               ` Peter Zijlstra
2013-11-25 23:52                                 ` Paul E. McKenney
2013-11-26  9:59                                   ` Peter Zijlstra
2013-11-26 17:11                                     ` Paul E. McKenney
2013-11-26 17:18                                       ` Peter Zijlstra
2013-11-26 19:00                                     ` Linus Torvalds
2013-11-26 19:20                                       ` Paul E. McKenney
2013-11-26 19:32                                         ` Linus Torvalds
2013-11-26 22:51                                           ` Paul E. McKenney
2013-11-26 23:58                                             ` Linus Torvalds
2013-11-27  0:21                                               ` Thomas Gleixner
2013-11-27  0:39                                               ` Paul E. McKenney
2013-11-27  1:05                                                 ` Linus Torvalds
2013-11-27  1:31                                                   ` Paul E. McKenney
2013-11-27 10:16                                             ` Will Deacon
2013-11-27 17:11                                               ` Paul E. McKenney
2013-11-28 11:40                                                 ` Will Deacon
2013-11-28 17:38                                                   ` Paul E. McKenney
2013-11-28 18:03                                                     ` Will Deacon
2013-11-28 18:27                                                       ` Paul E. McKenney
2013-11-28 18:53                                                         ` Will Deacon
2013-11-28 19:50                                                           ` Paul E. McKenney
2013-11-29 16:17                                                             ` Will Deacon
2013-11-29 16:44                                                               ` Linus Torvalds
2013-11-29 18:18                                                                 ` Will Deacon
2013-11-30 17:38                                                                 ` Paul E. McKenney
2013-11-26 19:21                                       ` Peter Zijlstra
2013-11-27 16:58                                         ` Oleg Nesterov
2013-11-26 23:08                                       ` Benjamin Herrenschmidt
2013-11-25 23:55                               ` H. Peter Anvin
2013-11-26  3:16                                 ` Paul E. McKenney
2013-11-27  0:46                                   ` H. Peter Anvin
2013-11-27  1:07                                     ` Linus Torvalds
2013-11-27  1:27                                     ` Paul E. McKenney
2013-11-27  2:59                                       ` H. Peter Anvin
2013-11-25 18:52                             ` H. Peter Anvin
2013-11-25 22:58                               ` Tim Chen
2013-11-25 23:28                                 ` H. Peter Anvin
2013-11-25 23:51                                   ` Paul E. McKenney
2013-11-25 23:36                               ` Paul E. McKenney
2013-12-04 21:26                 ` Andi Kleen
2013-12-04 22:07                   ` Paul E. McKenney
2013-11-21 13:19           ` Paul E. McKenney
2013-11-20  1:37 ` [PATCH v6 5/5] MCS Lock: Allows for architecture specific mcs lock and unlock Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131123013654.GG4138@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linaro.org \
    --cc=andi@firstfloor.org \
    --cc=arnd@arndb.de \
    --cc=aswin@hp.com \
    --cc=dave.hansen@intel.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=figo1802@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@horizon.com \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mingo@elte.hu \
    --cc=mingo@kernel.org \
    --cc=peter@hurleysoftware.com \
    --cc=raghavendra.kt@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=scott.norton@hp.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=waiman.long@hp.com \
    --cc=walken@google.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).