All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@us.ibm.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: David Howells <dhowells@redhat.com>,
	Kernel development list <linux-kernel@vger.kernel.org>
Subject: Re: Uses for memory barriers
Date: Fri, 8 Sep 2006 11:57:35 -0700	[thread overview]
Message-ID: <20060908185735.GF1314@us.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0609081101090.7156-100000@iolanthe.rowland.org>

On Fri, Sep 08, 2006 at 11:55:46AM -0400, Alan Stern wrote:
> On Thu, 7 Sep 2006, Paul E. McKenney wrote:
> 
> > On Thu, Sep 07, 2006 at 05:25:51PM -0400, Alan Stern wrote:
> > > Paul:
> > > 
> > > Here's something I had been thinking about back in July but never got 
> > > around to discussing:  Under what circumstances would one ever want to use 
> > > "mb()" rather than "rmb()" or "wmb()"?
> > 
> > If there were reads needing to be separated from writes, for example
> > in spinlocks.  The spinlock-acquisition primitive could not use either
> > wmb() or rmb(), since reads in the critical section must remain ordered
> > with respect to the write to the spinlock itself.
> 
> Yes, okay, that's a general situation where mb() would be useful.  But the 
> actual application amounts to one of the two patterns below, as you will 
> see...

Hey, you asked!!!  ;-)

> Let's call the following pattern "write-mb-read", since that's what each
> CPU does (write a, mb, read b on CPU 0 and write b, mb, read a on CPU 1).
> 
> > > The obvious extension of the canonical example is to have CPU 0 write
> > > one location and read another, while CPU 1 reads and writes the same
> > > locations.  Example:
> > > 
> > > 	CPU 0			CPU 1
> > > 	-----			-----
> > > 	while (y==0) relax();	y = -1;
> > > 	a = 1;			b = 1;
> > > 	mb();			mb();
> > > 	y = b;			x = a;
> > > 				while (y < 0) relax();
> > > 				assert(x==1 || y==1);	//???
> 
> > In the above code, there is nothing stopping CPU 1 from executing through
> > the "x=a" before CPU 0 starts, so that x==0.
> 
> Agreed.
> 
> >  In addition, CPU 1 imposes
> > no ordering between the assignment to y and b,
> 
> Agreed.
> 
> >  so there is nothing stopping
> > CPU 0 from seeing the new value of y, but failing to see the new value of
> > b,
> 
> Disagree: CPU 0 executes mb() between reading y and b.  You have assumed
> that CPU 1 executed its write to b and its mb() before CPU 0 got started, 
> so why wouldn't CPU 0's mb() guarantee that it sees the new value of b?  
> That's really the key point.

If we are talking about some specific CPUs, you might have a point.
But Linux must tolerate least-common-demoninator semantics...

And those least-common-denominator semantics are "if a CPU sees an
assignment from another CPU that followed a memory barrier on that
other CPU, then the first CPU is guaranteed to see any stores from
the other CPU -preceding- that memory barrier.

In your example, CPU 0 does not access x, so never sees any stores
from CPU 1 following the mb(), and thus is never guaranteed to see
the assignments preceding CPU 1's mb().

> To rephrase it in terms of partial orderings of events: CPU 1's mb()  
> orders the commit for the write to b before the read from a.  The fact
> that a was read as 0 means that the read occurred before CPU 0's write to
> a was committed.  The mb() on CPU 0 orders the commit for the write to a
> before the read from b.  By transitivity, the read from b must have
> occurred after the write to b was committed, so the value read must have
> been 1.

After CPU 1 sees the new value of y, then, yes, any operation ordered
after the read of y would see the new value of a.  But (1) the "x=a"
precedes the check for y and (2) there is no mb() following the check of y
to force any ordering whatsoever.

Assume that initially, a==0 in a cache line owned by CPU 1.  Assume that
b==0 and y==0 in separate cache lines owned by CPU 0.

	CPU 0			CPU 1
	-----			-----
				y = -1;  [this goes out quickly.]
	while (y==0) relax(); [picks up the value from CPU 1]
	a = 1;	[the cacheline is on CPU 1, so this one sits in
		 CPU 0's store queue.  CPU 0 transmits an invalidation
		 request, which will take time to reach CPU 1.]
				b = 1; [the cacheline is on CPU 0, so
					this one sits in CPU 1's store
					queue, and CPU 1 transmits an
					invalidation request, which again
					will take time to rech CPU 0.]
	mb(); [CPU 0 waits for acknowledgement of reception of all previously
	       transmitted invalidation requests, and also processes any
	       invalidation requests that it has previously received (none!)
	       before processing any subsequent loads.  Yes, CPU 1 already
	       sent the invalidation request for b, but there is no
	       guarantee that it has worked its way to CPU 0 yet!]
				mb();  [Ditto.]

	At this point, CPU 0 has received the invalidation request for b
	from CPU 1, and CPU 1 has received the invalidation request for
	a from CPU 0, but there is no guarantee that either of these
	invalidation requests have been processed.

				x = a; [Could therefore still be using
					the old cached version of a, so
					that x==0.]
	y = b; [Could also be still using the old cached version of b,
		so that y==0.]
				while (y < 0) relax(); [This will spin until
					the cache coherence protocol delivers
					the new value y==0.  At this point,
					CPU 1 is guaranteed to see the new
					value of a due to CPU 0's mb(), but
					it is too late...  And CPU 1 would
					need a memory barrier following the
					while loop in any case -- otherwise,
					CPU 1 would be within its rights
					on some CPUs to execute the code
					out of order.]
				assert(x==1 || y==1);	//???
					[This assertion can therefore fail.]

By the way, this sort of scenario is why maintainers have my full
sympathies when they automatically reject any patches containing explicit
memory barriers...

> > so that y==0 (assuming the initial value of b is zero).
> > 
> > Something like the following might illustrate your point:
> > 
> > 	CPU 0			CPU 1
> > 	-----			-----
> > 				b = 1;
> > 				wmb();
> > 	while (y==0) relax();	y = -1;
> > 	a = 1;
> > 	wmb();
> > 	y = b;			while (y < 0) relax();
> > 				rmb();
> > 				x = a;
> > 				assert(x==1 || y==1);	//???
> > 
> > Except that the memory barriers have all turned into rmb()s or wmb()s...
> 
> This seems like a non-sequitur.  My point was about mb(); how could this
> example illustrate it?  In particular, CPU 0's wmb() doesn't imply any 
> ordering between its read of y and its read of b.

Excellent point, thus CPU 0's wmb() needs to instead be an mb().

> Furthermore, in this example the stronger assertion x==1 would always 
> hold (it's a corollary of assert(y==-1 || x==1) combined with the 
> knowledge that the while loop has terminated).  Did you in fact intend CPU 
> 0's wmb() to be rmb()?

I was just confused, confusion being a common symptom of working with
explicit memory barriers.  :-/

If CPU 0 did mb(), then I believe assert(x==1&&y==1) would hold.

> Let's call the following pattern "read-mb-write".
> 
> > > The opposite approach would use reads followed by writes:
> > > 
> > > 	CPU 0			CPU 1
> > > 	-----			-----
> > > 	while (x==0) relax();	x = -1;
> > > 	x = a;			y = b;
> > > 	mb();			mb();
> > > 	b = 1;			a = 1;
> > > 				while (x < 0) relax();
> > > 				assert(x==0 || y==0);	//???
> > > 
> > > Similar reasoning can be applied here.  However IIRC, you decided that
> > > neither of these assertions is actually guaranteed to hold.  If that's the
> > > case, then it looks like mb() is useless for coordinating two CPUs.
> > 
> > Yep, similar problems as with the earlier example.
> > 
> > > Am I correct?  Or are there some easily-explained situations where mb()  
> > > really should be used for inter-CPU synchronization?
> > 
> > Consider the following (lame) definitions for spinlock primitives,
> > but in an alternate universe where atomic_xchg() did not imply a
> > memory barrier, and on a weak-memory CPU:
> > 
> > 	typedef spinlock_t atomic_t;
> > 
> > 	void spin_lock(spinlock_t *l)
> > 	{
> > 		for (;;) {
> > 			if (atomic_xchg(l, 1) == 0) {
> > 				smp_mb();
> > 				return;
> > 			}
> > 			while (atomic_read(l) != 0) barrier();
> > 		}
> > 
> > 	}
> > 
> > 	void spin_unlock(spinlock_t *l)
> > 	{
> > 		smp_mb();
> > 		atomic_set(l, 0);
> > 	}
> > 
> > The spin_lock() primitive needs smp_mb() to ensure that all loads and
> > stores in the following critical section happen only -after- the lock
> > is acquired.  Similarly for the spin_unlock() primitive.
> 
> Really?  Let's take a closer look.  Let b be a pointer to a spinlock_t and 
> let a get accessed only within the critical sections:
> 
> 	CPU 0			CPU 1
> 	-----			-----
> 	spin_lock(b);
> 	...
> 	x = a;
> 	spin_unlock(b);		spin_lock(b);
> 				a = 1;
> 				...
> 	assert(x==0); //???
> 
> Expanding out the code gives us a version of the read-mb-write pattern.  

Eh?   There is nothing guaranteeing that CPU 0 gets the lock before
CPU 1, so what is to assert?

> For the sake of argument, let's suppose that CPU 1's spin_lock succeeds
> immediately with no looping:
> 
> 	CPU 0			CPU 1
> 	-----			-----
> 	...
> 	x = a;
> 	// spin_unlock(b):
> 	mb();
> 	atomic_set(b,0);	if (atomic_xchg(b,1) == 0)   // Succeeds
> 					mb();
> 				a = 1;
> 				...
> 	assert(x==0); //???
> 
> As you can see, this fits the pattern exactly.  CPU 0 does read a, mb, 
> write b, and CPU 1 does read b, mb, write a.

Again, you don't have anything that guarantees that CPU 0 will get the
lock first, so the assertion is bogus.  Also, the "if" shouldn't be
there -- since we are assuming CPU 1 got the lock immediately, it
would be unconditionally executing the mb(), right?

> We are supposing the read of b obtains the new value (no looping); can we
> then assert that the read of a obtained the old value?  If we can -- and
> we had better if spinlocks are to work correctly -- then doesn't this mean
> that the read-mb-write pattern will always succeed?

Since CPU 1 saw CPU 0's assignment to b (which follows CPU 0's mb()),
and since CPU 1 immediately executed a memory barrier, CPU 1's critical
section is guaranteed to follow that of CPU 0.  -Both- CPU 0's and CPU 1's
memory barriers are needed to make this work.  This does not rely on
anything fancy, simply the standard pair-wise memory-barrier guarantee.

(Which is: if CPU 1 sees an assignment by CPU 0 following CPU 0 executing
a memory barrier, then any code on CPU 1 following a later CPU 1 memory
barrier is guaranteed to see the effects of all references by CPU 0
that preceded CPU 0's memory barrier.)

David Howells's Documentation/memory-barriers.txt goes through this
sort of thing, IIRC.

							Thanx, Paul

  reply	other threads:[~2006-09-08 18:56 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-07 21:25 Uses for memory barriers Alan Stern
2006-09-07 22:10 ` linux-os (Dick Johnson)
2006-09-08 18:39   ` Alan Stern
2006-09-08  0:14 ` Paul E. McKenney
2006-09-08 15:55   ` Alan Stern
2006-09-08 18:57     ` Paul E. McKenney [this message]
2006-09-08 21:23       ` Alan Stern
2006-09-09  0:44         ` Paul E. McKenney
2006-09-11 16:05           ` Alan Stern
2006-09-08  5:52 ` David Schwartz
     [not found] <200609081929.33027.oliver@neukum.org>
2006-09-08 18:06 ` Alan Stern
2006-09-08 18:22   ` Oliver Neukum
     [not found] <200609082230.22225.oliver@neukum.org>
2006-09-08 21:26 ` Alan Stern
2006-09-08 21:46   ` Oliver Neukum
2006-09-08 22:25     ` Alan Stern
2006-09-08 22:49       ` Oliver Neukum
2006-09-09  2:25         ` Alan Stern
2006-09-11 16:21           ` Paul E. McKenney
2006-09-11 16:50             ` Alan Stern
2006-09-11 17:23               ` Segher Boessenkool
2006-09-11 19:04                 ` Paul E. McKenney
2006-09-11 19:03               ` Paul E. McKenney
2006-09-11 17:21             ` Segher Boessenkool
2006-09-11 19:48             ` Oliver Neukum
2006-09-11 20:29               ` Paul E. McKenney
2006-09-12  9:01             ` David Howells
2006-09-12 10:22               ` Oliver Neukum
2006-09-12 14:55                 ` Paul E. McKenney
2006-09-12 15:07                   ` Oliver Neukum
2006-09-12 16:12                     ` Paul E. McKenney
2006-09-12 17:50                     ` Segher Boessenkool
2006-09-12 14:42               ` Paul E. McKenney
2006-09-12  8:57           ` David Howells
     [not found] <20060911190005.GA1295@us.ibm.com>
2006-09-12 18:08 ` Alan Stern
2006-09-12 20:23   ` Paul E. McKenney
2006-09-14 14:58     ` Alan Stern
2006-09-15  5:16       ` Paul E. McKenney
2006-09-15 19:48         ` Alan Stern
2006-09-16  4:19           ` Paul E. McKenney
2006-09-16 15:28             ` Alan Stern
2006-09-18 19:13               ` Paul E. McKenney
2006-09-18 20:13                 ` Alan Stern
2006-09-19  0:47                   ` Paul E. McKenney
2006-09-19 16:04                     ` Alan Stern
2006-09-19 16:38                       ` Nick Piggin
2006-09-19 17:40                         ` Alan Stern
2006-09-19 17:51                           ` Nick Piggin
2006-09-19 18:19                             ` Paul E. McKenney
2006-09-19 18:48                               ` Nick Piggin
2006-09-19 19:36                                 ` Paul E. McKenney
2006-09-19 19:48                                   ` Nick Piggin
2006-09-19 20:01                                     ` Paul E. McKenney
2006-09-19 20:38                                       ` Alan Stern
2006-09-21  1:43                                         ` Paul E. McKenney
2006-09-19 18:16                       ` Paul E. McKenney
2006-09-20 19:39                         ` Alan Stern
2006-09-21  1:34                           ` Paul E. McKenney
2006-09-21 20:59                             ` Alan Stern
2006-09-22  5:02                               ` Paul E. McKenney
2006-09-22 20:38                                 ` Alan Stern
2006-09-27 21:06                                 ` Alan Stern
2006-09-30  1:11                                   ` Paul E. McKenney
2006-09-30 21:01                                     ` Alan Stern
2006-10-02  0:06                                       ` Paul E. McKenney
2006-10-02 15:44                                         ` Alan Stern
2006-10-04 15:35                                           ` Paul E. McKenney
2006-10-04 18:04                                             ` Alan Stern
2006-10-13 16:51                                               ` Paul E. McKenney
2006-10-13 18:30                                                 ` Alan Stern
2006-10-13 22:39                                                   ` Paul E. McKenney
2006-10-14  2:27                                                     ` Alan Stern
2006-10-17  1:24                                                       ` Paul E. McKenney
2006-10-17 15:29                                                         ` Alan Stern
2006-10-17 17:27                                                           ` Paul E. McKenney
2006-10-17 19:42                                                             ` Alan Stern
2006-10-17 20:15                                                               ` Paul E. McKenney
2006-10-17 21:21                                                                 ` Alan Stern
2006-10-17 22:58                                                                   ` Paul E. McKenney
2006-10-18 19:05                                                                     ` Alan Stern
2006-10-18 23:01                                                                       ` Paul E. McKenney
2006-10-19 16:44                                                                         ` Alan Stern
2006-10-19 19:21                                                                           ` Paul E. McKenney
2006-10-19 20:55                                                                             ` Alan Stern
2006-10-19 22:46                                                                               ` Paul E. McKenney
2006-10-20 16:54                                                                                 ` Alan Stern
2006-10-21  0:59                                                                                   ` Paul E. McKenney
2006-10-21 19:47                                                                                     ` Alan Stern
2006-10-21 22:52                                                                                       ` Paul E. McKenney
2006-10-22  2:18                                                                                         ` Alan Stern
2006-10-23  5:32                                                                                           ` Paul E. McKenney
2006-10-23 14:07                                                                                             ` Alan Stern
2006-10-24 17:52                                                                                               ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060908185735.GF1314@us.ibm.com \
    --to=paulmck@us.ibm.com \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.