linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* memory barrier question
@ 2010-09-15 14:36 Miklos Szeredi
  2010-09-15 14:36 ` Miklos Szeredi
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Miklos Szeredi @ 2010-09-15 14:36 UTC (permalink / raw)
  To: dhowells; +Cc: linux-kernel, linux-arch

Hi,

I'm trying to understand memory barriers but not quite succeeding.

Consider the following example:

Start:
	p = NULL;
	x = 0;

CPU1:
	atomic_inc(&x);
	p = &x;

CPU2:
	if (p)
		z = atomic_read(p);

Is it possible to end up with z == 0?  What if there's a lock/unlock
before setting "p"?  What if there's a write barrier before setting
"p"?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: memory barrier question
@ 2010-09-20 10:34 George Spelvin
  2010-09-20 10:34 ` George Spelvin
  0 siblings, 1 reply; 37+ messages in thread
From: George Spelvin @ 2010-09-20 10:34 UTC (permalink / raw)
  To: benh, miklos; +Cc: linux, linux-arch, linux-kernel

>>    LOAD inode = next.dentry->inode
>>    if (inode != NULL)
>>    	LOAD inode->f_op
>> 
>> What is there the compiler can optimize?

> Those two loads depend on each other, I don't think any implementation
> can re-order them. In fact, such data dependency is typically what is
> used to avoid having barriers in some cases. The second load cannot be
> issued until the value from the first one is returned.

It's quite hard for the *compiler* to do the loads out of order, but it's
vaguely possible on, say, an ia64 system which supports non-faulting loads.

That is, the compiled code could be:

"I guess the next inode will be at address X"
retry:
LOAD preloaded == *X;
LOAD inode = next.dentry->inode
if (inode == X)
	TRAP if fetch of preloaded had a page fault
	LOAD preloaded.f_op (guaranteed non blocking)
else
	update guess X = inode
	goto retry;
	
However, it's *very* easy for a processor to do exactly that!  There
days, hardware cache prefetchers look for sequential access patterns,
and if your inodes happen to get allocated to consecutive addresses,
it's possible that the prefetcher could have done the read ahead of you.

Even if reads are generated in one order, there's no reason that one can't
be stalled due to some resource conflict (say a bank conflict in the L1
cache) while the second is not stalled and ends up completing first.


So, for example, the following is not safe without barriers:

START:
	dentry->inode == a
	a->f_op == a_op
	b->f_op = INVALID
CPU 1:
	b->f_op = b_op
	dentry->inode = b
CPU 2:
	LOAD inode = dentry->inode
	LOAD op = inode->f_op
	assert(op != INVALID)

While this is a FALSE OVERSIMPLIFICATION that will lead you astray if
you take it to far, you can describe hazards in terms of a serialized stream
of "external observer" memory updates.

The above code has two hazards:
- CPU 1 might write dentry->inode before the b->f_op write is complete, or
- CPU 2 might read inode->f_op befire is reads dentry->inode.  This is
  difficult, but a hardware prefetcher might make a lucky guess.

Either way, CPU 2 ends up reading an invalid op vector.


Although the single serialized stream conceptual model works fine between
two processors, it is not guaranteed to work between three.  That is,
there is no guarantee that the following won't happen:

START:
	a = b = 0
CPU 1:
	a = 1;
	/* No barrier! */
	b = 1;
CPU 2:
	LOAD a;	(loads 1)
	-- read barrier --
	LOAD b; (loads 0)
CPU 3:
	LOAD b;	(loads 1)
	-- read barrier --
	LOAD a; (loads 0)

This is obviously inconsistent with the existence of a "true" sequential
ordering, but is entirely legal according to a less-strict memory ordering.

The lack of a write barrier on CPU 1 means that the order in which the
writes will be seen by other processor is not only undefined, it might
be different for different processors!

The Alpha processor manual had a good description of various ugly corner
cases.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2010-09-22 18:41 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-15 14:36 memory barrier question Miklos Szeredi
2010-09-15 14:36 ` Miklos Szeredi
2010-09-15 19:12 ` Rafael J. Wysocki
2010-09-16 11:55 ` David Howells
2010-09-16 13:42   ` Miklos Szeredi
2010-09-16 13:42     ` Miklos Szeredi
2010-09-16 14:30     ` David Howells
2010-09-16 15:03       ` Paul E. McKenney
2010-09-16 15:03         ` Paul E. McKenney
2010-09-16 16:06         ` Miklos Szeredi
2010-09-16 16:06           ` Miklos Szeredi
2010-09-16 16:37           ` Paul E. McKenney
2010-09-16 16:56             ` Miklos Szeredi
2010-09-16 17:09               ` James Bottomley
2010-09-16 17:17                 ` Miklos Szeredi
2010-09-16 17:40                   ` James Bottomley
2010-09-17 21:49                   ` Benjamin Herrenschmidt
2010-09-17 23:12                     ` Paul E. McKenney
2010-09-19  2:47                       ` Benjamin Herrenschmidt
2010-09-19 15:26                         ` Paul E. McKenney
2010-09-19 20:15                           ` Miklos Szeredi
2010-09-19 21:59                             ` Paul E. McKenney
2010-09-20  0:58                               ` James Bottomley
2010-09-20  1:29                                 ` Paul E. McKenney
2010-09-20 16:01                                   ` Miklos Szeredi
2010-09-20 18:25                                     ` Paul E. McKenney
2010-09-20 18:57                                       ` Paul E. McKenney
2010-09-20 20:26                                       ` Michael Cree
2010-09-20 20:40                                         ` Paul E. McKenney
2010-09-21 14:59                                           ` Paul E. McKenney
2010-09-22 18:41                                             ` Paul E. McKenney
2010-09-18  1:12                     ` Alan Cox
2010-09-16 16:50           ` Jamie Lokier
2010-09-16 16:18   ` Peter Zijlstra
2010-09-16 17:59     ` David Howells
  -- strict thread matches above, loose matches on Subject: below --
2010-09-20 10:34 George Spelvin
2010-09-20 10:34 ` George Spelvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).