linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* memory barrier question
@ 2010-09-15 14:36 Miklos Szeredi
  2010-09-15 19:12 ` Rafael J. Wysocki
  2010-09-16 11:55 ` David Howells
  0 siblings, 2 replies; 37+ messages in thread
From: Miklos Szeredi @ 2010-09-15 14:36 UTC (permalink / raw)
  To: dhowells; +Cc: linux-kernel, linux-arch

Hi,

I'm trying to understand memory barriers but not quite succeeding.

Consider the following example:

Start:
	p = NULL;
	x = 0;

CPU1:
	atomic_inc(&x);
	p = &x;

CPU2:
	if (p)
		z = atomic_read(p);

Is it possible to end up with z == 0?  What if there's a lock/unlock
before setting "p"?  What if there's a write barrier before setting
"p"?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: memory barrier question
@ 2010-09-20 10:34 George Spelvin
  0 siblings, 0 replies; 37+ messages in thread
From: George Spelvin @ 2010-09-20 10:34 UTC (permalink / raw)
  To: benh, miklos; +Cc: linux, linux-arch, linux-kernel

>>    LOAD inode = next.dentry->inode
>>    if (inode != NULL)
>>    	LOAD inode->f_op
>> 
>> What is there the compiler can optimize?

> Those two loads depend on each other, I don't think any implementation
> can re-order them. In fact, such data dependency is typically what is
> used to avoid having barriers in some cases. The second load cannot be
> issued until the value from the first one is returned.

It's quite hard for the *compiler* to do the loads out of order, but it's
vaguely possible on, say, an ia64 system which supports non-faulting loads.

That is, the compiled code could be:

"I guess the next inode will be at address X"
retry:
LOAD preloaded == *X;
LOAD inode = next.dentry->inode
if (inode == X)
	TRAP if fetch of preloaded had a page fault
	LOAD preloaded.f_op (guaranteed non blocking)
else
	update guess X = inode
	goto retry;
	
However, it's *very* easy for a processor to do exactly that!  There
days, hardware cache prefetchers look for sequential access patterns,
and if your inodes happen to get allocated to consecutive addresses,
it's possible that the prefetcher could have done the read ahead of you.

Even if reads are generated in one order, there's no reason that one can't
be stalled due to some resource conflict (say a bank conflict in the L1
cache) while the second is not stalled and ends up completing first.


So, for example, the following is not safe without barriers:

START:
	dentry->inode == a
	a->f_op == a_op
	b->f_op = INVALID
CPU 1:
	b->f_op = b_op
	dentry->inode = b
CPU 2:
	LOAD inode = dentry->inode
	LOAD op = inode->f_op
	assert(op != INVALID)

While this is a FALSE OVERSIMPLIFICATION that will lead you astray if
you take it to far, you can describe hazards in terms of a serialized stream
of "external observer" memory updates.

The above code has two hazards:
- CPU 1 might write dentry->inode before the b->f_op write is complete, or
- CPU 2 might read inode->f_op befire is reads dentry->inode.  This is
  difficult, but a hardware prefetcher might make a lucky guess.

Either way, CPU 2 ends up reading an invalid op vector.


Although the single serialized stream conceptual model works fine between
two processors, it is not guaranteed to work between three.  That is,
there is no guarantee that the following won't happen:

START:
	a = b = 0
CPU 1:
	a = 1;
	/* No barrier! */
	b = 1;
CPU 2:
	LOAD a;	(loads 1)
	-- read barrier --
	LOAD b; (loads 0)
CPU 3:
	LOAD b;	(loads 1)
	-- read barrier --
	LOAD a; (loads 0)

This is obviously inconsistent with the existence of a "true" sequential
ordering, but is entirely legal according to a less-strict memory ordering.

The lack of a write barrier on CPU 1 means that the order in which the
writes will be seen by other processor is not only undefined, it might
be different for different processors!

The Alpha processor manual had a good description of various ugly corner
cases.

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Memory barrier question.
@ 2010-10-29 13:23 Tetsuo Handa
  2010-10-29 15:09 ` Paul E. McKenney
  0 siblings, 1 reply; 37+ messages in thread
From: Tetsuo Handa @ 2010-10-29 13:23 UTC (permalink / raw)
  To: paulmck; +Cc: linux-kernel

Hello.

I got a question regarding memory barrier.

sruct word {
	struct list_head list;
	char *buf;
};

static LIST_HEAD(wordlist);
static DEFINE_SPINLOCK(wordlist_lock);

--- On CPU 0 ---

struct word *hello = kzalloc(sizeof(*hello), GFP_KERNEL);
hello->buf = kstrdup("hello", GFP_KERNEL);
spin_lock(&wordlist_lock);
list_add_rcu(&hello.list, &wordlist);
spin_unlock(&wordlist_lock);

--- On CPU 1 ---

struct word *ptr;
rcu_read_lock();
list_for_each_entry_rcu(ptr, &wordlist, list) {
	char *str = ptr->buf;
	printk("%s\n", str);
}
rcu_read_unlock();

Use of rcu_assign_pointer() and rcu_dereference() guarantees that
CPU 1 gets &hello->list by reading wordlist.next only after
CPU 1 can get kstrdup()ed pointer by reading hello->buf.
But what guarantees that CPU 1 gets "hello" by reading kstrdup()ed pointer?

Say, kstrdup("hello", GFP_KERNEL) stores

  'h' -> 0xC0000000
  'e' -> 0xC0000001
  'l' -> 0xC0000002
  'l' -> 0xC0000003
  'o' -> 0xC0000004
  '\0' -> 0xC0000005

and hello->buf = kstrdup() stores

  0xC0000000 -> hello->buf

.

If ordered by smp_wmb() by CPU 0 and smp_rmb() by CPU 1,
str = ptr->buf will load

  0xC0000000 -> str

and printk("%s\n", str) will load

  0xC0000000 -> 'h'
  0xC0000001 -> 'e'
  0xC0000002 -> 'l'
  0xC0000003 -> 'l'
  0xC0000004 -> 'o'
  0xC0000005 -> '\0'

.

Since CPU 0 issued smp_wmb() (inside list_add_rcu()) but CPU 1 did not issue
smp_rmb() (inside list_for_each_entry_rcu()), I think CPU 1 would see bogus
values like

  0xC0000000 -> 'h'
  0xC0000001 -> 'a'
  0xC0000002 -> 'l'
  0xC0000003 -> '1'
  0xC0000004 -> 'o'
  0xC0000005 -> 'w'
  0xC0000006 -> 'e'
  0xC0000007 -> 'e'
  0xC0000008 -> 'n'
  0xC0000009 -> '\0'

.

It seems to me that people do not call smp_rmb() before reading memory
which was dynamically allocated/initialized. What am I missing?

Regards.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2010-10-30 12:40 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-15 14:36 memory barrier question Miklos Szeredi
2010-09-15 19:12 ` Rafael J. Wysocki
2010-09-16 11:55 ` David Howells
2010-09-16 13:42   ` Miklos Szeredi
2010-09-16 14:30   ` David Howells
2010-09-16 15:03     ` Paul E. McKenney
2010-09-16 16:06       ` Miklos Szeredi
2010-09-16 16:37         ` Paul E. McKenney
2010-09-16 16:56           ` Miklos Szeredi
2010-09-16 17:09             ` James Bottomley
2010-09-16 17:17               ` Miklos Szeredi
2010-09-16 17:40                 ` James Bottomley
2010-09-17 21:49                 ` Benjamin Herrenschmidt
2010-09-17 23:12                   ` Paul E. McKenney
2010-09-19  2:47                     ` Benjamin Herrenschmidt
2010-09-19 15:26                       ` Paul E. McKenney
2010-09-19 20:15                         ` Miklos Szeredi
2010-09-19 21:59                           ` Paul E. McKenney
2010-09-20  0:58                             ` James Bottomley
2010-09-20  1:29                               ` Paul E. McKenney
2010-09-20 16:01                                 ` Miklos Szeredi
2010-09-20 18:25                                   ` Paul E. McKenney
2010-09-20 18:57                                     ` Paul E. McKenney
2010-09-20 20:26                                     ` Michael Cree
2010-09-20 20:40                                       ` Paul E. McKenney
2010-09-21 14:59                                         ` Paul E. McKenney
2010-09-22 18:41                                           ` Paul E. McKenney
2010-09-18  1:12                   ` Alan Cox
2010-09-16 16:50         ` Jamie Lokier
2010-09-16 16:18   ` Peter Zijlstra
2010-09-16 17:59   ` David Howells
  -- strict thread matches above, loose matches on Subject: below --
2010-09-20 10:34 George Spelvin
2010-10-29 13:23 Memory " Tetsuo Handa
2010-10-29 15:09 ` Paul E. McKenney
2010-10-30  5:48   ` Tetsuo Handa
2010-10-30  6:11     ` Eric Dumazet
2010-10-30 12:40       ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).