From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
akpm@linux-foundation.org, josh@joshtriplett.org,
tglx@linutronix.de, Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
laijs@cn.fujitsu.com, dipankar@in.ibm.com
Subject: Re: [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier
Date: Thu, 7 Jan 2010 14:34:32 -0800 [thread overview]
Message-ID: <20100107223432.GU6764@linux.vnet.ibm.com> (raw)
In-Reply-To: <1262900140.28171.3773.camel@gandalf.stny.rr.com>
On Thu, Jan 07, 2010 at 04:35:40PM -0500, Steven Rostedt wrote:
> On Thu, 2010-01-07 at 12:58 -0800, Paul E. McKenney wrote:
>
> > I believe that I am worried about a different scenario. I do not believe
> > that the scenario you lay out above can actually happen. The pair of
> > schedules on CPU 2 have to act as a full memory barrier, otherwise,
> > it would not be safe to resume a task on some other CPU.
>
> I'm not so sure about that. The update of ->curr happens inside a
> spinlock, which is a rmb() ... wmb() pair. Must be, because a spin_lock
> must be an rmb otherwise the loads could move outside the lock, and the
> spin_unlock must be a wmb() otherwise what was written could move
> outside the lock.
If a given task is running on CPU 0, then switches to CPU 1, all of the
CPU-0 activity from that task had -better- be visible when it runs on
CPU 1. But if you were saying that there are other ways to accomplish
this than a full memory barrier, I do agree.
> > If the pair
> > of schedules act as a full memory barrier, then the code in
> > synchronize_rcu() that looks at the RCU read-side state would see that
> > CPU 2 is in an RCU read-side critical section.
> >
> > The scenario that I am (perhaps wrongly) concerned about is enabled by
> > the fact that URCU's rcu_read_lock() has a load, some checks, and a store.
> > It has compiler constraints, but no hardware memory barriers. This
> > means that CPUs (even x86) can execute an rcu_dereference() before the
> > rcu_read_lock()'s store has executed.
> >
> > Hacking your example above, keeping mind that x86 can reorder subsequent
> > loads to precede prior stores:
> >
> >
> > CPU 1 CPU 2
> > ----------- -------------
> >
> > <user space> <kernel space, switching to task>
> >
> > ->curr updated
> >
> > <long code path, maybe mb?>
> >
> > <user space>
> >
> > rcu_read_lock(); [load only]
> >
> > obj = list->next
> >
> > list_del(obj)
> >
> > sys_membarrier();
> > < kernel space >
>
> Well, if we just grab the task_rq(task)->lock here, then we should be
> OK? We would guarantee that curr is either the task we want or not.
The lock that CPU 2 just grabbed to protect its ->curr update? If so,
then I believe that this would work, because the CPU would not be
permitted to re-order the "obj = list->next" to precede CPU 2's
acquisition of this lock.
> > if (task_rq(task)->curr != task)
> > < but load to obj reordered before store to ->curr >
> >
> > < user space >
> >
> > < misses that CPU 2 is in rcu section >
> >
> > [CPU 2's ->curr update now visible]
> >
> > [CPU 2's rcu_read_lock() store now visible]
> >
> > free(obj);
> >
> > use_object(obj); <=== crash!
> >
> >
> >
> > If the "long code path" happens to include a full memory barrier, or if it
> > happens to be long enough to overflow CPU 2's store buffer, then the
> > above scenario cannot happen. Until such time as someone applies some
> > unforeseen optimization to the context-switch path.
> >
> > And, yes, the context-switch path has to have a full memory barrier
> > somewhere, but that somewhere could just as easily come before the
> > update of ->curr.
>
> Hmm, since ->curr is updated before sched_mm() I'm thinking it would
> have to be after the update of curr.
If I understand what you are getting at, from a coherence viewpoint,
the only requirement is that the memory barrier (or equivalent) come
between the last user-mode instruction and the runqueue update on the
outgoing CPU, and between the runqueue read and the first user-mode
instruction on the incoming CPU.
> > The same scenario applies when using ->cpu_vm_mask instead of ->curr.
> >
> > Now, I could easily believe that the current context-switch code has
> > sufficient atomic operations, memory barriers, and instructions to
> > prevent this scenario from occurring, but it is feeling a lot like an
> > accident waiting to happen. Hence my strident complaints. ;-)
>
> I'm totally with you on this. I really want a good understanding of what
> can go wrong, and show that we have the necessary infrastructure to
> prevent it.
Sounds good to me! ;-)
Thanx, Paul
next prev parent reply other threads:[~2010-01-07 22:34 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-07 4:40 [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier Mathieu Desnoyers
2010-01-07 5:02 ` Paul E. McKenney
2010-01-07 5:39 ` Mathieu Desnoyers
2010-01-07 8:32 ` Peter Zijlstra
2010-01-07 16:39 ` Paul E. McKenney
2010-01-07 5:28 ` Josh Triplett
2010-01-07 6:04 ` Mathieu Desnoyers
2010-01-07 6:32 ` Josh Triplett
2010-01-07 17:45 ` Mathieu Desnoyers
2010-01-07 16:46 ` Paul E. McKenney
2010-01-07 5:40 ` Steven Rostedt
2010-01-07 6:19 ` Mathieu Desnoyers
2010-01-07 6:35 ` Josh Triplett
2010-01-07 8:44 ` Peter Zijlstra
2010-01-07 13:15 ` Steven Rostedt
2010-01-07 15:07 ` Mathieu Desnoyers
2010-01-07 16:52 ` Paul E. McKenney
2010-01-07 17:18 ` Peter Zijlstra
2010-01-07 17:31 ` Paul E. McKenney
2010-01-07 17:44 ` Mathieu Desnoyers
2010-01-07 17:55 ` Paul E. McKenney
2010-01-07 17:44 ` Steven Rostedt
2010-01-07 17:56 ` Paul E. McKenney
2010-01-07 18:04 ` Steven Rostedt
2010-01-07 18:40 ` Paul E. McKenney
2010-01-07 17:36 ` Mathieu Desnoyers
2010-01-07 14:27 ` Steven Rostedt
2010-01-07 15:10 ` Mathieu Desnoyers
2010-01-07 16:49 ` Paul E. McKenney
2010-01-07 17:00 ` Steven Rostedt
2010-01-07 8:27 ` Peter Zijlstra
2010-01-07 18:30 ` Oleg Nesterov
2010-01-07 18:39 ` Paul E. McKenney
2010-01-07 18:59 ` Steven Rostedt
2010-01-07 19:16 ` Paul E. McKenney
2010-01-07 19:40 ` Steven Rostedt
2010-01-07 20:58 ` Paul E. McKenney
2010-01-07 21:35 ` Steven Rostedt
2010-01-07 22:34 ` Paul E. McKenney [this message]
2010-01-08 22:28 ` Mathieu Desnoyers
2010-01-08 23:53 ` Mathieu Desnoyers
2010-01-09 0:20 ` Paul E. McKenney
2010-01-09 1:02 ` Mathieu Desnoyers
2010-01-09 1:21 ` Paul E. McKenney
2010-01-09 1:22 ` Paul E. McKenney
2010-01-09 2:38 ` Mathieu Desnoyers
2010-01-09 5:42 ` Paul E. McKenney
2010-01-09 19:20 ` Mathieu Desnoyers
2010-01-09 23:05 ` Steven Rostedt
2010-01-09 23:16 ` Steven Rostedt
2010-01-10 0:03 ` Paul E. McKenney
2010-01-10 0:41 ` Steven Rostedt
2010-01-10 1:14 ` Mathieu Desnoyers
2010-01-10 1:44 ` Mathieu Desnoyers
2010-01-10 2:12 ` Steven Rostedt
2010-01-10 5:25 ` Paul E. McKenney
2010-01-10 11:50 ` Steven Rostedt
2010-01-10 16:03 ` Mathieu Desnoyers
2010-01-10 16:21 ` Steven Rostedt
2010-01-10 17:10 ` Mathieu Desnoyers
2010-01-10 21:02 ` Steven Rostedt
2010-01-10 21:41 ` Mathieu Desnoyers
2010-01-11 1:21 ` Paul E. McKenney
2010-01-10 17:45 ` Paul E. McKenney
2010-01-10 18:24 ` Mathieu Desnoyers
2010-01-11 1:17 ` Paul E. McKenney
2010-01-11 4:25 ` Mathieu Desnoyers
2010-01-11 4:29 ` [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier (v3a) Mathieu Desnoyers
2010-01-11 17:27 ` Paul E. McKenney
2010-01-11 17:35 ` Mathieu Desnoyers
2010-01-11 17:50 ` Peter Zijlstra
2010-01-11 20:52 ` Mathieu Desnoyers
2010-01-11 21:19 ` Peter Zijlstra
2010-01-11 22:04 ` Mathieu Desnoyers
2010-01-11 22:20 ` Peter Zijlstra
2010-01-11 22:48 ` Paul E. McKenney
2010-01-11 22:48 ` Mathieu Desnoyers
2010-01-11 21:19 ` Peter Zijlstra
2010-01-11 21:31 ` Peter Zijlstra
2010-01-11 4:30 ` [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier (v3b) Mathieu Desnoyers
2010-01-11 22:43 ` Paul E. McKenney
2010-01-12 15:38 ` Mathieu Desnoyers
2010-01-12 16:27 ` Steven Rostedt
2010-01-12 16:38 ` Mathieu Desnoyers
2010-01-12 16:54 ` Paul E. McKenney
2010-01-12 18:12 ` Paul E. McKenney
2010-01-12 18:56 ` Mathieu Desnoyers
2010-01-13 0:23 ` Paul E. McKenney
2010-01-11 16:25 ` [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier Paul E. McKenney
2010-01-11 20:21 ` Mathieu Desnoyers
2010-01-11 21:48 ` Paul E. McKenney
2010-01-14 2:56 ` Lai Jiangshan
2010-01-14 5:13 ` Paul E. McKenney
2010-01-14 5:39 ` Mathieu Desnoyers
2010-01-10 5:18 ` Paul E. McKenney
2010-01-10 1:12 ` Mathieu Desnoyers
2010-01-10 5:19 ` Paul E. McKenney
2010-01-10 1:04 ` Mathieu Desnoyers
2010-01-10 1:01 ` Mathieu Desnoyers
2010-01-09 23:59 ` Paul E. McKenney
2010-01-10 1:11 ` Mathieu Desnoyers
2010-01-07 9:50 ` Andi Kleen
2010-01-07 15:12 ` Mathieu Desnoyers
2010-01-07 16:56 ` Paul E. McKenney
2010-01-07 11:04 ` David Howells
2010-01-07 15:15 ` Mathieu Desnoyers
2010-01-07 15:47 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100107223432.GU6764@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@polymtl.ca \
--cc=mingo@elte.hu \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.