public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	akpm@linux-foundation.org, josh@joshtriplett.org,
	tglx@linutronix.de, Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com
Subject: Re: [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier
Date: Thu, 7 Jan 2010 12:58:30 -0800	[thread overview]
Message-ID: <20100107205830.GR6764@linux.vnet.ibm.com> (raw)
In-Reply-To: <1262893243.28171.3753.camel@gandalf.stny.rr.com>

On Thu, Jan 07, 2010 at 02:40:43PM -0500, Steven Rostedt wrote:
> On Thu, 2010-01-07 at 11:16 -0800, Paul E. McKenney wrote:
> 
> > > Note, we are not suggesting optimizations. It has nothing to do with
> > > performance of the syscall. We just can't allow one process to be DoSing
> > > another process on another cpu by it sending out millions of IPIs.
> > > Mathieu already showed that you could cause a 2x slowdown to the
> > > unrelated tasks.
> > 
> > I would have said that we are trying to optimize our way out of a DoS
> > situation, but point taken.  Whatever we choose to call it, the discussion
> > is on the suggested modifications, not strictly on the original patch.  ;-)
> 
> OK, I just want to get a better understanding of what can go wrong. A
> sys_membarrier() is used as follows, correct? (using a list example)
> 
> 	<user space>
> 
> 	list_del(obj);
> 
> 	synchronize_rcu();  -> calls sys_membarrier();
> 
> 	free(obj);
> 
> 
> And we need to protect against:
> 
> 	<user space>
> 
> 	read_rcu_lock();
> 
> 	obj = list->next;
> 
> 	use_object(obj);
> 
> 	read_rcu_unlock();

Yep!

> where we want to make sure that the synchronize_rcu() makes sure that we
> have passed the grace period of all takers of read_rcu_lock(). Now I
> have not looked at the code that implements userspace rcu, so I'm making
> a lot of assumptions here. But the problem that we need to avoid is:
> 
> 
> 	CPU 1				CPU 2
>      -----------                    -------------
> 
> 	<user space>			<user space>
> 
> 					rcu_read_lock();
> 
> 					obj = list->next
> 
> 	list_del(obj)
> 
> 					< Interrupt >
> 					< kernel space>
> 					<schedule>
> 
> 					<kernel_thread>
> 
> 					<schedule>
> 
> 					< back to original task >
> 
> 	sys_membarrier();
> 	< kernel space >
> 
> 	if (task_rq(task)->curr != task)
> 	< but still sees kernel thread >
> 
> 	< user space >
> 
> 	< misses that we are still in rcu section >
> 
> 	free(obj);
> 
> 					< user space >
> 
> 					use_object(obj); <=== crash!
> 
> 
> 
> I guess what I'm trying to do here is to understand what can go wrong,
> and then when we understand the issues, we can find a solution.

I believe that I am worried about a different scenario.  I do not believe
that the scenario you lay out above can actually happen.  The pair of
schedules on CPU 2 have to act as a full memory barrier, otherwise,
it would not be safe to resume a task on some other CPU.  If the pair
of schedules act as a full memory barrier, then the code in
synchronize_rcu() that looks at the RCU read-side state would see that
CPU 2 is in an RCU read-side critical section.

The scenario that I am (perhaps wrongly) concerned about is enabled by
the fact that URCU's rcu_read_lock() has a load, some checks, and a store.
It has compiler constraints, but no hardware memory barriers.  This
means that CPUs (even x86) can execute an rcu_dereference() before the
rcu_read_lock()'s store has executed.

Hacking your example above, keeping mind that x86 can reorder subsequent
loads to precede prior stores:


	CPU 1				CPU 2
     -----------                    -------------

	<user space>			<kernel space, switching to task>

					->curr updated

					<long code path, maybe mb?>

					<user space>

					rcu_read_lock(); [load only]

					obj = list->next

	list_del(obj)

	sys_membarrier();
	< kernel space >

	if (task_rq(task)->curr != task)
	< but load to obj reordered before store to ->curr >

	< user space >

	< misses that CPU 2 is in rcu section >

	[CPU 2's ->curr update now visible]

	[CPU 2's rcu_read_lock() store now visible]

	free(obj);

					use_object(obj); <=== crash!



If the "long code path" happens to include a full memory barrier, or if it
happens to be long enough to overflow CPU 2's store buffer, then the
above scenario cannot happen.  Until such time as someone applies some
unforeseen optimization to the context-switch path.

And, yes, the context-switch path has to have a full memory barrier
somewhere, but that somewhere could just as easily come before the
update of ->curr.

The same scenario applies when using ->cpu_vm_mask instead of ->curr.

Now, I could easily believe that the current context-switch code has
sufficient atomic operations, memory barriers, and instructions to
prevent this scenario from occurring, but it is feeling a lot like an
accident waiting to happen.  Hence my strident complaints.  ;-)

							Thanx, Paul

  reply	other threads:[~2010-01-07 20:58 UTC|newest]

Thread overview: 107+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-07  4:40 [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier Mathieu Desnoyers
2010-01-07  5:02 ` Paul E. McKenney
2010-01-07  5:39   ` Mathieu Desnoyers
2010-01-07  8:32   ` Peter Zijlstra
2010-01-07 16:39     ` Paul E. McKenney
2010-01-07  5:28 ` Josh Triplett
2010-01-07  6:04   ` Mathieu Desnoyers
2010-01-07  6:32     ` Josh Triplett
2010-01-07 17:45       ` Mathieu Desnoyers
2010-01-07 16:46     ` Paul E. McKenney
2010-01-07  5:40 ` Steven Rostedt
2010-01-07  6:19   ` Mathieu Desnoyers
2010-01-07  6:35     ` Josh Triplett
2010-01-07  8:44       ` Peter Zijlstra
2010-01-07 13:15         ` Steven Rostedt
2010-01-07 15:07         ` Mathieu Desnoyers
2010-01-07 16:52         ` Paul E. McKenney
2010-01-07 17:18           ` Peter Zijlstra
2010-01-07 17:31             ` Paul E. McKenney
2010-01-07 17:44               ` Mathieu Desnoyers
2010-01-07 17:55                 ` Paul E. McKenney
2010-01-07 17:44               ` Steven Rostedt
2010-01-07 17:56                 ` Paul E. McKenney
2010-01-07 18:04                   ` Steven Rostedt
2010-01-07 18:40                     ` Paul E. McKenney
2010-01-07 17:36             ` Mathieu Desnoyers
2010-01-07 14:27     ` Steven Rostedt
2010-01-07 15:10       ` Mathieu Desnoyers
2010-01-07 16:49   ` Paul E. McKenney
2010-01-07 17:00     ` Steven Rostedt
2010-01-07  8:27 ` Peter Zijlstra
2010-01-07 18:30   ` Oleg Nesterov
2010-01-07 18:39     ` Paul E. McKenney
2010-01-07 18:59       ` Steven Rostedt
2010-01-07 19:16         ` Paul E. McKenney
2010-01-07 19:40           ` Steven Rostedt
2010-01-07 20:58             ` Paul E. McKenney [this message]
2010-01-07 21:35               ` Steven Rostedt
2010-01-07 22:34                 ` Paul E. McKenney
2010-01-08 22:28                 ` Mathieu Desnoyers
2010-01-08 23:53                 ` Mathieu Desnoyers
2010-01-09  0:20                   ` Paul E. McKenney
2010-01-09  1:02                     ` Mathieu Desnoyers
2010-01-09  1:21                       ` Paul E. McKenney
2010-01-09  1:22                         ` Paul E. McKenney
2010-01-09  2:38                         ` Mathieu Desnoyers
2010-01-09  5:42                           ` Paul E. McKenney
2010-01-09 19:20                             ` Mathieu Desnoyers
2010-01-09 23:05                               ` Steven Rostedt
2010-01-09 23:16                                 ` Steven Rostedt
2010-01-10  0:03                                   ` Paul E. McKenney
2010-01-10  0:41                                     ` Steven Rostedt
2010-01-10  1:14                                       ` Mathieu Desnoyers
2010-01-10  1:44                                       ` Mathieu Desnoyers
2010-01-10  2:12                                         ` Steven Rostedt
2010-01-10  5:25                                           ` Paul E. McKenney
2010-01-10 11:50                                             ` Steven Rostedt
2010-01-10 16:03                                               ` Mathieu Desnoyers
2010-01-10 16:21                                                 ` Steven Rostedt
2010-01-10 17:10                                                   ` Mathieu Desnoyers
2010-01-10 21:02                                                     ` Steven Rostedt
2010-01-10 21:41                                                       ` Mathieu Desnoyers
2010-01-11  1:21                                                       ` Paul E. McKenney
2010-01-10 17:45                                               ` Paul E. McKenney
2010-01-10 18:24                                                 ` Mathieu Desnoyers
2010-01-11  1:17                                                   ` Paul E. McKenney
2010-01-11  4:25                                                     ` Mathieu Desnoyers
2010-01-11  4:29                                                       ` [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier (v3a) Mathieu Desnoyers
2010-01-11 17:27                                                         ` Paul E. McKenney
2010-01-11 17:35                                                           ` Mathieu Desnoyers
2010-01-11 17:50                                                         ` Peter Zijlstra
2010-01-11 20:52                                                           ` Mathieu Desnoyers
2010-01-11 21:19                                                             ` Peter Zijlstra
2010-01-11 22:04                                                               ` Mathieu Desnoyers
2010-01-11 22:20                                                                 ` Peter Zijlstra
2010-01-11 22:48                                                                   ` Paul E. McKenney
2010-01-11 22:48                                                                   ` Mathieu Desnoyers
2010-01-11 21:19                                                             ` Peter Zijlstra
2010-01-11 21:31                                                             ` Peter Zijlstra
2010-01-11  4:30                                                       ` [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier (v3b) Mathieu Desnoyers
2010-01-11 22:43                                                         ` Paul E. McKenney
2010-01-12 15:38                                                           ` Mathieu Desnoyers
2010-01-12 16:27                                                             ` Steven Rostedt
2010-01-12 16:38                                                               ` Mathieu Desnoyers
2010-01-12 16:54                                                               ` Paul E. McKenney
2010-01-12 18:12                                                             ` Paul E. McKenney
2010-01-12 18:56                                                               ` Mathieu Desnoyers
2010-01-13  0:23                                                                 ` Paul E. McKenney
2010-01-11 16:25                                                       ` [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier Paul E. McKenney
2010-01-11 20:21                                                         ` Mathieu Desnoyers
2010-01-11 21:48                                                           ` Paul E. McKenney
2010-01-14  2:56                                                             ` Lai Jiangshan
2010-01-14  5:13                                                               ` Paul E. McKenney
2010-01-14  5:39                                                                 ` Mathieu Desnoyers
2010-01-10  5:18                                         ` Paul E. McKenney
2010-01-10  1:12                                     ` Mathieu Desnoyers
2010-01-10  5:19                                       ` Paul E. McKenney
2010-01-10  1:04                                   ` Mathieu Desnoyers
2010-01-10  1:01                                 ` Mathieu Desnoyers
2010-01-09 23:59                               ` Paul E. McKenney
2010-01-10  1:11                                 ` Mathieu Desnoyers
2010-01-07  9:50 ` Andi Kleen
2010-01-07 15:12   ` Mathieu Desnoyers
2010-01-07 16:56   ` Paul E. McKenney
2010-01-07 11:04 ` David Howells
2010-01-07 15:15   ` Mathieu Desnoyers
2010-01-07 15:47     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100107205830.GR6764@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox