All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC][PATCH] fix move/migrate_pages() race on task struct
Date: Thu, 23 Feb 2012 19:14:50 -0800	[thread overview]
Message-ID: <m1ehtkapn9.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1202231536240.13554@router.home> (Christoph Lameter's message of "Thu, 23 Feb 2012 15:41:50 -0600 (CST)")

Christoph Lameter <cl@linux.com> writes:

> On Thu, 23 Feb 2012, Dave Hansen wrote:
>
>> > We may at this point be getting a reference to a task struct from another
>> > process not only from the current process (where the above procedure is
>> > valid). You rightly pointed out that the slab rcu free mechanism allows a
>> > free and a reallocation within the RCU period.
>>
>> I didn't _mean_ to point that out, but I think I realize what you're
>> talking about.  What we have before this patch is this:
>>
>>         rcu_read_lock();
>>         task = pid ? find_task_by_vpid(pid) : current;
>
> We take a refcount here on the mm ... See the code. We could simply take a
> refcount on the task as well if this is considered safe enough. If we have
> a refcount on the task then we do not need the refcount on the mm. Thats
> was your approach...
>
>>         rcu_read_unlock();
>
>> > Is that a real difference or are you just playing with words?
>>
>> I think we're talking about two different things:
>> 1. does RCU protect the pid->task lookup sufficiently?
>
> I dont know

Yes.  See below.

>> 2. Can the task simply go away in the move/migrate_pages() calls?
>
> The task may go away but we need the mm to stay for migration.
> That is why a refcount is taken on the mm.
>
> The bug in migrate_pages() is that we do a rcu_unlock and a rcu_lock. If
> we drop those then we should be safe if the use of a task pointer within a
> rcu section is safe without taking a refcount.

Yes the user of a task_struct pointer found via a userspace pid is valid
for the life of an rcu critical section, and the bug is indeed that we
drop the rcu_lock and somehow expect the task to remain valid.

The guarantee comes from release_task.  In release_task we call
__exit_signal which calls __unhash_process, and then we call
delayed_put_task to guarantee that the task lives until the end
of the rcu interval.



In migrate_pages we have a lot of task accesses outside of the
rcu critical section, and without a reference count on task.

I tell you the truth trying to figure out what that code needs to be
correct if task != current makes my head hurt.

I think we need to grab a reference on task_struct, to stop the task
from going away, and in addition we need to hold task_lock.  To keep
task->mm from changing (see exec_mmap).  But we can't do that and sleep
so I think the entire function needs to be rewritten, and the need for
task deep in the migrate_pages path needs to be removed as even with the
reference count held we can race with someone calling exec.

The only easy fix I see is to add:
if (pid)
	return -EINVAL;

Then we are working with current and only current change it's mm making
things much, much, much simpler.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [RFC][PATCH] fix move/migrate_pages() race on task struct
Date: Thu, 23 Feb 2012 19:14:50 -0800	[thread overview]
Message-ID: <m1ehtkapn9.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <alpine.DEB.2.00.1202231536240.13554@router.home> (Christoph Lameter's message of "Thu, 23 Feb 2012 15:41:50 -0600 (CST)")

Christoph Lameter <cl@linux.com> writes:

> On Thu, 23 Feb 2012, Dave Hansen wrote:
>
>> > We may at this point be getting a reference to a task struct from another
>> > process not only from the current process (where the above procedure is
>> > valid). You rightly pointed out that the slab rcu free mechanism allows a
>> > free and a reallocation within the RCU period.
>>
>> I didn't _mean_ to point that out, but I think I realize what you're
>> talking about.  What we have before this patch is this:
>>
>>         rcu_read_lock();
>>         task = pid ? find_task_by_vpid(pid) : current;
>
> We take a refcount here on the mm ... See the code. We could simply take a
> refcount on the task as well if this is considered safe enough. If we have
> a refcount on the task then we do not need the refcount on the mm. Thats
> was your approach...
>
>>         rcu_read_unlock();
>
>> > Is that a real difference or are you just playing with words?
>>
>> I think we're talking about two different things:
>> 1. does RCU protect the pid->task lookup sufficiently?
>
> I dont know

Yes.  See below.

>> 2. Can the task simply go away in the move/migrate_pages() calls?
>
> The task may go away but we need the mm to stay for migration.
> That is why a refcount is taken on the mm.
>
> The bug in migrate_pages() is that we do a rcu_unlock and a rcu_lock. If
> we drop those then we should be safe if the use of a task pointer within a
> rcu section is safe without taking a refcount.

Yes the user of a task_struct pointer found via a userspace pid is valid
for the life of an rcu critical section, and the bug is indeed that we
drop the rcu_lock and somehow expect the task to remain valid.

The guarantee comes from release_task.  In release_task we call
__exit_signal which calls __unhash_process, and then we call
delayed_put_task to guarantee that the task lives until the end
of the rcu interval.



In migrate_pages we have a lot of task accesses outside of the
rcu critical section, and without a reference count on task.

I tell you the truth trying to figure out what that code needs to be
correct if task != current makes my head hurt.

I think we need to grab a reference on task_struct, to stop the task
from going away, and in addition we need to hold task_lock.  To keep
task->mm from changing (see exec_mmap).  But we can't do that and sleep
so I think the entire function needs to be rewritten, and the need for
task deep in the migrate_pages path needs to be removed as even with the
reference count held we can race with someone calling exec.

The only easy fix I see is to add:
if (pid)
	return -EINVAL;

Then we are working with current and only current change it's mm making
things much, much, much simpler.

Eric

  reply	other threads:[~2012-02-24  3:11 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-23 18:07 [RFC][PATCH] fix move/migrate_pages() race on task struct Dave Hansen
2012-02-23 18:07 ` Dave Hansen
2012-02-23 18:45 ` Andi Kleen
2012-02-23 18:45   ` Andi Kleen
2012-02-23 18:45 ` Christoph Lameter
2012-02-23 18:45   ` Christoph Lameter
2012-02-23 19:10   ` Dave Hansen
2012-02-23 19:10     ` Dave Hansen
2012-02-23 19:40     ` Christoph Lameter
2012-02-23 19:40       ` Christoph Lameter
2012-02-23 20:04       ` Dave Hansen
2012-02-23 20:04         ` Dave Hansen
2012-02-23 21:41         ` Christoph Lameter
2012-02-23 21:41           ` Christoph Lameter
2012-02-24  3:14           ` Eric W. Biederman [this message]
2012-02-24  3:14             ` Eric W. Biederman
2012-02-24 15:20             ` Christoph Lameter
2012-02-24 15:20               ` Christoph Lameter
2012-02-24 15:41               ` Eric W. Biederman
2012-02-24 15:41                 ` Eric W. Biederman
2012-02-24 16:48               ` Dave Hansen
2012-02-24 16:48                 ` Dave Hansen
2012-02-24 16:54                 ` Christoph Lameter
2012-02-24 16:54                   ` Christoph Lameter
2012-02-24 17:04                   ` Dave Hansen
2012-02-24 17:04                     ` Dave Hansen
2012-02-24 17:08                   ` Christoph Lameter
2012-02-24 17:08                     ` Christoph Lameter
2012-02-24 17:25                     ` Dave Hansen
2012-02-24 17:25                       ` Dave Hansen
2012-02-24 17:32                       ` Christoph Lameter
2012-02-24 17:32                         ` Christoph Lameter
2012-02-24 21:37                         ` Dave Hansen
2012-02-24 21:37                           ` Dave Hansen
2012-02-24 23:12                         ` Eric W. Biederman
2012-02-24 23:12                           ` Eric W. Biederman
2012-02-27 16:43                           ` Christoph Lameter
2012-02-27 16:43                             ` Christoph Lameter
2012-02-25 12:13                         ` Eric W. Biederman
2012-02-25 12:13                           ` Eric W. Biederman
2012-02-27 19:01                           ` Christoph Lameter
2012-02-27 19:01                             ` Christoph Lameter
2012-02-27 20:15                             ` Eric W. Biederman
2012-02-27 20:15                               ` Eric W. Biederman
2012-02-27 22:39                               ` Christoph Lameter
2012-02-27 22:39                                 ` Christoph Lameter
2012-02-28 19:30                                 ` Christoph Lameter
2012-02-28 19:30                                   ` Christoph Lameter
2012-02-29 20:31                                   ` Andrew Morton
2012-02-29 20:31                                     ` Andrew Morton
2012-02-29 20:33                                     ` Christoph Lameter
2012-02-29 20:33                                       ` Christoph Lameter
2012-02-29 20:36                                     ` Dave Hansen
2012-02-29 20:36                                       ` Dave Hansen
2012-02-24 17:07               ` KOSAKI Motohiro
2012-02-24 17:07                 ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1ehtkapn9.fsf@fess.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=cl@linux.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.