All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	David Rientjes <rientjes@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Hugh Dickins <hughd@google.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH 2/4] mempolicy: fix show_numa_map() vs exec() + do_set_mempolicy() race
Date: Thu, 21 Aug 2014 21:01:17 +0200	[thread overview]
Message-ID: <20140821190117.GC8703@redhat.com> (raw)
In-Reply-To: <20140821190020.GA8703@redhat.com>

9e7814404b77 "hold task->mempolicy while numa_maps scans." fixed the
race with the exiting task but this is not enough.

The current code assumes that get_vma_policy(task) should either see
task->mempolicy == NULL or it should be equal to ->task_mempolicy saved
by hold_task_mempolicy(), so we can never race with __mpol_put(). But
this can only work if we can't race with do_set_mempolicy(), and thus
we can't race with another do_set_mempolicy() or do_exit() after that.

However, do_set_mempolicy()->down_write(mmap_sem) can not prevent this
race. This task can exec, change it's ->mm, and call do_set_mempolicy()
after that; in this case they take 2 different locks.

Change hold_task_mempolicy() to use get_task_policy(), it never returns
NULL, and change show_numa_map() to use __get_vma_policy() or fall back
to proc_priv->task_mempolicy.

Note: this is the minimal fix, we will cleanup this code later. I think
hold_task_mempolicy() and release_task_mempolicy() should die, we can
move this logic into show_numa_map(). Or we can move get_task_policy()
outside of ->mmap_sem and !CONFIG_NUMA code at least.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |   33 +++++++++------------------------
 1 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 9d7a9a6..7810aba 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -87,32 +87,14 @@ unsigned long task_statm(struct mm_struct *mm,
 
 #ifdef CONFIG_NUMA
 /*
- * These functions are for numa_maps but called in generic **maps seq_file
- * ->start(), ->stop() ops.
- *
- * numa_maps scans all vmas under mmap_sem and checks their mempolicy.
- * Each mempolicy object is controlled by reference counting. The problem here
- * is how to avoid accessing dead mempolicy object.
- *
- * Because we're holding mmap_sem while reading seq_file, it's safe to access
- * each vma's mempolicy, no vma objects will never drop refs to mempolicy.
- *
- * A task's mempolicy (task->mempolicy) has different behavior. task->mempolicy
- * is set and replaced under mmap_sem but unrefed and cleared under task_lock().
- * So, without task_lock(), we cannot trust get_vma_policy() because we cannot
- * gurantee the task never exits under us. But taking task_lock() around
- * get_vma_plicy() causes lock order problem.
- *
- * To access task->mempolicy without lock, we hold a reference count of an
- * object pointed by task->mempolicy and remember it. This will guarantee
- * that task->mempolicy points to an alive object or NULL in numa_maps accesses.
+ * Save get_task_policy() for show_numa_map().
  */
 static void hold_task_mempolicy(struct proc_maps_private *priv)
 {
 	struct task_struct *task = priv->task;
 
 	task_lock(task);
-	priv->task_mempolicy = task->mempolicy;
+	priv->task_mempolicy = get_task_policy(task);
 	mpol_get(priv->task_mempolicy);
 	task_unlock(task);
 }
@@ -1423,7 +1405,6 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
 	struct vm_area_struct *vma = v;
 	struct numa_maps *md = &numa_priv->md;
 	struct file *file = vma->vm_file;
-	struct task_struct *task = proc_priv->task;
 	struct mm_struct *mm = vma->vm_mm;
 	struct mm_walk walk = {};
 	struct mempolicy *pol;
@@ -1443,9 +1424,13 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
 	walk.private = md;
 	walk.mm = mm;
 
-	pol = get_vma_policy(task, vma, vma->vm_start);
-	mpol_to_str(buffer, sizeof(buffer), pol);
-	mpol_cond_put(pol);
+	pol = __get_vma_policy(vma, vma->vm_start);
+	if (pol) {
+		mpol_to_str(buffer, sizeof(buffer), pol);
+		mpol_cond_put(pol);
+	} else {
+		mpol_to_str(buffer, sizeof(buffer), proc_priv->task_mempolicy);
+	}
 
 	seq_printf(m, "%08lx %s", vma->vm_start, buffer);
 
-- 
1.5.5.1



  parent reply	other threads:[~2014-08-21 19:04 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
2014-08-06  9:52   ` Kirill A. Shutemov
2014-08-06 14:10     ` Oleg Nesterov
2014-08-06 19:41       ` Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
2014-08-06  9:55   ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
2014-08-06  9:57   ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
2014-08-06 10:05   ` Kirill A. Shutemov
2014-12-03 14:14   ` Kirill A. Shutemov
2014-12-03 16:59     ` Eric W. Biederman
2014-12-04 16:17       ` Kirill A. Shutemov
2014-12-03 17:34     ` Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
2014-08-06 10:12   ` Kirill A. Shutemov
2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
2014-08-06 10:17   ` Kirill A. Shutemov
2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
2014-08-06 10:18   ` Kirill A. Shutemov
2014-08-06 10:19 ` [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Kirill A. Shutemov
2014-08-06 18:48 ` Cyrill Gorcunov
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
2014-08-07 19:17   ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
2014-08-07 19:17   ` [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable Oleg Nesterov
2014-08-07 19:17   ` [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper Oleg Nesterov
2014-08-07 19:17   ` [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic Oleg Nesterov
2014-08-07 19:18   ` [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start() Oleg Nesterov
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
2014-08-11 17:00   ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
2014-08-11 17:00   ` [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
2014-08-11 17:00   ` [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm Oleg Nesterov
2014-08-12 13:57   ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
2014-08-12 17:35     ` Oleg Nesterov
2014-08-20 16:49 ` [PATCH -mm 0/2] make /proc/PID/*maps* namespace friendly Oleg Nesterov
2014-08-20 16:49   ` [PATCH -mm 1/2] proc/maps: replace proc_maps_private->pid with "struct inode *inode" Oleg Nesterov
2014-08-20 16:50   ` [PATCH -mm 2/2] proc/maps: make vm_is_stack() logic namespace-friendly Oleg Nesterov
2014-08-20 19:22 ` [PATCH 0/4] mempolicy: get_task_policy() cleanups/preparations Oleg Nesterov
2014-08-20 19:22   ` [PATCH 1/4] mempolicy: change alloc_pages_vma() to use mpol_cond_put() Oleg Nesterov
2014-08-20 19:23   ` [PATCH 2/4] mempolicy: change get_task_policy() to return default_policy rather than NULL Oleg Nesterov
2014-08-20 19:23   ` [PATCH 3/4] mempolicy: sanitize the usage of get_task_policy() Oleg Nesterov
2014-08-20 19:23   ` [PATCH 4/4] mempolicy: remove the "task" arg of vma_policy_mof() and simplify it Oleg Nesterov
2014-08-21 19:00   ` [PATCH 0/4] mempolicy: fix show_numa_map() vs exec() + do_set_mempolicy() race Oleg Nesterov
2014-08-21 19:00     ` [PATCH 1/4] mempolicy: introduce __get_vma_policy(), export get_task_policy() Oleg Nesterov
2014-08-21 19:01     ` Oleg Nesterov [this message]
2014-08-21 19:01     ` [PATCH 3/4] mempolicy: kill do_set_mempolicy()->down_write(&mm->mmap_sem) Oleg Nesterov
2014-08-21 19:01     ` [PATCH 4/4] mempolicy: unexport get_vma_policy() and remove its "task" arg Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140821190117.GC8703@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.