linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups
@ 2014-08-05 19:46 Oleg Nesterov
  2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
                   ` (10 more replies)
  0 siblings, 11 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Changes:

	1-4: Update the changelogs, join "introduce the stable
	     proc_maps_private->mm" and "change m_start() to rely on
	     priv->mm" into a single "shift "priv->task = NULL" from
	     m_start() to m_stop()".

	     Resulting code is the same. Kirill and Cyrill, you seem
	     to agree with these changes, I'll appreciate your acks.

	5-7: New. Seems to work, please review.

Todo:

	- Cleanup the tail_vma horror in m_start

	- Update task_nommu.c in the same way

	- Fix lock_trace() users

Oleg.

 fs/proc/base.c     |   36 ++++++++------
 fs/proc/internal.h |    3 +
 fs/proc/task_mmu.c |  136 ++++++++++++++++++++++++++-------------------------
 3 files changed, 93 insertions(+), 82 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
  2014-08-06  9:52   ` Kirill A. Shutemov
  2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

get_gate_vma(priv->task->mm) looks ugly and wrong, task->mm can be
NULL or it can changed by exec right after mm_access().

And in theory this race is not harmless, the task can exec and then
later exit and free the new mm_struct. In this case get_task_mm(oldmm)
can't help, get_gate_vma(task->mm) can read the freed/unmapped memory.

I think that priv->task should simply die and hold_task_mempolicy()
logic can be simplified. tail_vma logic asks for cleanups too.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..8a5271e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -170,7 +170,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 		return mm;
 	down_read(&mm->mmap_sem);
 
-	tail_vma = get_gate_vma(priv->task->mm);
+	tail_vma = get_gate_vma(mm);
 	priv->tail_vma = tail_vma;
 	hold_task_mempolicy(priv);
 	/* Start with last addr hint */
@@ -351,12 +351,11 @@ static int show_map(struct seq_file *m, void *v, int is_pid)
 {
 	struct vm_area_struct *vma = v;
 	struct proc_maps_private *priv = m->private;
-	struct task_struct *task = priv->task;
 
 	show_map_vma(m, vma, is_pid);
 
 	if (m->count < m->size)  /* vma is copied successfully */
-		m->version = (vma != get_gate_vma(task->mm))
+		m->version = (vma != priv->tail_vma)
 			? vma->vm_start : 0;
 	return 0;
 }
@@ -584,7 +583,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
 	struct proc_maps_private *priv = m->private;
-	struct task_struct *task = priv->task;
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss;
 	struct mm_walk smaps_walk = {
@@ -639,7 +637,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	show_smap_vma_flags(m, vma);
 
 	if (m->count < m->size)  /* vma is copied successfully */
-		m->version = (vma != get_gate_vma(task->mm))
+		m->version = (vma != priv->tail_vma)
 			? vma->vm_start : 0;
 	return 0;
 }
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open()
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
  2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
  2014-08-06  9:55   ` Kirill A. Shutemov
  2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

do_maps_open() and numa_maps_open() are overcomplicated, they could
use __seq_open_private(). Plus they do the same, just sizeof(*priv)
differs.

Change them to use a new simple helper, proc_maps_open(ops, psize).
This simplifies the code and allows us to do the next changes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |   44 ++++++++++++++++----------------------------
 1 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 8a5271e..794aeb6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -231,23 +231,23 @@ static void m_stop(struct seq_file *m, void *v)
 		put_task_struct(priv->task);
 }
 
+static int proc_maps_open(struct inode *inode, struct file *file,
+			const struct seq_operations *ops, int psize)
+{
+	struct proc_maps_private *priv = __seq_open_private(file, ops, psize);
+
+	if (!priv)
+		return -ENOMEM;
+
+	priv->pid = proc_pid(inode);
+	return 0;
+}
+
 static int do_maps_open(struct inode *inode, struct file *file,
 			const struct seq_operations *ops)
 {
-	struct proc_maps_private *priv;
-	int ret = -ENOMEM;
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (priv) {
-		priv->pid = proc_pid(inode);
-		ret = seq_open(file, ops);
-		if (!ret) {
-			struct seq_file *m = file->private_data;
-			m->private = priv;
-		} else {
-			kfree(priv);
-		}
-	}
-	return ret;
+	return proc_maps_open(inode, file, ops,
+				sizeof(struct proc_maps_private));
 }
 
 static void
@@ -1502,20 +1502,8 @@ static const struct seq_operations proc_tid_numa_maps_op = {
 static int numa_maps_open(struct inode *inode, struct file *file,
 			  const struct seq_operations *ops)
 {
-	struct numa_maps_private *priv;
-	int ret = -ENOMEM;
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (priv) {
-		priv->proc_maps.pid = proc_pid(inode);
-		ret = seq_open(file, ops);
-		if (!ret) {
-			struct seq_file *m = file->private_data;
-			m->private = priv;
-		} else {
-			kfree(priv);
-		}
-	}
-	return ret;
+	return proc_maps_open(inode, file, ops,
+				sizeof(struct numa_maps_private));
 }
 
 static int pid_numa_maps_open(struct inode *inode, struct file *file)
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 3/7] proc: introduce proc_mem_open()
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
  2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
  2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
  2014-08-06  9:57   ` Kirill A. Shutemov
  2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Extract the mm_access() code from __mem_open() into the new helper,
proc_mem_open(), the next patch will add another caller.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/base.c     |   36 +++++++++++++++++++++---------------
 fs/proc/internal.h |    2 ++
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2d696b0..9aef9ac 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -667,29 +667,35 @@ static const struct file_operations proc_single_file_operations = {
 	.release	= single_release,
 };
 
-static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
+
+struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
 {
-	struct task_struct *task = get_proc_task(file_inode(file));
-	struct mm_struct *mm;
+	struct task_struct *task = get_proc_task(inode);
+	struct mm_struct *mm = ERR_PTR(-ESRCH);
 
-	if (!task)
-		return -ESRCH;
+	if (task) {
+		mm = mm_access(task, mode);
+		put_task_struct(task);
 
-	mm = mm_access(task, mode);
-	put_task_struct(task);
+		if (!IS_ERR_OR_NULL(mm)) {
+			/* ensure this mm_struct can't be freed */
+			atomic_inc(&mm->mm_count);
+			/* but do not pin its memory */
+			mmput(mm);
+		}
+	}
+
+	return mm;
+}
+
+static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
+{
+	struct mm_struct *mm = proc_mem_open(inode, mode);
 
 	if (IS_ERR(mm))
 		return PTR_ERR(mm);
 
-	if (mm) {
-		/* ensure this mm_struct can't be freed */
-		atomic_inc(&mm->mm_count);
-		/* but do not pin its memory */
-		mmput(mm);
-	}
-
 	file->private_data = mm;
-
 	return 0;
 }
 
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 3ab6d14..ba0c1c1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -275,6 +275,8 @@ struct proc_maps_private {
 #endif
 };
 
+struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
+
 extern const struct file_operations proc_pid_maps_operations;
 extern const struct file_operations proc_tid_maps_operations;
 extern const struct file_operations proc_pid_numa_maps_operations;
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
                   ` (2 preceding siblings ...)
  2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
  2014-08-06 10:05   ` Kirill A. Shutemov
  2014-12-03 14:14   ` Kirill A. Shutemov
  2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

A simple test-case from Kirill Shutemov

	cat /proc/self/maps >/dev/null
	chmod +x /proc/self/net/packet
	exec /proc/self/net/packet

makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
the opposite order.

It's a false positive and probably we should not allow "chmod +x" on proc
files. Still I think that we should avoid mm_access() and cred_guard_mutex
in sys_read() paths, security checking should happen at open time. Besides,
this doesn't even look right if the task changes its ->mm between m_stop()
and m_start().

Add the new "mm_struct *mm" member into struct proc_maps_private and change
proc_maps_open() to initialize it using proc_mem_open(). Change m_start() to
use priv->mm if atomic_inc_not_zero(mm_users) succeeds or return NULL (eof)
otherwise.

The only complication is that proc_maps_open() users should additionally do
mmdrop() in fop->release(), add the new proc_map_release() helper for that.

Note: this is the user-visible change, if the task execs after open("maps")
the new ->mm won't be visible via this file. I hope this is fine, and this
matches /proc/pid/mem bahaviour.

Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/internal.h |    1 +
 fs/proc/task_mmu.c |   37 ++++++++++++++++++++++++++++---------
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index ba0c1c1..78784cd 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -267,6 +267,7 @@ extern int proc_remount(struct super_block *, int *, char *);
 struct proc_maps_private {
 	struct pid *pid;
 	struct task_struct *task;
+	struct mm_struct *mm;
 #ifdef CONFIG_MMU
 	struct vm_area_struct *tail_vma;
 #endif
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 794aeb6..7ec8eb5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -165,9 +165,9 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	if (!priv->task)
 		return ERR_PTR(-ESRCH);
 
-	mm = mm_access(priv->task, PTRACE_MODE_READ);
-	if (!mm || IS_ERR(mm))
-		return mm;
+	mm = priv->mm;
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
+		return NULL;
 	down_read(&mm->mmap_sem);
 
 	tail_vma = get_gate_vma(mm);
@@ -240,9 +240,28 @@ static int proc_maps_open(struct inode *inode, struct file *file,
 		return -ENOMEM;
 
 	priv->pid = proc_pid(inode);
+	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	if (IS_ERR(priv->mm)) {
+		int err = PTR_ERR(priv->mm);
+		seq_release_private(inode, file);
+		return err;
+	}
+
 	return 0;
 }
 
+static int proc_map_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+	struct proc_maps_private *priv = seq->private;
+
+	if (priv->mm)
+		mmdrop(priv->mm);
+
+	return seq_release_private(inode, file);
+}
+
+
 static int do_maps_open(struct inode *inode, struct file *file,
 			const struct seq_operations *ops)
 {
@@ -398,14 +417,14 @@ const struct file_operations proc_pid_maps_operations = {
 	.open		= pid_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 const struct file_operations proc_tid_maps_operations = {
 	.open		= tid_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 /*
@@ -680,14 +699,14 @@ const struct file_operations proc_pid_smaps_operations = {
 	.open		= pid_smaps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 const struct file_operations proc_tid_smaps_operations = {
 	.open		= tid_smaps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 /*
@@ -1520,13 +1539,13 @@ const struct file_operations proc_pid_numa_maps_operations = {
 	.open		= pid_numa_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 
 const struct file_operations proc_tid_numa_maps_operations = {
 	.open		= tid_numa_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= proc_map_release,
 };
 #endif /* CONFIG_NUMA */
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
                   ` (3 preceding siblings ...)
  2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
  2014-08-06 10:12   ` Kirill A. Shutemov
  2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

m_start() drops ->mmap_sem and does mmput() if it retuns vsyscall
vma. This is because in this case m_stop()->vma_stop() obviously
can't use gate_vma->vm_mm.

Now that we have proc_maps_private->mm we can simplify this logic:

  - Change m_start() to return with ->mmap_sem held unless it returns
    IS_ERR_OR_NULL().

  - Change vma_stop() to use priv->mm and avoid the ugly vma checks,
    this makes "vm_area_struct *vma" unnecessary.

  - This also allows m_start() to use vm_stop().

  - Cleanup m_next() to follow the new locking rule.

    Note: m_stop() looks very ugly, and this temporary uglifies it
    even more. Fixed by the next change.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |   34 ++++++++++++++++++----------------
 1 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7ec8eb5..f1254b4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -129,14 +129,12 @@ static void release_task_mempolicy(struct proc_maps_private *priv)
 }
 #endif
 
-static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma)
+static void vma_stop(struct proc_maps_private *priv)
 {
-	if (vma && vma != priv->tail_vma) {
-		struct mm_struct *mm = vma->vm_mm;
-		release_task_mempolicy(priv);
-		up_read(&mm->mmap_sem);
-		mmput(mm);
-	}
+	struct mm_struct *mm = priv->mm;
+	release_task_mempolicy(priv);
+	up_read(&mm->mmap_sem);
+	mmput(mm);
 }
 
 static void *m_start(struct seq_file *m, loff_t *pos)
@@ -199,12 +197,13 @@ out:
 	if (vma)
 		return vma;
 
-	release_task_mempolicy(priv);
 	/* End of vmas has been reached */
 	m->version = (tail_vma != NULL)? 0: -1UL;
-	up_read(&mm->mmap_sem);
-	mmput(mm);
-	return tail_vma;
+	if (tail_vma)
+		return tail_vma;
+
+	vma_stop(priv);
+	return NULL;
 }
 
 static void *m_next(struct seq_file *m, void *v, loff_t *pos)
@@ -212,21 +211,24 @@ static void *m_next(struct seq_file *m, void *v, loff_t *pos)
 	struct proc_maps_private *priv = m->private;
 	struct vm_area_struct *vma = v;
 	struct vm_area_struct *tail_vma = priv->tail_vma;
+	struct vm_area_struct *next;
 
 	(*pos)++;
 	if (vma && (vma != tail_vma) && vma->vm_next)
 		return vma->vm_next;
-	vma_stop(priv, vma);
-	return (vma != tail_vma)? tail_vma: NULL;
+
+	next = (vma != tail_vma)? tail_vma: NULL;
+	if (!next)
+		vma_stop(priv);
+	return next;
 }
 
 static void m_stop(struct seq_file *m, void *v)
 {
 	struct proc_maps_private *priv = m->private;
-	struct vm_area_struct *vma = v;
 
-	if (!IS_ERR(vma))
-		vma_stop(priv, vma);
+	if (!IS_ERR_OR_NULL(v))
+		vma_stop(priv);
 	if (priv->task)
 		put_task_struct(priv->task);
 }
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next()
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
                   ` (4 preceding siblings ...)
  2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
@ 2014-08-05 19:47 ` Oleg Nesterov
  2014-08-06 10:17   ` Kirill A. Shutemov
  2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

1. Kill the first "vma != NULL" check. Firstly this is not possible,
   m_next() won't be called if ->start() or the previous ->next()
   returns NULL.

   And if it was possible the 2nd "vma != tail_vma" check is buggy,
   we should not wrongly return ->tail_vma.

2. Make this function readable. The logic is very simple, we should
   return check "vma != tail" once and return "vm_next || tail_vma".

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f1254b4..c4c8825 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -209,15 +209,13 @@ out:
 static void *m_next(struct seq_file *m, void *v, loff_t *pos)
 {
 	struct proc_maps_private *priv = m->private;
-	struct vm_area_struct *vma = v;
 	struct vm_area_struct *tail_vma = priv->tail_vma;
-	struct vm_area_struct *next;
+	struct vm_area_struct *vma = v, *next = NULL;
 
 	(*pos)++;
-	if (vma && (vma != tail_vma) && vma->vm_next)
-		return vma->vm_next;
+	if (vma != tail_vma)
+		next = vma->vm_next ?: tail_vma;
 
-	next = (vma != tail_vma)? tail_vma: NULL;
 	if (!next)
 		vma_stop(priv);
 	return next;
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop()
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
                   ` (5 preceding siblings ...)
  2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
@ 2014-08-05 19:47 ` Oleg Nesterov
  2014-08-06 10:18   ` Kirill A. Shutemov
  2014-08-06 10:19 ` [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Kirill A. Shutemov
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

1. There is no reason to reset ->tail_vma in m_start(), if we return
   IS_ERR_OR_NULL() it won't be used.

2. m_start() also clears priv->task to ensure that m_stop() won't use
   the stale pointer if we fail before get_task_struct(). But this is
   ugly and confusing, move this initialization in m_stop().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |    9 +++------
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c4c8825..5f7fb45 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -145,17 +145,12 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	struct vm_area_struct *vma, *tail_vma = NULL;
 	loff_t l = *pos;
 
-	/* Clear the per syscall fields in priv */
-	priv->task = NULL;
-	priv->tail_vma = NULL;
-
 	/*
 	 * We remember last_addr rather than next_addr to hit with
 	 * vmacache most of the time. We have zero last_addr at
 	 * the beginning and also after lseek. We will have -1 last_addr
 	 * after the end of the vmas.
 	 */
-
 	if (last_addr == -1UL)
 		return NULL;
 
@@ -227,8 +222,10 @@ static void m_stop(struct seq_file *m, void *v)
 
 	if (!IS_ERR_OR_NULL(v))
 		vma_stop(priv);
-	if (priv->task)
+	if (priv->task) {
 		put_task_struct(priv->task);
+		priv->task = NULL;
+	}
 }
 
 static int proc_maps_open(struct inode *inode, struct file *file,
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
  2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
@ 2014-08-06  9:52   ` Kirill A. Shutemov
  2014-08-06 14:10     ` Oleg Nesterov
  0 siblings, 1 reply; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06  9:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:46:44PM +0200, Oleg Nesterov wrote:
> get_gate_vma(priv->task->mm) looks ugly and wrong, task->mm can be
> NULL or it can changed by exec right after mm_access().
> 
> And in theory this race is not harmless, the task can exec and then
> later exit and free the new mm_struct. In this case get_task_mm(oldmm)
> can't help, get_gate_vma(task->mm) can read the freed/unmapped memory.
> 
> I think that priv->task should simply die and hold_task_mempolicy()
> logic can be simplified. tail_vma logic asks for cleanups too.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/proc/task_mmu.c |    8 +++-----
>  1 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index cfa63ee..8a5271e 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -170,7 +170,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
>  		return mm;
>  	down_read(&mm->mmap_sem);
>  
> -	tail_vma = get_gate_vma(priv->task->mm);
> +	tail_vma = get_gate_vma(mm);
>  	priv->tail_vma = tail_vma;
>  	hold_task_mempolicy(priv);
>  	/* Start with last addr hint */
> @@ -351,12 +351,11 @@ static int show_map(struct seq_file *m, void *v, int is_pid)
>  {
>  	struct vm_area_struct *vma = v;
>  	struct proc_maps_private *priv = m->private;
> -	struct task_struct *task = priv->task;
>  
>  	show_map_vma(m, vma, is_pid);
>  
>  	if (m->count < m->size)  /* vma is copied successfully */
> -		m->version = (vma != get_gate_vma(task->mm))
> +		m->version = (vma != priv->tail_vma)
>  			? vma->vm_start : 0;

Drop excessive parenthesis while you're there? And in the next hunk.

Otherwise:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

>  	return 0;
>  }
> @@ -584,7 +583,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
>  static int show_smap(struct seq_file *m, void *v, int is_pid)
>  {
>  	struct proc_maps_private *priv = m->private;
> -	struct task_struct *task = priv->task;
>  	struct vm_area_struct *vma = v;
>  	struct mem_size_stats mss;
>  	struct mm_walk smaps_walk = {
> @@ -639,7 +637,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  	show_smap_vma_flags(m, vma);
>  
>  	if (m->count < m->size)  /* vma is copied successfully */
> -		m->version = (vma != get_gate_vma(task->mm))
> +		m->version = (vma != priv->tail_vma)
>  			? vma->vm_start : 0;
>  	return 0;
>  }
> -- 
> 1.5.5.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open()
  2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
@ 2014-08-06  9:55   ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06  9:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:46:48PM +0200, Oleg Nesterov wrote:
> do_maps_open() and numa_maps_open() are overcomplicated, they could
> use __seq_open_private(). Plus they do the same, just sizeof(*priv)
> differs.
> 
> Change them to use a new simple helper, proc_maps_open(ops, psize).
> This simplifies the code and allows us to do the next changes.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 3/7] proc: introduce proc_mem_open()
  2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
@ 2014-08-06  9:57   ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06  9:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:46:52PM +0200, Oleg Nesterov wrote:
> Extract the mm_access() code from __mem_open() into the new helper,
> proc_mem_open(), the next patch will add another caller.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
@ 2014-08-06 10:05   ` Kirill A. Shutemov
  2014-12-03 14:14   ` Kirill A. Shutemov
  1 sibling, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:05 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> A simple test-case from Kirill Shutemov
> 
> 	cat /proc/self/maps >/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet
> 
> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> the opposite order.
> 
> It's a false positive and probably we should not allow "chmod +x" on proc
> files. Still I think that we should avoid mm_access() and cred_guard_mutex
> in sys_read() paths, security checking should happen at open time. Besides,
> this doesn't even look right if the task changes its ->mm between m_stop()
> and m_start().
> 
> Add the new "mm_struct *mm" member into struct proc_maps_private and change
> proc_maps_open() to initialize it using proc_mem_open(). Change m_start() to
> use priv->mm if atomic_inc_not_zero(mm_users) succeeds or return NULL (eof)
> otherwise.
> 
> The only complication is that proc_maps_open() users should additionally do
> mmdrop() in fop->release(), add the new proc_map_release() helper for that.
> 
> Note: this is the user-visible change, if the task execs after open("maps")
> the new ->mm won't be visible via this file. I hope this is fine, and this
> matches /proc/pid/mem bahaviour.
> 
> Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic
  2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
@ 2014-08-06 10:12   ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:46:59PM +0200, Oleg Nesterov wrote:
> m_start() drops ->mmap_sem and does mmput() if it retuns vsyscall
> vma. This is because in this case m_stop()->vma_stop() obviously
> can't use gate_vma->vm_mm.
> 
> Now that we have proc_maps_private->mm we can simplify this logic:
> 
>   - Change m_start() to return with ->mmap_sem held unless it returns
>     IS_ERR_OR_NULL().
> 
>   - Change vma_stop() to use priv->mm and avoid the ugly vma checks,
>     this makes "vm_area_struct *vma" unnecessary.
> 
>   - This also allows m_start() to use vm_stop().
> 
>   - Cleanup m_next() to follow the new locking rule.
> 
>     Note: m_stop() looks very ugly, and this temporary uglifies it
>     even more. Fixed by the next change.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next()
  2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
@ 2014-08-06 10:17   ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:47:03PM +0200, Oleg Nesterov wrote:
> 1. Kill the first "vma != NULL" check. Firstly this is not possible,
>    m_next() won't be called if ->start() or the previous ->next()
>    returns NULL.
> 
>    And if it was possible the 2nd "vma != tail_vma" check is buggy,
>    we should not wrongly return ->tail_vma.
> 
> 2. Make this function readable. The logic is very simple, we should
>    return check "vma != tail" once and return "vm_next || tail_vma".
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop()
  2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
@ 2014-08-06 10:18   ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:47:07PM +0200, Oleg Nesterov wrote:
> 1. There is no reason to reset ->tail_vma in m_start(), if we return
>    IS_ERR_OR_NULL() it won't be used.
> 
> 2. m_start() also clears priv->task to ensure that m_stop() won't use
>    the stale pointer if we fail before get_task_struct(). But this is
>    ugly and confusing, move this initialization in m_stop().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
                   ` (6 preceding siblings ...)
  2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
@ 2014-08-06 10:19 ` Kirill A. Shutemov
  2014-08-06 18:48 ` Cyrill Gorcunov
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:46:28PM +0200, Oleg Nesterov wrote:
> Changes:
> 
> 	1-4: Update the changelogs, join "introduce the stable
> 	     proc_maps_private->mm" and "change m_start() to rely on
> 	     priv->mm" into a single "shift "priv->task = NULL" from
> 	     m_start() to m_stop()".
> 
> 	     Resulting code is the same. Kirill and Cyrill, you seem
> 	     to agree with these changes, I'll appreciate your acks.
> 
> 	5-7: New. Seems to work, please review.

Thanks! Looks great!

Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
  2014-08-06  9:52   ` Kirill A. Shutemov
@ 2014-08-06 14:10     ` Oleg Nesterov
  2014-08-06 19:41       ` Oleg Nesterov
  0 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-06 14:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On 08/06, Kirill A. Shutemov wrote:
>
> On Tue, Aug 05, 2014 at 09:46:44PM +0200, Oleg Nesterov wrote:
> >
> >  	if (m->count < m->size)  /* vma is copied successfully */
> > -		m->version = (vma != get_gate_vma(task->mm))
> > +		m->version = (vma != priv->tail_vma)
> >  			? vma->vm_start : 0;
>
> Drop excessive parenthesis while you're there? And in the next hunk.

I agree. But lets do this in another patch? I forgot to mention this in
the yesterday's TODO, but we should factor out this code. Perhaps move
this into vma_stop(). Or at least add a trivial helper, this pattern
repeats 3 times, plus another one in m_start(). This problem is that I
do not really understand this logic right now, but I'll do something
today in any case.

> Otherwise:
>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
                   ` (7 preceding siblings ...)
  2014-08-06 10:19 ` [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Kirill A. Shutemov
@ 2014-08-06 18:48 ` Cyrill Gorcunov
  2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
  2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
  10 siblings, 0 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-08-06 18:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Alexander Viro, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On Tue, Aug 05, 2014 at 09:46:28PM +0200, Oleg Nesterov wrote:
> Changes:
> 
> 	1-4: Update the changelogs, join "introduce the stable
> 	     proc_maps_private->mm" and "change m_start() to rely on
> 	     priv->mm" into a single "shift "priv->task = NULL" from
> 	     m_start() to m_stop()".
> 
> 	     Resulting code is the same. Kirill and Cyrill, you seem
> 	     to agree with these changes, I'll appreciate your acks.
> 
> 	5-7: New. Seems to work, please review.
> 
> Todo:
> 
> 	- Cleanup the tail_vma horror in m_start
> 
> 	- Update task_nommu.c in the same way
> 
> 	- Fix lock_trace() users
> 
> Oleg.

(Sorry for delay) Thanks a lot, Oleg!
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
to whole series

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
  2014-08-06 14:10     ` Oleg Nesterov
@ 2014-08-06 19:41       ` Oleg Nesterov
  0 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-06 19:41 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On 08/06, Oleg Nesterov wrote:
>
> On 08/06, Kirill A. Shutemov wrote:
> >
> > On Tue, Aug 05, 2014 at 09:46:44PM +0200, Oleg Nesterov wrote:
> > >
> > >  	if (m->count < m->size)  /* vma is copied successfully */
> > > -		m->version = (vma != get_gate_vma(task->mm))
> > > +		m->version = (vma != priv->tail_vma)
> > >  			? vma->vm_start : 0;
> >
> > Drop excessive parenthesis while you're there? And in the next hunk.
>
> I agree. But lets do this in another patch? I forgot to mention this in
> the yesterday's TODO, but we should factor out this code. Perhaps move
> this into vma_stop(). Or at least add a trivial helper, this pattern
> repeats 3 times, plus another one in m_start(). This problem is that I
> do not really understand this logic right now, but I'll do something
> today in any case.

Damn, this needs even more cleanups than I thought. Even "version == -1"
logic is absolutely wrong/dead. And "last_addr rather than next_addr" is
wrong too afaics, this just means we need the extra ->vm_next step plus
vma-or-tail_vma recheck.

Hopefully I'll finish this tomorrow.

Oleg.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
                   ` (8 preceding siblings ...)
  2014-08-06 18:48 ` Cyrill Gorcunov
@ 2014-08-07 19:17 ` Oleg Nesterov
  2014-08-07 19:17   ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
                     ` (4 more replies)
  2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
  10 siblings, 5 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

On 08/05, Oleg Nesterov wrote:
>
> Todo:
>
> 	- Cleanup the tail_vma horror in m_start

On top of the previous "[PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups"
series.

It took me much more time than I expected ;) And after 3 attempts I failed
to fix this (imho) horror step-by-step. So 1/5 simply removes the last_addr
code, and then 4/5 brings it back.

Todo:

	- Update task_nommu.c in the same way

	- Fix lock_trace() users

Oleg.

 fs/proc/task_mmu.c |   98 +++++++++++++++++++++------------------------------
 1 files changed, 40 insertions(+), 58 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic
  2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
@ 2014-08-07 19:17   ` Oleg Nesterov
  2014-08-07 19:17   ` [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable Oleg Nesterov
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

m_start() carefully documents, checks, and sets "m->version = -1" if
we are going to return NULL. The only problem is that we will be never
called again if m_start() returns NULL, so this is simply pointless
and misleading.

Otoh, ->show() methods m->version = 0 if vma == tail_vma and this is
just wrong, we want -1 in this case. And in fact we also want -1 if
->vm_next == NULL and ->tail_vma == NULL.

And it is not used consistently, the "scan vmas" loop in m_start()
should update last_addr too.

Finally, imo the whole "last_addr" logic in m_start() looks horrible.
find_vma(last_addr) is called unconditionally even if we are not going
to use the result. But the main problem is that this code participates
in tail_vma-or-NULL mess, and this looks simply unfixable.

Remove this optimization. We will add it back after some cleanups.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |   35 +----------------------------------
 1 files changed, 1 insertions(+), 34 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5f7fb45..d69f31c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -140,20 +140,10 @@ static void vma_stop(struct proc_maps_private *priv)
 static void *m_start(struct seq_file *m, loff_t *pos)
 {
 	struct proc_maps_private *priv = m->private;
-	unsigned long last_addr = m->version;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma, *tail_vma = NULL;
 	loff_t l = *pos;
 
-	/*
-	 * We remember last_addr rather than next_addr to hit with
-	 * vmacache most of the time. We have zero last_addr at
-	 * the beginning and also after lseek. We will have -1 last_addr
-	 * after the end of the vmas.
-	 */
-	if (last_addr == -1UL)
-		return NULL;
-
 	priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
 	if (!priv->task)
 		return ERR_PTR(-ESRCH);
@@ -166,12 +156,6 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	tail_vma = get_gate_vma(mm);
 	priv->tail_vma = tail_vma;
 	hold_task_mempolicy(priv);
-	/* Start with last addr hint */
-	vma = find_vma(mm, last_addr);
-	if (last_addr && vma) {
-		vma = vma->vm_next;
-		goto out;
-	}
 
 	/*
 	 * Check the vma index is within the range and do
@@ -192,8 +176,6 @@ out:
 	if (vma)
 		return vma;
 
-	/* End of vmas has been reached */
-	m->version = (tail_vma != NULL)? 0: -1UL;
 	if (tail_vma)
 		return tail_vma;
 
@@ -365,14 +347,7 @@ done:
 
 static int show_map(struct seq_file *m, void *v, int is_pid)
 {
-	struct vm_area_struct *vma = v;
-	struct proc_maps_private *priv = m->private;
-
-	show_map_vma(m, vma, is_pid);
-
-	if (m->count < m->size)  /* vma is copied successfully */
-		m->version = (vma != priv->tail_vma)
-			? vma->vm_start : 0;
+	show_map_vma(m, v, is_pid);
 	return 0;
 }
 
@@ -598,7 +573,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
-	struct proc_maps_private *priv = m->private;
 	struct vm_area_struct *vma = v;
 	struct mem_size_stats mss;
 	struct mm_walk smaps_walk = {
@@ -651,10 +625,6 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 				mss.nonlinear >> 10);
 
 	show_smap_vma_flags(m, vma);
-
-	if (m->count < m->size)  /* vma is copied successfully */
-		m->version = (vma != priv->tail_vma)
-			? vma->vm_start : 0;
 	return 0;
 }
 
@@ -1485,9 +1455,6 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
 			seq_printf(m, " N%d=%lu", nid, md->node[nid]);
 out:
 	seq_putc(m, '\n');
-
-	if (m->count < m->size)
-		m->version = (vma != proc_priv->tail_vma) ? vma->vm_start : 0;
 	return 0;
 }
 
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable
  2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
  2014-08-07 19:17   ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
@ 2014-08-07 19:17   ` Oleg Nesterov
  2014-08-07 19:17   ` [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper Oleg Nesterov
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Now that m->version is gone we can cleanup m_start(). In particular,

  - Remove the "unsigned long" typecast, m->index can't be negative
    or exceed ->map_count. But lets use "unsigned int pos" to make
    it clear that "pos < map_count" is safe.

  - Remove the unnecessary "vma != NULL" check in the main loop. It
    can't be NULL unless we have a vm bug.

  - This also means that "pos < map_count" case can simply return the
    valid vma and avoid "goto" and subsequent checks.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |   34 ++++++++++------------------------
 1 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index d69f31c..1349d3d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -137,12 +137,12 @@ static void vma_stop(struct proc_maps_private *priv)
 	mmput(mm);
 }
 
-static void *m_start(struct seq_file *m, loff_t *pos)
+static void *m_start(struct seq_file *m, loff_t *ppos)
 {
 	struct proc_maps_private *priv = m->private;
 	struct mm_struct *mm;
-	struct vm_area_struct *vma, *tail_vma = NULL;
-	loff_t l = *pos;
+	struct vm_area_struct *vma;
+	unsigned int pos = *ppos;
 
 	priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
 	if (!priv->task)
@@ -151,33 +151,19 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	mm = priv->mm;
 	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
 		return NULL;
-	down_read(&mm->mmap_sem);
 
-	tail_vma = get_gate_vma(mm);
-	priv->tail_vma = tail_vma;
+	down_read(&mm->mmap_sem);
 	hold_task_mempolicy(priv);
+	priv->tail_vma = get_gate_vma(mm);
 
-	/*
-	 * Check the vma index is within the range and do
-	 * sequential scan until m_index.
-	 */
-	vma = NULL;
-	if ((unsigned long)l < mm->map_count) {
-		vma = mm->mmap;
-		while (l-- && vma)
+	if (pos < mm->map_count) {
+		for (vma = mm->mmap; pos; pos--)
 			vma = vma->vm_next;
-		goto out;
-	}
-
-	if (l != mm->map_count)
-		tail_vma = NULL; /* After gate vma */
-
-out:
-	if (vma)
 		return vma;
+	}
 
-	if (tail_vma)
-		return tail_vma;
+	if (pos == mm->map_count && priv->tail_vma)
+		return priv->tail_vma;
 
 	vma_stop(priv);
 	return NULL;
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper
  2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
  2014-08-07 19:17   ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
  2014-08-07 19:17   ` [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable Oleg Nesterov
@ 2014-08-07 19:17   ` Oleg Nesterov
  2014-08-07 19:17   ` [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic Oleg Nesterov
  2014-08-07 19:18   ` [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start() Oleg Nesterov
  4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Extract the tail_vma/vm_next calculation from m_next() into the new
trivial helper, m_next_vma().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1349d3d..2ba3b16 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -137,6 +137,14 @@ static void vma_stop(struct proc_maps_private *priv)
 	mmput(mm);
 }
 
+static struct vm_area_struct *
+m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
+{
+	if (vma == priv->tail_vma)
+		return NULL;
+	return vma->vm_next ?: priv->tail_vma;
+}
+
 static void *m_start(struct seq_file *m, loff_t *ppos)
 {
 	struct proc_maps_private *priv = m->private;
@@ -172,13 +180,10 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
 static void *m_next(struct seq_file *m, void *v, loff_t *pos)
 {
 	struct proc_maps_private *priv = m->private;
-	struct vm_area_struct *tail_vma = priv->tail_vma;
-	struct vm_area_struct *vma = v, *next = NULL;
+	struct vm_area_struct *next;
 
 	(*pos)++;
-	if (vma != tail_vma)
-		next = vma->vm_next ?: tail_vma;
-
+	next = m_next_vma(priv, v);
 	if (!next)
 		vma_stop(priv);
 	return next;
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic
  2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
                     ` (2 preceding siblings ...)
  2014-08-07 19:17   ` [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper Oleg Nesterov
@ 2014-08-07 19:17   ` Oleg Nesterov
  2014-08-07 19:18   ` [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start() Oleg Nesterov
  4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Add the "last_addr" optimization back. Like before, every ->show()
method checks !seq_overflow() and sets m->version = vma->vm_start.

However, it also checks that m_next_vma(vma) != NULL, otherwise it
sets m->version = -1 for the lockless "EOF" fast-path in m_start().

m_start() can simply do find_vma() + m_next_vma() if last_addr is
not zero, the code looks clear and simple and this case is clearly
separated from "scan vmas" path.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2ba3b16..1b897dc 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -145,13 +145,24 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
 	return vma->vm_next ?: priv->tail_vma;
 }
 
+static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
+{
+	if (m->count < m->size)	/* vma is copied successfully */
+		m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
+}
+
 static void *m_start(struct seq_file *m, loff_t *ppos)
 {
 	struct proc_maps_private *priv = m->private;
+	unsigned long last_addr = m->version;
 	struct mm_struct *mm;
 	struct vm_area_struct *vma;
 	unsigned int pos = *ppos;
 
+	/* See m_cache_vma(). Zero at the start or after lseek. */
+	if (last_addr == -1UL)
+		return NULL;
+
 	priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
 	if (!priv->task)
 		return ERR_PTR(-ESRCH);
@@ -164,6 +175,13 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
 	hold_task_mempolicy(priv);
 	priv->tail_vma = get_gate_vma(mm);
 
+	if (last_addr) {
+		vma = find_vma(mm, last_addr);
+		if (vma && (vma = m_next_vma(priv, vma)))
+			return vma;
+	}
+
+	m->version = 0;
 	if (pos < mm->map_count) {
 		for (vma = mm->mmap; pos; pos--)
 			vma = vma->vm_next;
@@ -339,6 +357,7 @@ done:
 static int show_map(struct seq_file *m, void *v, int is_pid)
 {
 	show_map_vma(m, v, is_pid);
+	m_cache_vma(m, v);
 	return 0;
 }
 
@@ -616,6 +635,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 				mss.nonlinear >> 10);
 
 	show_smap_vma_flags(m, vma);
+	m_cache_vma(m, vma);
 	return 0;
 }
 
@@ -1446,6 +1466,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
 			seq_printf(m, " N%d=%lu", nid, md->node[nid]);
 out:
 	seq_putc(m, '\n');
+	m_cache_vma(m, vma);
 	return 0;
 }
 
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start()
  2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
                     ` (3 preceding siblings ...)
  2014-08-07 19:17   ` [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic Oleg Nesterov
@ 2014-08-07 19:18   ` Oleg Nesterov
  4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Change the main loop in m_start() to update m->version. Mostly for
consistency, but this can help to avoid the same loop if the very
1st ->show() fails due to seq_overflow().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_mmu.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1b897dc..4ec6b72 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -183,11 +183,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
 
 	m->version = 0;
 	if (pos < mm->map_count) {
-		for (vma = mm->mmap; pos; pos--)
+		for (vma = mm->mmap; pos; pos--) {
+			m->version = vma->vm_start;
 			vma = vma->vm_next;
+		}
 		return vma;
 	}
 
+	/* we do not bother to update m->version in this case */
 	if (pos == mm->map_count && priv->tail_vma)
 		return priv->tail_vma;
 
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c
  2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
                   ` (9 preceding siblings ...)
  2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
@ 2014-08-11 17:00 ` Oleg Nesterov
  2014-08-11 17:00   ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
                     ` (3 more replies)
  10 siblings, 4 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-11 17:00 UTC (permalink / raw)
  To: Andrew Morton, Greg Ungerer
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Greg, could you review? The changes are simple, but I am not familiar
with NOMMU and I can't test this.

Depends on

	[PATCH v2 3/7] proc: introduce proc_mem_open()
	http://marc.info/?l=linux-kernel&m=140726831328943

	[PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
	http://marc.info/?l=linux-kernel&m=140726828328929
	(only because it adds proc_maps_private->mm)

I'll also send you mbox with the previous series just in case.

Oleg.

 fs/proc/task_nommu.c |   65 ++++++++++++++++++++++++++++++--------------------
 1 files changed, 39 insertions(+), 26 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private()
  2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
@ 2014-08-11 17:00   ` Oleg Nesterov
  2014-08-11 17:00   ` [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-11 17:00 UTC (permalink / raw)
  To: Andrew Morton, Greg Ungerer
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Cleanup and preparation. maps_open() can use __seq_open_private()
like proc_maps_open() does.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_nommu.c |   22 +++++++---------------
 1 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 678455d..c12eab3 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -268,21 +268,13 @@ static const struct seq_operations proc_tid_maps_ops = {
 static int maps_open(struct inode *inode, struct file *file,
 		     const struct seq_operations *ops)
 {
-	struct proc_maps_private *priv;
-	int ret = -ENOMEM;
-
-	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
-	if (priv) {
-		priv->pid = proc_pid(inode);
-		ret = seq_open(file, ops);
-		if (!ret) {
-			struct seq_file *m = file->private_data;
-			m->private = priv;
-		} else {
-			kfree(priv);
-		}
-	}
-	return ret;
+	struct proc_maps_private *priv = __seq_open_private(file, ops,
+					 sizeof(struct proc_maps_private));
+	if (!priv)
+		return -ENOMEM;
+
+	priv->pid = proc_pid(inode);
+	return 0;
 }
 
 static int pid_maps_open(struct inode *inode, struct file *file)
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
  2014-08-11 17:00   ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
@ 2014-08-11 17:00   ` Oleg Nesterov
  2014-08-11 17:00   ` [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm Oleg Nesterov
  2014-08-12 13:57   ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
  3 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-11 17:00 UTC (permalink / raw)
  To: Andrew Morton, Greg Ungerer
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Copy-and-paste the changes from "fs/proc/task_mmu.c: shift mm_access()
from m_start() to proc_maps_open()" into task_nommu.c.

Change maps_open() to initialize priv->mm using proc_mem_open(), m_start()
can rely on atomic_inc_not_zero(mm_users) like task_mmu.c does.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_nommu.c |   29 ++++++++++++++++++++++++-----
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index c12eab3..003f2be 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -216,11 +216,11 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 	if (!priv->task)
 		return ERR_PTR(-ESRCH);
 
-	mm = mm_access(priv->task, PTRACE_MODE_READ);
-	if (!mm || IS_ERR(mm)) {
+	mm = priv->mm;
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users)) {
 		put_task_struct(priv->task);
 		priv->task = NULL;
-		return mm;
+		return NULL;
 	}
 	down_read(&mm->mmap_sem);
 
@@ -274,9 +274,28 @@ static int maps_open(struct inode *inode, struct file *file,
 		return -ENOMEM;
 
 	priv->pid = proc_pid(inode);
+	priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	if (IS_ERR(priv->mm)) {
+		int err = PTR_ERR(priv->mm);
+		seq_release_private(inode, file);
+		return err;
+	}
+
 	return 0;
 }
 
+
+static int map_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq = file->private_data;
+	struct proc_maps_private *priv = seq->private;
+
+	if (priv->mm)
+		mmdrop(priv->mm);
+
+	return seq_release_private(inode, file);
+}
+
 static int pid_maps_open(struct inode *inode, struct file *file)
 {
 	return maps_open(inode, file, &proc_pid_maps_ops);
@@ -291,13 +310,13 @@ const struct file_operations proc_pid_maps_operations = {
 	.open		= pid_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= map_release,
 };
 
 const struct file_operations proc_tid_maps_operations = {
 	.open		= tid_maps_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release_private,
+	.release	= map_release,
 };
 
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm
  2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
  2014-08-11 17:00   ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
  2014-08-11 17:00   ` [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
@ 2014-08-11 17:00   ` Oleg Nesterov
  2014-08-12 13:57   ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
  3 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-11 17:00 UTC (permalink / raw)
  To: Andrew Morton, Greg Ungerer
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

I do not know if CONFIG_PREEMPT/SMP is possible without CONFIG_MMU
but the usage of task->mm in m_stop(). The task can exit/exec before
we take mmap_sem, in this case m_stop() can hit NULL or unlock the
wrong rw_semaphore.

Also, this code uses priv->task != NULL to decide whether we need
up_read/mmput. This is correct, but we will probably kill priv->task.
Change m_start/m_stop to rely on IS_ERR_OR_NULL() like task_mmu.c does.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/task_nommu.c |   18 ++++++++++--------
 1 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 003f2be..e0237c1 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -217,17 +217,17 @@ static void *m_start(struct seq_file *m, loff_t *pos)
 		return ERR_PTR(-ESRCH);
 
 	mm = priv->mm;
-	if (!mm || !atomic_inc_not_zero(&mm->mm_users)) {
-		put_task_struct(priv->task);
-		priv->task = NULL;
+	if (!mm || !atomic_inc_not_zero(&mm->mm_users))
 		return NULL;
-	}
-	down_read(&mm->mmap_sem);
 
+	down_read(&mm->mmap_sem);
 	/* start from the Nth VMA */
 	for (p = rb_first(&mm->mm_rb); p; p = rb_next(p))
 		if (n-- == 0)
 			return p;
+
+	up_read(&mm->mmap_sem);
+	mmput(mm);
 	return NULL;
 }
 
@@ -235,11 +235,13 @@ static void m_stop(struct seq_file *m, void *_vml)
 {
 	struct proc_maps_private *priv = m->private;
 
+	if (!IS_ERR_OR_NULL(_vml)) {
+		up_read(&priv->mm->mmap_sem);
+		mmput(priv->mm);
+	}
 	if (priv->task) {
-		struct mm_struct *mm = priv->task->mm;
-		up_read(&mm->mmap_sem);
-		mmput(mm);
 		put_task_struct(priv->task);
+		priv->task = NULL;
 	}
 }
 
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c
  2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
                     ` (2 preceding siblings ...)
  2014-08-11 17:00   ` [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm Oleg Nesterov
@ 2014-08-12 13:57   ` Greg Ungerer
  2014-08-12 17:35     ` Oleg Nesterov
  3 siblings, 1 reply; 35+ messages in thread
From: Greg Ungerer @ 2014-08-12 13:57 UTC (permalink / raw)
  To: Oleg Nesterov, Andrew Morton
  Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
	Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
	Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel

Hi Oleg,

On 12/08/14 03:00, Oleg Nesterov wrote:
> Greg, could you review? The changes are simple, but I am not familiar
> with NOMMU and I can't test this.
>
> Depends on
>
> 	[PATCH v2 3/7] proc: introduce proc_mem_open()
> 	http://marc.info/?l=linux-kernel&m=140726831328943
>
> 	[PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
> 	http://marc.info/?l=linux-kernel&m=140726828328929
> 	(only because it adds proc_maps_private->mm)
>
> I'll also send you mbox with the previous series just in case.
>
> Oleg.
>
>   fs/proc/task_nommu.c |   65 ++++++++++++++++++++++++++++++--------------------
>   1 files changed, 39 insertions(+), 26 deletions(-)

I don't see any problems. Applied cleanly for me (to a 3.16-rc7 tree), 
compiled
cleanly for a non-mmu m68k target, and ran with no problems I could see.
At least I checked /proc/1/maps and that still came out fine. Is there 
anything
else I should check?

I am happy to ack it:

Acked-by: Greg Ungerer <gerg@uclinux.org>

Regards
Greg




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c
  2014-08-12 13:57   ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
@ 2014-08-12 17:35     ` Oleg Nesterov
  0 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-12 17:35 UTC (permalink / raw)
  To: Greg Ungerer
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
	Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
	linux-kernel

On 08/12, Greg Ungerer wrote:
>
> Hi Oleg,
>
> On 12/08/14 03:00, Oleg Nesterov wrote:
>> Greg, could you review? The changes are simple, but I am not familiar
>> with NOMMU and I can't test this.
>>
>> Depends on
>>
>> 	[PATCH v2 3/7] proc: introduce proc_mem_open()
>> 	http://marc.info/?l=linux-kernel&m=140726831328943
>>
>> 	[PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
>> 	http://marc.info/?l=linux-kernel&m=140726828328929
>> 	(only because it adds proc_maps_private->mm)
>>
>> I'll also send you mbox with the previous series just in case.
>>
>> Oleg.
>>
>>   fs/proc/task_nommu.c |   65 ++++++++++++++++++++++++++++++--------------------
>>   1 files changed, 39 insertions(+), 26 deletions(-)
>
> I don't see any problems. Applied cleanly for me (to a 3.16-rc7 tree),
> compiled
> cleanly for a non-mmu m68k target, and ran with no problems I could see.
> At least I checked /proc/1/maps and that still came out fine. Is there
> anything
> else I should check?

Great, hopefully nothing else.

> I am happy to ack it:
>
> Acked-by: Greg Ungerer <gerg@uclinux.org>

Thanks a lot!

I'll send more cleanups once/if Andrew takes the pending patches.

Oleg.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
  2014-08-06 10:05   ` Kirill A. Shutemov
@ 2014-12-03 14:14   ` Kirill A. Shutemov
  2014-12-03 16:59     ` Eric W. Biederman
  2014-12-03 17:34     ` Oleg Nesterov
  1 sibling, 2 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-12-03 14:14 UTC (permalink / raw)
  To: Oleg Nesterov, David S. Miller, Linus Torvalds
  Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
	Eric W. Biederman, Kirill A. Shutemov, Peter Zijlstra,
	Sasha Levin, linux-fsdevel, linux-kernel, Alexey Dobriyan, netdev

On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> A simple test-case from Kirill Shutemov
> 
> 	cat /proc/self/maps >/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet
> 
> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> the opposite order.

Oleg, I see it again with almost the same test-case:

	cat /proc/self/stack >/dev/null
	chmod +x /proc/self/net/packet
	exec /proc/self/net/packet

Looks like bunch of proc files were converted to use seq_file by Alexey
Dobriyan around the same time you've fixed the issue for /proc/pid/maps.

More generic test-case:

	find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
	chmod +x /proc/self/net/packet
	exec /proc/self/net/packet

David, any justification for allowing chmod +x for files under
/proc/pid/net?

[    2.042212] ======================================================
[    2.042930] [ INFO: possible circular locking dependency detected ]
[    2.043648] 3.18.0-rc7-00003-g3a18ca061311-dirty #237 Not tainted
[    2.044350] -------------------------------------------------------
[    2.045054] sh/94 is trying to acquire lock:
[    2.045546]  (&p->lock){+.+.+.}, at: [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[    2.045781] 
[    2.045781] but task is already holding lock:
[    2.045781]  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
[    2.045781] 
[    2.045781] which lock already depends on the new lock.
[    2.045781] 
[    2.045781] 
[    2.045781] the existing dependency chain (in reverse order) is:
[    2.045781] 
-> #1 (&sig->cred_guard_mutex){+.+.+.}:
[    2.045781]        [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[    2.045781]        [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[    2.045781]        [<ffffffff81849da6>] mutex_lock_killable_nested+0x66/0x460
[    2.045781]        [<ffffffff81229de4>] lock_trace+0x24/0x70
[    2.045781]        [<ffffffff81229e8f>] proc_pid_stack+0x5f/0xe0
[    2.045781]        [<ffffffff81227244>] proc_single_show+0x54/0xa0
[    2.045781]        [<ffffffff811e13a0>] seq_read+0xe0/0x3e0
[    2.045781]        [<ffffffff811b9377>] vfs_read+0x97/0x180
[    2.045781]        [<ffffffff811b9f5d>] SyS_read+0x4d/0xc0
[    2.045781]        [<ffffffff8184e492>] system_call_fastpath+0x12/0x17
[    2.045781] 
-> #0 (&p->lock){+.+.+.}:
[    2.045781]        [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
[    2.045781]        [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[    2.045781]        [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[    2.045781]        [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
[    2.045781]        [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[    2.045781]        [<ffffffff81226428>] proc_reg_read+0x48/0x70
[    2.045781]        [<ffffffff811b9377>] vfs_read+0x97/0x180
[    2.045781]        [<ffffffff811bf1a8>] kernel_read+0x48/0x60
[    2.045781]        [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
[    2.045781]        [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
[    2.045781]        [<ffffffff811c1818>] do_execve+0x18/0x20
[    2.045781]        [<ffffffff811c1b05>] SyS_execve+0x25/0x30
[    2.045781]        [<ffffffff8184ea49>] stub_execve+0x69/0xa0
[    2.045781] 
[    2.045781] other info that might help us debug this:
[    2.045781] 
[    2.045781]  Possible unsafe locking scenario:
[    2.045781] 
[    2.045781]        CPU0                    CPU1
[    2.045781]        ----                    ----
[    2.045781]   lock(&sig->cred_guard_mutex);
[    2.045781]                                lock(&p->lock);
[    2.045781]                                lock(&sig->cred_guard_mutex);
[    2.045781]   lock(&p->lock);
[    2.045781] 
[    2.045781]  *** DEADLOCK ***
[    2.045781] 
[    2.045781] 1 lock held by sh/94:
[    2.045781]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
[    2.045781] 
[    2.045781] stack backtrace:
[    2.045781] CPU: 0 PID: 94 Comm: sh Not tainted 3.18.0-rc7-00003-g3a18ca061311-dirty #237
[    2.045781] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.045781]  ffffffff82a48d50 ffff88085427bad8 ffffffff81844a85 0000000000000cac
[    2.045781]  ffffffff82a654a0 ffff88085427bb28 ffffffff810a1b03 0000000000000000
[    2.045781]  ffff88085427bb68 ffff88085427bb28 ffff8808547f1500 ffff8808547f1c40
[    2.045781] Call Trace:
[    2.045781]  [<ffffffff81844a85>] dump_stack+0x4e/0x68
[    2.045781]  [<ffffffff810a1b03>] print_circular_bug+0x203/0x310
[    2.045781]  [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
[    2.045781]  [<ffffffff8108fa76>] ? local_clock+0x16/0x30
[    2.045781]  [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[    2.045781]  [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[    2.045781]  [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
[    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[    2.045781]  [<ffffffff8108f9f8>] ? sched_clock_cpu+0x98/0xc0
[    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[    2.045781]  [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
[    2.045781]  [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[    2.045781]  [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
[    2.045781]  [<ffffffff81226428>] proc_reg_read+0x48/0x70
[    2.045781]  [<ffffffff811b9377>] vfs_read+0x97/0x180
[    2.045781]  [<ffffffff811bf1a8>] kernel_read+0x48/0x60
[    2.045781]  [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
[    2.045781]  [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
[    2.092142] tsc: Refined TSC clocksource calibration: 2693.484 MHz
[    2.045781]  [<ffffffff811c0fd3>] ? do_execve_common.isra.29+0x133/0x960
[    2.045781]  [<ffffffff8184f04d>] ? retint_swapgs+0xe/0x13
[    2.045781]  [<ffffffff811c1818>] do_execve+0x18/0x20
[    2.045781]  [<ffffffff811c1b05>] SyS_execve+0x25/0x30
[    2.045781]  [<ffffffff8184ea49>] stub_execve+0x69/0xa0
-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-12-03 14:14   ` Kirill A. Shutemov
@ 2014-12-03 16:59     ` Eric W. Biederman
  2014-12-04 16:17       ` Kirill A. Shutemov
  2014-12-03 17:34     ` Oleg Nesterov
  1 sibling, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2014-12-03 16:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Oleg Nesterov, David S. Miller, Linus Torvalds, Andrew Morton,
	Alexander Viro, Cyrill Gorcunov, David Howells,
	Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
	linux-kernel, Alexey Dobriyan, netdev

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
>> A simple test-case from Kirill Shutemov
>> 
>> 	cat /proc/self/maps >/dev/null
>> 	chmod +x /proc/self/net/packet
>> 	exec /proc/self/net/packet
>> 
>> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
>> the opposite order.
>
> Oleg, I see it again with almost the same test-case:
>
> 	cat /proc/self/stack >/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet
>
> Looks like bunch of proc files were converted to use seq_file by Alexey
> Dobriyan around the same time you've fixed the issue for /proc/pid/maps.
>
> More generic test-case:
>
> 	find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet
>
> David, any justification for allowing chmod +x for files under
> /proc/pid/net?

I don't think there are any good reasons for allowing chmod +x for the
proc generic files.   Certainly executing any of them is nonsense.

I do recall some weird conner cases existing.  I think they resulted
in a need to preserve chmod if not chmod +x.  This is just me saying
tread carefully before you change anything.

It really should be safe to tweak proc_notify_change to not allow
messing with the executable bits of proc files.

> [    2.042212] ======================================================
> [    2.042930] [ INFO: possible circular locking dependency detected ]
> [    2.043648] 3.18.0-rc7-00003-g3a18ca061311-dirty #237 Not tainted
> [    2.044350] -------------------------------------------------------
> [    2.045054] sh/94 is trying to acquire lock:
> [    2.045546]  (&p->lock){+.+.+.}, at: [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [    2.045781] 
> [    2.045781] but task is already holding lock:
> [    2.045781]  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
> [    2.045781] 
> [    2.045781] which lock already depends on the new lock.
> [    2.045781] 
> [    2.045781] 
> [    2.045781] the existing dependency chain (in reverse order) is:
> [    2.045781] 
> -> #1 (&sig->cred_guard_mutex){+.+.+.}:
> [    2.045781]        [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [    2.045781]        [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [    2.045781]        [<ffffffff81849da6>] mutex_lock_killable_nested+0x66/0x460
> [    2.045781]        [<ffffffff81229de4>] lock_trace+0x24/0x70
> [    2.045781]        [<ffffffff81229e8f>] proc_pid_stack+0x5f/0xe0
> [    2.045781]        [<ffffffff81227244>] proc_single_show+0x54/0xa0
> [    2.045781]        [<ffffffff811e13a0>] seq_read+0xe0/0x3e0
> [    2.045781]        [<ffffffff811b9377>] vfs_read+0x97/0x180
> [    2.045781]        [<ffffffff811b9f5d>] SyS_read+0x4d/0xc0
> [    2.045781]        [<ffffffff8184e492>] system_call_fastpath+0x12/0x17
> [    2.045781] 
> -> #0 (&p->lock){+.+.+.}:
> [    2.045781]        [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
> [    2.045781]        [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [    2.045781]        [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [    2.045781]        [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
> [    2.045781]        [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [    2.045781]        [<ffffffff81226428>] proc_reg_read+0x48/0x70
> [    2.045781]        [<ffffffff811b9377>] vfs_read+0x97/0x180
> [    2.045781]        [<ffffffff811bf1a8>] kernel_read+0x48/0x60
> [    2.045781]        [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
> [    2.045781]        [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
> [    2.045781]        [<ffffffff811c1818>] do_execve+0x18/0x20
> [    2.045781]        [<ffffffff811c1b05>] SyS_execve+0x25/0x30
> [    2.045781]        [<ffffffff8184ea49>] stub_execve+0x69/0xa0
> [    2.045781] 
> [    2.045781] other info that might help us debug this:
> [    2.045781] 
> [    2.045781]  Possible unsafe locking scenario:
> [    2.045781] 
> [    2.045781]        CPU0                    CPU1
> [    2.045781]        ----                    ----
> [    2.045781]   lock(&sig->cred_guard_mutex);
> [    2.045781]                                lock(&p->lock);
> [    2.045781]                                lock(&sig->cred_guard_mutex);
> [    2.045781]   lock(&p->lock);
> [    2.045781] 
> [    2.045781]  *** DEADLOCK ***
> [    2.045781] 
> [    2.045781] 1 lock held by sh/94:
> [    2.045781]  #0:  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
> [    2.045781] 
> [    2.045781] stack backtrace:
> [    2.045781] CPU: 0 PID: 94 Comm: sh Not tainted 3.18.0-rc7-00003-g3a18ca061311-dirty #237
> [    2.045781] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> [    2.045781]  ffffffff82a48d50 ffff88085427bad8 ffffffff81844a85 0000000000000cac
> [    2.045781]  ffffffff82a654a0 ffff88085427bb28 ffffffff810a1b03 0000000000000000
> [    2.045781]  ffff88085427bb68 ffff88085427bb28 ffff8808547f1500 ffff8808547f1c40
> [    2.045781] Call Trace:
> [    2.045781]  [<ffffffff81844a85>] dump_stack+0x4e/0x68
> [    2.045781]  [<ffffffff810a1b03>] print_circular_bug+0x203/0x310
> [    2.045781]  [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
> [    2.045781]  [<ffffffff8108fa76>] ? local_clock+0x16/0x30
> [    2.045781]  [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [    2.045781]  [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [    2.045781]  [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
> [    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [    2.045781]  [<ffffffff8108f9f8>] ? sched_clock_cpu+0x98/0xc0
> [    2.045781]  [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [    2.045781]  [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
> [    2.045781]  [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [    2.045781]  [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
> [    2.045781]  [<ffffffff81226428>] proc_reg_read+0x48/0x70
> [    2.045781]  [<ffffffff811b9377>] vfs_read+0x97/0x180
> [    2.045781]  [<ffffffff811bf1a8>] kernel_read+0x48/0x60
> [    2.045781]  [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
> [    2.045781]  [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
> [    2.092142] tsc: Refined TSC clocksource calibration: 2693.484 MHz
> [    2.045781]  [<ffffffff811c0fd3>] ? do_execve_common.isra.29+0x133/0x960
> [    2.045781]  [<ffffffff8184f04d>] ? retint_swapgs+0xe/0x13
> [    2.045781]  [<ffffffff811c1818>] do_execve+0x18/0x20
> [    2.045781]  [<ffffffff811c1b05>] SyS_execve+0x25/0x30
> [    2.045781]  [<ffffffff8184ea49>] stub_execve+0x69/0xa0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-12-03 14:14   ` Kirill A. Shutemov
  2014-12-03 16:59     ` Eric W. Biederman
@ 2014-12-03 17:34     ` Oleg Nesterov
  1 sibling, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-12-03 17:34 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: David S. Miller, Linus Torvalds, Andrew Morton, Alexander Viro,
	Cyrill Gorcunov, David Howells, Eric W. Biederman,
	Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
	linux-kernel, Alexey Dobriyan, netdev

On 12/03, Kirill A. Shutemov wrote:
>
> On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> > A simple test-case from Kirill Shutemov
> >
> > 	cat /proc/self/maps >/dev/null
> > 	chmod +x /proc/self/net/packet
> > 	exec /proc/self/net/packet
> >
> > makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> > the opposite order.
>
> Oleg, I see it again with almost the same test-case:
>
> 	cat /proc/self/stack >/dev/null
> 	chmod +x /proc/self/net/packet
> 	exec /proc/self/net/packet

Yes, there are more lock_trace/mm_access (ab)users. Fortunately, they
are much simpler than proc/pid/maps (which also asked for other cleanups
and fixes).

I'll try to take a look, thanks for reminding.

And I agree with Eric, chmod+x probably makes no sense. Still I think
this code deserves some cleanups regardless. To the point I think that
lock_trace() should probably die.

Thanks!

Oleg.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
  2014-12-03 16:59     ` Eric W. Biederman
@ 2014-12-04 16:17       ` Kirill A. Shutemov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-12-04 16:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Oleg Nesterov, David S. Miller, Linus Torvalds, Andrew Morton,
	Alexander Viro, Cyrill Gorcunov, David Howells,
	Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
	linux-kernel, Alexey Dobriyan, netdev

On Wed, Dec 03, 2014 at 10:59:57AM -0600, Eric W. Biederman wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
> 
> > On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> >> A simple test-case from Kirill Shutemov
> >> 
> >> 	cat /proc/self/maps >/dev/null
> >> 	chmod +x /proc/self/net/packet
> >> 	exec /proc/self/net/packet
> >> 
> >> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> >> the opposite order.
> >
> > Oleg, I see it again with almost the same test-case:
> >
> > 	cat /proc/self/stack >/dev/null
> > 	chmod +x /proc/self/net/packet
> > 	exec /proc/self/net/packet
> >
> > Looks like bunch of proc files were converted to use seq_file by Alexey
> > Dobriyan around the same time you've fixed the issue for /proc/pid/maps.
> >
> > More generic test-case:
> >
> > 	find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
> > 	chmod +x /proc/self/net/packet
> > 	exec /proc/self/net/packet
> >
> > David, any justification for allowing chmod +x for files under
> > /proc/pid/net?
> 
> I don't think there are any good reasons for allowing chmod +x for the
> proc generic files.   Certainly executing any of them is nonsense.
> 
> I do recall some weird conner cases existing.  I think they resulted
> in a need to preserve chmod if not chmod +x.  This is just me saying
> tread carefully before you change anything.
> 
> It really should be safe to tweak proc_notify_change to not allow
> messing with the executable bits of proc files.

BTW, we have MS_NOSUID and MS_NOEXEC set in ->s_flags for procfs since
2006 -- see 92d032855e64.

But there's no code which would translate them into vfsmount->mnt_flags |=
MNT_NOSUID/MNT_NOEXEC and we bypast nosuid/noexec checks on exec path.

Hm?..

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-12-04 16:17 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
2014-08-06  9:52   ` Kirill A. Shutemov
2014-08-06 14:10     ` Oleg Nesterov
2014-08-06 19:41       ` Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
2014-08-06  9:55   ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
2014-08-06  9:57   ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
2014-08-06 10:05   ` Kirill A. Shutemov
2014-12-03 14:14   ` Kirill A. Shutemov
2014-12-03 16:59     ` Eric W. Biederman
2014-12-04 16:17       ` Kirill A. Shutemov
2014-12-03 17:34     ` Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
2014-08-06 10:12   ` Kirill A. Shutemov
2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
2014-08-06 10:17   ` Kirill A. Shutemov
2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
2014-08-06 10:18   ` Kirill A. Shutemov
2014-08-06 10:19 ` [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Kirill A. Shutemov
2014-08-06 18:48 ` Cyrill Gorcunov
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
2014-08-07 19:17   ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
2014-08-07 19:17   ` [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable Oleg Nesterov
2014-08-07 19:17   ` [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper Oleg Nesterov
2014-08-07 19:17   ` [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic Oleg Nesterov
2014-08-07 19:18   ` [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start() Oleg Nesterov
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
2014-08-11 17:00   ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
2014-08-11 17:00   ` [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
2014-08-11 17:00   ` [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm Oleg Nesterov
2014-08-12 13:57   ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
2014-08-12 17:35     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).