* [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups
@ 2014-08-05 19:46 Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
` (10 more replies)
0 siblings, 11 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Changes:
1-4: Update the changelogs, join "introduce the stable
proc_maps_private->mm" and "change m_start() to rely on
priv->mm" into a single "shift "priv->task = NULL" from
m_start() to m_stop()".
Resulting code is the same. Kirill and Cyrill, you seem
to agree with these changes, I'll appreciate your acks.
5-7: New. Seems to work, please review.
Todo:
- Cleanup the tail_vma horror in m_start
- Update task_nommu.c in the same way
- Fix lock_trace() users
Oleg.
fs/proc/base.c | 36 ++++++++------
fs/proc/internal.h | 3 +
fs/proc/task_mmu.c | 136 ++++++++++++++++++++++++++-------------------------
3 files changed, 93 insertions(+), 82 deletions(-)
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
2014-08-06 9:52 ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
` (9 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
get_gate_vma(priv->task->mm) looks ugly and wrong, task->mm can be
NULL or it can changed by exec right after mm_access().
And in theory this race is not harmless, the task can exec and then
later exit and free the new mm_struct. In this case get_task_mm(oldmm)
can't help, get_gate_vma(task->mm) can read the freed/unmapped memory.
I think that priv->task should simply die and hold_task_mempolicy()
logic can be simplified. tail_vma logic asks for cleanups too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..8a5271e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -170,7 +170,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
return mm;
down_read(&mm->mmap_sem);
- tail_vma = get_gate_vma(priv->task->mm);
+ tail_vma = get_gate_vma(mm);
priv->tail_vma = tail_vma;
hold_task_mempolicy(priv);
/* Start with last addr hint */
@@ -351,12 +351,11 @@ static int show_map(struct seq_file *m, void *v, int is_pid)
{
struct vm_area_struct *vma = v;
struct proc_maps_private *priv = m->private;
- struct task_struct *task = priv->task;
show_map_vma(m, vma, is_pid);
if (m->count < m->size) /* vma is copied successfully */
- m->version = (vma != get_gate_vma(task->mm))
+ m->version = (vma != priv->tail_vma)
? vma->vm_start : 0;
return 0;
}
@@ -584,7 +583,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
static int show_smap(struct seq_file *m, void *v, int is_pid)
{
struct proc_maps_private *priv = m->private;
- struct task_struct *task = priv->task;
struct vm_area_struct *vma = v;
struct mem_size_stats mss;
struct mm_walk smaps_walk = {
@@ -639,7 +637,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
show_smap_vma_flags(m, vma);
if (m->count < m->size) /* vma is copied successfully */
- m->version = (vma != get_gate_vma(task->mm))
+ m->version = (vma != priv->tail_vma)
? vma->vm_start : 0;
return 0;
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open()
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
2014-08-06 9:55 ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
` (8 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
do_maps_open() and numa_maps_open() are overcomplicated, they could
use __seq_open_private(). Plus they do the same, just sizeof(*priv)
differs.
Change them to use a new simple helper, proc_maps_open(ops, psize).
This simplifies the code and allows us to do the next changes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 44 ++++++++++++++++----------------------------
1 files changed, 16 insertions(+), 28 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 8a5271e..794aeb6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -231,23 +231,23 @@ static void m_stop(struct seq_file *m, void *v)
put_task_struct(priv->task);
}
+static int proc_maps_open(struct inode *inode, struct file *file,
+ const struct seq_operations *ops, int psize)
+{
+ struct proc_maps_private *priv = __seq_open_private(file, ops, psize);
+
+ if (!priv)
+ return -ENOMEM;
+
+ priv->pid = proc_pid(inode);
+ return 0;
+}
+
static int do_maps_open(struct inode *inode, struct file *file,
const struct seq_operations *ops)
{
- struct proc_maps_private *priv;
- int ret = -ENOMEM;
- priv = kzalloc(sizeof(*priv), GFP_KERNEL);
- if (priv) {
- priv->pid = proc_pid(inode);
- ret = seq_open(file, ops);
- if (!ret) {
- struct seq_file *m = file->private_data;
- m->private = priv;
- } else {
- kfree(priv);
- }
- }
- return ret;
+ return proc_maps_open(inode, file, ops,
+ sizeof(struct proc_maps_private));
}
static void
@@ -1502,20 +1502,8 @@ static const struct seq_operations proc_tid_numa_maps_op = {
static int numa_maps_open(struct inode *inode, struct file *file,
const struct seq_operations *ops)
{
- struct numa_maps_private *priv;
- int ret = -ENOMEM;
- priv = kzalloc(sizeof(*priv), GFP_KERNEL);
- if (priv) {
- priv->proc_maps.pid = proc_pid(inode);
- ret = seq_open(file, ops);
- if (!ret) {
- struct seq_file *m = file->private_data;
- m->private = priv;
- } else {
- kfree(priv);
- }
- }
- return ret;
+ return proc_maps_open(inode, file, ops,
+ sizeof(struct numa_maps_private));
}
static int pid_numa_maps_open(struct inode *inode, struct file *file)
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 3/7] proc: introduce proc_mem_open()
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
2014-08-06 9:57 ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
` (7 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Extract the mm_access() code from __mem_open() into the new helper,
proc_mem_open(), the next patch will add another caller.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/base.c | 36 +++++++++++++++++++++---------------
fs/proc/internal.h | 2 ++
2 files changed, 23 insertions(+), 15 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2d696b0..9aef9ac 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -667,29 +667,35 @@ static const struct file_operations proc_single_file_operations = {
.release = single_release,
};
-static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
+
+struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
{
- struct task_struct *task = get_proc_task(file_inode(file));
- struct mm_struct *mm;
+ struct task_struct *task = get_proc_task(inode);
+ struct mm_struct *mm = ERR_PTR(-ESRCH);
- if (!task)
- return -ESRCH;
+ if (task) {
+ mm = mm_access(task, mode);
+ put_task_struct(task);
- mm = mm_access(task, mode);
- put_task_struct(task);
+ if (!IS_ERR_OR_NULL(mm)) {
+ /* ensure this mm_struct can't be freed */
+ atomic_inc(&mm->mm_count);
+ /* but do not pin its memory */
+ mmput(mm);
+ }
+ }
+
+ return mm;
+}
+
+static int __mem_open(struct inode *inode, struct file *file, unsigned int mode)
+{
+ struct mm_struct *mm = proc_mem_open(inode, mode);
if (IS_ERR(mm))
return PTR_ERR(mm);
- if (mm) {
- /* ensure this mm_struct can't be freed */
- atomic_inc(&mm->mm_count);
- /* but do not pin its memory */
- mmput(mm);
- }
-
file->private_data = mm;
-
return 0;
}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 3ab6d14..ba0c1c1 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -275,6 +275,8 @@ struct proc_maps_private {
#endif
};
+struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode);
+
extern const struct file_operations proc_pid_maps_operations;
extern const struct file_operations proc_tid_maps_operations;
extern const struct file_operations proc_pid_numa_maps_operations;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
` (2 preceding siblings ...)
2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
2014-08-06 10:05 ` Kirill A. Shutemov
2014-12-03 14:14 ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
` (6 subsequent siblings)
10 siblings, 2 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
A simple test-case from Kirill Shutemov
cat /proc/self/maps >/dev/null
chmod +x /proc/self/net/packet
exec /proc/self/net/packet
makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
the opposite order.
It's a false positive and probably we should not allow "chmod +x" on proc
files. Still I think that we should avoid mm_access() and cred_guard_mutex
in sys_read() paths, security checking should happen at open time. Besides,
this doesn't even look right if the task changes its ->mm between m_stop()
and m_start().
Add the new "mm_struct *mm" member into struct proc_maps_private and change
proc_maps_open() to initialize it using proc_mem_open(). Change m_start() to
use priv->mm if atomic_inc_not_zero(mm_users) succeeds or return NULL (eof)
otherwise.
The only complication is that proc_maps_open() users should additionally do
mmdrop() in fop->release(), add the new proc_map_release() helper for that.
Note: this is the user-visible change, if the task execs after open("maps")
the new ->mm won't be visible via this file. I hope this is fine, and this
matches /proc/pid/mem bahaviour.
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/internal.h | 1 +
fs/proc/task_mmu.c | 37 ++++++++++++++++++++++++++++---------
2 files changed, 29 insertions(+), 9 deletions(-)
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index ba0c1c1..78784cd 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -267,6 +267,7 @@ extern int proc_remount(struct super_block *, int *, char *);
struct proc_maps_private {
struct pid *pid;
struct task_struct *task;
+ struct mm_struct *mm;
#ifdef CONFIG_MMU
struct vm_area_struct *tail_vma;
#endif
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 794aeb6..7ec8eb5 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -165,9 +165,9 @@ static void *m_start(struct seq_file *m, loff_t *pos)
if (!priv->task)
return ERR_PTR(-ESRCH);
- mm = mm_access(priv->task, PTRACE_MODE_READ);
- if (!mm || IS_ERR(mm))
- return mm;
+ mm = priv->mm;
+ if (!mm || !atomic_inc_not_zero(&mm->mm_users))
+ return NULL;
down_read(&mm->mmap_sem);
tail_vma = get_gate_vma(mm);
@@ -240,9 +240,28 @@ static int proc_maps_open(struct inode *inode, struct file *file,
return -ENOMEM;
priv->pid = proc_pid(inode);
+ priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+ if (IS_ERR(priv->mm)) {
+ int err = PTR_ERR(priv->mm);
+ seq_release_private(inode, file);
+ return err;
+ }
+
return 0;
}
+static int proc_map_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *seq = file->private_data;
+ struct proc_maps_private *priv = seq->private;
+
+ if (priv->mm)
+ mmdrop(priv->mm);
+
+ return seq_release_private(inode, file);
+}
+
+
static int do_maps_open(struct inode *inode, struct file *file,
const struct seq_operations *ops)
{
@@ -398,14 +417,14 @@ const struct file_operations proc_pid_maps_operations = {
.open = pid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = proc_map_release,
};
const struct file_operations proc_tid_maps_operations = {
.open = tid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = proc_map_release,
};
/*
@@ -680,14 +699,14 @@ const struct file_operations proc_pid_smaps_operations = {
.open = pid_smaps_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = proc_map_release,
};
const struct file_operations proc_tid_smaps_operations = {
.open = tid_smaps_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = proc_map_release,
};
/*
@@ -1520,13 +1539,13 @@ const struct file_operations proc_pid_numa_maps_operations = {
.open = pid_numa_maps_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = proc_map_release,
};
const struct file_operations proc_tid_numa_maps_operations = {
.open = tid_numa_maps_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = proc_map_release,
};
#endif /* CONFIG_NUMA */
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
` (3 preceding siblings ...)
2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
@ 2014-08-05 19:46 ` Oleg Nesterov
2014-08-06 10:12 ` Kirill A. Shutemov
2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
` (5 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:46 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
m_start() drops ->mmap_sem and does mmput() if it retuns vsyscall
vma. This is because in this case m_stop()->vma_stop() obviously
can't use gate_vma->vm_mm.
Now that we have proc_maps_private->mm we can simplify this logic:
- Change m_start() to return with ->mmap_sem held unless it returns
IS_ERR_OR_NULL().
- Change vma_stop() to use priv->mm and avoid the ugly vma checks,
this makes "vm_area_struct *vma" unnecessary.
- This also allows m_start() to use vm_stop().
- Cleanup m_next() to follow the new locking rule.
Note: m_stop() looks very ugly, and this temporary uglifies it
even more. Fixed by the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 34 ++++++++++++++++++----------------
1 files changed, 18 insertions(+), 16 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 7ec8eb5..f1254b4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -129,14 +129,12 @@ static void release_task_mempolicy(struct proc_maps_private *priv)
}
#endif
-static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma)
+static void vma_stop(struct proc_maps_private *priv)
{
- if (vma && vma != priv->tail_vma) {
- struct mm_struct *mm = vma->vm_mm;
- release_task_mempolicy(priv);
- up_read(&mm->mmap_sem);
- mmput(mm);
- }
+ struct mm_struct *mm = priv->mm;
+ release_task_mempolicy(priv);
+ up_read(&mm->mmap_sem);
+ mmput(mm);
}
static void *m_start(struct seq_file *m, loff_t *pos)
@@ -199,12 +197,13 @@ out:
if (vma)
return vma;
- release_task_mempolicy(priv);
/* End of vmas has been reached */
m->version = (tail_vma != NULL)? 0: -1UL;
- up_read(&mm->mmap_sem);
- mmput(mm);
- return tail_vma;
+ if (tail_vma)
+ return tail_vma;
+
+ vma_stop(priv);
+ return NULL;
}
static void *m_next(struct seq_file *m, void *v, loff_t *pos)
@@ -212,21 +211,24 @@ static void *m_next(struct seq_file *m, void *v, loff_t *pos)
struct proc_maps_private *priv = m->private;
struct vm_area_struct *vma = v;
struct vm_area_struct *tail_vma = priv->tail_vma;
+ struct vm_area_struct *next;
(*pos)++;
if (vma && (vma != tail_vma) && vma->vm_next)
return vma->vm_next;
- vma_stop(priv, vma);
- return (vma != tail_vma)? tail_vma: NULL;
+
+ next = (vma != tail_vma)? tail_vma: NULL;
+ if (!next)
+ vma_stop(priv);
+ return next;
}
static void m_stop(struct seq_file *m, void *v)
{
struct proc_maps_private *priv = m->private;
- struct vm_area_struct *vma = v;
- if (!IS_ERR(vma))
- vma_stop(priv, vma);
+ if (!IS_ERR_OR_NULL(v))
+ vma_stop(priv);
if (priv->task)
put_task_struct(priv->task);
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next()
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
` (4 preceding siblings ...)
2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
@ 2014-08-05 19:47 ` Oleg Nesterov
2014-08-06 10:17 ` Kirill A. Shutemov
2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
` (4 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:47 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
1. Kill the first "vma != NULL" check. Firstly this is not possible,
m_next() won't be called if ->start() or the previous ->next()
returns NULL.
And if it was possible the 2nd "vma != tail_vma" check is buggy,
we should not wrongly return ->tail_vma.
2. Make this function readable. The logic is very simple, we should
return check "vma != tail" once and return "vm_next || tail_vma".
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 8 +++-----
1 files changed, 3 insertions(+), 5 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f1254b4..c4c8825 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -209,15 +209,13 @@ out:
static void *m_next(struct seq_file *m, void *v, loff_t *pos)
{
struct proc_maps_private *priv = m->private;
- struct vm_area_struct *vma = v;
struct vm_area_struct *tail_vma = priv->tail_vma;
- struct vm_area_struct *next;
+ struct vm_area_struct *vma = v, *next = NULL;
(*pos)++;
- if (vma && (vma != tail_vma) && vma->vm_next)
- return vma->vm_next;
+ if (vma != tail_vma)
+ next = vma->vm_next ?: tail_vma;
- next = (vma != tail_vma)? tail_vma: NULL;
if (!next)
vma_stop(priv);
return next;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop()
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
` (5 preceding siblings ...)
2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
@ 2014-08-05 19:47 ` Oleg Nesterov
2014-08-06 10:18 ` Kirill A. Shutemov
2014-08-06 10:19 ` [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Kirill A. Shutemov
` (3 subsequent siblings)
10 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-05 19:47 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
1. There is no reason to reset ->tail_vma in m_start(), if we return
IS_ERR_OR_NULL() it won't be used.
2. m_start() also clears priv->task to ensure that m_stop() won't use
the stale pointer if we fail before get_task_struct(). But this is
ugly and confusing, move this initialization in m_stop().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index c4c8825..5f7fb45 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -145,17 +145,12 @@ static void *m_start(struct seq_file *m, loff_t *pos)
struct vm_area_struct *vma, *tail_vma = NULL;
loff_t l = *pos;
- /* Clear the per syscall fields in priv */
- priv->task = NULL;
- priv->tail_vma = NULL;
-
/*
* We remember last_addr rather than next_addr to hit with
* vmacache most of the time. We have zero last_addr at
* the beginning and also after lseek. We will have -1 last_addr
* after the end of the vmas.
*/
-
if (last_addr == -1UL)
return NULL;
@@ -227,8 +222,10 @@ static void m_stop(struct seq_file *m, void *v)
if (!IS_ERR_OR_NULL(v))
vma_stop(priv);
- if (priv->task)
+ if (priv->task) {
put_task_struct(priv->task);
+ priv->task = NULL;
+ }
}
static int proc_maps_open(struct inode *inode, struct file *file,
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
@ 2014-08-06 9:52 ` Kirill A. Shutemov
2014-08-06 14:10 ` Oleg Nesterov
0 siblings, 1 reply; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 9:52 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:46:44PM +0200, Oleg Nesterov wrote:
> get_gate_vma(priv->task->mm) looks ugly and wrong, task->mm can be
> NULL or it can changed by exec right after mm_access().
>
> And in theory this race is not harmless, the task can exec and then
> later exit and free the new mm_struct. In this case get_task_mm(oldmm)
> can't help, get_gate_vma(task->mm) can read the freed/unmapped memory.
>
> I think that priv->task should simply die and hold_task_mempolicy()
> logic can be simplified. tail_vma logic asks for cleanups too.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> fs/proc/task_mmu.c | 8 +++-----
> 1 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index cfa63ee..8a5271e 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -170,7 +170,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
> return mm;
> down_read(&mm->mmap_sem);
>
> - tail_vma = get_gate_vma(priv->task->mm);
> + tail_vma = get_gate_vma(mm);
> priv->tail_vma = tail_vma;
> hold_task_mempolicy(priv);
> /* Start with last addr hint */
> @@ -351,12 +351,11 @@ static int show_map(struct seq_file *m, void *v, int is_pid)
> {
> struct vm_area_struct *vma = v;
> struct proc_maps_private *priv = m->private;
> - struct task_struct *task = priv->task;
>
> show_map_vma(m, vma, is_pid);
>
> if (m->count < m->size) /* vma is copied successfully */
> - m->version = (vma != get_gate_vma(task->mm))
> + m->version = (vma != priv->tail_vma)
> ? vma->vm_start : 0;
Drop excessive parenthesis while you're there? And in the next hunk.
Otherwise:
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> return 0;
> }
> @@ -584,7 +583,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
> static int show_smap(struct seq_file *m, void *v, int is_pid)
> {
> struct proc_maps_private *priv = m->private;
> - struct task_struct *task = priv->task;
> struct vm_area_struct *vma = v;
> struct mem_size_stats mss;
> struct mm_walk smaps_walk = {
> @@ -639,7 +637,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
> show_smap_vma_flags(m, vma);
>
> if (m->count < m->size) /* vma is copied successfully */
> - m->version = (vma != get_gate_vma(task->mm))
> + m->version = (vma != priv->tail_vma)
> ? vma->vm_start : 0;
> return 0;
> }
> --
> 1.5.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open()
2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
@ 2014-08-06 9:55 ` Kirill A. Shutemov
0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 9:55 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:46:48PM +0200, Oleg Nesterov wrote:
> do_maps_open() and numa_maps_open() are overcomplicated, they could
> use __seq_open_private(). Plus they do the same, just sizeof(*priv)
> differs.
>
> Change them to use a new simple helper, proc_maps_open(ops, psize).
> This simplifies the code and allows us to do the next changes.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 3/7] proc: introduce proc_mem_open()
2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
@ 2014-08-06 9:57 ` Kirill A. Shutemov
0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 9:57 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:46:52PM +0200, Oleg Nesterov wrote:
> Extract the mm_access() code from __mem_open() into the new helper,
> proc_mem_open(), the next patch will add another caller.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
@ 2014-08-06 10:05 ` Kirill A. Shutemov
2014-12-03 14:14 ` Kirill A. Shutemov
1 sibling, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:05 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> A simple test-case from Kirill Shutemov
>
> cat /proc/self/maps >/dev/null
> chmod +x /proc/self/net/packet
> exec /proc/self/net/packet
>
> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> the opposite order.
>
> It's a false positive and probably we should not allow "chmod +x" on proc
> files. Still I think that we should avoid mm_access() and cred_guard_mutex
> in sys_read() paths, security checking should happen at open time. Besides,
> this doesn't even look right if the task changes its ->mm between m_stop()
> and m_start().
>
> Add the new "mm_struct *mm" member into struct proc_maps_private and change
> proc_maps_open() to initialize it using proc_mem_open(). Change m_start() to
> use priv->mm if atomic_inc_not_zero(mm_users) succeeds or return NULL (eof)
> otherwise.
>
> The only complication is that proc_maps_open() users should additionally do
> mmdrop() in fop->release(), add the new proc_map_release() helper for that.
>
> Note: this is the user-visible change, if the task execs after open("maps")
> the new ->mm won't be visible via this file. I hope this is fine, and this
> matches /proc/pid/mem bahaviour.
>
> Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic
2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
@ 2014-08-06 10:12 ` Kirill A. Shutemov
0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:12 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:46:59PM +0200, Oleg Nesterov wrote:
> m_start() drops ->mmap_sem and does mmput() if it retuns vsyscall
> vma. This is because in this case m_stop()->vma_stop() obviously
> can't use gate_vma->vm_mm.
>
> Now that we have proc_maps_private->mm we can simplify this logic:
>
> - Change m_start() to return with ->mmap_sem held unless it returns
> IS_ERR_OR_NULL().
>
> - Change vma_stop() to use priv->mm and avoid the ugly vma checks,
> this makes "vm_area_struct *vma" unnecessary.
>
> - This also allows m_start() to use vm_stop().
>
> - Cleanup m_next() to follow the new locking rule.
>
> Note: m_stop() looks very ugly, and this temporary uglifies it
> even more. Fixed by the next change.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next()
2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
@ 2014-08-06 10:17 ` Kirill A. Shutemov
0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:17 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:47:03PM +0200, Oleg Nesterov wrote:
> 1. Kill the first "vma != NULL" check. Firstly this is not possible,
> m_next() won't be called if ->start() or the previous ->next()
> returns NULL.
>
> And if it was possible the 2nd "vma != tail_vma" check is buggy,
> we should not wrongly return ->tail_vma.
>
> 2. Make this function readable. The logic is very simple, we should
> return check "vma != tail" once and return "vm_next || tail_vma".
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop()
2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
@ 2014-08-06 10:18 ` Kirill A. Shutemov
0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:18 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:47:07PM +0200, Oleg Nesterov wrote:
> 1. There is no reason to reset ->tail_vma in m_start(), if we return
> IS_ERR_OR_NULL() it won't be used.
>
> 2. m_start() also clears priv->task to ensure that m_stop() won't use
> the stale pointer if we fail before get_task_struct(). But this is
> ugly and confusing, move this initialization in m_stop().
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
` (6 preceding siblings ...)
2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
@ 2014-08-06 10:19 ` Kirill A. Shutemov
2014-08-06 18:48 ` Cyrill Gorcunov
` (2 subsequent siblings)
10 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-08-06 10:19 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:46:28PM +0200, Oleg Nesterov wrote:
> Changes:
>
> 1-4: Update the changelogs, join "introduce the stable
> proc_maps_private->mm" and "change m_start() to rely on
> priv->mm" into a single "shift "priv->task = NULL" from
> m_start() to m_stop()".
>
> Resulting code is the same. Kirill and Cyrill, you seem
> to agree with these changes, I'll appreciate your acks.
>
> 5-7: New. Seems to work, please review.
Thanks! Looks great!
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
2014-08-06 9:52 ` Kirill A. Shutemov
@ 2014-08-06 14:10 ` Oleg Nesterov
2014-08-06 19:41 ` Oleg Nesterov
0 siblings, 1 reply; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-06 14:10 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On 08/06, Kirill A. Shutemov wrote:
>
> On Tue, Aug 05, 2014 at 09:46:44PM +0200, Oleg Nesterov wrote:
> >
> > if (m->count < m->size) /* vma is copied successfully */
> > - m->version = (vma != get_gate_vma(task->mm))
> > + m->version = (vma != priv->tail_vma)
> > ? vma->vm_start : 0;
>
> Drop excessive parenthesis while you're there? And in the next hunk.
I agree. But lets do this in another patch? I forgot to mention this in
the yesterday's TODO, but we should factor out this code. Perhaps move
this into vma_stop(). Or at least add a trivial helper, this pattern
repeats 3 times, plus another one in m_start(). This problem is that I
do not really understand this logic right now, but I'll do something
today in any case.
> Otherwise:
>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Thanks!
Oleg.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
` (7 preceding siblings ...)
2014-08-06 10:19 ` [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Kirill A. Shutemov
@ 2014-08-06 18:48 ` Cyrill Gorcunov
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
10 siblings, 0 replies; 35+ messages in thread
From: Cyrill Gorcunov @ 2014-08-06 18:48 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrew Morton, Alexander Viro, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On Tue, Aug 05, 2014 at 09:46:28PM +0200, Oleg Nesterov wrote:
> Changes:
>
> 1-4: Update the changelogs, join "introduce the stable
> proc_maps_private->mm" and "change m_start() to rely on
> priv->mm" into a single "shift "priv->task = NULL" from
> m_start() to m_stop()".
>
> Resulting code is the same. Kirill and Cyrill, you seem
> to agree with these changes, I'll appreciate your acks.
>
> 5-7: New. Seems to work, please review.
>
> Todo:
>
> - Cleanup the tail_vma horror in m_start
>
> - Update task_nommu.c in the same way
>
> - Fix lock_trace() users
>
> Oleg.
(Sorry for delay) Thanks a lot, Oleg!
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
to whole series
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()
2014-08-06 14:10 ` Oleg Nesterov
@ 2014-08-06 19:41 ` Oleg Nesterov
0 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-06 19:41 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On 08/06, Oleg Nesterov wrote:
>
> On 08/06, Kirill A. Shutemov wrote:
> >
> > On Tue, Aug 05, 2014 at 09:46:44PM +0200, Oleg Nesterov wrote:
> > >
> > > if (m->count < m->size) /* vma is copied successfully */
> > > - m->version = (vma != get_gate_vma(task->mm))
> > > + m->version = (vma != priv->tail_vma)
> > > ? vma->vm_start : 0;
> >
> > Drop excessive parenthesis while you're there? And in the next hunk.
>
> I agree. But lets do this in another patch? I forgot to mention this in
> the yesterday's TODO, but we should factor out this code. Perhaps move
> this into vma_stop(). Or at least add a trivial helper, this pattern
> repeats 3 times, plus another one in m_start(). This problem is that I
> do not really understand this logic right now, but I'll do something
> today in any case.
Damn, this needs even more cleanups than I thought. Even "version == -1"
logic is absolutely wrong/dead. And "last_addr rather than next_addr" is
wrong too afaics, this just means we need the extra ->vm_next step plus
vma-or-tail_vma recheck.
Hopefully I'll finish this tomorrow.
Oleg.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
` (8 preceding siblings ...)
2014-08-06 18:48 ` Cyrill Gorcunov
@ 2014-08-07 19:17 ` Oleg Nesterov
2014-08-07 19:17 ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
` (4 more replies)
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
10 siblings, 5 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
On 08/05, Oleg Nesterov wrote:
>
> Todo:
>
> - Cleanup the tail_vma horror in m_start
On top of the previous "[PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups"
series.
It took me much more time than I expected ;) And after 3 attempts I failed
to fix this (imho) horror step-by-step. So 1/5 simply removes the last_addr
code, and then 4/5 brings it back.
Todo:
- Update task_nommu.c in the same way
- Fix lock_trace() users
Oleg.
fs/proc/task_mmu.c | 98 +++++++++++++++++++++------------------------------
1 files changed, 40 insertions(+), 58 deletions(-)
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
@ 2014-08-07 19:17 ` Oleg Nesterov
2014-08-07 19:17 ` [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable Oleg Nesterov
` (3 subsequent siblings)
4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
m_start() carefully documents, checks, and sets "m->version = -1" if
we are going to return NULL. The only problem is that we will be never
called again if m_start() returns NULL, so this is simply pointless
and misleading.
Otoh, ->show() methods m->version = 0 if vma == tail_vma and this is
just wrong, we want -1 in this case. And in fact we also want -1 if
->vm_next == NULL and ->tail_vma == NULL.
And it is not used consistently, the "scan vmas" loop in m_start()
should update last_addr too.
Finally, imo the whole "last_addr" logic in m_start() looks horrible.
find_vma(last_addr) is called unconditionally even if we are not going
to use the result. But the main problem is that this code participates
in tail_vma-or-NULL mess, and this looks simply unfixable.
Remove this optimization. We will add it back after some cleanups.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 35 +----------------------------------
1 files changed, 1 insertions(+), 34 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5f7fb45..d69f31c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -140,20 +140,10 @@ static void vma_stop(struct proc_maps_private *priv)
static void *m_start(struct seq_file *m, loff_t *pos)
{
struct proc_maps_private *priv = m->private;
- unsigned long last_addr = m->version;
struct mm_struct *mm;
struct vm_area_struct *vma, *tail_vma = NULL;
loff_t l = *pos;
- /*
- * We remember last_addr rather than next_addr to hit with
- * vmacache most of the time. We have zero last_addr at
- * the beginning and also after lseek. We will have -1 last_addr
- * after the end of the vmas.
- */
- if (last_addr == -1UL)
- return NULL;
-
priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
if (!priv->task)
return ERR_PTR(-ESRCH);
@@ -166,12 +156,6 @@ static void *m_start(struct seq_file *m, loff_t *pos)
tail_vma = get_gate_vma(mm);
priv->tail_vma = tail_vma;
hold_task_mempolicy(priv);
- /* Start with last addr hint */
- vma = find_vma(mm, last_addr);
- if (last_addr && vma) {
- vma = vma->vm_next;
- goto out;
- }
/*
* Check the vma index is within the range and do
@@ -192,8 +176,6 @@ out:
if (vma)
return vma;
- /* End of vmas has been reached */
- m->version = (tail_vma != NULL)? 0: -1UL;
if (tail_vma)
return tail_vma;
@@ -365,14 +347,7 @@ done:
static int show_map(struct seq_file *m, void *v, int is_pid)
{
- struct vm_area_struct *vma = v;
- struct proc_maps_private *priv = m->private;
-
- show_map_vma(m, vma, is_pid);
-
- if (m->count < m->size) /* vma is copied successfully */
- m->version = (vma != priv->tail_vma)
- ? vma->vm_start : 0;
+ show_map_vma(m, v, is_pid);
return 0;
}
@@ -598,7 +573,6 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
static int show_smap(struct seq_file *m, void *v, int is_pid)
{
- struct proc_maps_private *priv = m->private;
struct vm_area_struct *vma = v;
struct mem_size_stats mss;
struct mm_walk smaps_walk = {
@@ -651,10 +625,6 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
mss.nonlinear >> 10);
show_smap_vma_flags(m, vma);
-
- if (m->count < m->size) /* vma is copied successfully */
- m->version = (vma != priv->tail_vma)
- ? vma->vm_start : 0;
return 0;
}
@@ -1485,9 +1455,6 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
seq_printf(m, " N%d=%lu", nid, md->node[nid]);
out:
seq_putc(m, '\n');
-
- if (m->count < m->size)
- m->version = (vma != proc_priv->tail_vma) ? vma->vm_start : 0;
return 0;
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
2014-08-07 19:17 ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
@ 2014-08-07 19:17 ` Oleg Nesterov
2014-08-07 19:17 ` [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper Oleg Nesterov
` (2 subsequent siblings)
4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Now that m->version is gone we can cleanup m_start(). In particular,
- Remove the "unsigned long" typecast, m->index can't be negative
or exceed ->map_count. But lets use "unsigned int pos" to make
it clear that "pos < map_count" is safe.
- Remove the unnecessary "vma != NULL" check in the main loop. It
can't be NULL unless we have a vm bug.
- This also means that "pos < map_count" case can simply return the
valid vma and avoid "goto" and subsequent checks.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 34 ++++++++++------------------------
1 files changed, 10 insertions(+), 24 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index d69f31c..1349d3d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -137,12 +137,12 @@ static void vma_stop(struct proc_maps_private *priv)
mmput(mm);
}
-static void *m_start(struct seq_file *m, loff_t *pos)
+static void *m_start(struct seq_file *m, loff_t *ppos)
{
struct proc_maps_private *priv = m->private;
struct mm_struct *mm;
- struct vm_area_struct *vma, *tail_vma = NULL;
- loff_t l = *pos;
+ struct vm_area_struct *vma;
+ unsigned int pos = *ppos;
priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
if (!priv->task)
@@ -151,33 +151,19 @@ static void *m_start(struct seq_file *m, loff_t *pos)
mm = priv->mm;
if (!mm || !atomic_inc_not_zero(&mm->mm_users))
return NULL;
- down_read(&mm->mmap_sem);
- tail_vma = get_gate_vma(mm);
- priv->tail_vma = tail_vma;
+ down_read(&mm->mmap_sem);
hold_task_mempolicy(priv);
+ priv->tail_vma = get_gate_vma(mm);
- /*
- * Check the vma index is within the range and do
- * sequential scan until m_index.
- */
- vma = NULL;
- if ((unsigned long)l < mm->map_count) {
- vma = mm->mmap;
- while (l-- && vma)
+ if (pos < mm->map_count) {
+ for (vma = mm->mmap; pos; pos--)
vma = vma->vm_next;
- goto out;
- }
-
- if (l != mm->map_count)
- tail_vma = NULL; /* After gate vma */
-
-out:
- if (vma)
return vma;
+ }
- if (tail_vma)
- return tail_vma;
+ if (pos == mm->map_count && priv->tail_vma)
+ return priv->tail_vma;
vma_stop(priv);
return NULL;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
2014-08-07 19:17 ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
2014-08-07 19:17 ` [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable Oleg Nesterov
@ 2014-08-07 19:17 ` Oleg Nesterov
2014-08-07 19:17 ` [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic Oleg Nesterov
2014-08-07 19:18 ` [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start() Oleg Nesterov
4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Extract the tail_vma/vm_next calculation from m_next() into the new
trivial helper, m_next_vma().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1349d3d..2ba3b16 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -137,6 +137,14 @@ static void vma_stop(struct proc_maps_private *priv)
mmput(mm);
}
+static struct vm_area_struct *
+m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
+{
+ if (vma == priv->tail_vma)
+ return NULL;
+ return vma->vm_next ?: priv->tail_vma;
+}
+
static void *m_start(struct seq_file *m, loff_t *ppos)
{
struct proc_maps_private *priv = m->private;
@@ -172,13 +180,10 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
static void *m_next(struct seq_file *m, void *v, loff_t *pos)
{
struct proc_maps_private *priv = m->private;
- struct vm_area_struct *tail_vma = priv->tail_vma;
- struct vm_area_struct *vma = v, *next = NULL;
+ struct vm_area_struct *next;
(*pos)++;
- if (vma != tail_vma)
- next = vma->vm_next ?: tail_vma;
-
+ next = m_next_vma(priv, v);
if (!next)
vma_stop(priv);
return next;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
` (2 preceding siblings ...)
2014-08-07 19:17 ` [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper Oleg Nesterov
@ 2014-08-07 19:17 ` Oleg Nesterov
2014-08-07 19:18 ` [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start() Oleg Nesterov
4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:17 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Add the "last_addr" optimization back. Like before, every ->show()
method checks !seq_overflow() and sets m->version = vma->vm_start.
However, it also checks that m_next_vma(vma) != NULL, otherwise it
sets m->version = -1 for the lockless "EOF" fast-path in m_start().
m_start() can simply do find_vma() + m_next_vma() if last_addr is
not zero, the code looks clear and simple and this case is clearly
separated from "scan vmas" path.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2ba3b16..1b897dc 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -145,13 +145,24 @@ m_next_vma(struct proc_maps_private *priv, struct vm_area_struct *vma)
return vma->vm_next ?: priv->tail_vma;
}
+static void m_cache_vma(struct seq_file *m, struct vm_area_struct *vma)
+{
+ if (m->count < m->size) /* vma is copied successfully */
+ m->version = m_next_vma(m->private, vma) ? vma->vm_start : -1UL;
+}
+
static void *m_start(struct seq_file *m, loff_t *ppos)
{
struct proc_maps_private *priv = m->private;
+ unsigned long last_addr = m->version;
struct mm_struct *mm;
struct vm_area_struct *vma;
unsigned int pos = *ppos;
+ /* See m_cache_vma(). Zero at the start or after lseek. */
+ if (last_addr == -1UL)
+ return NULL;
+
priv->task = get_pid_task(priv->pid, PIDTYPE_PID);
if (!priv->task)
return ERR_PTR(-ESRCH);
@@ -164,6 +175,13 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
hold_task_mempolicy(priv);
priv->tail_vma = get_gate_vma(mm);
+ if (last_addr) {
+ vma = find_vma(mm, last_addr);
+ if (vma && (vma = m_next_vma(priv, vma)))
+ return vma;
+ }
+
+ m->version = 0;
if (pos < mm->map_count) {
for (vma = mm->mmap; pos; pos--)
vma = vma->vm_next;
@@ -339,6 +357,7 @@ done:
static int show_map(struct seq_file *m, void *v, int is_pid)
{
show_map_vma(m, v, is_pid);
+ m_cache_vma(m, v);
return 0;
}
@@ -616,6 +635,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
mss.nonlinear >> 10);
show_smap_vma_flags(m, vma);
+ m_cache_vma(m, vma);
return 0;
}
@@ -1446,6 +1466,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
seq_printf(m, " N%d=%lu", nid, md->node[nid]);
out:
seq_putc(m, '\n');
+ m_cache_vma(m, vma);
return 0;
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start()
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
` (3 preceding siblings ...)
2014-08-07 19:17 ` [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic Oleg Nesterov
@ 2014-08-07 19:18 ` Oleg Nesterov
4 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-07 19:18 UTC (permalink / raw)
To: Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Change the main loop in m_start() to update m->version. Mostly for
consistency, but this can help to avoid the same loop if the very
1st ->show() fails due to seq_overflow().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_mmu.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1b897dc..4ec6b72 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -183,11 +183,14 @@ static void *m_start(struct seq_file *m, loff_t *ppos)
m->version = 0;
if (pos < mm->map_count) {
- for (vma = mm->mmap; pos; pos--)
+ for (vma = mm->mmap; pos; pos--) {
+ m->version = vma->vm_start;
vma = vma->vm_next;
+ }
return vma;
}
+ /* we do not bother to update m->version in this case */
if (pos == mm->map_count && priv->tail_vma)
return priv->tail_vma;
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
` (9 preceding siblings ...)
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
@ 2014-08-11 17:00 ` Oleg Nesterov
2014-08-11 17:00 ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
` (3 more replies)
10 siblings, 4 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-11 17:00 UTC (permalink / raw)
To: Andrew Morton, Greg Ungerer
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Greg, could you review? The changes are simple, but I am not familiar
with NOMMU and I can't test this.
Depends on
[PATCH v2 3/7] proc: introduce proc_mem_open()
http://marc.info/?l=linux-kernel&m=140726831328943
[PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
http://marc.info/?l=linux-kernel&m=140726828328929
(only because it adds proc_maps_private->mm)
I'll also send you mbox with the previous series just in case.
Oleg.
fs/proc/task_nommu.c | 65 ++++++++++++++++++++++++++++++--------------------
1 files changed, 39 insertions(+), 26 deletions(-)
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private()
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
@ 2014-08-11 17:00 ` Oleg Nesterov
2014-08-11 17:00 ` [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
` (2 subsequent siblings)
3 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-11 17:00 UTC (permalink / raw)
To: Andrew Morton, Greg Ungerer
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Cleanup and preparation. maps_open() can use __seq_open_private()
like proc_maps_open() does.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_nommu.c | 22 +++++++---------------
1 files changed, 7 insertions(+), 15 deletions(-)
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 678455d..c12eab3 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -268,21 +268,13 @@ static const struct seq_operations proc_tid_maps_ops = {
static int maps_open(struct inode *inode, struct file *file,
const struct seq_operations *ops)
{
- struct proc_maps_private *priv;
- int ret = -ENOMEM;
-
- priv = kzalloc(sizeof(*priv), GFP_KERNEL);
- if (priv) {
- priv->pid = proc_pid(inode);
- ret = seq_open(file, ops);
- if (!ret) {
- struct seq_file *m = file->private_data;
- m->private = priv;
- } else {
- kfree(priv);
- }
- }
- return ret;
+ struct proc_maps_private *priv = __seq_open_private(file, ops,
+ sizeof(struct proc_maps_private));
+ if (!priv)
+ return -ENOMEM;
+
+ priv->pid = proc_pid(inode);
+ return 0;
}
static int pid_maps_open(struct inode *inode, struct file *file)
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open()
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
2014-08-11 17:00 ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
@ 2014-08-11 17:00 ` Oleg Nesterov
2014-08-11 17:00 ` [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm Oleg Nesterov
2014-08-12 13:57 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
3 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-11 17:00 UTC (permalink / raw)
To: Andrew Morton, Greg Ungerer
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Copy-and-paste the changes from "fs/proc/task_mmu.c: shift mm_access()
from m_start() to proc_maps_open()" into task_nommu.c.
Change maps_open() to initialize priv->mm using proc_mem_open(), m_start()
can rely on atomic_inc_not_zero(mm_users) like task_mmu.c does.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_nommu.c | 29 ++++++++++++++++++++++++-----
1 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index c12eab3..003f2be 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -216,11 +216,11 @@ static void *m_start(struct seq_file *m, loff_t *pos)
if (!priv->task)
return ERR_PTR(-ESRCH);
- mm = mm_access(priv->task, PTRACE_MODE_READ);
- if (!mm || IS_ERR(mm)) {
+ mm = priv->mm;
+ if (!mm || !atomic_inc_not_zero(&mm->mm_users)) {
put_task_struct(priv->task);
priv->task = NULL;
- return mm;
+ return NULL;
}
down_read(&mm->mmap_sem);
@@ -274,9 +274,28 @@ static int maps_open(struct inode *inode, struct file *file,
return -ENOMEM;
priv->pid = proc_pid(inode);
+ priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
+ if (IS_ERR(priv->mm)) {
+ int err = PTR_ERR(priv->mm);
+ seq_release_private(inode, file);
+ return err;
+ }
+
return 0;
}
+
+static int map_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *seq = file->private_data;
+ struct proc_maps_private *priv = seq->private;
+
+ if (priv->mm)
+ mmdrop(priv->mm);
+
+ return seq_release_private(inode, file);
+}
+
static int pid_maps_open(struct inode *inode, struct file *file)
{
return maps_open(inode, file, &proc_pid_maps_ops);
@@ -291,13 +310,13 @@ const struct file_operations proc_pid_maps_operations = {
.open = pid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = map_release,
};
const struct file_operations proc_tid_maps_operations = {
.open = tid_maps_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release_private,
+ .release = map_release,
};
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
2014-08-11 17:00 ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
2014-08-11 17:00 ` [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
@ 2014-08-11 17:00 ` Oleg Nesterov
2014-08-12 13:57 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
3 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-11 17:00 UTC (permalink / raw)
To: Andrew Morton, Greg Ungerer
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
I do not know if CONFIG_PREEMPT/SMP is possible without CONFIG_MMU
but the usage of task->mm in m_stop(). The task can exit/exec before
we take mmap_sem, in this case m_stop() can hit NULL or unlock the
wrong rw_semaphore.
Also, this code uses priv->task != NULL to decide whether we need
up_read/mmput. This is correct, but we will probably kill priv->task.
Change m_start/m_stop to rely on IS_ERR_OR_NULL() like task_mmu.c does.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/proc/task_nommu.c | 18 ++++++++++--------
1 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 003f2be..e0237c1 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -217,17 +217,17 @@ static void *m_start(struct seq_file *m, loff_t *pos)
return ERR_PTR(-ESRCH);
mm = priv->mm;
- if (!mm || !atomic_inc_not_zero(&mm->mm_users)) {
- put_task_struct(priv->task);
- priv->task = NULL;
+ if (!mm || !atomic_inc_not_zero(&mm->mm_users))
return NULL;
- }
- down_read(&mm->mmap_sem);
+ down_read(&mm->mmap_sem);
/* start from the Nth VMA */
for (p = rb_first(&mm->mm_rb); p; p = rb_next(p))
if (n-- == 0)
return p;
+
+ up_read(&mm->mmap_sem);
+ mmput(mm);
return NULL;
}
@@ -235,11 +235,13 @@ static void m_stop(struct seq_file *m, void *_vml)
{
struct proc_maps_private *priv = m->private;
+ if (!IS_ERR_OR_NULL(_vml)) {
+ up_read(&priv->mm->mmap_sem);
+ mmput(priv->mm);
+ }
if (priv->task) {
- struct mm_struct *mm = priv->task->mm;
- up_read(&mm->mmap_sem);
- mmput(mm);
put_task_struct(priv->task);
+ priv->task = NULL;
}
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
` (2 preceding siblings ...)
2014-08-11 17:00 ` [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm Oleg Nesterov
@ 2014-08-12 13:57 ` Greg Ungerer
2014-08-12 17:35 ` Oleg Nesterov
3 siblings, 1 reply; 35+ messages in thread
From: Greg Ungerer @ 2014-08-12 13:57 UTC (permalink / raw)
To: Oleg Nesterov, Andrew Morton
Cc: Alexander Viro, Cyrill Gorcunov, David Howells, David S. Miller,
Eric W. Biederman, Kirill A. Shutemov, Kirill A. Shutemov,
Peter Zijlstra, Sasha Levin, linux-fsdevel, linux-kernel
Hi Oleg,
On 12/08/14 03:00, Oleg Nesterov wrote:
> Greg, could you review? The changes are simple, but I am not familiar
> with NOMMU and I can't test this.
>
> Depends on
>
> [PATCH v2 3/7] proc: introduce proc_mem_open()
> http://marc.info/?l=linux-kernel&m=140726831328943
>
> [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
> http://marc.info/?l=linux-kernel&m=140726828328929
> (only because it adds proc_maps_private->mm)
>
> I'll also send you mbox with the previous series just in case.
>
> Oleg.
>
> fs/proc/task_nommu.c | 65 ++++++++++++++++++++++++++++++--------------------
> 1 files changed, 39 insertions(+), 26 deletions(-)
I don't see any problems. Applied cleanly for me (to a 3.16-rc7 tree),
compiled
cleanly for a non-mmu m68k target, and ran with no problems I could see.
At least I checked /proc/1/maps and that still came out fine. Is there
anything
else I should check?
I am happy to ack it:
Acked-by: Greg Ungerer <gerg@uclinux.org>
Regards
Greg
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c
2014-08-12 13:57 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
@ 2014-08-12 17:35 ` Oleg Nesterov
0 siblings, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-08-12 17:35 UTC (permalink / raw)
To: Greg Ungerer
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
David S. Miller, Eric W. Biederman, Kirill A. Shutemov,
Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
linux-kernel
On 08/12, Greg Ungerer wrote:
>
> Hi Oleg,
>
> On 12/08/14 03:00, Oleg Nesterov wrote:
>> Greg, could you review? The changes are simple, but I am not familiar
>> with NOMMU and I can't test this.
>>
>> Depends on
>>
>> [PATCH v2 3/7] proc: introduce proc_mem_open()
>> http://marc.info/?l=linux-kernel&m=140726831328943
>>
>> [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
>> http://marc.info/?l=linux-kernel&m=140726828328929
>> (only because it adds proc_maps_private->mm)
>>
>> I'll also send you mbox with the previous series just in case.
>>
>> Oleg.
>>
>> fs/proc/task_nommu.c | 65 ++++++++++++++++++++++++++++++--------------------
>> 1 files changed, 39 insertions(+), 26 deletions(-)
>
> I don't see any problems. Applied cleanly for me (to a 3.16-rc7 tree),
> compiled
> cleanly for a non-mmu m68k target, and ran with no problems I could see.
> At least I checked /proc/1/maps and that still came out fine. Is there
> anything
> else I should check?
Great, hopefully nothing else.
> I am happy to ack it:
>
> Acked-by: Greg Ungerer <gerg@uclinux.org>
Thanks a lot!
I'll send more cleanups once/if Andrew takes the pending patches.
Oleg.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
2014-08-06 10:05 ` Kirill A. Shutemov
@ 2014-12-03 14:14 ` Kirill A. Shutemov
2014-12-03 16:59 ` Eric W. Biederman
2014-12-03 17:34 ` Oleg Nesterov
1 sibling, 2 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-12-03 14:14 UTC (permalink / raw)
To: Oleg Nesterov, David S. Miller, Linus Torvalds
Cc: Andrew Morton, Alexander Viro, Cyrill Gorcunov, David Howells,
Eric W. Biederman, Kirill A. Shutemov, Peter Zijlstra,
Sasha Levin, linux-fsdevel, linux-kernel, Alexey Dobriyan, netdev
On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> A simple test-case from Kirill Shutemov
>
> cat /proc/self/maps >/dev/null
> chmod +x /proc/self/net/packet
> exec /proc/self/net/packet
>
> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> the opposite order.
Oleg, I see it again with almost the same test-case:
cat /proc/self/stack >/dev/null
chmod +x /proc/self/net/packet
exec /proc/self/net/packet
Looks like bunch of proc files were converted to use seq_file by Alexey
Dobriyan around the same time you've fixed the issue for /proc/pid/maps.
More generic test-case:
find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
chmod +x /proc/self/net/packet
exec /proc/self/net/packet
David, any justification for allowing chmod +x for files under
/proc/pid/net?
[ 2.042212] ======================================================
[ 2.042930] [ INFO: possible circular locking dependency detected ]
[ 2.043648] 3.18.0-rc7-00003-g3a18ca061311-dirty #237 Not tainted
[ 2.044350] -------------------------------------------------------
[ 2.045054] sh/94 is trying to acquire lock:
[ 2.045546] (&p->lock){+.+.+.}, at: [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[ 2.045781]
[ 2.045781] but task is already holding lock:
[ 2.045781] (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
[ 2.045781]
[ 2.045781] which lock already depends on the new lock.
[ 2.045781]
[ 2.045781]
[ 2.045781] the existing dependency chain (in reverse order) is:
[ 2.045781]
-> #1 (&sig->cred_guard_mutex){+.+.+.}:
[ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[ 2.045781] [<ffffffff81849da6>] mutex_lock_killable_nested+0x66/0x460
[ 2.045781] [<ffffffff81229de4>] lock_trace+0x24/0x70
[ 2.045781] [<ffffffff81229e8f>] proc_pid_stack+0x5f/0xe0
[ 2.045781] [<ffffffff81227244>] proc_single_show+0x54/0xa0
[ 2.045781] [<ffffffff811e13a0>] seq_read+0xe0/0x3e0
[ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
[ 2.045781] [<ffffffff811b9f5d>] SyS_read+0x4d/0xc0
[ 2.045781] [<ffffffff8184e492>] system_call_fastpath+0x12/0x17
[ 2.045781]
-> #0 (&p->lock){+.+.+.}:
[ 2.045781] [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
[ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[ 2.045781] [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
[ 2.045781] [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff81226428>] proc_reg_read+0x48/0x70
[ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
[ 2.045781] [<ffffffff811bf1a8>] kernel_read+0x48/0x60
[ 2.045781] [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
[ 2.045781] [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
[ 2.045781] [<ffffffff811c1818>] do_execve+0x18/0x20
[ 2.045781] [<ffffffff811c1b05>] SyS_execve+0x25/0x30
[ 2.045781] [<ffffffff8184ea49>] stub_execve+0x69/0xa0
[ 2.045781]
[ 2.045781] other info that might help us debug this:
[ 2.045781]
[ 2.045781] Possible unsafe locking scenario:
[ 2.045781]
[ 2.045781] CPU0 CPU1
[ 2.045781] ---- ----
[ 2.045781] lock(&sig->cred_guard_mutex);
[ 2.045781] lock(&p->lock);
[ 2.045781] lock(&sig->cred_guard_mutex);
[ 2.045781] lock(&p->lock);
[ 2.045781]
[ 2.045781] *** DEADLOCK ***
[ 2.045781]
[ 2.045781] 1 lock held by sh/94:
[ 2.045781] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
[ 2.045781]
[ 2.045781] stack backtrace:
[ 2.045781] CPU: 0 PID: 94 Comm: sh Not tainted 3.18.0-rc7-00003-g3a18ca061311-dirty #237
[ 2.045781] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2.045781] ffffffff82a48d50 ffff88085427bad8 ffffffff81844a85 0000000000000cac
[ 2.045781] ffffffff82a654a0 ffff88085427bb28 ffffffff810a1b03 0000000000000000
[ 2.045781] ffff88085427bb68 ffff88085427bb28 ffff8808547f1500 ffff8808547f1c40
[ 2.045781] Call Trace:
[ 2.045781] [<ffffffff81844a85>] dump_stack+0x4e/0x68
[ 2.045781] [<ffffffff810a1b03>] print_circular_bug+0x203/0x310
[ 2.045781] [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
[ 2.045781] [<ffffffff8108fa76>] ? local_clock+0x16/0x30
[ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
[ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
[ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
[ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff8108f9f8>] ? sched_clock_cpu+0x98/0xc0
[ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
[ 2.045781] [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
[ 2.045781] [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
[ 2.045781] [<ffffffff81226428>] proc_reg_read+0x48/0x70
[ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
[ 2.045781] [<ffffffff811bf1a8>] kernel_read+0x48/0x60
[ 2.045781] [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
[ 2.045781] [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
[ 2.092142] tsc: Refined TSC clocksource calibration: 2693.484 MHz
[ 2.045781] [<ffffffff811c0fd3>] ? do_execve_common.isra.29+0x133/0x960
[ 2.045781] [<ffffffff8184f04d>] ? retint_swapgs+0xe/0x13
[ 2.045781] [<ffffffff811c1818>] do_execve+0x18/0x20
[ 2.045781] [<ffffffff811c1b05>] SyS_execve+0x25/0x30
[ 2.045781] [<ffffffff8184ea49>] stub_execve+0x69/0xa0
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
2014-12-03 14:14 ` Kirill A. Shutemov
@ 2014-12-03 16:59 ` Eric W. Biederman
2014-12-04 16:17 ` Kirill A. Shutemov
2014-12-03 17:34 ` Oleg Nesterov
1 sibling, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2014-12-03 16:59 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Oleg Nesterov, David S. Miller, Linus Torvalds, Andrew Morton,
Alexander Viro, Cyrill Gorcunov, David Howells,
Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
linux-kernel, Alexey Dobriyan, netdev
"Kirill A. Shutemov" <kirill@shutemov.name> writes:
> On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
>> A simple test-case from Kirill Shutemov
>>
>> cat /proc/self/maps >/dev/null
>> chmod +x /proc/self/net/packet
>> exec /proc/self/net/packet
>>
>> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
>> the opposite order.
>
> Oleg, I see it again with almost the same test-case:
>
> cat /proc/self/stack >/dev/null
> chmod +x /proc/self/net/packet
> exec /proc/self/net/packet
>
> Looks like bunch of proc files were converted to use seq_file by Alexey
> Dobriyan around the same time you've fixed the issue for /proc/pid/maps.
>
> More generic test-case:
>
> find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
> chmod +x /proc/self/net/packet
> exec /proc/self/net/packet
>
> David, any justification for allowing chmod +x for files under
> /proc/pid/net?
I don't think there are any good reasons for allowing chmod +x for the
proc generic files. Certainly executing any of them is nonsense.
I do recall some weird conner cases existing. I think they resulted
in a need to preserve chmod if not chmod +x. This is just me saying
tread carefully before you change anything.
It really should be safe to tweak proc_notify_change to not allow
messing with the executable bits of proc files.
> [ 2.042212] ======================================================
> [ 2.042930] [ INFO: possible circular locking dependency detected ]
> [ 2.043648] 3.18.0-rc7-00003-g3a18ca061311-dirty #237 Not tainted
> [ 2.044350] -------------------------------------------------------
> [ 2.045054] sh/94 is trying to acquire lock:
> [ 2.045546] (&p->lock){+.+.+.}, at: [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [ 2.045781]
> [ 2.045781] but task is already holding lock:
> [ 2.045781] (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
> [ 2.045781]
> [ 2.045781] which lock already depends on the new lock.
> [ 2.045781]
> [ 2.045781]
> [ 2.045781] the existing dependency chain (in reverse order) is:
> [ 2.045781]
> -> #1 (&sig->cred_guard_mutex){+.+.+.}:
> [ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [ 2.045781] [<ffffffff81849da6>] mutex_lock_killable_nested+0x66/0x460
> [ 2.045781] [<ffffffff81229de4>] lock_trace+0x24/0x70
> [ 2.045781] [<ffffffff81229e8f>] proc_pid_stack+0x5f/0xe0
> [ 2.045781] [<ffffffff81227244>] proc_single_show+0x54/0xa0
> [ 2.045781] [<ffffffff811e13a0>] seq_read+0xe0/0x3e0
> [ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
> [ 2.045781] [<ffffffff811b9f5d>] SyS_read+0x4d/0xc0
> [ 2.045781] [<ffffffff8184e492>] system_call_fastpath+0x12/0x17
> [ 2.045781]
> -> #0 (&p->lock){+.+.+.}:
> [ 2.045781] [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
> [ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [ 2.045781] [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
> [ 2.045781] [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [ 2.045781] [<ffffffff81226428>] proc_reg_read+0x48/0x70
> [ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
> [ 2.045781] [<ffffffff811bf1a8>] kernel_read+0x48/0x60
> [ 2.045781] [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
> [ 2.045781] [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
> [ 2.045781] [<ffffffff811c1818>] do_execve+0x18/0x20
> [ 2.045781] [<ffffffff811c1b05>] SyS_execve+0x25/0x30
> [ 2.045781] [<ffffffff8184ea49>] stub_execve+0x69/0xa0
> [ 2.045781]
> [ 2.045781] other info that might help us debug this:
> [ 2.045781]
> [ 2.045781] Possible unsafe locking scenario:
> [ 2.045781]
> [ 2.045781] CPU0 CPU1
> [ 2.045781] ---- ----
> [ 2.045781] lock(&sig->cred_guard_mutex);
> [ 2.045781] lock(&p->lock);
> [ 2.045781] lock(&sig->cred_guard_mutex);
> [ 2.045781] lock(&p->lock);
> [ 2.045781]
> [ 2.045781] *** DEADLOCK ***
> [ 2.045781]
> [ 2.045781] 1 lock held by sh/94:
> [ 2.045781] #0: (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811c0e3d>] prepare_bprm_creds+0x2d/0x90
> [ 2.045781]
> [ 2.045781] stack backtrace:
> [ 2.045781] CPU: 0 PID: 94 Comm: sh Not tainted 3.18.0-rc7-00003-g3a18ca061311-dirty #237
> [ 2.045781] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
> [ 2.045781] ffffffff82a48d50 ffff88085427bad8 ffffffff81844a85 0000000000000cac
> [ 2.045781] ffffffff82a654a0 ffff88085427bb28 ffffffff810a1b03 0000000000000000
> [ 2.045781] ffff88085427bb68 ffff88085427bb28 ffff8808547f1500 ffff8808547f1c40
> [ 2.045781] Call Trace:
> [ 2.045781] [<ffffffff81844a85>] dump_stack+0x4e/0x68
> [ 2.045781] [<ffffffff810a1b03>] print_circular_bug+0x203/0x310
> [ 2.045781] [<ffffffff810a389f>] validate_chain.isra.36+0xfff/0x1400
> [ 2.045781] [<ffffffff8108fa76>] ? local_clock+0x16/0x30
> [ 2.045781] [<ffffffff810a6e99>] __lock_acquire+0x4d9/0xd40
> [ 2.045781] [<ffffffff810a7ff2>] lock_acquire+0xd2/0x2a0
> [ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [ 2.045781] [<ffffffff81849629>] mutex_lock_nested+0x69/0x3c0
> [ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [ 2.045781] [<ffffffff8108f9f8>] ? sched_clock_cpu+0x98/0xc0
> [ 2.045781] [<ffffffff811e12fd>] ? seq_read+0x3d/0x3e0
> [ 2.045781] [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
> [ 2.045781] [<ffffffff811e12fd>] seq_read+0x3d/0x3e0
> [ 2.045781] [<ffffffff814050b9>] ? lockref_put_or_lock+0x29/0x40
> [ 2.045781] [<ffffffff81226428>] proc_reg_read+0x48/0x70
> [ 2.045781] [<ffffffff811b9377>] vfs_read+0x97/0x180
> [ 2.045781] [<ffffffff811bf1a8>] kernel_read+0x48/0x60
> [ 2.045781] [<ffffffff811bfb2c>] prepare_binprm+0xdc/0x180
> [ 2.045781] [<ffffffff811c139a>] do_execve_common.isra.29+0x4fa/0x960
> [ 2.092142] tsc: Refined TSC clocksource calibration: 2693.484 MHz
> [ 2.045781] [<ffffffff811c0fd3>] ? do_execve_common.isra.29+0x133/0x960
> [ 2.045781] [<ffffffff8184f04d>] ? retint_swapgs+0xe/0x13
> [ 2.045781] [<ffffffff811c1818>] do_execve+0x18/0x20
> [ 2.045781] [<ffffffff811c1b05>] SyS_execve+0x25/0x30
> [ 2.045781] [<ffffffff8184ea49>] stub_execve+0x69/0xa0
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
2014-12-03 14:14 ` Kirill A. Shutemov
2014-12-03 16:59 ` Eric W. Biederman
@ 2014-12-03 17:34 ` Oleg Nesterov
1 sibling, 0 replies; 35+ messages in thread
From: Oleg Nesterov @ 2014-12-03 17:34 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: David S. Miller, Linus Torvalds, Andrew Morton, Alexander Viro,
Cyrill Gorcunov, David Howells, Eric W. Biederman,
Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
linux-kernel, Alexey Dobriyan, netdev
On 12/03, Kirill A. Shutemov wrote:
>
> On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> > A simple test-case from Kirill Shutemov
> >
> > cat /proc/self/maps >/dev/null
> > chmod +x /proc/self/net/packet
> > exec /proc/self/net/packet
> >
> > makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> > the opposite order.
>
> Oleg, I see it again with almost the same test-case:
>
> cat /proc/self/stack >/dev/null
> chmod +x /proc/self/net/packet
> exec /proc/self/net/packet
Yes, there are more lock_trace/mm_access (ab)users. Fortunately, they
are much simpler than proc/pid/maps (which also asked for other cleanups
and fixes).
I'll try to take a look, thanks for reminding.
And I agree with Eric, chmod+x probably makes no sense. Still I think
this code deserves some cleanups regardless. To the point I think that
lock_trace() should probably die.
Thanks!
Oleg.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()
2014-12-03 16:59 ` Eric W. Biederman
@ 2014-12-04 16:17 ` Kirill A. Shutemov
0 siblings, 0 replies; 35+ messages in thread
From: Kirill A. Shutemov @ 2014-12-04 16:17 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Oleg Nesterov, David S. Miller, Linus Torvalds, Andrew Morton,
Alexander Viro, Cyrill Gorcunov, David Howells,
Kirill A. Shutemov, Peter Zijlstra, Sasha Levin, linux-fsdevel,
linux-kernel, Alexey Dobriyan, netdev
On Wed, Dec 03, 2014 at 10:59:57AM -0600, Eric W. Biederman wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
>
> > On Tue, Aug 05, 2014 at 09:46:55PM +0200, Oleg Nesterov wrote:
> >> A simple test-case from Kirill Shutemov
> >>
> >> cat /proc/self/maps >/dev/null
> >> chmod +x /proc/self/net/packet
> >> exec /proc/self/net/packet
> >>
> >> makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
> >> the opposite order.
> >
> > Oleg, I see it again with almost the same test-case:
> >
> > cat /proc/self/stack >/dev/null
> > chmod +x /proc/self/net/packet
> > exec /proc/self/net/packet
> >
> > Looks like bunch of proc files were converted to use seq_file by Alexey
> > Dobriyan around the same time you've fixed the issue for /proc/pid/maps.
> >
> > More generic test-case:
> >
> > find /proc/self/ -type f -exec dd if='{}' of=/dev/null bs=1 count=1 ';' 2>/dev/null
> > chmod +x /proc/self/net/packet
> > exec /proc/self/net/packet
> >
> > David, any justification for allowing chmod +x for files under
> > /proc/pid/net?
>
> I don't think there are any good reasons for allowing chmod +x for the
> proc generic files. Certainly executing any of them is nonsense.
>
> I do recall some weird conner cases existing. I think they resulted
> in a need to preserve chmod if not chmod +x. This is just me saying
> tread carefully before you change anything.
>
> It really should be safe to tweak proc_notify_change to not allow
> messing with the executable bits of proc files.
BTW, we have MS_NOSUID and MS_NOEXEC set in ->s_flags for procfs since
2006 -- see 92d032855e64.
But there's no code which would translate them into vfsmount->mnt_flags |=
MNT_NOSUID/MNT_NOEXEC and we bypast nosuid/noexec checks on exec path.
Hm?..
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2014-12-04 16:17 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-05 19:46 [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 1/7] fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map() Oleg Nesterov
2014-08-06 9:52 ` Kirill A. Shutemov
2014-08-06 14:10 ` Oleg Nesterov
2014-08-06 19:41 ` Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 2/7] fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open() Oleg Nesterov
2014-08-06 9:55 ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 3/7] proc: introduce proc_mem_open() Oleg Nesterov
2014-08-06 9:57 ` Kirill A. Shutemov
2014-08-05 19:46 ` [PATCH v2 4/7] fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
2014-08-06 10:05 ` Kirill A. Shutemov
2014-12-03 14:14 ` Kirill A. Shutemov
2014-12-03 16:59 ` Eric W. Biederman
2014-12-04 16:17 ` Kirill A. Shutemov
2014-12-03 17:34 ` Oleg Nesterov
2014-08-05 19:46 ` [PATCH v2 5/7] fs/proc/task_mmu.c: simplify the vma_stop() logic Oleg Nesterov
2014-08-06 10:12 ` Kirill A. Shutemov
2014-08-05 19:47 ` [PATCH v2 6/7] fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next() Oleg Nesterov
2014-08-06 10:17 ` Kirill A. Shutemov
2014-08-05 19:47 ` [PATCH v2 7/7] fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop() Oleg Nesterov
2014-08-06 10:18 ` Kirill A. Shutemov
2014-08-06 10:19 ` [PATCH v2 0/7] /proc/PID/*maps* fixes/cleanups Kirill A. Shutemov
2014-08-06 18:48 ` Cyrill Gorcunov
2014-08-07 19:17 ` [PATCH 0/5] fs/proc/task_mmu.c: cleanup tail_vma/last_addr mess Oleg Nesterov
2014-08-07 19:17 ` [PATCH 1/5] fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logic Oleg Nesterov
2014-08-07 19:17 ` [PATCH 2/5] fs/proc/task_mmu.c: simplify m_start() to make it readable Oleg Nesterov
2014-08-07 19:17 ` [PATCH 3/5] fs/proc/task_mmu.c: introduce m_next_vma() helper Oleg Nesterov
2014-08-07 19:17 ` [PATCH 4/5] fs/proc/task_mmu.c: reintroduce m->version logic Oleg Nesterov
2014-08-07 19:18 ` [PATCH 5/5] fs/proc/task_mmu.c: update m->version in the main loop in m_start() Oleg Nesterov
2014-08-11 17:00 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Oleg Nesterov
2014-08-11 17:00 ` [PATCH 1/3] fs/proc/task_nommu.c: change maps_open() to use __seq_open_private() Oleg Nesterov
2014-08-11 17:00 ` [PATCH 2/3] fs/proc/task_nommu.c: shift mm_access() from m_start() to proc_maps_open() Oleg Nesterov
2014-08-11 17:00 ` [PATCH 3/3] fs/proc/task_nommu.c: don't use priv->task->mm Oleg Nesterov
2014-08-12 13:57 ` [PATCH 0/3] fs/proc/task_nommu.c: copy-and-paste the changes from task_mmu.c Greg Ungerer
2014-08-12 17:35 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).