* [PATCH 01/12] x86: add context tag to mark mm when running a task in 32-bit compatibility mode
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 02/12] x86: mark associated mm when running a task in 32 bit " Stephen Wilson
` (10 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
This tag is intended to mirror the thread info TIF_IA32 flag. Will be used to
identify mm's which support 32 bit tasks running in compatibility mode without
requiring a reference to the task itself.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/include/asm/mmu.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 80a1dee..aeff3e8 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -13,6 +13,12 @@ typedef struct {
int size;
struct mutex lock;
void *vdso;
+
+#ifdef CONFIG_X86_64
+ /* True if mm supports a task running in 32 bit compatibility mode. */
+ unsigned short ia32_compat;
+#endif
+
} mm_context_t;
#ifdef CONFIG_SMP
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 02/12] x86: mark associated mm when running a task in 32 bit compatibility mode
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
2011-03-13 19:49 ` [PATCH 01/12] x86: add context tag to mark mm when running a task in 32-bit compatibility mode Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 03/12] mm: arch: make get_gate_vma take an mm_struct instead of a task_struct Stephen Wilson
` (9 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
This patch simply follows the same practice as for setting the TIF_IA32 flag.
In particular, an mm is marked as holding 32-bit tasks when a 32-bit binary is
exec'ed. Both ELF and a.out formats are updated.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/x86/ia32/ia32_aout.c | 1 +
arch/x86/kernel/process_64.c | 8 ++++++++
2 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 2d93bdb..fd84387 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -298,6 +298,7 @@ static int load_aout_binary(struct linux_binprm *bprm, struct pt_regs *regs)
/* OK, This is the point of no return */
set_personality(PER_LINUX);
set_thread_flag(TIF_IA32);
+ current->mm->context.ia32_compat = 1;
setup_new_exec(bprm);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index bd387e8..6c9dd92 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -501,6 +501,10 @@ void set_personality_64bit(void)
/* Make sure to be in 64bit mode */
clear_thread_flag(TIF_IA32);
+ /* Ensure the corresponding mm is not marked. */
+ if (current->mm)
+ current->mm->context.ia32_compat = 0;
+
/* TBD: overwrites user setup. Should have two bits.
But 64bit processes have always behaved this way,
so it's not too bad. The main problem is just that
@@ -516,6 +520,10 @@ void set_personality_ia32(void)
set_thread_flag(TIF_IA32);
current->personality |= force_personality32;
+ /* Mark the associated mm as containing 32-bit tasks. */
+ if (current->mm)
+ current->mm->context.ia32_compat = 1;
+
/* Prepare the first "return" to user space */
current_thread_info()->status |= TS_COMPAT;
}
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 03/12] mm: arch: make get_gate_vma take an mm_struct instead of a task_struct
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
2011-03-13 19:49 ` [PATCH 01/12] x86: add context tag to mark mm when running a task in 32-bit compatibility mode Stephen Wilson
2011-03-13 19:49 ` [PATCH 02/12] x86: mark associated mm when running a task in 32 bit " Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 04/12] mm: arch: make in_gate_area " Stephen Wilson
` (8 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
Morally, the presence of a gate vma is more an attribute of a particular mm than
a particular task. Moreover, dropping the dependency on task_struct will help
make both existing and future operations on mm's more flexible and convenient.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/powerpc/kernel/vdso.c | 2 +-
arch/s390/kernel/vdso.c | 2 +-
arch/sh/kernel/vsyscall/vsyscall.c | 2 +-
arch/x86/mm/init_64.c | 6 +++---
arch/x86/vdso/vdso32-setup.c | 11 ++++++-----
fs/binfmt_elf.c | 2 +-
fs/proc/task_mmu.c | 8 +++++---
include/linux/mm.h | 2 +-
mm/memory.c | 4 ++--
mm/mlock.c | 4 ++--
10 files changed, 23 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index fd87287..6169f17 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -830,7 +830,7 @@ int in_gate_area(struct task_struct *task, unsigned long addr)
return 0;
}
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
return NULL;
}
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index f438d74..d19f305 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -347,7 +347,7 @@ int in_gate_area(struct task_struct *task, unsigned long addr)
return 0;
}
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
return NULL;
}
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index 242117c..3f9b6f4 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -94,7 +94,7 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return NULL;
}
-struct vm_area_struct *get_gate_vma(struct task_struct *task)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
return NULL;
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..711b690 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -870,10 +870,10 @@ static struct vm_area_struct gate_vma = {
.vm_flags = VM_READ | VM_EXEC
};
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
#ifdef CONFIG_IA32_EMULATION
- if (test_tsk_thread_flag(tsk, TIF_IA32))
+ if (!mm || mm->context.ia32_compat)
return NULL;
#endif
return &gate_vma;
@@ -881,7 +881,7 @@ struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
int in_gate_area(struct task_struct *task, unsigned long addr)
{
- struct vm_area_struct *vma = get_gate_vma(task);
+ struct vm_area_struct *vma = get_gate_vma(task->mm);
if (!vma)
return 0;
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index 36df991..1f651f6 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -417,11 +417,12 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return NULL;
}
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
- struct mm_struct *mm = tsk->mm;
-
- /* Check to see if this task was created in compat vdso mode */
+ /*
+ * Check to see if the corresponding task was created in compat vdso
+ * mode.
+ */
if (mm && mm->context.vdso == (void *)VDSO_HIGH_BASE)
return &gate_vma;
return NULL;
@@ -429,7 +430,7 @@ struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
int in_gate_area(struct task_struct *task, unsigned long addr)
{
- const struct vm_area_struct *vma = get_gate_vma(task);
+ const struct vm_area_struct *vma = get_gate_vma(task->mm);
return vma && addr >= vma->vm_start && addr < vma->vm_end;
}
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d5b640b..bbabdcc 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1906,7 +1906,7 @@ static int elf_core_dump(struct coredump_params *cprm)
segs = current->mm->map_count;
segs += elf_core_extra_phdrs();
- gate_vma = get_gate_vma(current);
+ gate_vma = get_gate_vma(current->mm);
if (gate_vma != NULL)
segs++;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 60b9148..bb548d4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -126,7 +126,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
return NULL;
down_read(&mm->mmap_sem);
- tail_vma = get_gate_vma(priv->task);
+ tail_vma = get_gate_vma(priv->task->mm);
priv->tail_vma = tail_vma;
/* Start with last addr hint */
@@ -277,7 +277,8 @@ static int show_map(struct seq_file *m, void *v)
show_map_vma(m, vma);
if (m->count < m->size) /* vma is copied successfully */
- m->version = (vma != get_gate_vma(task))? vma->vm_start: 0;
+ m->version = (vma != get_gate_vma(task->mm))
+ ? vma->vm_start : 0;
return 0;
}
@@ -436,7 +437,8 @@ static int show_smap(struct seq_file *m, void *v)
(unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
if (m->count < m->size) /* vma is copied successfully */
- m->version = (vma != get_gate_vma(task)) ? vma->vm_start : 0;
+ m->version = (vma != get_gate_vma(task->mm))
+ ? vma->vm_start : 0;
return 0;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f6385fc..b571921 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1568,7 +1568,7 @@ static inline bool kernel_page_present(struct page *page) { return true; }
#endif /* CONFIG_HIBERNATION */
#endif
-extern struct vm_area_struct *get_gate_vma(struct task_struct *tsk);
+extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm);
#ifdef __HAVE_ARCH_GATE_AREA
int in_gate_area_no_task(unsigned long addr);
int in_gate_area(struct task_struct *task, unsigned long addr);
diff --git a/mm/memory.c b/mm/memory.c
index 5823698..aec7cbd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1439,7 +1439,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
vma = find_extend_vma(mm, start);
if (!vma && in_gate_area(tsk, start)) {
unsigned long pg = start & PAGE_MASK;
- struct vm_area_struct *gate_vma = get_gate_vma(tsk);
+ struct vm_area_struct *gate_vma = get_gate_vma(tsk->mm);
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
@@ -3439,7 +3439,7 @@ static int __init gate_vma_init(void)
__initcall(gate_vma_init);
#endif
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
#ifdef AT_SYSINFO_EHDR
return &gate_vma;
diff --git a/mm/mlock.c b/mm/mlock.c
index c3924c7f..2689a08c 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -237,7 +237,7 @@ long mlock_vma_pages_range(struct vm_area_struct *vma,
if (!((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
is_vm_hugetlb_page(vma) ||
- vma == get_gate_vma(current))) {
+ vma == get_gate_vma(current->mm))) {
__mlock_vma_pages_range(vma, start, end, NULL);
@@ -332,7 +332,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
int lock = newflags & VM_LOCKED;
if (newflags == vma->vm_flags || (vma->vm_flags & VM_SPECIAL) ||
- is_vm_hugetlb_page(vma) || vma == get_gate_vma(current))
+ is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm))
goto out; /* don't set VM_LOCKED, don't count */
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 04/12] mm: arch: make in_gate_area take an mm_struct instead of a task_struct
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (2 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 03/12] mm: arch: make get_gate_vma take an mm_struct instead of a task_struct Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 05/12] mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm Stephen Wilson
` (7 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
Morally, the question of whether an address lies in a gate vma should be asked
with respect to an mm, not a particular task. Moreover, dropping the dependency
on task_struct will help make existing and future operations on mm's more
flexible and convenient.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/powerpc/kernel/vdso.c | 2 +-
arch/s390/kernel/vdso.c | 2 +-
arch/sh/kernel/vsyscall/vsyscall.c | 2 +-
arch/x86/mm/init_64.c | 4 ++--
arch/x86/vdso/vdso32-setup.c | 4 ++--
include/linux/mm.h | 4 ++--
mm/memory.c | 2 +-
7 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 6169f17..467aa9e 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -825,7 +825,7 @@ int in_gate_area_no_task(unsigned long addr)
return 0;
}
-int in_gate_area(struct task_struct *task, unsigned long addr)
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
return 0;
}
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index d19f305..9006e96 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -342,7 +342,7 @@ int in_gate_area_no_task(unsigned long addr)
return 0;
}
-int in_gate_area(struct task_struct *task, unsigned long addr)
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
return 0;
}
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index 3f9b6f4..62c36a8 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -99,7 +99,7 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
return NULL;
}
-int in_gate_area(struct task_struct *task, unsigned long address)
+int in_gate_area(struct mm_struct *mm, unsigned long address)
{
return 0;
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 711b690..69fd853 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -879,9 +879,9 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
return &gate_vma;
}
-int in_gate_area(struct task_struct *task, unsigned long addr)
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
- struct vm_area_struct *vma = get_gate_vma(task->mm);
+ struct vm_area_struct *vma = get_gate_vma(mm);
if (!vma)
return 0;
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index 1f651f6..f849bb2 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -428,9 +428,9 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
return NULL;
}
-int in_gate_area(struct task_struct *task, unsigned long addr)
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
- const struct vm_area_struct *vma = get_gate_vma(task->mm);
+ const struct vm_area_struct *vma = get_gate_vma(mm);
return vma && addr >= vma->vm_start && addr < vma->vm_end;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b571921..c700bbb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1571,10 +1571,10 @@ static inline bool kernel_page_present(struct page *page) { return true; }
extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm);
#ifdef __HAVE_ARCH_GATE_AREA
int in_gate_area_no_task(unsigned long addr);
-int in_gate_area(struct task_struct *task, unsigned long addr);
+int in_gate_area(struct mm_struct *mm, unsigned long addr);
#else
int in_gate_area_no_task(unsigned long addr);
-#define in_gate_area(task, addr) ({(void)task; in_gate_area_no_task(addr);})
+#define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_task(addr);})
#endif /* __HAVE_ARCH_GATE_AREA */
int drop_caches_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/memory.c b/mm/memory.c
index aec7cbd..d1347ac 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1437,7 +1437,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
struct vm_area_struct *vma;
vma = find_extend_vma(mm, start);
- if (!vma && in_gate_area(tsk, start)) {
+ if (!vma && in_gate_area(tsk->mm, start)) {
unsigned long pg = start & PAGE_MASK;
struct vm_area_struct *gate_vma = get_gate_vma(tsk->mm);
pgd_t *pgd;
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 05/12] mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (3 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 04/12] mm: arch: make in_gate_area " Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 06/12] mm: use mm_struct to resolve gate vma's in __get_user_pages Stephen Wilson
` (6 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
Now that gate vma's are referenced with respect to a particular mm and not a
particular task it only makes sense to propagate the change to this predicate as
well.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
arch/powerpc/kernel/vdso.c | 2 +-
arch/s390/kernel/vdso.c | 2 +-
arch/sh/kernel/vsyscall/vsyscall.c | 2 +-
arch/x86/mm/init_64.c | 8 ++++----
arch/x86/vdso/vdso32-setup.c | 2 +-
include/linux/mm.h | 6 +++---
kernel/kallsyms.c | 4 ++--
mm/memory.c | 2 +-
mm/nommu.c | 2 +-
9 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 467aa9e..142ab10 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -820,7 +820,7 @@ static int __init vdso_init(void)
}
arch_initcall(vdso_init);
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return 0;
}
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index 9006e96..d73630b 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -337,7 +337,7 @@ static int __init vdso_init(void)
}
arch_initcall(vdso_init);
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return 0;
}
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index 62c36a8..1d6d51a 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -104,7 +104,7 @@ int in_gate_area(struct mm_struct *mm, unsigned long address)
return 0;
}
-int in_gate_area_no_task(unsigned long address)
+int in_gate_area_no_mm(unsigned long address)
{
return 0;
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 69fd853..8a64fdf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -890,11 +890,11 @@ int in_gate_area(struct mm_struct *mm, unsigned long addr)
}
/*
- * Use this when you have no reliable task/vma, typically from interrupt
- * context. It is less reliable than using the task's vma and may give
- * false positives:
+ * Use this when you have no reliable mm, typically from interrupt
+ * context. It is less reliable than using a task's mm and may give
+ * false positives.
*/
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
}
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index f849bb2..468d591 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -435,7 +435,7 @@ int in_gate_area(struct mm_struct *mm, unsigned long addr)
return vma && addr >= vma->vm_start && addr < vma->vm_end;
}
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return 0;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c700bbb..694512d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1570,11 +1570,11 @@ static inline bool kernel_page_present(struct page *page) { return true; }
extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm);
#ifdef __HAVE_ARCH_GATE_AREA
-int in_gate_area_no_task(unsigned long addr);
+int in_gate_area_no_mm(unsigned long addr);
int in_gate_area(struct mm_struct *mm, unsigned long addr);
#else
-int in_gate_area_no_task(unsigned long addr);
-#define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_task(addr);})
+int in_gate_area_no_mm(unsigned long addr);
+#define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_mm(addr);})
#endif /* __HAVE_ARCH_GATE_AREA */
int drop_caches_sysctl_handler(struct ctl_table *, int,
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 6f6d091..b9d0fd1 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -64,14 +64,14 @@ static inline int is_kernel_text(unsigned long addr)
if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) ||
arch_is_kernel_text(addr))
return 1;
- return in_gate_area_no_task(addr);
+ return in_gate_area_no_mm(addr);
}
static inline int is_kernel(unsigned long addr)
{
if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
return 1;
- return in_gate_area_no_task(addr);
+ return in_gate_area_no_mm(addr);
}
static int is_ksym_addr(unsigned long addr)
diff --git a/mm/memory.c b/mm/memory.c
index d1347ac..3863e86 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3448,7 +3448,7 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
#endif
}
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
#ifdef AT_SYSINFO_EHDR
if ((addr >= FIXADDR_USER_START) && (addr < FIXADDR_USER_END))
diff --git a/mm/nommu.c b/mm/nommu.c
index f59e142..e629143 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1963,7 +1963,7 @@ error:
return -ENOMEM;
}
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return 0;
}
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 06/12] mm: use mm_struct to resolve gate vma's in __get_user_pages
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (4 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 05/12] mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 07/12] mm: factor out main logic of access_process_vm Stephen Wilson
` (5 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
We now check if a requested user page overlaps a gate vma using the supplied mm
instead of the supplied task. The given task is now used solely for accounting
purposes and may be NULL.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
---
mm/memory.c | 18 +++++++++++-------
1 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 3863e86..36445e3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1437,9 +1437,9 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
struct vm_area_struct *vma;
vma = find_extend_vma(mm, start);
- if (!vma && in_gate_area(tsk->mm, start)) {
+ if (!vma && in_gate_area(mm, start)) {
unsigned long pg = start & PAGE_MASK;
- struct vm_area_struct *gate_vma = get_gate_vma(tsk->mm);
+ struct vm_area_struct *gate_vma = get_gate_vma(mm);
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
@@ -1533,10 +1533,13 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
return i ? i : -EFAULT;
BUG();
}
- if (ret & VM_FAULT_MAJOR)
- tsk->maj_flt++;
- else
- tsk->min_flt++;
+
+ if (tsk) {
+ if (ret & VM_FAULT_MAJOR)
+ tsk->maj_flt++;
+ else
+ tsk->min_flt++;
+ }
if (ret & VM_FAULT_RETRY) {
*nonblocking = 0;
@@ -1581,7 +1584,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
/**
* get_user_pages() - pin user pages in memory
- * @tsk: task_struct of target task
+ * @tsk: the task_struct to use for page fault accounting, or
+ * NULL if faults are not to be recorded.
* @mm: mm_struct of target mm
* @start: starting user address
* @nr_pages: number of pages from start to pin
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 07/12] mm: factor out main logic of access_process_vm
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (5 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 06/12] mm: use mm_struct to resolve gate vma's in __get_user_pages Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 08/12] mm: implement access_remote_vm Stephen Wilson
` (4 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
Introduce an internal helper __access_remote_vm and base access_process_vm on
top of it. This new method may be called with a NULL task_struct if page fault
accounting is not desired. This code will be shared with a new address space
accessor that is independent of task_struct.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
---
mm/memory.c | 35 +++++++++++++++++++++++++----------
1 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 36445e3..68eec4f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3593,20 +3593,15 @@ int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
#endif
/*
- * Access another process' address space.
- * Source/target buffer must be kernel space,
- * Do not walk the page table directly, use get_user_pages
+ * Access another process' address space as given in mm. If non-NULL, use the
+ * given task for page fault accounting.
*/
-int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write)
+static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
+ unsigned long addr, void *buf, int len, int write)
{
- struct mm_struct *mm;
struct vm_area_struct *vma;
void *old_buf = buf;
- mm = get_task_mm(tsk);
- if (!mm)
- return 0;
-
down_read(&mm->mmap_sem);
/* ignore errors, just check how much was successfully transferred */
while (len) {
@@ -3655,12 +3650,32 @@ int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, in
addr += bytes;
}
up_read(&mm->mmap_sem);
- mmput(mm);
return buf - old_buf;
}
/*
+ * Access another process' address space.
+ * Source/target buffer must be kernel space,
+ * Do not walk the page table directly, use get_user_pages
+ */
+int access_process_vm(struct task_struct *tsk, unsigned long addr,
+ void *buf, int len, int write)
+{
+ struct mm_struct *mm;
+ int ret;
+
+ mm = get_task_mm(tsk);
+ if (!mm)
+ return 0;
+
+ ret = __access_remote_vm(tsk, mm, addr, buf, len, write);
+ mmput(mm);
+
+ return ret;
+}
+
+/*
* Print the name of a VMA.
*/
void print_vma_addr(char *prefix, unsigned long ip)
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 08/12] mm: implement access_remote_vm
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (6 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 07/12] mm: factor out main logic of access_process_vm Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 09/12] proc: disable mem_write after exec Stephen Wilson
` (3 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
Provide an alternative to access_process_vm that allows the caller to obtain a
reference to the supplied mm_struct.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
---
include/linux/mm.h | 2 ++
mm/memory.c | 16 ++++++++++++++++
2 files changed, 18 insertions(+), 0 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 694512d..e5fde8a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -964,6 +964,8 @@ static inline int handle_mm_fault(struct mm_struct *mm,
extern int make_pages_present(unsigned long addr, unsigned long end);
extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
+extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
+ void *buf, int len, int write);
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int nr_pages, int write, int force,
diff --git a/mm/memory.c b/mm/memory.c
index 68eec4f..c26e4f9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3654,6 +3654,22 @@ static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
return buf - old_buf;
}
+/**
+ * @access_remote_vm - access another process' address space
+ * @mm: the mm_struct of the target address space
+ * @addr: start address to access
+ * @buf: source or destination buffer
+ * @len: number of bytes to transfer
+ * @write: whether the access is a write
+ *
+ * The caller must hold a reference on @mm.
+ */
+int access_remote_vm(struct mm_struct *mm, unsigned long addr,
+ void *buf, int len, int write)
+{
+ return __access_remote_vm(NULL, mm, addr, buf, len, write);
+}
+
/*
* Access another process' address space.
* Source/target buffer must be kernel space,
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 09/12] proc: disable mem_write after exec
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (7 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 08/12] mm: implement access_remote_vm Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 10/12] proc: hold cred_guard_mutex in check_mem_permission() Stephen Wilson
` (2 subsequent siblings)
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
This change makes mem_write() observe the same constraints as mem_read(). This
is particularly important for mem_write as an accidental leak of the fd across
an exec could result in arbitrary modification of the target process' memory.
IOW, /proc/pid/mem is implicitly close-on-exec.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
---
fs/proc/base.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9d096e8..e52702d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -848,6 +848,10 @@ static ssize_t mem_write(struct file * file, const char __user *buf,
if (check_mem_permission(task))
goto out;
+ copied = -EIO;
+ if (file->private_data != (void *)((long)current->self_exec_id))
+ goto out;
+
copied = -ENOMEM;
page = (char *)__get_free_page(GFP_TEMPORARY);
if (!page)
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 10/12] proc: hold cred_guard_mutex in check_mem_permission()
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (8 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 09/12] proc: disable mem_write after exec Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-13 19:49 ` [PATCH 11/12] proc: make check_mem_permission() return an mm_struct on success Stephen Wilson
2011-03-13 19:49 ` [PATCH 12/12] proc: enable writing to /proc/pid/mem Stephen Wilson
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
Avoid a potential race when task exec's and we get a new ->mm but check against
the old credentials in ptrace_may_access().
Holding of the mutex is implemented by factoring out the body of the code into a
helper function __check_mem_permission(). Performing this factorization now
simplifies upcoming changes and minimizes churn in the diff's.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
---
fs/proc/base.c | 26 ++++++++++++++++++++++----
1 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index e52702d..f6b644f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -191,10 +191,7 @@ static int proc_root_link(struct inode *inode, struct path *path)
return result;
}
-/*
- * Return zero if current may access user memory in @task, -error if not.
- */
-static int check_mem_permission(struct task_struct *task)
+static int __check_mem_permission(struct task_struct *task)
{
/*
* A task can always look at itself, in case it chooses
@@ -222,6 +219,27 @@ static int check_mem_permission(struct task_struct *task)
return -EPERM;
}
+/*
+ * Return zero if current may access user memory in @task, -error if not.
+ */
+static int check_mem_permission(struct task_struct *task)
+{
+ int err;
+
+ /*
+ * Avoid racing if task exec's as we might get a new mm but validate
+ * against old credentials.
+ */
+ err = mutex_lock_killable(&task->signal->cred_guard_mutex);
+ if (err)
+ return err;
+
+ err = __check_mem_permission(task);
+ mutex_unlock(&task->signal->cred_guard_mutex);
+
+ return err;
+}
+
struct mm_struct *mm_for_maps(struct task_struct *task)
{
struct mm_struct *mm;
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 11/12] proc: make check_mem_permission() return an mm_struct on success
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (9 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 10/12] proc: hold cred_guard_mutex in check_mem_permission() Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
2011-03-14 0:08 ` Kees Cook
2011-03-13 19:49 ` [PATCH 12/12] proc: enable writing to /proc/pid/mem Stephen Wilson
11 siblings, 1 reply; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
This change allows us to take advantage of access_remote_vm(), which in turn
eliminates a security issue with the mem_write() implementation.
The previous implementation of mem_write() was insecure since the target task
could exec a setuid-root binary between the permission check and the actual
write. Holding a reference to the target mm_struct eliminates this
vulnerability.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
---
fs/proc/base.c | 58 ++++++++++++++++++++++++++++++++-----------------------
1 files changed, 34 insertions(+), 24 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index f6b644f..2af83bd 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -191,14 +191,20 @@ static int proc_root_link(struct inode *inode, struct path *path)
return result;
}
-static int __check_mem_permission(struct task_struct *task)
+static struct mm_struct *__check_mem_permission(struct task_struct *task)
{
+ struct mm_struct *mm;
+
+ mm = get_task_mm(task);
+ if (!mm)
+ return ERR_PTR(-EINVAL);
+
/*
* A task can always look at itself, in case it chooses
* to use system calls instead of load instructions.
*/
if (task == current)
- return 0;
+ return mm;
/*
* If current is actively ptrace'ing, and would also be
@@ -210,20 +216,23 @@ static int __check_mem_permission(struct task_struct *task)
match = (tracehook_tracer_task(task) == current);
rcu_read_unlock();
if (match && ptrace_may_access(task, PTRACE_MODE_ATTACH))
- return 0;
+ return mm;
}
/*
* Noone else is allowed.
*/
- return -EPERM;
+ mmput(mm);
+ return ERR_PTR(-EPERM);
}
/*
- * Return zero if current may access user memory in @task, -error if not.
+ * If current may access user memory in @task return a reference to the
+ * corresponding mm, otherwise ERR_PTR.
*/
-static int check_mem_permission(struct task_struct *task)
+static struct mm_struct *check_mem_permission(struct task_struct *task)
{
+ struct mm_struct *mm;
int err;
/*
@@ -232,12 +241,12 @@ static int check_mem_permission(struct task_struct *task)
*/
err = mutex_lock_killable(&task->signal->cred_guard_mutex);
if (err)
- return err;
+ return ERR_PTR(err);
- err = __check_mem_permission(task);
+ mm = __check_mem_permission(task);
mutex_unlock(&task->signal->cred_guard_mutex);
- return err;
+ return mm;
}
struct mm_struct *mm_for_maps(struct task_struct *task)
@@ -793,18 +802,14 @@ static ssize_t mem_read(struct file * file, char __user * buf,
if (!task)
goto out_no_task;
- if (check_mem_permission(task))
- goto out;
-
ret = -ENOMEM;
page = (char *)__get_free_page(GFP_TEMPORARY);
if (!page)
goto out;
- ret = 0;
-
- mm = get_task_mm(task);
- if (!mm)
+ mm = check_mem_permission(task);
+ ret = PTR_ERR(mm);
+ if (IS_ERR(mm))
goto out_free;
ret = -EIO;
@@ -818,8 +823,8 @@ static ssize_t mem_read(struct file * file, char __user * buf,
int this_len, retval;
this_len = (count > PAGE_SIZE) ? PAGE_SIZE : count;
- retval = access_process_vm(task, src, page, this_len, 0);
- if (!retval || check_mem_permission(task)) {
+ retval = access_remote_vm(mm, src, page, this_len, 0);
+ if (!retval) {
if (!ret)
ret = -EIO;
break;
@@ -858,22 +863,25 @@ static ssize_t mem_write(struct file * file, const char __user *buf,
char *page;
struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
unsigned long dst = *ppos;
+ struct mm_struct *mm;
copied = -ESRCH;
if (!task)
goto out_no_task;
- if (check_mem_permission(task))
- goto out;
+ mm = check_mem_permission(task);
+ copied = PTR_ERR(mm);
+ if (IS_ERR(mm))
+ goto out_task;
copied = -EIO;
if (file->private_data != (void *)((long)current->self_exec_id))
- goto out;
+ goto out_mm;
copied = -ENOMEM;
page = (char *)__get_free_page(GFP_TEMPORARY);
if (!page)
- goto out;
+ goto out_mm;
copied = 0;
while (count > 0) {
@@ -884,7 +892,7 @@ static ssize_t mem_write(struct file * file, const char __user *buf,
copied = -EFAULT;
break;
}
- retval = access_process_vm(task, dst, page, this_len, 1);
+ retval = access_remote_vm(mm, dst, page, this_len, 1);
if (!retval) {
if (!copied)
copied = -EIO;
@@ -897,7 +905,9 @@ static ssize_t mem_write(struct file * file, const char __user *buf,
}
*ppos = dst;
free_page((unsigned long) page);
-out:
+out_mm:
+ mmput(mm);
+out_task:
put_task_struct(task);
out_no_task:
return copied;
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 11/12] proc: make check_mem_permission() return an mm_struct on success
2011-03-13 19:49 ` [PATCH 11/12] proc: make check_mem_permission() return an mm_struct on success Stephen Wilson
@ 2011-03-14 0:08 ` Kees Cook
2011-03-14 0:59 ` Stephen Wilson
0 siblings, 1 reply; 16+ messages in thread
From: Kees Cook @ 2011-03-14 0:08 UTC (permalink / raw)
To: Stephen Wilson
Cc: Andrew Morton, Alexander Viro, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michel Lespinasse, Andi Kleen, Rik van Riel,
KOSAKI Motohiro, Matt Mackall, David Rientjes, Nick Piggin,
Andrea Arcangeli, Mel Gorman, Hugh Dickins, x86, linux-mm,
linux-kernel
Hi Stephen,
On Sun, Mar 13, 2011 at 03:49:23PM -0400, Stephen Wilson wrote:
> This change allows us to take advantage of access_remote_vm(), which in turn
> eliminates a security issue with the mem_write() implementation.
>
> The previous implementation of mem_write() was insecure since the target task
> could exec a setuid-root binary between the permission check and the actual
> write. Holding a reference to the target mm_struct eliminates this
> vulnerability.
>
> Signed-off-by: Stephen Wilson <wilsons@start.ca>
> ---
> fs/proc/base.c | 58 ++++++++++++++++++++++++++++++++-----------------------
> 1 files changed, 34 insertions(+), 24 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index f6b644f..2af83bd 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -858,22 +863,25 @@ static ssize_t mem_write(struct file * file, const char __user *buf,
> char *page;
> struct task_struct *task = get_proc_task(file->f_path.dentry->d_inode);
> unsigned long dst = *ppos;
> + struct mm_struct *mm;
>
> copied = -ESRCH;
> if (!task)
> goto out_no_task;
>
> - if (check_mem_permission(task))
> - goto out;
> + mm = check_mem_permission(task);
> + copied = PTR_ERR(mm);
> + if (IS_ERR(mm))
> + goto out_task;
>
> copied = -EIO;
> if (file->private_data != (void *)((long)current->self_exec_id))
> - goto out;
> + goto out_mm;
The file->private_data test seems wrong to me. Is there a case were the mm
returned from check_mem_permission(task) can refer to something that is no
longer attached to task?
For example:
- pid 100 ptraces pid 200
- pid 100 opens /proc/200/mem
- pid 200 execs into something else
A read of that mem fd could, IIUC, read from the new pid 200 mm, but
only after passing check_mem_permission(task) again. This is stopped
by the private_data test. But should it, since check_mem_permission()
passed?
Even if it does mean to block it, it's insufficient since pid 200
could just exec u32 many times and align with the original private_data
value. What is that test trying to do? And I'm curious for both mem_write
as well as the existing mem_read use of the test, since I'd like to see
a general solution to the "invalidate /proc fds across exec" so we can
close CVE-2011-1020 for everything[1].
Associated with this, the drop of check_mem_permission(task) during the
mem_read loop implies that the mm is locked during that loop and seems to
reflect what you're saying ("Holding a reference to the target mm_struct
eliminates this vulnerability."), meaning there's no reason to recheck
permissions. Is that accurate?
Thanks,
-Kees
[1] https://lkml.org/lkml/2011/2/7/368
--
Kees Cook
Ubuntu Security Team
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 11/12] proc: make check_mem_permission() return an mm_struct on success
2011-03-14 0:08 ` Kees Cook
@ 2011-03-14 0:59 ` Stephen Wilson
2011-03-14 15:13 ` Kees Cook
0 siblings, 1 reply; 16+ messages in thread
From: Stephen Wilson @ 2011-03-14 0:59 UTC (permalink / raw)
To: Kees Cook
Cc: Andrew Morton, Alexander Viro, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michel Lespinasse, Andi Kleen, Rik van Riel,
KOSAKI Motohiro, Matt Mackall, David Rientjes, Nick Piggin,
Andrea Arcangeli, Mel Gorman, Hugh Dickins, x86, linux-mm,
linux-kernel
On Sun, Mar 13, 2011 at 05:08:59PM -0700, Kees Cook wrote:
> On Sun, Mar 13, 2011 at 03:49:23PM -0400, Stephen Wilson wrote:
> > copied = -EIO;
> > if (file->private_data != (void *)((long)current->self_exec_id))
> > - goto out;
> > + goto out_mm;
>
> The file->private_data test seems wrong to me. Is there a case were the mm
> returned from check_mem_permission(task) can refer to something that is no
> longer attached to task?
>
> For example:
> - pid 100 ptraces pid 200
> - pid 100 opens /proc/200/mem
> - pid 200 execs into something else
If the _target_ task (pid 200) execs then we are OK -- we hold a
reference to the *old* mm and it is that to which we read and write via
access_remote_vm().
In the case of the file->private_data test we are looking at the
*ptracer* (pid 100). Here we are guarding against the case where the
tracer exec's and accidentally leaks the fd (hence the test wrt
current). IOW, /proc/pid/mem is implicitly close on exec. This is just
a minor feature to protect against buggy user space reading/writing
mistakenly into the targets address space.
> only after passing check_mem_permission(task) again. This is stopped
> by the private_data test. But should it, since check_mem_permission()
> passed?
No. I hope the above clears that up.
> Even if it does mean to block it, it's insufficient since pid 200
> could just exec u32 many times and align with the original private_data
> value.
Just for clarity, in your example it would be pid 100 that would need to
exec many times. And yes, I think it would be possible for pid 100 to
exec() N times before the next call to mem_read/mem_write and thus
subvert this check.
Perhaps we can improve things (I would need to look into how O_CLOEXEC
is implemented), however please note that the primary rationale here is
to protect against bugs: the tracer already has the needed privilege,
and it would be silly for it to exec N times just to pass the fd out
across an exec().
> What is that test trying to do? And I'm curious for both mem_write
> as well as the existing mem_read use of the test, since I'd like to see
> a general solution to the "invalidate /proc fds across exec" so we can
> close CVE-2011-1020 for everything[1].
These patches certainly do not add to the problem -- but they do not try
to address the general issue either.
> Associated with this, the drop of check_mem_permission(task) during the
> mem_read loop implies that the mm is locked during that loop and seems to
> reflect what you're saying ("Holding a reference to the target mm_struct
> eliminates this vulnerability."), meaning there's no reason to recheck
> permissions. Is that accurate?
Yes, precisely. Once we have a reference to the mm we do not need to
worry about things changing underneath our feet, so the second check in
mem_read() is redundant and can be dropped.
Take care,
>
> Thanks,
>
> -Kees
>
> [1] https://lkml.org/lkml/2011/2/7/368
>
> --
> Kees Cook
> Ubuntu Security Team
--
steve
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 11/12] proc: make check_mem_permission() return an mm_struct on success
2011-03-14 0:59 ` Stephen Wilson
@ 2011-03-14 15:13 ` Kees Cook
0 siblings, 0 replies; 16+ messages in thread
From: Kees Cook @ 2011-03-14 15:13 UTC (permalink / raw)
To: Stephen Wilson
Cc: Andrew Morton, Alexander Viro, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Michel Lespinasse, Andi Kleen, Rik van Riel,
KOSAKI Motohiro, Matt Mackall, David Rientjes, Nick Piggin,
Andrea Arcangeli, Mel Gorman, Hugh Dickins, x86, linux-mm,
linux-kernel
On Sun, Mar 13, 2011 at 08:59:48PM -0400, Stephen Wilson wrote:
> On Sun, Mar 13, 2011 at 05:08:59PM -0700, Kees Cook wrote:
> > On Sun, Mar 13, 2011 at 03:49:23PM -0400, Stephen Wilson wrote:
> > > copied = -EIO;
> > > if (file->private_data != (void *)((long)current->self_exec_id))
> > > - goto out;
> > > + goto out_mm;
> >
> > The file->private_data test seems wrong to me. Is there a case were the mm
> > returned from check_mem_permission(task) can refer to something that is no
> > longer attached to task?
> >
> > For example:
> > - pid 100 ptraces pid 200
> > - pid 100 opens /proc/200/mem
> > - pid 200 execs into something else
>
> If the _target_ task (pid 200) execs then we are OK -- we hold a
> reference to the *old* mm and it is that to which we read and write via
> access_remote_vm().
Right, the old mm is held during read_mem(). But isn't the mm fetched
from check_mem_permission(task) each time pid 100 reads from the
/proc/200/mem fd? (And if so, that's still okay, it still passes through
check_mem_permission() so the access will be validated.)
> In the case of the file->private_data test we are looking at the
> *ptracer* (pid 100). Here we are guarding against the case where the
> tracer exec's and accidentally leaks the fd (hence the test wrt
> current). IOW, /proc/pid/mem is implicitly close on exec. This is just
> a minor feature to protect against buggy user space reading/writing
> mistakenly into the targets address space.
Ah! Right, thanks, that clears that up.
> > What is that test trying to do? And I'm curious for both mem_write
> > as well as the existing mem_read use of the test, since I'd like to see
> > a general solution to the "invalidate /proc fds across exec" so we can
> > close CVE-2011-1020 for everything[1].
>
> These patches certainly do not add to the problem -- but they do not try
> to address the general issue either.
The use of check_mem_permission() already protects /proc/pid/mem, but
that test is much stricter than the may_ptrace() checks of things like
/proc/pid/maps. Regardless, yeah, there's no problem here that I can see.
> > Associated with this, the drop of check_mem_permission(task) during the
> > mem_read loop implies that the mm is locked during that loop and seems to
> > reflect what you're saying ("Holding a reference to the target mm_struct
> > eliminates this vulnerability."), meaning there's no reason to recheck
> > permissions. Is that accurate?
>
> Yes, precisely. Once we have a reference to the mm we do not need to
> worry about things changing underneath our feet, so the second check in
> mem_read() is redundant and can be dropped.
Excellent. :)
-Kees
--
Kees Cook
Ubuntu Security Team
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 12/12] proc: enable writing to /proc/pid/mem
2011-03-13 19:49 [PATCH v2 0/12] enable writing to /proc/pid/mem Stephen Wilson
` (10 preceding siblings ...)
2011-03-13 19:49 ` [PATCH 11/12] proc: make check_mem_permission() return an mm_struct on success Stephen Wilson
@ 2011-03-13 19:49 ` Stephen Wilson
11 siblings, 0 replies; 16+ messages in thread
From: Stephen Wilson @ 2011-03-13 19:49 UTC (permalink / raw)
To: Andrew Morton, Alexander Viro
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Michel Lespinasse,
Andi Kleen, Rik van Riel, KOSAKI Motohiro, Matt Mackall,
David Rientjes, Nick Piggin, Andrea Arcangeli, Mel Gorman,
Hugh Dickins, x86, linux-mm, linux-kernel, Stephen Wilson
With recent changes there is no longer a security hazard with writing to
/proc/pid/mem. Remove the #ifdef.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
---
fs/proc/base.c | 5 -----
1 files changed, 0 insertions(+), 5 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2af83bd..964bc5d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -852,10 +852,6 @@ out_no_task:
return ret;
}
-#define mem_write NULL
-
-#ifndef mem_write
-/* This is a security hazard */
static ssize_t mem_write(struct file * file, const char __user *buf,
size_t count, loff_t *ppos)
{
@@ -912,7 +908,6 @@ out_task:
out_no_task:
return copied;
}
-#endif
loff_t mem_lseek(struct file *file, loff_t offset, int orig)
{
--
1.7.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 16+ messages in thread