From: Andrew Morton <akpm@linux-foundation.org>
To: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Kees Cook <keescook@chromium.org>,
linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Andrew Vagin <avagin@openvz.org>,
Serge Hallyn <serge.hallyn@canonical.com>,
Pavel Emelyanov <xemul@parallels.com>,
Vasiliy Kulikov <segoon@openwall.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Michael Kerrisk <mtk.manpages@gmail.com>
Subject: Re: [rfc 3/3] prctl: Add PR_SET_MM codes to tune up mm_struct entires
Date: Wed, 7 Dec 2011 14:43:55 -0800 [thread overview]
Message-ID: <20111207144355.c889e22d.akpm@linux-foundation.org> (raw)
In-Reply-To: <20111207122718.GF21678@moon>
On Wed, 7 Dec 2011 16:27:18 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> At process of task restoration we need a way to tune up
> a few members of mm_struct structure such as start_code,
> end_code, start_data, end_data, start_stack, start_brk, brk.
I don't really know what "tune up" means in this context. Can we
please be more specific and detailed here? It appears that the patch
permits userspace to directly modify these fields.
> While most of them have a statistical nature (their values
> are involved into calculation of /proc/<pid>/statm output)
> the start_brk and brk values are used to compute an allowed
> size of program data segment expansion. Which means an arbitrary
> changes of this value might be a bit dangerous operation.
>
> To restrict access to this facility the following requirements
> applied to prctl users:
>
> - The process has to have CAP_SYS_ADMIN capability granted.
> - For all opcodes except start_brk/brk members an appropriate
> VMA area must be existing and should fit certain VMA flags,
> such as:
> - code segment must be executable but not writable;
> - data segment must not be executable.
>
> start_brk/brk values must not intersect with data segment
> and must not exceed RLIMIT_DATA resource limit.
>
> Still the main guard is CAP_SYS_ADMIN capability check.
>
> ...
>
> include/linux/prctl.h | 12 +++++
> kernel/sys.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++
The prctl(2) manpage will need to be updated. Please Cc Michael on all
such changes.
>
> Index: linux-2.6.git/include/linux/prctl.h
> ===================================================================
> --- linux-2.6.git.orig/include/linux/prctl.h
> +++ linux-2.6.git/include/linux/prctl.h
> @@ -102,4 +102,16 @@
>
> #define PR_MCE_KILL_GET 34
>
> +/*
> + * Tune up process memory map specifics.
> + */
> +#define PR_SET_MM 35
> +# define PR_SET_MM_START_CODE 1
> +# define PR_SET_MM_END_CODE 2
> +# define PR_SET_MM_START_DATA 3
> +# define PR_SET_MM_END_DATA 4
> +# define PR_SET_MM_START_STACK 5
> +# define PR_SET_MM_START_BRK 6
> +# define PR_SET_MM_BRK 7
> +
> #endif /* _LINUX_PRCTL_H */
> Index: linux-2.6.git/kernel/sys.c
> ===================================================================
> --- linux-2.6.git.orig/kernel/sys.c
> +++ linux-2.6.git/kernel/sys.c
> @@ -1692,6 +1692,118 @@ SYSCALL_DEFINE1(umask, int, mask)
> return mask;
> }
>
> +static int prctl_set_mm(int opt, unsigned long addr)
> +{
> + unsigned long rlim = rlimit(RLIMIT_DATA);
> + unsigned long vm_req_flags;
> + unsigned long vm_bad_flags;
> + struct vm_area_struct *vma;
> + struct mm_struct *mm;
> + int error = 0;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (addr >= TASK_SIZE)
> + return -EINVAL;
> +
> + mm = get_task_mm(current);
Is it necessaary to run the expensive get_task_mm() for `current'?
`current' is known to be running and you have control of it here -
nobody will be taking our mm away. Simply use current->mm? The
function actually uses current->mm later on in several places.
> + if (!mm)
> + return -ENOENT;
> +
> + down_read(&mm->mmap_sem);
> + vma = find_vma(mm, addr);
> +
> + if (opt != PR_SET_MM_START_BRK &&
> + opt != PR_SET_MM_BRK) {
80 columns, not 40 :)
> + /* It must be existing VMA */
> + if (!vma || vma->vm_start > addr)
> + goto out;
> + }
> +
> + error = -EINVAL;
> + switch (opt) {
> + case PR_SET_MM_START_CODE:
> + case PR_SET_MM_END_CODE:
> +
You're adding unneeded and unconventional newlines after the `case'
statements.
> + vm_req_flags = VM_READ | VM_EXEC;
> + vm_bad_flags = VM_WRITE | VM_MAYSHARE;
> +
> + if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
> + (vma->vm_flags & vm_bad_flags))
> + goto out;
> +
> + if (opt == PR_SET_MM_START_CODE)
> + current->mm->start_code = addr;
> + else
> + current->mm->end_code = addr;
> + break;
> +
> + case PR_SET_MM_START_DATA:
> + case PR_SET_MM_END_DATA:
> +
> + vm_req_flags = VM_READ | VM_WRITE;
> + vm_bad_flags = VM_EXEC | VM_MAYSHARE;
> +
> + if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
> + (vma->vm_flags & vm_bad_flags))
> + goto out;
> +
> + if (opt == PR_SET_MM_START_DATA)
> + current->mm->start_data = addr;
> + else
> + current->mm->end_data = addr;
> + break;
> +
> + case PR_SET_MM_START_STACK:
> +
> +#ifdef CONFIG_STACK_GROWSUP
> + vm_req_flags = VM_READ | VM_WRITE | VM_GROWSUP;
> +#else
> + vm_req_flags = VM_READ | VM_WRITE | VM_GROWSDOWN;
> +#endif
> + if ((vma->vm_flags & vm_req_flags) != vm_req_flags)
> + goto out;
> +
> + current->mm->start_stack = addr;
> + break;
> +
> + case PR_SET_MM_START_BRK:
> + if (addr <= mm->end_data)
> + goto out;
> +
> + if (rlim < RLIM_INFINITY &&
> + (mm->brk - addr) + (mm->end_data - mm->start_data) > rlim)
> + goto out;
> +
> + current->mm->start_brk = addr;
> + break;
> +
> + case PR_SET_MM_BRK:
> + if (addr <= mm->end_data)
> + goto out;
> +
> + if (rlim < RLIM_INFINITY &&
> + (addr - mm->start_brk) + (mm->end_data - mm->start_data) > rlim)
> + goto out;
> +
> + current->mm->brk = addr;
> + break;
> +
> + default:
> + error = -EINVAL;
> + goto out;
> + }
> +
> + error = 0;
> +
> +out:
> + up_read(&mm->mmap_sem);
> + mmput(mm);
> +
> + return error;
> +}
This is starting to add a non-trivial amount of code. Perhaps we need
to introduce a Kconfig variable to control such things as this, to
prevent bloating up kernels which aren't require to support c/r?
>
> ...
>
next prev parent reply other threads:[~2011-12-07 22:43 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-29 19:12 [rfc 0/3] A small bundle in a sake of checkpoint/restore Cyrill Gorcunov
2011-11-29 19:12 ` [rfc 1/3] fs, proc: Add start_data, end_data, start_brk members to /proc/$pid/stat Cyrill Gorcunov
2011-11-29 20:06 ` Kees Cook
2011-12-02 0:24 ` Alexey Dobriyan
2011-12-02 7:28 ` Cyrill Gorcunov
2011-12-02 19:23 ` Kees Cook
2011-12-02 19:28 ` Cyrill Gorcunov
2011-11-29 20:32 ` Serge Hallyn
2011-11-30 5:04 ` KAMEZAWA Hiroyuki
2011-11-29 19:12 ` [rfc 2/3] fs, proc: Introduce the Children: line in /proc/<pid>/status Cyrill Gorcunov
2011-11-30 5:00 ` KAMEZAWA Hiroyuki
2011-11-30 6:05 ` Cyrill Gorcunov
2011-12-01 9:54 ` Cyrill Gorcunov
2011-12-01 15:43 ` Tejun Heo
2011-12-01 15:53 ` Cyrill Gorcunov
2011-12-01 16:07 ` Tejun Heo
2011-12-01 21:29 ` Andrew Morton
2011-12-01 21:38 ` Cyrill Gorcunov
2011-12-02 0:40 ` KAMEZAWA Hiroyuki
2011-12-02 12:41 ` Pedro Alves
2011-12-02 12:43 ` Pavel Emelyanov
2011-12-02 12:45 ` Cyrill Gorcunov
2011-12-02 13:10 ` Pedro Alves
2011-12-02 13:40 ` Pedro Alves
2011-12-02 12:58 ` Pedro Alves
2011-12-02 13:16 ` Pavel Emelyanov
2011-12-02 13:44 ` Pedro Alves
2011-12-02 13:52 ` Pavel Emelyanov
2011-12-02 14:00 ` Pedro Alves
2011-12-02 14:17 ` Pavel Emelyanov
2011-12-02 14:25 ` Pedro Alves
2011-12-02 14:37 ` Pavel Emelyanov
2011-12-02 14:45 ` Pedro Alves
2011-11-29 19:12 ` [rfc 3/3] prctl: Add PR_SET_MM codes to tune up mm_struct entires Cyrill Gorcunov
2011-11-29 20:19 ` Kees Cook
2011-11-29 20:29 ` Cyrill Gorcunov
2011-11-29 20:37 ` Cyrill Gorcunov
2011-11-29 20:40 ` Kees Cook
2011-11-29 20:47 ` Cyrill Gorcunov
2011-11-30 17:37 ` Cyrill Gorcunov
2011-11-30 18:10 ` Kees Cook
2011-11-30 18:23 ` Cyrill Gorcunov
2011-11-30 21:06 ` Cyrill Gorcunov
2011-12-07 12:27 ` Cyrill Gorcunov
2011-12-07 22:43 ` Andrew Morton [this message]
2011-12-08 7:07 ` Cyrill Gorcunov
2011-12-08 7:15 ` Andrew Morton
2011-12-08 7:30 ` Cyrill Gorcunov
2011-11-29 20:37 ` Kees Cook
2011-11-29 20:49 ` Serge Hallyn
2011-11-29 20:55 ` Cyrill Gorcunov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111207144355.c889e22d.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=avagin@openvz.org \
--cc=gorcunov@gmail.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=segoon@openwall.com \
--cc=serge.hallyn@canonical.com \
--cc=tj@kernel.org \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.