* [patch 0/3] Patches in a sake of checkpoint/restore, procfs and prctls
@ 2011-12-12 20:06 Cyrill Gorcunov
2011-12-12 20:06 ` [patch 1/3] Kconfig: Introduce CHECKPOINT_RESTORE symbol Cyrill Gorcunov
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Cyrill Gorcunov @ 2011-12-12 20:06 UTC (permalink / raw)
To: LKML
Cc: Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, Kees Cook, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman
Hi,
while /proc/pid/children patch is still under review/rework other
patches were fixed and I hope in good shape. The first one introduces
CONFIG_CHECKPOINT_RESTORE Kconfig symbol which should be a lever to
turn on/off all c/r related features for those who not need it. In
particular new prctl codes are covered by.
Eric pointed that /proc/<pid>/stat enhancement might be a dangerous
one from user-space point of view, so while I've successfuly tested
it, it doesn't mean I've covered every single user-space utility which
might use this file, and if it's still considered to be pretty harmful
change -- I'll hapily move to /proc/pid/statm or whatever. The only
reason it was introduced in "stat" -- we already have mm->start_code
and etc there, so I wanted to have them in one place, not sprinkled
over several files.
Any complains are welcome as usual.
Cyrill
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 1/3] Kconfig: Introduce CHECKPOINT_RESTORE symbol
2011-12-12 20:06 [patch 0/3] Patches in a sake of checkpoint/restore, procfs and prctls Cyrill Gorcunov
@ 2011-12-12 20:06 ` Cyrill Gorcunov
2011-12-12 20:40 ` Kees Cook
2011-12-12 20:06 ` [patch 2/3] [PATCH] fs, proc: Add start_data, end_data, start_brk members to /proc/$pid/stat v4 Cyrill Gorcunov
2011-12-12 20:06 ` [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3 Cyrill Gorcunov
2 siblings, 1 reply; 13+ messages in thread
From: Cyrill Gorcunov @ 2011-12-12 20:06 UTC (permalink / raw)
To: LKML
Cc: Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, Kees Cook, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Cyrill Gorcunov
[-- Attachment #1: kconfig-add-checkpoint-restore --]
[-- Type: text/plain, Size: 1149 bytes --]
In a sake of checkpoint/restore we need auxilary
features being compiled into the kernel, such as additional
prctl codes, /proc/<pid>/map_files and etc...
but same time these features are not mandatory for a
regular kernel so CHECKPOINT_RESTORE config symbol should
bring a way to disable them all at once if one wish to get
rid of additional functionality.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
init/Kconfig | 11 +++++++++++
1 file changed, 11 insertions(+)
Index: linux-2.6.git/init/Kconfig
===================================================================
--- linux-2.6.git.orig/init/Kconfig
+++ linux-2.6.git/init/Kconfig
@@ -773,6 +773,17 @@ config DEBUG_BLK_CGROUP
endif # CGROUPS
+config CHECKPOINT_RESTORE
+ bool "Checkpoint/restore support" if EXPERT
+ default n
+ help
+ Enables additional kernel features in a sake of checkpoint/restore.
+ In particular it adds auxiliary prctl codes to setup process text,
+ data and heap segment sizes, and a few additional /proc filesystem
+ entries.
+
+ If unsure, say N here.
+
menuconfig NAMESPACES
bool "Namespaces support" if EXPERT
default !EXPERT
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 2/3] [PATCH] fs, proc: Add start_data, end_data, start_brk members to /proc/$pid/stat v4
2011-12-12 20:06 [patch 0/3] Patches in a sake of checkpoint/restore, procfs and prctls Cyrill Gorcunov
2011-12-12 20:06 ` [patch 1/3] Kconfig: Introduce CHECKPOINT_RESTORE symbol Cyrill Gorcunov
@ 2011-12-12 20:06 ` Cyrill Gorcunov
2011-12-12 20:06 ` [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3 Cyrill Gorcunov
2 siblings, 0 replies; 13+ messages in thread
From: Cyrill Gorcunov @ 2011-12-12 20:06 UTC (permalink / raw)
To: LKML
Cc: Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, Kees Cook, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Cyrill Gorcunov
[-- Attachment #1: fs-proc-Add-start_data-end_data-start_brk-members-2 --]
[-- Type: text/plain, Size: 2948 bytes --]
The mm->start_code/end_code, mm->start_data/end_data,
mm->start_brk are involved into calculation of program
text/data segment sizes (which might be seen in
/proc/<pid>/statm) and into brk() call final address.
In a sake of restore we need to know all these values.
While mm->start_code/end_code already present in
/proc/$pid/stat, the rest members are not, so this
patch brings them in.
The restore procedure of these members is addressed
in another patch and uses prctl facility.
v2:
- Kees and Alexey pointed out that "1" hack is unnecessary,
so make it to have plain (mm && permitted) ? mm->member : 0
form.
v3:
- Andrew pointed that documentation update is missed.
Chagelog is updated as well.
v4:
- Update proc.txt to address concerns from Kees and Kame.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Alexey Dobriyan <adobriyan@gmail.com>
---
Documentation/filesystems/proc.txt | 3 +++
fs/proc/array.c | 7 +++++--
2 files changed, 8 insertions(+), 2 deletions(-)
Index: linux-2.6.git/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.git.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.git/Documentation/filesystems/proc.txt
@@ -305,6 +305,9 @@ Table 1-4: Contents of the stat files (a
blkio_ticks time spent waiting for block IO
gtime guest time of the task in jiffies
cgtime guest time of the task children in jiffies
+ start_data address above which program data+bss is placed
+ end_data address below which program data+bss is placed
+ start_brk address above which program heap can be expaned with brk() call
..............................................................................
The /proc/PID/maps file containing the currently mapped memory regions and
Index: linux-2.6.git/fs/proc/array.c
===================================================================
--- linux-2.6.git.orig/fs/proc/array.c
+++ linux-2.6.git/fs/proc/array.c
@@ -464,7 +464,7 @@ static int do_task_stat(struct seq_file
seq_printf(m, "%d (%s) %c %d %d %d %d %d %u %lu \
%lu %lu %lu %lu %lu %ld %ld %ld %ld %d 0 %llu %lu %ld %lu %lu %lu %lu %lu \
-%lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu %lu %ld\n",
+%lu %lu %lu %lu %lu %lu %lu %lu %d %d %u %u %llu %lu %ld %lu %lu %lu\n",
pid_nr_ns(pid, ns),
tcomm,
state,
@@ -511,7 +511,10 @@ static int do_task_stat(struct seq_file
task->policy,
(unsigned long long)delayacct_blkio_ticks(task),
cputime_to_clock_t(gtime),
- cputime_to_clock_t(cgtime));
+ cputime_to_clock_t(cgtime),
+ (mm && permitted) ? mm->start_data : 0,
+ (mm && permitted) ? mm->end_data : 0,
+ (mm && permitted) ? mm->start_brk : 0);
if (mm)
mmput(mm);
return 0;
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 20:06 [patch 0/3] Patches in a sake of checkpoint/restore, procfs and prctls Cyrill Gorcunov
2011-12-12 20:06 ` [patch 1/3] Kconfig: Introduce CHECKPOINT_RESTORE symbol Cyrill Gorcunov
2011-12-12 20:06 ` [patch 2/3] [PATCH] fs, proc: Add start_data, end_data, start_brk members to /proc/$pid/stat v4 Cyrill Gorcunov
@ 2011-12-12 20:06 ` Cyrill Gorcunov
2011-12-12 20:38 ` Kees Cook
2011-12-12 21:49 ` KOSAKI Motohiro
2 siblings, 2 replies; 13+ messages in thread
From: Cyrill Gorcunov @ 2011-12-12 20:06 UTC (permalink / raw)
To: LKML
Cc: Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, Kees Cook, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Cyrill Gorcunov, Pavel Emelyanov,
Michael Kerrisk
[-- Attachment #1: prctl-tune-up-mm_struct-members-4 --]
[-- Type: text/plain, Size: 5792 bytes --]
When we restore a task we need to set up text, data and data
heap sizes from userspace to the values a task had at
checkpoint time. This patch adds auxilary prctl codes for that.
While most of them have a statistical nature (their values
are involved into calculation of /proc/<pid>/statm output)
the start_brk and brk values are used to compute an allowed
size of program data segment expansion. Which means an arbitrary
changes of this values might be dangerous operation. So to restrict
access the following requirements applied to prctl calls:
- The process has to have CAP_SYS_ADMIN capability granted.
- For all opcodes except start_brk/brk members an appropriate
VMA area must exist and should fit certain VMA flags,
such as:
- code segment must be executable but not writable;
- data segment must not be executable.
start_brk/brk values must not intersect with data segment
and must not exceed RLIMIT_DATA resource limit.
Still the main guard is CAP_SYS_ADMIN capability check.
Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE
support otherwise these prctl calls will return -EINVAL.
v2:
- Add a check for vma start address, testing for vma ending
address is not enough. From Kees Cook.
- Add some sanity tests for assigned addresses.
v3:
- Make code CONFIG_CHECKPOINT_RESTORE dependant.
- Drop get_task_mm call since "current" is known
to be running and we have control over it (from
Andrew Morton).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
---
include/linux/prctl.h | 12 +++++
kernel/sys.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 132 insertions(+)
Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -102,4 +102,16 @@
#define PR_MCE_KILL_GET 34
+/*
+ * Tune up process memory map specifics.
+ */
+#define PR_SET_MM 35
+# define PR_SET_MM_START_CODE 1
+# define PR_SET_MM_END_CODE 2
+# define PR_SET_MM_START_DATA 3
+# define PR_SET_MM_END_DATA 4
+# define PR_SET_MM_START_STACK 5
+# define PR_SET_MM_START_BRK 6
+# define PR_SET_MM_BRK 7
+
#endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1692,6 +1692,122 @@ SYSCALL_DEFINE1(umask, int, mask)
return mask;
}
+#ifdef CONFIG_CHECKPOINT_RESTORE
+static int prctl_set_mm(int opt, unsigned long addr,
+ unsigned long arg4, unsigned long arg5)
+{
+ unsigned long rlim = rlimit(RLIMIT_DATA);
+ unsigned long vm_req_flags;
+ unsigned long vm_bad_flags;
+ struct vm_area_struct *vma;
+ int error = 0;
+
+ if (arg4 | arg5)
+ return -EINVAL;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (addr >= TASK_SIZE)
+ return -EINVAL;
+
+ down_read(¤t->mm->mmap_sem);
+ vma = find_vma(current->mm, addr);
+
+ if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
+ /* It must be existing VMA */
+ if (!vma || vma->vm_start > addr)
+ goto out;
+ }
+
+ error = -EINVAL;
+ switch (opt) {
+ case PR_SET_MM_START_CODE:
+ case PR_SET_MM_END_CODE:
+ vm_req_flags = VM_READ | VM_EXEC;
+ vm_bad_flags = VM_WRITE | VM_MAYSHARE;
+
+ if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
+ (vma->vm_flags & vm_bad_flags))
+ goto out;
+
+ if (opt == PR_SET_MM_START_CODE)
+ current->mm->start_code = addr;
+ else
+ current->mm->end_code = addr;
+ break;
+
+ case PR_SET_MM_START_DATA:
+ case PR_SET_MM_END_DATA:
+ vm_req_flags = VM_READ | VM_WRITE;
+ vm_bad_flags = VM_EXEC | VM_MAYSHARE;
+
+ if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
+ (vma->vm_flags & vm_bad_flags))
+ goto out;
+
+ if (opt == PR_SET_MM_START_DATA)
+ current->mm->start_data = addr;
+ else
+ current->mm->end_data = addr;
+ break;
+
+ case PR_SET_MM_START_STACK:
+
+#ifdef CONFIG_STACK_GROWSUP
+ vm_req_flags = VM_READ | VM_WRITE | VM_GROWSUP;
+#else
+ vm_req_flags = VM_READ | VM_WRITE | VM_GROWSDOWN;
+#endif
+ if ((vma->vm_flags & vm_req_flags) != vm_req_flags)
+ goto out;
+
+ current->mm->start_stack = addr;
+ break;
+
+ case PR_SET_MM_START_BRK:
+ if (addr <= current->mm->end_data)
+ goto out;
+
+ if (rlim < RLIM_INFINITY &&
+ (current->mm->brk - addr) +
+ (current->mm->end_data - current->mm->start_data) > rlim)
+ goto out;
+
+ current->mm->start_brk = addr;
+ break;
+
+ case PR_SET_MM_BRK:
+ if (addr <= current->mm->end_data)
+ goto out;
+
+ if (rlim < RLIM_INFINITY &&
+ (addr - current->mm->start_brk) +
+ (current->mm->end_data - current->mm->start_data) > rlim)
+ goto out;
+
+ current->mm->brk = addr;
+ break;
+
+ default:
+ error = -EINVAL;
+ goto out;
+ }
+
+ error = 0;
+
+out:
+ up_read(¤t->mm->mmap_sem);
+
+ return error;
+}
+#else /* CONFIG_CHECKPOINT_RESTORE */
+static int prctl_set_mm(int opt, unsigned long addr)
+{
+ return -EINVAL;
+}
+#endif
+
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
unsigned long, arg4, unsigned long, arg5)
{
@@ -1841,6 +1957,10 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
else
error = PR_MCE_KILL_DEFAULT;
break;
+ case PR_SET_MM: {
+ error = prctl_set_mm(arg2, arg3, arg4, arg5);
+ break;
+ }
default:
error = -EINVAL;
break;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 20:06 ` [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3 Cyrill Gorcunov
@ 2011-12-12 20:38 ` Kees Cook
2011-12-12 20:51 ` Cyrill Gorcunov
2011-12-12 21:49 ` KOSAKI Motohiro
1 sibling, 1 reply; 13+ messages in thread
From: Kees Cook @ 2011-12-12 20:38 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: LKML, Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Pavel Emelyanov, Michael Kerrisk
On Mon, Dec 12, 2011 at 12:06 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> When we restore a task we need to set up text, data and data
> heap sizes from userspace to the values a task had at
> checkpoint time. This patch adds auxilary prctl codes for that.
> ...
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Looks good; I like having this wrapped in CONFIG_CHECKPOINT_RESTORE.
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +static int prctl_set_mm(int opt, unsigned long addr,
> + unsigned long arg4, unsigned long arg5)
> ...
> +#else /* CONFIG_CHECKPOINT_RESTORE */
> +static int prctl_set_mm(int opt, unsigned long addr)
These need to have matching argument lists.
Reviewed-by: Kees Cook <keescook@chromium.org>
-Kees
--
Kees Cook
ChromeOS Security
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 1/3] Kconfig: Introduce CHECKPOINT_RESTORE symbol
2011-12-12 20:06 ` [patch 1/3] Kconfig: Introduce CHECKPOINT_RESTORE symbol Cyrill Gorcunov
@ 2011-12-12 20:40 ` Kees Cook
0 siblings, 0 replies; 13+ messages in thread
From: Kees Cook @ 2011-12-12 20:40 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: LKML, Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman
On Mon, Dec 12, 2011 at 12:06 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> In a sake of checkpoint/restore we need auxilary
> features being compiled into the kernel, such as additional
> prctl codes, /proc/<pid>/map_files and etc...
> but same time these features are not mandatory for a
> regular kernel so CHECKPOINT_RESTORE config symbol should
> bring a way to disable them all at once if one wish to get
> rid of additional functionality.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
ChromeOS Security
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 20:38 ` Kees Cook
@ 2011-12-12 20:51 ` Cyrill Gorcunov
2011-12-12 21:53 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: Cyrill Gorcunov @ 2011-12-12 20:51 UTC (permalink / raw)
To: Kees Cook
Cc: LKML, Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Pavel Emelyanov, Michael Kerrisk
On Mon, Dec 12, 2011 at 12:38:58PM -0800, Kees Cook wrote:
> On Mon, Dec 12, 2011 at 12:06 PM, Cyrill Gorcunov <gorcunov@openvz.org> wrote:
> > When we restore a task we need to set up text, data and data
> > heap sizes from userspace to the values a task had at
> > checkpoint time. This patch adds auxilary prctl codes for that.
> > ...
> > Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
>
> Looks good; I like having this wrapped in CONFIG_CHECKPOINT_RESTORE.
>
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > +static int prctl_set_mm(int opt, unsigned long addr,
> > + unsigned long arg4, unsigned long arg5)
> > ...
> > +#else /* CONFIG_CHECKPOINT_RESTORE */
> > +static int prctl_set_mm(int opt, unsigned long addr)
>
> These need to have matching argument lists.
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
>
Ups! Thanks Kees! Will update in a minute.
Cyrill
---
From: Cyrill Gorcunov <gorcunov@openvz.org>
Subject: [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v4
When we restore a task we need to set up text, data and data
heap sizes from userspace to the values a task had at
checkpoint time. This patch adds auxilary prctl codes for that.
While most of them have a statistical nature (their values
are involved into calculation of /proc/<pid>/statm output)
the start_brk and brk values are used to compute an allowed
size of program data segment expansion. Which means an arbitrary
changes of this values might be dangerous operation. So to restrict
access the following requirements applied to prctl calls:
- The process has to have CAP_SYS_ADMIN capability granted.
- For all opcodes except start_brk/brk members an appropriate
VMA area must exist and should fit certain VMA flags,
such as:
- code segment must be executable but not writable;
- data segment must not be executable.
start_brk/brk values must not intersect with data segment
and must not exceed RLIMIT_DATA resource limit.
Still the main guard is CAP_SYS_ADMIN capability check.
Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE
support otherwise these prctl calls will return -EINVAL.
v2:
- Add a check for vma start address, testing for vma ending
address is not enough. From Kees Cook.
- Add some sanity tests for assigned addresses.
v3:
- Make code CONFIG_CHECKPOINT_RESTORE dependant.
- Drop get_task_mm call since "current" is known
to be running and we have control over it (from
Andrew Morton).
v4:
- Arguments matching for function declaration
from Kees Cook.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
---
include/linux/prctl.h | 12 +++++
kernel/sys.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 132 insertions(+)
Index: linux-2.6.git/include/linux/prctl.h
===================================================================
--- linux-2.6.git.orig/include/linux/prctl.h
+++ linux-2.6.git/include/linux/prctl.h
@@ -102,4 +102,16 @@
#define PR_MCE_KILL_GET 34
+/*
+ * Tune up process memory map specifics.
+ */
+#define PR_SET_MM 35
+# define PR_SET_MM_START_CODE 1
+# define PR_SET_MM_END_CODE 2
+# define PR_SET_MM_START_DATA 3
+# define PR_SET_MM_END_DATA 4
+# define PR_SET_MM_START_STACK 5
+# define PR_SET_MM_START_BRK 6
+# define PR_SET_MM_BRK 7
+
#endif /* _LINUX_PRCTL_H */
Index: linux-2.6.git/kernel/sys.c
===================================================================
--- linux-2.6.git.orig/kernel/sys.c
+++ linux-2.6.git/kernel/sys.c
@@ -1692,6 +1692,123 @@ SYSCALL_DEFINE1(umask, int, mask)
return mask;
}
+#ifdef CONFIG_CHECKPOINT_RESTORE
+static int prctl_set_mm(int opt, unsigned long addr,
+ unsigned long arg4, unsigned long arg5)
+{
+ unsigned long rlim = rlimit(RLIMIT_DATA);
+ unsigned long vm_req_flags;
+ unsigned long vm_bad_flags;
+ struct vm_area_struct *vma;
+ int error = 0;
+
+ if (arg4 | arg5)
+ return -EINVAL;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (addr >= TASK_SIZE)
+ return -EINVAL;
+
+ down_read(¤t->mm->mmap_sem);
+ vma = find_vma(current->mm, addr);
+
+ if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
+ /* It must be existing VMA */
+ if (!vma || vma->vm_start > addr)
+ goto out;
+ }
+
+ error = -EINVAL;
+ switch (opt) {
+ case PR_SET_MM_START_CODE:
+ case PR_SET_MM_END_CODE:
+ vm_req_flags = VM_READ | VM_EXEC;
+ vm_bad_flags = VM_WRITE | VM_MAYSHARE;
+
+ if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
+ (vma->vm_flags & vm_bad_flags))
+ goto out;
+
+ if (opt == PR_SET_MM_START_CODE)
+ current->mm->start_code = addr;
+ else
+ current->mm->end_code = addr;
+ break;
+
+ case PR_SET_MM_START_DATA:
+ case PR_SET_MM_END_DATA:
+ vm_req_flags = VM_READ | VM_WRITE;
+ vm_bad_flags = VM_EXEC | VM_MAYSHARE;
+
+ if ((vma->vm_flags & vm_req_flags) != vm_req_flags ||
+ (vma->vm_flags & vm_bad_flags))
+ goto out;
+
+ if (opt == PR_SET_MM_START_DATA)
+ current->mm->start_data = addr;
+ else
+ current->mm->end_data = addr;
+ break;
+
+ case PR_SET_MM_START_STACK:
+
+#ifdef CONFIG_STACK_GROWSUP
+ vm_req_flags = VM_READ | VM_WRITE | VM_GROWSUP;
+#else
+ vm_req_flags = VM_READ | VM_WRITE | VM_GROWSDOWN;
+#endif
+ if ((vma->vm_flags & vm_req_flags) != vm_req_flags)
+ goto out;
+
+ current->mm->start_stack = addr;
+ break;
+
+ case PR_SET_MM_START_BRK:
+ if (addr <= current->mm->end_data)
+ goto out;
+
+ if (rlim < RLIM_INFINITY &&
+ (current->mm->brk - addr) +
+ (current->mm->end_data - current->mm->start_data) > rlim)
+ goto out;
+
+ current->mm->start_brk = addr;
+ break;
+
+ case PR_SET_MM_BRK:
+ if (addr <= current->mm->end_data)
+ goto out;
+
+ if (rlim < RLIM_INFINITY &&
+ (addr - current->mm->start_brk) +
+ (current->mm->end_data - current->mm->start_data) > rlim)
+ goto out;
+
+ current->mm->brk = addr;
+ break;
+
+ default:
+ error = -EINVAL;
+ goto out;
+ }
+
+ error = 0;
+
+out:
+ up_read(¤t->mm->mmap_sem);
+
+ return error;
+}
+#else /* CONFIG_CHECKPOINT_RESTORE */
+static int prctl_set_mm(int opt, unsigned long addr,
+ unsigned long arg4, unsigned long arg5)
+{
+ return -EINVAL;
+}
+#endif
+
SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
unsigned long, arg4, unsigned long, arg5)
{
@@ -1841,6 +1958,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsi
else
error = PR_MCE_KILL_DEFAULT;
break;
+ case PR_SET_MM:
+ error = prctl_set_mm(arg2, arg3, arg4, arg5);
+ break;
default:
error = -EINVAL;
break;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 20:06 ` [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3 Cyrill Gorcunov
2011-12-12 20:38 ` Kees Cook
@ 2011-12-12 21:49 ` KOSAKI Motohiro
2011-12-12 21:58 ` Cyrill Gorcunov
1 sibling, 1 reply; 13+ messages in thread
From: KOSAKI Motohiro @ 2011-12-12 21:49 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: LKML, Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, Kees Cook, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Pavel Emelyanov, Michael Kerrisk
Hi
> When we restore a task we need to set up text, data and data
> heap sizes from userspace to the values a task had at
> checkpoint time. This patch adds auxilary prctl codes for that.
>
> While most of them have a statistical nature (their values
> are involved into calculation of /proc/<pid>/statm output)
> the start_brk and brk values are used to compute an allowed
> size of program data segment expansion. Which means an arbitrary
> changes of this values might be dangerous operation. So to restrict
> access the following requirements applied to prctl calls:
>
> - The process has to have CAP_SYS_ADMIN capability granted.
This is very dangerous feature and useless from regular admins.
Moreover, CAP_SYS_ADMIN has a pretty overweight meanings and
we can't disable it on practical. So, I have a question. Why
don't you make new capability for checkpoint?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 20:51 ` Cyrill Gorcunov
@ 2011-12-12 21:53 ` Andrew Morton
2011-12-12 22:01 ` Cyrill Gorcunov
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2011-12-12 21:53 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Kees Cook, LKML, Tejun Heo, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Pavel Emelyanov, Michael Kerrisk
On Tue, 13 Dec 2011 00:51:14 +0400
Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
> When we restore a task we need to set up text, data and data
> heap sizes from userspace to the values a task had at
> checkpoint time. This patch adds auxilary prctl codes for that.
>
> While most of them have a statistical nature (their values
> are involved into calculation of /proc/<pid>/statm output)
> the start_brk and brk values are used to compute an allowed
> size of program data segment expansion. Which means an arbitrary
> changes of this values might be dangerous operation. So to restrict
> access the following requirements applied to prctl calls:
>
> - The process has to have CAP_SYS_ADMIN capability granted.
> - For all opcodes except start_brk/brk members an appropriate
> VMA area must exist and should fit certain VMA flags,
> such as:
> - code segment must be executable but not writable;
> - data segment must not be executable.
>
> start_brk/brk values must not intersect with data segment
> and must not exceed RLIMIT_DATA resource limit.
>
> Still the main guard is CAP_SYS_ADMIN capability check.
>
> Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE
> support otherwise these prctl calls will return -EINVAL.
>
> ...
>
> +#ifdef CONFIG_CHECKPOINT_RESTORE
> +static int prctl_set_mm(int opt, unsigned long addr,
> + unsigned long arg4, unsigned long arg5)
> +{
> + unsigned long rlim = rlimit(RLIMIT_DATA);
> + unsigned long vm_req_flags;
> + unsigned long vm_bad_flags;
> + struct vm_area_struct *vma;
> + int error = 0;
> +
> + if (arg4 | arg5)
> + return -EINVAL;
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EPERM;
> +
> + if (addr >= TASK_SIZE)
> + return -EINVAL;
> +
> + down_read(¤t->mm->mmap_sem);
This may not be true of all compiler versions, but when I cache
current->mm in a local, the code size is reduced rather a lot:
akpm:/usr/src/25> size kernel/sys.o
text data bss dec hex filename
22685 14376 7616 44677 ae85 kernel/sys.o
22489 14376 7616 44481 adc1 kernel/sys.o
diff -puN kernel/sys.c~c-r-prctl-add-pr_set_mm-codes-to-set-up-mm_struct-entries-fix kernel/sys.c
--- a/kernel/sys.c~c-r-prctl-add-pr_set_mm-codes-to-set-up-mm_struct-entries-fix
+++ a/kernel/sys.c
@@ -1701,6 +1701,7 @@ static int prctl_set_mm(int opt, unsigne
unsigned long vm_bad_flags;
struct vm_area_struct *vma;
int error = 0;
+ struct mm_struct *mm = current->mm;
if (arg4 | arg5)
return -EINVAL;
@@ -1711,8 +1712,8 @@ static int prctl_set_mm(int opt, unsigne
if (addr >= TASK_SIZE)
return -EINVAL;
- down_read(¤t->mm->mmap_sem);
- vma = find_vma(current->mm, addr);
+ down_read(&mm->mmap_sem);
+ vma = find_vma(mm, addr);
if (opt != PR_SET_MM_START_BRK && opt != PR_SET_MM_BRK) {
/* It must be existing VMA */
@@ -1732,9 +1733,9 @@ static int prctl_set_mm(int opt, unsigne
goto out;
if (opt == PR_SET_MM_START_CODE)
- current->mm->start_code = addr;
+ mm->start_code = addr;
else
- current->mm->end_code = addr;
+ mm->end_code = addr;
break;
case PR_SET_MM_START_DATA:
@@ -1747,9 +1748,9 @@ static int prctl_set_mm(int opt, unsigne
goto out;
if (opt == PR_SET_MM_START_DATA)
- current->mm->start_data = addr;
+ mm->start_data = addr;
else
- current->mm->end_data = addr;
+ mm->end_data = addr;
break;
case PR_SET_MM_START_STACK:
@@ -1762,31 +1763,31 @@ static int prctl_set_mm(int opt, unsigne
if ((vma->vm_flags & vm_req_flags) != vm_req_flags)
goto out;
- current->mm->start_stack = addr;
+ mm->start_stack = addr;
break;
case PR_SET_MM_START_BRK:
- if (addr <= current->mm->end_data)
+ if (addr <= mm->end_data)
goto out;
if (rlim < RLIM_INFINITY &&
- (current->mm->brk - addr) +
- (current->mm->end_data - current->mm->start_data) > rlim)
+ (mm->brk - addr) +
+ (mm->end_data - mm->start_data) > rlim)
goto out;
- current->mm->start_brk = addr;
+ mm->start_brk = addr;
break;
case PR_SET_MM_BRK:
- if (addr <= current->mm->end_data)
+ if (addr <= mm->end_data)
goto out;
if (rlim < RLIM_INFINITY &&
- (addr - current->mm->start_brk) +
- (current->mm->end_data - current->mm->start_data) > rlim)
+ (addr - mm->start_brk) +
+ (mm->end_data - mm->start_data) > rlim)
goto out;
- current->mm->brk = addr;
+ mm->brk = addr;
break;
default:
@@ -1797,7 +1798,7 @@ static int prctl_set_mm(int opt, unsigne
error = 0;
out:
- up_read(¤t->mm->mmap_sem);
+ up_read(&mm->mmap_sem);
return error;
}
_
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 21:49 ` KOSAKI Motohiro
@ 2011-12-12 21:58 ` Cyrill Gorcunov
2011-12-12 22:24 ` KOSAKI Motohiro
0 siblings, 1 reply; 13+ messages in thread
From: Cyrill Gorcunov @ 2011-12-12 21:58 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: LKML, Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, Kees Cook, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Pavel Emelyanov, Michael Kerrisk
On Mon, Dec 12, 2011 at 04:49:38PM -0500, KOSAKI Motohiro wrote:
> Hi
>
> > When we restore a task we need to set up text, data and data
> > heap sizes from userspace to the values a task had at
> > checkpoint time. This patch adds auxilary prctl codes for that.
> >
> > While most of them have a statistical nature (their values
> > are involved into calculation of /proc/<pid>/statm output)
> > the start_brk and brk values are used to compute an allowed
> > size of program data segment expansion. Which means an arbitrary
> > changes of this values might be dangerous operation. So to restrict
> > access the following requirements applied to prctl calls:
> >
> > - The process has to have CAP_SYS_ADMIN capability granted.
>
> This is very dangerous feature and useless from regular admins.
Except brk() call I don't see where it might be extremelly
dangerous at moment but indeed it might become very dangerous
once code grows. Still if evil minded person got CAP_SYS_ADMIN
these prctls are least thing one should carry about.
> Moreover, CAP_SYS_ADMIN has a pretty overweight meanings and
> we can't disable it on practical. So, I have a question. Why
> don't you make new capability for checkpoint?
>
It's not a problem to introduce CAP_CHECKPOINT_RESTORE, but
would it be accepted? I mean, are we fine with new capability
introduction? If yes -- I'll add new one and rebase the patch.
Cyrill
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 21:53 ` Andrew Morton
@ 2011-12-12 22:01 ` Cyrill Gorcunov
2011-12-12 22:05 ` Cyrill Gorcunov
0 siblings, 1 reply; 13+ messages in thread
From: Cyrill Gorcunov @ 2011-12-12 22:01 UTC (permalink / raw)
To: Andrew Morton
Cc: Kees Cook, LKML, Tejun Heo, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Pavel Emelyanov, Michael Kerrisk
On Mon, Dec 12, 2011 at 01:53:23PM -0800, Andrew Morton wrote:
...
> >
> > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > +static int prctl_set_mm(int opt, unsigned long addr,
> > + unsigned long arg4, unsigned long arg5)
> > +{
> > + unsigned long rlim = rlimit(RLIMIT_DATA);
> > + unsigned long vm_req_flags;
> > + unsigned long vm_bad_flags;
> > + struct vm_area_struct *vma;
> > + int error = 0;
> > +
> > + if (arg4 | arg5)
> > + return -EINVAL;
> > +
> > + if (!capable(CAP_SYS_ADMIN))
> > + return -EPERM;
> > +
> > + if (addr >= TASK_SIZE)
> > + return -EINVAL;
> > +
> > + down_read(¤t->mm->mmap_sem);
>
> This may not be true of all compiler versions, but when I cache
> current->mm in a local, the code size is reduced rather a lot:
>
> akpm:/usr/src/25> size kernel/sys.o
> text data bss dec hex filename
> 22685 14376 7616 44677 ae85 kernel/sys.o
> 22489 14376 7616 44481 adc1 kernel/sys.o
>
Hmm, this is great and weird. Letme try...
Cyrill
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 22:01 ` Cyrill Gorcunov
@ 2011-12-12 22:05 ` Cyrill Gorcunov
0 siblings, 0 replies; 13+ messages in thread
From: Cyrill Gorcunov @ 2011-12-12 22:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Kees Cook, LKML, Tejun Heo, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Pavel Emelyanov, Michael Kerrisk
On Tue, Dec 13, 2011 at 02:01:32AM +0400, Cyrill Gorcunov wrote:
...
> >
> > This may not be true of all compiler versions, but when I cache
> > current->mm in a local, the code size is reduced rather a lot:
> >
> > akpm:/usr/src/25> size kernel/sys.o
> > text data bss dec hex filename
> > 22685 14376 7616 44677 ae85 kernel/sys.o
> > 22489 14376 7616 44481 adc1 kernel/sys.o
> >
>
> Hmm, this is great and weird. Letme try...
>
Same here (gcc version 4.6.2 20111027)
text data bss dec hex filename
20829 14376 5736 40941 9fed kernel/sys.o
20682 14376 5736 40794 9f5a kernel/sys.o
thanks Andrew!
Cyrill
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3
2011-12-12 21:58 ` Cyrill Gorcunov
@ 2011-12-12 22:24 ` KOSAKI Motohiro
0 siblings, 0 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2011-12-12 22:24 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: LKML, Tejun Heo, Andrew Morton, Andrew Vagin, Serge Hallyn,
Vasiliy Kulikov, Kees Cook, KAMEZAWA Hiroyuki, Alexey Dobriyan,
Eric W. Biederman, Pavel Emelyanov, Michael Kerrisk
(12/12/11 4:58 PM), Cyrill Gorcunov wrote:
> On Mon, Dec 12, 2011 at 04:49:38PM -0500, KOSAKI Motohiro wrote:
>> Hi
>>
>>> When we restore a task we need to set up text, data and data
>>> heap sizes from userspace to the values a task had at
>>> checkpoint time. This patch adds auxilary prctl codes for that.
>>>
>>> While most of them have a statistical nature (their values
>>> are involved into calculation of /proc/<pid>/statm output)
>>> the start_brk and brk values are used to compute an allowed
>>> size of program data segment expansion. Which means an arbitrary
>>> changes of this values might be dangerous operation. So to restrict
>>> access the following requirements applied to prctl calls:
>>>
>>> - The process has to have CAP_SYS_ADMIN capability granted.
>>
>> This is very dangerous feature and useless from regular admins.
>
> Except brk() call I don't see where it might be extremelly
> dangerous at moment but indeed it might become very dangerous
> once code grows. Still if evil minded person got CAP_SYS_ADMIN
> these prctls are least thing one should carry about.
I'm sorry, I misunderstood your code. Your code only allow to change
their own process attribute. So, it's enough harmless. Please ignore
my last mail.
>> Moreover, CAP_SYS_ADMIN has a pretty overweight meanings and
>> we can't disable it on practical. So, I have a question. Why
>> don't you make new capability for checkpoint?
>>
>
> It's not a problem to introduce CAP_CHECKPOINT_RESTORE, but
> would it be accepted? I mean, are we fine with new capability
> introduction? If yes -- I'll add new one and rebase the patch.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-12-12 22:25 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-12 20:06 [patch 0/3] Patches in a sake of checkpoint/restore, procfs and prctls Cyrill Gorcunov
2011-12-12 20:06 ` [patch 1/3] Kconfig: Introduce CHECKPOINT_RESTORE symbol Cyrill Gorcunov
2011-12-12 20:40 ` Kees Cook
2011-12-12 20:06 ` [patch 2/3] [PATCH] fs, proc: Add start_data, end_data, start_brk members to /proc/$pid/stat v4 Cyrill Gorcunov
2011-12-12 20:06 ` [patch 3/3] [PATCH] prctl: Add PR_SET_MM codes to set up mm_struct entires v3 Cyrill Gorcunov
2011-12-12 20:38 ` Kees Cook
2011-12-12 20:51 ` Cyrill Gorcunov
2011-12-12 21:53 ` Andrew Morton
2011-12-12 22:01 ` Cyrill Gorcunov
2011-12-12 22:05 ` Cyrill Gorcunov
2011-12-12 21:49 ` KOSAKI Motohiro
2011-12-12 21:58 ` Cyrill Gorcunov
2011-12-12 22:24 ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox