From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
Cc: Containers
<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
"David C. Hansen"
<haveblue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Subject: [RFC][PATCH 7/7][v2] Define clone_with_pids syscall
Date: Wed, 27 May 2009 21:39:45 -0700 [thread overview]
Message-ID: <20090528043945.GG16522@us.ibm.com> (raw)
In-Reply-To: <20090528043748.GA16522-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date: Mon, 4 May 2009 01:17:45 -0700
Subject: [RFC][PATCH 7/7][v2] Define clone_with_pids syscall
clone_with_pids() is same as clone(), except that it takes a 'target_pid_set'
paramter which lets caller choose a specific pid number for the child process
in each of the child process's pid namespace. This system call would be needed
to implement Checkpoint/Restart (i.e after a checkpoint, restart a process with
its original pids).
Call clone_with_pids as follows:
pid_t pids[] = { 0, 77, 99 };
struct target_pid_set pid_set;
pid_set.num_pids = sizeof(pids) / sizeof(int);
pid_set.target_pids = &pids;
syscall(__NR_clone_with_pids, flags, stack, NULL, NULL, NULL, &pid_set);
If a target-pid is 0, the kernel continues to assign a pid for the process in
that namespace. In the above example, pids[0] is 0, meaning the kernel will
assign next available pid to the process in init_pid_ns. But kernel will assign
pid 77 in the child pid namespace 1 and pid 99 in pid namespace 2. If either
77 or 99 are taken, the system call fails with -EBUSY.
If 'pid_set.num_pids' exceeds the current nesting level of pid namespaces,
the system call fails with -EINVAL.
Its mostly an exploratory patch seeking feedback on the interface.
NOTE:
1. clone_with_pids(), at least for now, needs CAP_SYS_ADMIN to prevent
misuse of the interface.
2. Compared to clone(), clone_with_pids() needs to pass in two more
pieces of information:
- number of pids in the set
- user buffer containing the list of pids.
But since clone() already takes 5 parameters, use a 'struct
target_pid_set'.
TODO:
- Gently tested.
- May need additional sanity checks in do_fork_with_pids().
- Allow CLONE_NEWPID() with clone_with_pids() (ensure target-pid in
the namespace is either 1 or 0).
Changelog[v2]:
- (Serge Hallyn) Mention CAP_SYS_ADMIN restriction in patch description.
- (Oren Laadan) Add checks for 'num_pids < 0' (return -EINVAL) and
'num_pids == 0' (fall back to normal clone()).
- Move arch-independent code (sanity checks and copy-in of target-pids)
into kernel/fork.c and simplify sys_clone_with_pids()
Changelog[v1]:
- Fixed some compile errors (had fixed these errors earlier in my
git tree but had not refreshed patches before emailing them)
Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
arch/x86/include/asm/syscalls.h | 1 +
arch/x86/include/asm/unistd_32.h | 1 +
arch/x86/kernel/entry_32.S | 1 +
arch/x86/kernel/process_32.c | 21 +++++++++
arch/x86/kernel/syscall_table_32.S | 1 +
kernel/fork.c | 81 +++++++++++++++++++++++++++++++++++-
6 files changed, 105 insertions(+), 1 deletions(-)
diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h
index 7043408..1fdc149 100644
--- a/arch/x86/include/asm/syscalls.h
+++ b/arch/x86/include/asm/syscalls.h
@@ -31,6 +31,7 @@ asmlinkage int sys_get_thread_area(struct user_desc __user *);
/* kernel/process_32.c */
int sys_fork(struct pt_regs *);
int sys_clone(struct pt_regs *);
+int sys_clone_with_pids(struct pt_regs *);
int sys_vfork(struct pt_regs *);
int sys_execve(struct pt_regs *);
diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
index 6e72d74..90f906f 100644
--- a/arch/x86/include/asm/unistd_32.h
+++ b/arch/x86/include/asm/unistd_32.h
@@ -340,6 +340,7 @@
#define __NR_inotify_init1 332
#define __NR_preadv 333
#define __NR_pwritev 334
+#define __NR_clone_with_pids 335
#ifdef __KERNEL__
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index c929add..ee92b0d 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -707,6 +707,7 @@ ptregs_##name: \
PTREGSCALL(iopl)
PTREGSCALL(fork)
PTREGSCALL(clone)
+PTREGSCALL(clone_with_pids)
PTREGSCALL(vfork)
PTREGSCALL(execve)
PTREGSCALL(sigaltstack)
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 76f8f84..1efc3de 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -445,6 +445,27 @@ int sys_clone(struct pt_regs *regs)
return do_fork(clone_flags, newsp, regs, 0, parent_tidptr, child_tidptr);
}
+int sys_clone_with_pids(struct pt_regs *regs)
+{
+ unsigned long clone_flags;
+ unsigned long newsp;
+ int __user *parent_tidptr;
+ int __user *child_tidptr;
+ void __user *upid_setp;
+
+ clone_flags = regs->bx;
+ newsp = regs->cx;
+ parent_tidptr = (int __user *)regs->dx;
+ child_tidptr = (int __user *)regs->di;
+ upid_setp = (void __user *)regs->bp;
+
+ if (!newsp)
+ newsp = regs->sp;
+
+ return do_fork_with_pids(clone_flags, newsp, regs, 0, parent_tidptr,
+ child_tidptr, upid_setp);
+}
+
/*
* sys_execve() executes a new program.
*/
diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
index ff5c873..94c1a58 100644
--- a/arch/x86/kernel/syscall_table_32.S
+++ b/arch/x86/kernel/syscall_table_32.S
@@ -334,3 +334,4 @@ ENTRY(sys_call_table)
.long sys_inotify_init1
.long sys_preadv
.long sys_pwritev
+ .long ptregs_clone_with_pids /* 335 */
diff --git a/kernel/fork.c b/kernel/fork.c
index a16ef7b..f265a18 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1335,6 +1335,58 @@ struct task_struct * __cpuinit fork_idle(int cpu)
}
/*
+ * If user specified any 'target-pids' in @upid_setp, copy them from
+ * user and return a pointer to the list of target pids.
+ *
+ * If user did not specify any target pids, return NULL (caller should
+ * treat this like normal clone).
+ *
+ * On any errors, return the error code
+ */
+static pid_t *copy_target_pids(void __user *upid_setp)
+{
+ int rc;
+ int size;
+ int num_pids;
+ pid_t __user *utarget_pids;
+ pid_t *target_pids;
+ struct target_pid_set pid_set;
+
+ if(!upid_setp)
+ return NULL;
+
+ if (copy_from_user(&pid_set, upid_setp, sizeof(pid_set)))
+ return ERR_PTR(-EFAULT);
+
+ num_pids = pid_set.num_pids;
+ utarget_pids = pid_set.target_pids;
+ size = num_pids * sizeof(pid_t);
+
+ if (!num_pids)
+ return NULL;
+
+ if (num_pids < 0 || num_pids > task_pid(current)->level + 1)
+ return ERR_PTR(-EINVAL);
+
+ target_pids = kzalloc(size, GFP_KERNEL);
+ if (!target_pids)
+ return ERR_PTR(-ENOMEM);
+
+ rc = -EFAULT;
+ if (copy_from_user(target_pids, pid_set.target_pids, size))
+ goto out_free;
+
+ printk(KERN_ERR "clone_with_pids() num_pids %d, [ %d, %d ]\n", num_pids,
+ target_pids[0], target_pids[1]);
+
+ return target_pids;
+
+out_free:
+ kfree(target_pids);
+ return ERR_PTR(rc);
+}
+
+/*
* Ok, this is the main fork-routine.
*
* It copies the process, and if successful kick-starts
@@ -1351,7 +1403,7 @@ long do_fork_with_pids(unsigned long clone_flags,
struct task_struct *p;
int trace = 0;
long nr;
- pid_t *target_pids = NULL;
+ pid_t *target_pids;
/*
* Do some preliminary argument and permissions checking before we
@@ -1385,6 +1437,29 @@ long do_fork_with_pids(unsigned long clone_flags,
}
}
+ target_pids = copy_target_pids(pid_setp);
+
+ if (target_pids) {
+ if (IS_ERR(target_pids))
+ return PTR_ERR(target_pids);
+
+ nr = -EPERM;
+ if (!capable(CAP_SYS_ADMIN))
+ goto out_free;
+
+ /*
+ * CLONE_NEWPID implies pid == 1
+ *
+ * TODO: Should this be more fine-grained ? (i.e would we want
+ * to have a container-init have a specific pid in an
+ * ancestor namespace ?) Maybe needed to checkpoint/
+ * restart an application that has a nested container.
+ */
+ nr = -EINVAL;
+ if (clone_flags & CLONE_NEWPID)
+ goto out_free;
+ }
+
/*
* When called from kernel_thread, don't do user tracing stuff.
*/
@@ -1446,6 +1521,10 @@ long do_fork_with_pids(unsigned long clone_flags,
} else {
nr = PTR_ERR(p);
}
+
+out_free:
+ kfree(target_pids);
+
return nr;
}
--
1.5.2.5
next prev parent reply other threads:[~2009-05-28 4:39 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-28 4:37 [RFC][PATCH 1/7][v2] Factor out code to allocate pidmap page Sukadev Bhattiprolu
[not found] ` <20090528043748.GA16522-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-28 4:38 ` [RFC][PATCH 2/7][v2] Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
2009-05-28 4:38 ` [RFC][PATCH 3/7][v2] Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
[not found] ` <20090528043834.GC16522-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-28 12:52 ` Serge E. Hallyn
2009-05-28 14:47 ` Oren Laadan
2009-05-28 4:38 ` [RFC][PATCH 4/7][v2] Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
2009-05-28 4:39 ` [RFC][PATCH 5/7][v2] Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-05-28 4:39 ` [RFC][PATCH 6/7][v2] Define do_fork_with_pids() Sukadev Bhattiprolu
[not found] ` <20090528043929.GF16522-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-28 12:54 ` Serge E. Hallyn
2009-05-28 15:03 ` Oren Laadan
2009-05-28 4:39 ` Sukadev Bhattiprolu [this message]
[not found] ` <20090528043945.GG16522-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-28 12:02 ` [RFC][PATCH 7/7][v2] Define clone_with_pids syscall Matt Helsley
2009-05-28 15:01 ` Oren Laadan
[not found] ` <4A1EA73F.1080802-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-05-28 15:14 ` Serge E. Hallyn
[not found] ` <20090528151444.GA17772-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-28 17:01 ` Sukadev Bhattiprolu
[not found] ` <20090528170103.GA26183-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-28 17:47 ` Serge E. Hallyn
[not found] ` <20090528174708.GA2236-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2009-05-28 18:00 ` Sukadev Bhattiprolu
[not found] ` <20090528180057.GA27191-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-28 23:45 ` Oren Laadan
2009-05-28 17:30 ` Sukadev Bhattiprolu
[not found] ` <20090528173019.GB26183-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-28 23:47 ` Oren Laadan
[not found] ` <4A1F228C.2020201-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-05-29 1:16 ` Sukadev Bhattiprolu
2009-05-29 3:05 ` Sukadev Bhattiprolu
[not found] ` <20090529030558.GA2548-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-29 5:29 ` Oren Laadan
[not found] ` <20090529054645.GA3344@us.ibm.com>
[not found] ` <20090529054645.GA3344-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-29 5:54 ` Oren Laadan
[not found] ` <4A1F78AF.6030404-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-05-29 17:06 ` Sukadev Bhattiprolu
[not found] ` <20090529170616.GA12597-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-29 19:34 ` Sukadev Bhattiprolu
[not found] ` <20090529193416.GB12597-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-29 20:01 ` Oren Laadan
[not found] ` <4A203F2E.1060807-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2009-05-29 21:19 ` Sukadev Bhattiprolu
[not found] ` <20090529211922.GC12597-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-05-29 21:32 ` Oren Laadan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090528043945.GG16522@us.ibm.com \
--to=sukadev-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=haveblue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
--cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox