All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: Andrew Morton <akpm-3NddpPZAyC0@public.gmane.org>
Cc: Oren Laadan <orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>,
	serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
	"Eric W. Biederman"
	<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
	Alexey Dobriyan
	<adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	mikew-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	mingo-X9Un+BFzKDI@public.gmane.org,
	hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org,
	Nathan Lynch <nathanl-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org>,
	matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org,
	arnd-r2nGTMty4D4@public.gmane.org,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org,
	Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org,
	roland-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org,
	randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org,
	pavel-+ZI9xUNit7I@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org
Subject: [v9][PATCH 9/9] Document clone3() syscall
Date: Sat, 24 Oct 2009 20:40:50 -0700	[thread overview]
Message-ID: <20091025034050.GJ20327@us.ibm.com> (raw)
In-Reply-To: <20091025033508.GA20327-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>


Subject: [v9][PATCH 9/9] Document clone3() syscall

This gives a brief overview of the clone3() system call.  We should
eventually describe more details in existing clone(2) man page or in
a new man page.

Changelog[v9]:
	- [Pavel Machek]: Fix an inconsistency and rename new file to
	  Documentation/clone3.
	- [Roland McGrath, H. Peter Anvin] Updates to description and
	  example to reflect new prototype of clone3() and the updated/
	  renamed 'struct clone_args'.

Changelog[v8]:
	- clone2() is already in use in IA64. Rename syscall to clone3()
	- Add notes to say that we return -EINVAL if invalid clone flags
	  are specified or if the reserved fields are not 0.
Changelog[v7]:
	- Rename clone_with_pids() to clone2()
	- Changes to reflect new prototype of clone2() (using clone_struct).

Signed-off-by: Sukadev Bhattiprolu <sukadev-8jLBTbqmX/OZamtmwQBW5tBPR1lH4CV8@public.gmane.org>
---
 Documentation/clone3 |  191 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 191 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/clone3

diff --git a/Documentation/clone3 b/Documentation/clone3
new file mode 100644
index 0000000..466fac2
--- /dev/null
+++ b/Documentation/clone3
@@ -0,0 +1,191 @@
+
+struct clone_args {
+	u64 clone_flags_high;
+	u64 child_stack_base;
+	u64 child_stack_size;
+	u64 parent_tid_ptr;
+	u64 child_tid_ptr;
+	u32 nr_pids;
+	u32 clone_args_size;
+	u64 reserved1;
+};
+
+
+clone3(u32 flags_low, struct clone_args * __user cargs, pid_t * __user pids)
+
+	In addition to doing everything that clone() system call does,
+	the clone3() system call:
+
+		- allows additional clone flags (31 of 32 bits in the flags
+		  parameter to clone() are in use)
+
+		- allows user to specify a pid for the child process in its
+		  active and ancestor pid name spaces.
+
+	This system call is meant to be used when restarting an application
+	from a checkpoint.  Such restart requires that the processes in the
+	application have the same pids they had when the application was
+	checkpointed. When containers are nested, the processes within the
+	containers exist in multiple pid namespaces and hence have multiple
+	pids to specify during restart.
+
+	The @flags_low parameter is identical to the 'clone_flags' parameter
+	in existing clone() system call.
+
+	The fields in 'struct clone_args' are meant to be used as follows:
+
+	u64 clone_flags_high:
+
+		When clone3() supports more than 32 clone flags, the higher
+		bits in the clone_flags should be specified in this field.
+		This field is currently unused and must be set to 0.
+
+	u64 child_stack_base;
+	u64 child_stack_size;
+
+		These two fields correspond to the 'child_stack' fields
+		in clone() and clone2() system calls (on IA64).
+
+	u64 parent_tid_ptr;
+	u64 child_tid_ptr;
+
+		These two fields correspond to the 'parent_tid_ptr' and
+		'child_tid_ptr' fields in the clone() system call
+
+	u32 nr_pids;
+
+		nr_pids specifies the number of pids in the @pids array
+		parameter to clone3() (see below). nr_pids should not exceed
+		the current nesting level of the calling process (i.e if the
+		process is in init_pid_ns, nr_pids must be 1, if process is
+		in a pid namespace that is a child of init-pid-ns, nr_pids
+		cannot exceed 2, and so on).
+
+	u32 clone_args_size;
+
+		clone_args_size specifes the sizeof(struct clone_args) and is
+		intended to enable extending this structure in the future,
+		while preserving backward compatibility.  For now, this field
+		must be set to the sizeof(struct clone_args) and this size must
+		match the kernel's view of the structure.
+
+	u64 reserved1;
+
+		reserved1 is intended to enable extending the functionality
+		of the clone3() system call in the future, while preserving
+		backward compatibility. It must currently be set to 0.
+
+
+	The @pids parameter defines the set of pids that should be assigned to
+	the child process in its active and ancestor pid name spaces. The
+	descendant pid namespaces do not matter since a process does not have a
+	pid in descendant namespaces, unless the process is in a new pid
+	namespace in which case the process is a container-init (and must have
+	the pid 1 in that namespace).
+
+	See CLONE_NEWPID section of clone(2) man page for details about pid
+	namespaces.
+
+	The order pids in @pids corresponds to the nesting order of pid-
+	namespaces, with @pids[0] corresponding to the init_pid_ns.
+
+	If a pid in the @pids list is 0, the kernel will assign the next
+	available pid in the pid namespace, for the process.
+
+	If a pid in the @pids list is non-zero, the kernel tries to assign
+	the specified pid in that namespace.  If that pid is already in use
+	by another process, the system call fails (see EBUSY below).
+
+	On success, the system call returns the pid of the child process in
+	the parent's active pid namespace.
+
+	On failure, clone3() returns -1 and sets 'errno' to one of following
+	values (the child process is not created).
+
+	EPERM	Caller does not have the SYS_ADMIN privilege needed to excute
+		this call.
+
+	EINVAL	The number of pids specified in 'clone_args.nr_pids' exceeds
+		the current nesting level of parent process
+
+	EINVAL	Not all specified clone-flags are valid.
+
+	EINVAL	The reserved fields in the clone_args argument are not 0.
+
+	EBUSY	A requested pid is in use by another process in that name space.
+
+---
+/* Example usage of clone3() on i386 */
+
+#include <stdio.h>
+#include <signal.h>
+#include <errno.h>
+
+#define __NR_clone3	337
+#define TEST_PID	399
+#define STACKSIZE	8192
+
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int pid_t;
+
+struct clone_args {
+	u64 clone_flags_high;
+	u64 child_stack_base;
+	u64 child_stack_size;
+	u64 parent_tid_ptr;
+	u64 child_tid_ptr;
+	u32 nr_pids;
+	u32 clone_args_size;
+	u64 reserved1;
+};
+
+int do_child(void *arg)
+{
+	printf("Child, pid %d, arg %s\n", getpid(), arg);
+
+	if (getpid() != TEST_PID)
+		printf("Expected pid %d, actual %d\n", TEST_PID, getpid());
+
+	_Exit(0);
+}
+
+main()
+{
+	int rc;
+	void **stack;
+	struct clone_args cargs;
+
+	u32 flags_low 	= SIGCHLD;
+	char *arg_str 	= "Args for child: abcdefg";
+	pid_t pids[] 	= { 377, TEST_PID };
+
+	stack = (void **)(malloc(STACKSIZE) + STACKSIZE - 1);
+
+	/* Set up stack for child */
+	*--stack = arg_str;
+	*--stack = NULL;
+	*--stack = do_child;
+
+	cargs.clone_flags_high = (u64)0;
+	cargs.child_stack_base = (u64)stack;
+	cargs.child_stack_size = (u64)0;
+
+	cargs.nr_pids = 2;              /* assumes we are in a child pid ns */
+	cargs.parent_tid_ptr = (u64)0;
+	cargs.child_tid_ptr = (u64)0;
+
+	cargs.clone_args_size = sizeof(cargs);
+	cargs.reserved1 = (u64)0;
+
+	rc = syscall(__NR_clone3, flags_low, &cargs, &pids);
+
+	if (rc != TEST_PID) {
+		printf("Parent: expected rc %d, actual %d, errno %d\n",
+				 TEST_PID, rc, errno);
+	} else {
+		printf("Parent: clone3() returns %d, errno %d\n", rc, errno);
+	}
+
+	waitpid(-1, NULL, 0);
+}
-- 

  parent reply	other threads:[~2009-10-25  3:40 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-25  3:35 [v9][PATCH 0/9] Implement clone3() system call Sukadev Bhattiprolu
     [not found] ` <20091025033508.GA20327-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-25  3:37   ` [v9][PATCH 1/9] Factor out code to allocate pidmap page Sukadev Bhattiprolu
2009-10-25  3:37   ` Sukadev Bhattiprolu
2009-10-25  3:37   ` [v9][PATCH 2/9] Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
2009-10-25  3:37   ` Sukadev Bhattiprolu
2009-10-25  3:38   ` [v9][PATCH 3/9] Define set_pidmap() function Sukadev Bhattiprolu
2009-10-25  3:38   ` Sukadev Bhattiprolu
2009-10-25  3:38   ` [v9][PATCH 4/9] Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
2009-10-25  3:38   ` Sukadev Bhattiprolu
2009-10-25  3:39   ` [v9][PATCH 5/9] Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-10-25  3:39   ` Sukadev Bhattiprolu
2009-10-25  3:39   ` [v9][PATCH 6/9] Check invalid clone flags Sukadev Bhattiprolu
     [not found]     ` <20091025033937.GG20327-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-25 17:08       ` Oren Laadan
2009-10-25 17:08       ` Oren Laadan
2009-10-25  3:39   ` Sukadev Bhattiprolu
2009-10-25  3:39   ` [v9][PATCH 7/9] Define do_fork_with_pids() Sukadev Bhattiprolu
2009-10-25  3:39   ` Sukadev Bhattiprolu
2009-10-25  3:40   ` [v9][PATCH 8/9] Define clone3() syscall Sukadev Bhattiprolu
2009-10-25  3:40   ` Sukadev Bhattiprolu
     [not found]     ` <20091025034023.GI20327-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-25 17:17       ` Linus Torvalds
2009-10-25 17:23       ` Oren Laadan
2009-10-25 17:23       ` Oren Laadan
2009-10-25  3:40   ` Sukadev Bhattiprolu [this message]
     [not found]     ` <20091025034050.GJ20327-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-25 17:21       ` [v9][PATCH 9/9] Document " Oren Laadan
2009-10-25 17:21       ` Oren Laadan
2009-10-25  3:40   ` Sukadev Bhattiprolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091025034050.GJ20327@us.ibm.com \
    --to=sukadev-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
    --cc=Louis.Rilling-aw0BnHfMbSpBDgjK7y7TUQ@public.gmane.org \
    --cc=adobriyan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=akpm-3NddpPZAyC0@public.gmane.org \
    --cc=arnd-r2nGTMty4D4@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org \
    --cc=kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=matthltc-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=mikew-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=mingo-X9Un+BFzKDI@public.gmane.org \
    --cc=mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org \
    --cc=nathanl-V7BBcbaFuwjMbYB6QlFGEg@public.gmane.org \
    --cc=orenl-eQaUEPhvms7ENvBUuze7eA@public.gmane.org \
    --cc=pavel-+ZI9xUNit7I@public.gmane.org \
    --cc=peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
    --cc=randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=roland-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.