From: Daniel Lezcano <daniel.lezcano@free.fr>
To: Oren Laadan <orenl@librato.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>,
randy.dunlap@oracle.com, arnd@arndb.de,
linux-api@vger.kernel.org,
Containers <containers@lists.linux-foundation.org>,
Nathan Lynch <nathanl@austin.ibm.com>,
linux-kernel@vger.kernel.org, Louis.Rilling@kerlabs.com,
"Eric W. Biederman" <ebiederm@xmission.com>,
kosaki.motohiro@jp.fujitsu.com, hpa@zytor.com, mingo@elte.hu,
torvalds@linux-foundation.org,
Alexey Dobriyan <adobriyan@gmail.com>,
roland@redhat.com, Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [RFC][v8][PATCH 0/10] Implement clone3() system call
Date: Thu, 22 Oct 2009 13:22:49 +0200 [thread overview]
Message-ID: <4AE04089.9020907@free.fr> (raw)
In-Reply-To: <4ADF56D4.8030405@librato.com>
Oren Laadan wrote:
>
> Daniel Lezcano wrote:
>> Oren Laadan wrote:
>>> Daniel Lezcano wrote:
>> [ ... ]
>>
>>>> I forgot to mention a constraint with the specified pid : P2 has to
>>>> be child of P1.
>>>> In other word, you can not specify a pid to clonat which is not your
>>>> descendant (including yourself).
>>>> With this constraint I think there is no security issues.
>>> Sounds dangerous. What if your descendant executed a setuid program ?
>> That does not happen because you inherit the context of the caller.
>>
>>>> Concerning of forking on behalf of another process, we can consider
>>>> it is up to the caller / programmer to know what it does. If a
>>>> process in
>>> Before the user can program with this syscall, _you_ need to define
>>> the semantics of this syscall.
>> Yes, you are right. Here it is the proposition of the semantics.
>>
>> Function prototype is:
>>
>> pid_t cloneat(pid_t pid, pid_t hint, struct clone_args *args);
>>
>> Structure types are:
>>
>> typedef int clone_flag_t;
>>
>> struct clone_args {
>> clone_flag_t *flags;
>> int flags_size;
>> u32 reserved1;
>> u32 reserved2;
>> u64 child_stack_base;
>> u64 child_stack_size;
>> u64 parent_tid_ptr;
>> u64 child_tid_ptr;
>> u64 reserved3;
>> };
>>
>> With the helper macros:
>>
>> void CLONE_SET(int flag, clone_flag_t *flags);
>> void CLONE_CLR(int flag, clone_flag_t *flags);
>> bool CLONE_ISSET(int flag, clone_flag_t *flags);
>> void CLONE_ZERO(flag_t *clone_flags);
>>
>> And:
>>
>> #define CLONEXT_VM 0x20 /* CLONE_VM>>3 */ #define CLONEXT_FS
>> 0x21
>> #define CLONEXT_FILES 0x22
>> ...
>>
>
> The main motivation for your new syscall is to make it possible to
> inject a process into a namespace. IOW, what you are proposing is
> a new incarnation of sys_hijack().
>
> This is _orthogonal_ to the current discussion, which is about an
> extension for clone to allow (a) choosing target pid(s), (b) more
> flags, and (c) future extensions.
>
> (Your suggested syscall may, too, allow the request a specific set
> of pids for the child process, and reuse the current code for that).
>
> I suggest that you start a new thread about your RFC. This will
> reduce distractions on the current thread, and bring more focus to
> your proposal. I surely will post some comments there :)
I can argue exactly the same thing, the main motivation for your new
syscall is to make it possible to restart a process tree for a
checkpoint / restart and this is orthogonal with adding extended clone
flags :)
But my main motivation is to have the possibility to a) choose a target
__and__ b) clone the process relatively to another one. These 2 features
allows to do what *we* need, that is recreate a process tree and the
bonus with this approach is the ability to inject a process into a
namespace, something asked by several people, eg. debug with gdb an
application running into another pid namespace (is not supported today).
I am sorry for coming late in the discussion and for distracting.
> [...]
>
>> The cloneat syscall can be used for the following use cases:
>>
>> * checkpoint / restart:
>>
>> The restart can be done with a clone(.., CLONE_NEWPID|...);
>> Then the new pid (aka pid 1) retrieves the proctree from the statefile
>> and creates the different tasks with the process hierarchy with the
>> cloneat syscall.
>
> s/cloneat/$CLONE3/
> (hint: this is how it's done now)
Of course, what is described is what you does with 'clone3' !
Do you think I will come proposing a variant of 'clone3' not doing what
you need ? :)
>> The proctree creation can be done from outside of the pid namespace or
>> from inside.
>
> Ew .. why would you do that ?
And why not. Is there a semantic specifying how a process tree should be
recreated ?
>> Concerning nested pid namespaces, IMHO I would not try to checkpoint /
>> restart them. The checkpoint of a nested pid namespace should be
>> forbidden except for the leaf of a pid namespaces tree. That should
>
> Others (me included) *will* try and may get upset if forbidden...
> Seriously, there is no technical reason to restrict this.
Ok.
> >> Can you define more precisely what you mean by "enter" the container ?
>>> If you simply want create a new process in the container, you can
>>> achieve the same thing with a daemon, or a smart init process (in
>>> there), or even ptrace tricks.
>> Yes, you can launch a daemon inside the container, that works for a
>> system container because the container is killed by killing the first
>> process of the container or by a shutdown inside the container (not
>> fully implemented in the kernel).
>> But this is unreliable for application containers, I won't enter in the
>> details but the container exits when the application exits, with a
>> daemon inside the container, this is no longer the case because you can
>> not detect the application death as the daemon is always there.
>>
>> With cloneat you restrict the life cycle of the command you launched,
>> that is the container exits as soon as all the processes exited the
>> container, including the spawned command itself.
>
> Then start a daemon _in addition_ to the application, or write a
> daemon that will launch the application and monitor it... And also
> there is ptrace -
Already tried :)
http://lxc.git.sourceforge.net/git/gitweb.cgi?p=lxc/lxc;a=blob;f=src/lxc/lxc_cinit.c;h=8f235483c1a9d9c9e0cc1ba69f1c33f1bc98b8aa;hb=57ff723f6a174a2a01c58c6ac367d118ef12b91c
> But, please let's take this off to a new thread about adding how to
> add a process into a namespace from the outside. FYI, I do think
> such an interface may be useful and nicer than the two alternatives
> I suggested above.
>
>>> Also, there is a reason why sys_hijack() was hijacked away ... And
>>> I honestly think that a syscall to force another process to clone
>>> would be shot down by the kernel guys.
>> Maybe, maybe not. CLONE_PARENT exists and looks similar to cloneat.
>
> Actually, I misread previously; I mean not forcing another process
> to clone, but instead forcing another process to become a parent (and
> I shall ignore the ethical issues :)
>
> I still suspect it won't be welcome. Several people would have liked
> to see CLONE_PARENT go away, too, if that was possible without breaking
> userspace applications. Yet another reason to take it to a discussion
> of its own.
At this point, I am hesitating of creating a new thread for this
discussion. Because, there will be:
* clone
* clone2
* clone3
and we will discuss again about a new clone syscall with a different API :(
I will not continue arguing on this thread except if someone is in favor
of cloneat.
Otherwise, I will spawn a new thread later.
Thanks
-- Daniel
next prev parent reply other threads:[~2009-10-22 11:22 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-13 4:49 [RFC][v8][PATCH 0/10] Implement clone3() system call Sukadev Bhattiprolu
2009-10-13 4:49 ` [RFC][v8][PATCH 1/10]: Factor out code to allocate pidmap page Sukadev Bhattiprolu
2009-10-13 4:50 ` [RFC][v8][PATCH 2/10]: Have alloc_pidmap() return actual error code Sukadev Bhattiprolu
2009-10-13 4:50 ` [RFC][v8][PATCH 3/10]: Make pid_max a pid_ns property Sukadev Bhattiprolu
2009-10-13 5:19 ` Alexey Dobriyan
2009-10-13 13:09 ` Pavel Emelyanov
2009-10-13 15:24 ` Serge E. Hallyn
2009-10-13 16:10 ` Pavel Emelyanov
2009-10-13 16:28 ` Serge E. Hallyn
2009-10-13 4:51 ` [RFC][v8][PATCH 4/10]: Add target_pid parameter to alloc_pidmap() Sukadev Bhattiprolu
2009-10-13 11:50 ` Pavel Emelyanov
2009-10-15 0:24 ` Sukadev Bhattiprolu
2009-10-13 4:51 ` [RFC][v8][PATCH 5/10]: Add target_pids parameter to alloc_pid() Sukadev Bhattiprolu
2009-10-13 4:52 ` [RFC][v8][PATCH 6/10]: Add target_pids parameter to copy_process() Sukadev Bhattiprolu
2009-10-13 4:52 ` [RFC][v8][PATCH 7/10]: Check invalid clone flags Sukadev Bhattiprolu
2009-10-13 18:35 ` Oren Laadan
2009-10-13 23:38 ` Sukadev Bhattiprolu
2009-10-13 4:52 ` [RFC][v8][PATCH 8/10]: Define do_fork_with_pids() Sukadev Bhattiprolu
2009-10-13 4:54 ` [RFC][v8][PATCH 9/10]: Define clone3() syscall Sukadev Bhattiprolu
2009-10-13 18:46 ` Oren Laadan
2009-10-16 4:20 ` Sukadev Bhattiprolu
2009-10-16 6:25 ` Michael Kerrisk
2009-10-16 18:06 ` Sukadev Bhattiprolu
2009-10-19 17:44 ` Matt Helsley
2009-10-19 21:31 ` H. Peter Anvin
2009-10-19 23:50 ` Matt Helsley
2009-10-21 4:26 ` Michael Kerrisk
2009-10-21 13:03 ` H. Peter Anvin
2009-10-21 19:44 ` Sukadev Bhattiprolu
2009-10-21 22:03 ` H. Peter Anvin
2009-10-22 10:40 ` Michael Kerrisk
2009-10-22 18:10 ` Sukadev Bhattiprolu
2009-10-22 10:26 ` Michael Kerrisk
2009-10-22 11:38 ` H. Peter Anvin
2009-10-22 12:14 ` Michael Kerrisk
2009-10-22 12:19 ` H. Peter Anvin
2009-10-22 13:57 ` Matt Helsley
2009-10-13 4:55 ` [RFC][v8][PATCH 10/10]: Document " Sukadev Bhattiprolu
2009-10-14 12:26 ` Arnd Bergmann
2009-10-14 18:39 ` Sukadev Bhattiprolu
2009-10-19 21:36 ` Pavel Machek
2009-10-21 8:37 ` Arnd Bergmann
2009-10-21 9:33 ` Pavel Machek
2009-10-21 13:26 ` Arnd Bergmann
2009-10-21 18:27 ` Sukadev Bhattiprolu
2009-10-13 20:50 ` [RFC][v8][PATCH 0/10] Implement clone3() system call Roland McGrath
2009-10-13 23:27 ` Sukadev Bhattiprolu
2009-10-13 23:53 ` Roland McGrath
2009-10-14 1:13 ` H. Peter Anvin
2009-10-14 4:36 ` Sukadev Bhattiprolu
2009-10-14 4:38 ` H. Peter Anvin
2009-10-14 22:36 ` Sukadev Bhattiprolu
2009-10-14 22:49 ` H. Peter Anvin
2009-10-15 0:17 ` Sukadev Bhattiprolu
2009-10-13 23:49 ` H. Peter Anvin
2009-10-14 1:39 ` Matt Helsley
2009-10-14 2:24 ` H. Peter Anvin
2009-10-14 4:40 ` Sukadev Bhattiprolu
2009-10-14 4:50 ` H. Peter Anvin
2009-10-14 16:07 ` Serge E. Hallyn
2009-10-16 19:22 ` Daniel Lezcano
2009-10-16 19:44 ` Sukadev Bhattiprolu
2009-10-19 20:34 ` Daniel Lezcano
2009-10-19 21:47 ` Oren Laadan
2009-10-20 0:51 ` Matt Helsley
2009-10-20 3:33 ` Eric W. Biederman
2009-10-20 4:03 ` Sukadev Bhattiprolu
2009-10-20 10:46 ` Eric W. Biederman
2009-10-20 14:16 ` Serge E. Hallyn
2009-10-20 18:33 ` Sukadev Bhattiprolu
2009-10-20 19:26 ` Eric W. Biederman
2009-10-20 20:13 ` Oren Laadan
2009-10-21 6:20 ` Sukadev Bhattiprolu
2009-10-21 9:16 ` Eric W. Biederman
2009-10-21 18:52 ` Sukadev Bhattiprolu
2009-10-21 21:11 ` Eric W. Biederman
2009-10-23 0:42 ` Sukadev Bhattiprolu
2009-10-23 1:03 ` Eric W. Biederman
2009-10-23 5:30 ` Sukadev Bhattiprolu
2009-10-23 5:44 ` Eric W. Biederman
2009-10-23 19:21 ` Sukadev Bhattiprolu
2009-10-23 20:48 ` Sukadev Bhattiprolu
2009-10-23 23:26 ` Eric W. Biederman
2009-10-24 3:38 ` Sukadev Bhattiprolu
2009-10-23 19:16 ` Oren Laadan
2009-10-23 19:34 ` Oren Laadan
2009-10-23 23:12 ` Eric W. Biederman
2009-10-20 14:09 ` Serge E. Hallyn
2009-10-21 15:53 ` Daniel Lezcano
2009-10-21 18:45 ` Oren Laadan
2009-10-22 11:22 ` Daniel Lezcano [this message]
-- strict thread matches above, loose matches on Subject: below --
2009-10-26 9:38 Albert Cahalan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AE04089.9020907@free.fr \
--to=daniel.lezcano@free.fr \
--cc=Louis.Rilling@kerlabs.com \
--cc=adobriyan@gmail.com \
--cc=arnd@arndb.de \
--cc=containers@lists.linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=hpa@zytor.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nathanl@austin.ibm.com \
--cc=orenl@librato.com \
--cc=randy.dunlap@oracle.com \
--cc=roland@redhat.com \
--cc=sukadev@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox