* CLONE_PARENT after setns(CLONE_NEWPID) @ 2013-11-06 18:02 Serge Hallyn 2013-11-06 19:33 ` Oleg Nesterov 0 siblings, 1 reply; 15+ messages in thread From: Serge Hallyn @ 2013-11-06 18:02 UTC (permalink / raw) To: Oleg Nesterov, Christian Seiler Cc: lkml, Andy Whitcroft, Eric W. Biederman, Lxc development list Hi Oleg, commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" breaks lxc-attach in 3.12. That code forks a child which does setns() and then does a clone(CLONE_PARENT). That way the grandchild can be in the right namespaces (which the child was not) and be a child of the original task, which is the monitor. lxc-attach in 3.11 was working fine with no side effects that I could see. Is there a real danger in allowing CLONE_PARENT when current->nsproxy->pidns_for_children is not our pidns, or was this done out of an "over-abundance of caution"? Can we safely revert that new extra check? thanks, -serge ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 18:02 CLONE_PARENT after setns(CLONE_NEWPID) Serge Hallyn @ 2013-11-06 19:33 ` Oleg Nesterov 2013-11-06 19:50 ` Andy Lutomirski ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Oleg Nesterov @ 2013-11-06 19:33 UTC (permalink / raw) To: Serge Hallyn, Andy Lutomirski, Brad Spengler Cc: Christian Seiler, lkml, Andy Whitcroft, Eric W. Biederman, Lxc development list Hi Serge, On 11/06, Serge Hallyn wrote: > > Hi Oleg, > > commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : > "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" > breaks lxc-attach in 3.12. That code forks a child which does > setns() and then does a clone(CLONE_PARENT). That way the > grandchild can be in the right namespaces (which the child was > not) and be a child of the original task, which is the monitor. Thanks... Yes, this is what 40a0d32d1ea explicitly tries to disallow. > Is there a real danger in allowing CLONE_PARENT > when current->nsproxy->pidns_for_children is not our pidns, > or was this done out of an "over-abundance of caution"? I am not sure... This all was based on the long discussion, and it was decided that the CLONE_PARENT check should be consistent wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns(). > Can we > safely revert that new extra check? Well, usually we do not break user-space, but I am not sure about this case... Eric, Andy, what do you think? And if we allow CLONE_PARENT when ->pidns_for_children was changed, should we also allow, say, CLONE_NEWPID && CLONE_PARENT ? Oleg. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 19:33 ` Oleg Nesterov @ 2013-11-06 19:50 ` Andy Lutomirski 2013-11-06 20:06 ` Oleg Nesterov 2013-11-06 22:50 ` Eric W. Biederman 2013-11-06 22:53 ` Serge Hallyn 2 siblings, 1 reply; 15+ messages in thread From: Andy Lutomirski @ 2013-11-06 19:50 UTC (permalink / raw) To: Oleg Nesterov Cc: Serge Hallyn, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Eric W. Biederman, Lxc development list On Wed, Nov 6, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote: > Hi Serge, > > On 11/06, Serge Hallyn wrote: >> >> Hi Oleg, >> >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" >> breaks lxc-attach in 3.12. That code forks a child which does >> setns() and then does a clone(CLONE_PARENT). That way the >> grandchild can be in the right namespaces (which the child was >> not) and be a child of the original task, which is the monitor. > > Thanks... > > Yes, this is what 40a0d32d1ea explicitly tries to disallow. > >> Is there a real danger in allowing CLONE_PARENT >> when current->nsproxy->pidns_for_children is not our pidns, >> or was this done out of an "over-abundance of caution"? > > I am not sure... This all was based on the long discussion, and > it was decided that the CLONE_PARENT check should be consistent > wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns(). > >> Can we >> safely revert that new extra check? > > Well, usually we do not break user-space, but I am not sure about > this case... Presumably if we allow this, then we should also allow clone(CLONE_NEWPID | CLONE_PARENT). This seems a little odd, but off the top of my head it doesn't seem obviously dangerous. (Why were we worried about this in the first place? The comment says that we don't want signal handlers or thread groups to span namespaces, but CLONE_PARENT has nothing to do with that.) I feel like I'm rehashing something ancient, but shouldn't that code just be: if (clone_flags & CLONE_VM) { // check for unsharing namespaces with an update to the comment that CLONE_THREAD and CLONE_SIGHAND both require CLONE_VM. --Andy ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 19:50 ` Andy Lutomirski @ 2013-11-06 20:06 ` Oleg Nesterov 2013-11-06 20:21 ` Andy Lutomirski 0 siblings, 1 reply; 15+ messages in thread From: Oleg Nesterov @ 2013-11-06 20:06 UTC (permalink / raw) To: Andy Lutomirski Cc: Serge Hallyn, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Eric W. Biederman, Lxc development list On 11/06, Andy Lutomirski wrote: > > On Wed, Nov 6, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > Hi Serge, > > > > On 11/06, Serge Hallyn wrote: > >> > >> Hi Oleg, > >> > >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : > >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" > >> breaks lxc-attach in 3.12. That code forks a child which does > >> setns() and then does a clone(CLONE_PARENT). That way the > >> grandchild can be in the right namespaces (which the child was > >> not) and be a child of the original task, which is the monitor. > > > > Thanks... > > > > Yes, this is what 40a0d32d1ea explicitly tries to disallow. > > > >> Is there a real danger in allowing CLONE_PARENT > >> when current->nsproxy->pidns_for_children is not our pidns, > >> or was this done out of an "over-abundance of caution"? > > > > I am not sure... This all was based on the long discussion, and > > it was decided that the CLONE_PARENT check should be consistent > > wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns(). > > > >> Can we > >> safely revert that new extra check? > > > > Well, usually we do not break user-space, but I am not sure about > > this case... > > Presumably if we allow this, then we should also allow > clone(CLONE_NEWPID | CLONE_PARENT). Yes, agreed. but this means another change, this was forbidden even before this commit. > This seems a little odd, but off > the top of my head it doesn't seem obviously dangerous. I do not see any "strong" reason too. At least right now... But I would say that it would be better to not allow to abuse ->real_parent, it doesn't event know about the new child (if CLONE_PARENT). > (Why were we worried about this in the first place? The comment says > that we don't want signal handlers or thread groups to span > namespaces, but CLONE_PARENT has nothing to do with that.) it also says "or parent" ;) > I feel like I'm rehashing something ancient, but shouldn't that code just be: > > if (clone_flags & CLONE_VM) { > // check for unsharing namespaces No, this will break vfork(). And note that CLONE_SIGHAND was disallowed "just in case" and because do_fork() had a similar check. Sharing the signal handlers is fine afaics. >From e79f525e: We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it would be safer to not do this. The current check denies CLONE_SIGHAND implicitely and there is no reason to change this. And I disagree with Eric said "CLONE_SIGHAND is fine. CLONE_THREAD would be even better. Having shared signal handling between two different pid namespaces is the case that we are fundamentally guarding against." added during the merging ;) Or perhaps I misunderstood the text above. But this all is off-topic. Oleg. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 20:06 ` Oleg Nesterov @ 2013-11-06 20:21 ` Andy Lutomirski 0 siblings, 0 replies; 15+ messages in thread From: Andy Lutomirski @ 2013-11-06 20:21 UTC (permalink / raw) To: Oleg Nesterov Cc: Serge Hallyn, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Eric W. Biederman, Lxc development list On Wed, Nov 6, 2013 at 12:06 PM, Oleg Nesterov <oleg@redhat.com> wrote: > On 11/06, Andy Lutomirski wrote: >> >> On Wed, Nov 6, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote: >> > Hi Serge, >> > >> > On 11/06, Serge Hallyn wrote: >> >> >> >> Hi Oleg, >> >> >> >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : >> >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" >> >> breaks lxc-attach in 3.12. That code forks a child which does >> >> setns() and then does a clone(CLONE_PARENT). That way the >> >> grandchild can be in the right namespaces (which the child was >> >> not) and be a child of the original task, which is the monitor. >> > >> > Thanks... >> > >> > Yes, this is what 40a0d32d1ea explicitly tries to disallow. >> > >> >> Is there a real danger in allowing CLONE_PARENT >> >> when current->nsproxy->pidns_for_children is not our pidns, >> >> or was this done out of an "over-abundance of caution"? >> > >> > I am not sure... This all was based on the long discussion, and >> > it was decided that the CLONE_PARENT check should be consistent >> > wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns(). >> > >> >> Can we >> >> safely revert that new extra check? >> > >> > Well, usually we do not break user-space, but I am not sure about >> > this case... >> >> Presumably if we allow this, then we should also allow >> clone(CLONE_NEWPID | CLONE_PARENT). > > Yes, agreed. but this means another change, this was forbidden even > before this commit. I'm really not a fan of allowing things by one path but disallowing them by another. That way lurk hidden bugs and incomprehensibilities... > >> This seems a little odd, but off >> the top of my head it doesn't seem obviously dangerous. > > I do not see any "strong" reason too. At least right now... But I would > say that it would be better to not allow to abuse ->real_parent, it > doesn't event know about the new child (if CLONE_PARENT). I'm not sure what namespaces have to do with this, though. The grandparent may be surprised to acquire a new child, but I don't see why it would care about the security context of that child. I admit that this stuff is complicated, but I don't see the problem that any of this is trying to prevent. > >> (Why were we worried about this in the first place? The comment says >> that we don't want signal handlers or thread groups to span >> namespaces, but CLONE_PARENT has nothing to do with that.) > > it also says "or parent" ;) > >> I feel like I'm rehashing something ancient, but shouldn't that code just be: >> >> if (clone_flags & CLONE_VM) { >> // check for unsharing namespaces > > No, this will break vfork(). > > And note that CLONE_SIGHAND was disallowed "just in case" and because > do_fork() had a similar check. Sharing the signal handlers is fine afaics. > > From e79f525e: > > We could probably even drop CLONE_SIGHAND and use CLONE_THREAD, but it > would be safer to not do this. The current check denies CLONE_SIGHAND > implicitely and there is no reason to change this. > > And I disagree with > > Eric said "CLONE_SIGHAND is fine. CLONE_THREAD would be even better. > Having shared signal handling between two different pid namespaces is > the case that we are fundamentally guarding against." > > added during the merging ;) Or perhaps I misunderstood the text above. But this > all is off-topic. > > Oleg. > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 19:33 ` Oleg Nesterov 2013-11-06 19:50 ` Andy Lutomirski @ 2013-11-06 22:50 ` Eric W. Biederman 2013-11-06 22:56 ` Andy Lutomirski ` (4 more replies) 2013-11-06 22:53 ` Serge Hallyn 2 siblings, 5 replies; 15+ messages in thread From: Eric W. Biederman @ 2013-11-06 22:50 UTC (permalink / raw) To: Oleg Nesterov Cc: Serge Hallyn, Andy Lutomirski, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Lxc development list Oleg Nesterov <oleg@redhat.com> writes: > Hi Serge, > > On 11/06, Serge Hallyn wrote: >> >> Hi Oleg, >> >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" >> breaks lxc-attach in 3.12. That code forks a child which does >> setns() and then does a clone(CLONE_PARENT). That way the >> grandchild can be in the right namespaces (which the child was >> not) and be a child of the original task, which is the monitor. Serge that is a clever trick to get around the limitation that we can not change the pid namespace of our current process. Given the challenging relaying of signals etc I can see why you would use this. At the same time it makes me a little sad to see new users of CLONE_PARENT. With CLONE_THREAD in existence the original reasons for CLONE_PARENT are gone now. Having used bash as an init process I know it can handle unexpeted children. However using CLONE_PARENT in this way still seems a little dodgy. Or am I misunderstanding why you are using CLONE_PARENT? That trick sounds like it might be worth adding to nsenter in util-linux just to simplify the code. > Thanks... > > Yes, this is what 40a0d32d1ea explicitly tries to disallow. > >> Is there a real danger in allowing CLONE_PARENT >> when current->nsproxy->pidns_for_children is not our pidns, >> or was this done out of an "over-abundance of caution"? > > I am not sure... This all was based on the long discussion, and > it was decided that the CLONE_PARENT check should be consistent > wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns(). > >> Can we >> safely revert that new extra check? > > Well, usually we do not break user-space, but I am not sure about > this case... > > Eric, Andy, what do you think? > > And if we allow CLONE_PARENT when ->pidns_for_children was changed, > should we also allow, say, CLONE_NEWPID && CLONE_PARENT ? The two fundamental things I know we can not allow are: - A shared signal queue aka CLONE_THREAD. Because we compute the pid and uid of the signal when we place it in the queue. - Changing the pid and by extention pid_namespace of an existing process. >From a parents perspective there is nothing special about the pid namespace, to deny CLONE_PARENT, because the parent simply won't know or care. >From the childs perspective all that is special really are shared signal queues. User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks in different pid namespaces is almost certainly going to break because it is complicated. But shared signal handlers can look at per thread information to know which pid namespace a process is in, so I don't know of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads at the kernel level. It would be absolutely stupid to implement but that is a different thing. So hmm. Because it can do no harm, and because it is a regression let's remove the CLONE_PARENT check and send it stable. diff --git a/kernel/fork.c b/kernel/fork.c index 086fe73..c447fbc 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, * do not allow it to share a thread group or signal handlers or * parent with the forking task. */ - if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) { + if (clone_flags & (CLONE_SIGHAND)) { if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) || (task_active_pid_ns(current) != current->nsproxy->pid_ns_for_children)) I don't know if there are shells that CLONE_PARENT can confuse but if there are lxcattach and nsenter using this functionality should be enough to slowly get that confusion fixed. Changing the CLONE_SIGHAND into CLONE_THREAD will need to happen in a separate patch. It isn't stable material, and so far there is no compelling use case for it. Eric ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 22:50 ` Eric W. Biederman @ 2013-11-06 22:56 ` Andy Lutomirski 2013-11-06 23:17 ` Serge Hallyn 2013-11-06 23:12 ` Serge Hallyn ` (3 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Andy Lutomirski @ 2013-11-06 22:56 UTC (permalink / raw) To: Eric W. Biederman Cc: Oleg Nesterov, Serge Hallyn, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Lxc development list On Wed, Nov 6, 2013 at 2:50 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Oleg Nesterov <oleg@redhat.com> writes: > >> Hi Serge, >> >> On 11/06, Serge Hallyn wrote: >>> >>> Hi Oleg, >>> >>> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : >>> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" >>> breaks lxc-attach in 3.12. That code forks a child which does >>> setns() and then does a clone(CLONE_PARENT). That way the >>> grandchild can be in the right namespaces (which the child was >>> not) and be a child of the original task, which is the monitor. > > Serge that is a clever trick to get around the limitation that we can > not change the pid namespace of our current process. Given the > challenging relaying of signals etc I can see why you would use this. > > At the same time it makes me a little sad to see new users of > CLONE_PARENT. With CLONE_THREAD in existence the original reasons for > CLONE_PARENT are gone now. > > Having used bash as an init process I know it can handle unexpeted > children. However using CLONE_PARENT in this way still seems a little > dodgy. Or am I misunderstanding why you are using CLONE_PARENT? > > That trick sounds like it might be worth adding to nsenter in util-linux > just to simplify the code. > >> Thanks... >> >> Yes, this is what 40a0d32d1ea explicitly tries to disallow. >> >>> Is there a real danger in allowing CLONE_PARENT >>> when current->nsproxy->pidns_for_children is not our pidns, >>> or was this done out of an "over-abundance of caution"? >> >> I am not sure... This all was based on the long discussion, and >> it was decided that the CLONE_PARENT check should be consistent >> wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns(). >> >>> Can we >>> safely revert that new extra check? >> >> Well, usually we do not break user-space, but I am not sure about >> this case... >> >> Eric, Andy, what do you think? >> >> And if we allow CLONE_PARENT when ->pidns_for_children was changed, >> should we also allow, say, CLONE_NEWPID && CLONE_PARENT ? > > The two fundamental things I know we can not allow are: > - A shared signal queue aka CLONE_THREAD. Because we compute the pid > and uid of the signal when we place it in the queue. > > - Changing the pid and by extention pid_namespace of an existing > process. > > From a parents perspective there is nothing special about the pid > namespace, to deny CLONE_PARENT, because the parent simply won't know or > care. > > From the childs perspective all that is special really are shared signal > queues. > > User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks > in different pid namespaces is almost certainly going to break because > it is complicated. But shared signal handlers can look at per thread > information to know which pid namespace a process is in, so I don't know > of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads > at the kernel level. It would be absolutely stupid to implement but > that is a different thing. > > So hmm. > > Because it can do no harm, and because it is a regression let's remove > the CLONE_PARENT check and send it stable. > > diff --git a/kernel/fork.c b/kernel/fork.c > index 086fe73..c447fbc 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, > * do not allow it to share a thread group or signal handlers or > * parent with the forking task. > */ > - if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) { > + if (clone_flags & (CLONE_SIGHAND)) { > if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) || > (task_active_pid_ns(current) != > current->nsproxy->pid_ns_for_children)) > Acked-by: Andy Lutomirski <luto@amacapital.net> > > I don't know if there are shells that CLONE_PARENT can confuse but if > there are lxcattach and nsenter using this functionality should be > enough to slowly get that confusion fixed. > > Changing the CLONE_SIGHAND into CLONE_THREAD will need to happen in a > separate patch. It isn't stable material, and so far there is no > compelling use case for it. > > Eric -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 22:56 ` Andy Lutomirski @ 2013-11-06 23:17 ` Serge Hallyn 0 siblings, 0 replies; 15+ messages in thread From: Serge Hallyn @ 2013-11-06 23:17 UTC (permalink / raw) To: Andy Lutomirski Cc: Eric W. Biederman, Oleg Nesterov, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Lxc development list Quoting Andy Lutomirski (luto@amacapital.net): > On Wed, Nov 6, 2013 at 2:50 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Oleg Nesterov <oleg@redhat.com> writes: > > > >> Hi Serge, > >> > >> On 11/06, Serge Hallyn wrote: > >>> > >>> Hi Oleg, > >>> > >>> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : > >>> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" > >>> breaks lxc-attach in 3.12. That code forks a child which does > >>> setns() and then does a clone(CLONE_PARENT). That way the > >>> grandchild can be in the right namespaces (which the child was > >>> not) and be a child of the original task, which is the monitor. > > > > Serge that is a clever trick to get around the limitation that we can > > not change the pid namespace of our current process. Given the > > challenging relaying of signals etc I can see why you would use this. > > > > At the same time it makes me a little sad to see new users of > > CLONE_PARENT. With CLONE_THREAD in existence the original reasons for > > CLONE_PARENT are gone now. > > > > Having used bash as an init process I know it can handle unexpeted > > children. However using CLONE_PARENT in this way still seems a little > > dodgy. Or am I misunderstanding why you are using CLONE_PARENT? > > > > That trick sounds like it might be worth adding to nsenter in util-linux > > just to simplify the code. > > > >> Thanks... > >> > >> Yes, this is what 40a0d32d1ea explicitly tries to disallow. > >> > >>> Is there a real danger in allowing CLONE_PARENT > >>> when current->nsproxy->pidns_for_children is not our pidns, > >>> or was this done out of an "over-abundance of caution"? > >> > >> I am not sure... This all was based on the long discussion, and > >> it was decided that the CLONE_PARENT check should be consistent > >> wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns(). > >> > >>> Can we > >>> safely revert that new extra check? > >> > >> Well, usually we do not break user-space, but I am not sure about > >> this case... > >> > >> Eric, Andy, what do you think? > >> > >> And if we allow CLONE_PARENT when ->pidns_for_children was changed, > >> should we also allow, say, CLONE_NEWPID && CLONE_PARENT ? > > > > The two fundamental things I know we can not allow are: > > - A shared signal queue aka CLONE_THREAD. Because we compute the pid > > and uid of the signal when we place it in the queue. > > > > - Changing the pid and by extention pid_namespace of an existing > > process. > > > > From a parents perspective there is nothing special about the pid > > namespace, to deny CLONE_PARENT, because the parent simply won't know or > > care. > > > > From the childs perspective all that is special really are shared signal > > queues. > > > > User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks > > in different pid namespaces is almost certainly going to break because > > it is complicated. But shared signal handlers can look at per thread > > information to know which pid namespace a process is in, so I don't know > > of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads > > at the kernel level. It would be absolutely stupid to implement but > > that is a different thing. > > > > So hmm. > > > > Because it can do no harm, and because it is a regression let's remove > > the CLONE_PARENT check and send it stable. > > > > diff --git a/kernel/fork.c b/kernel/fork.c > > index 086fe73..c447fbc 100644 > > --- a/kernel/fork.c > > +++ b/kernel/fork.c > > @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, > > * do not allow it to share a thread group or signal handlers or > > * parent with the forking task. > > */ > > - if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) { > > + if (clone_flags & (CLONE_SIGHAND)) { > > if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) || > > (task_active_pid_ns(current) != > > current->nsproxy->pid_ns_for_children)) > > > > Acked-by: Andy Lutomirski <luto@amacapital.net> Also (obviously) Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 22:50 ` Eric W. Biederman 2013-11-06 22:56 ` Andy Lutomirski @ 2013-11-06 23:12 ` Serge Hallyn 2013-11-06 23:31 ` Christian Seiler ` (2 subsequent siblings) 4 siblings, 0 replies; 15+ messages in thread From: Serge Hallyn @ 2013-11-06 23:12 UTC (permalink / raw) To: Eric W. Biederman Cc: Oleg Nesterov, Andy Lutomirski, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Lxc development list Quoting Eric W. Biederman (ebiederm@xmission.com): > Oleg Nesterov <oleg@redhat.com> writes: > > > Hi Serge, > > > > On 11/06, Serge Hallyn wrote: > >> > >> Hi Oleg, > >> > >> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : > >> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" > >> breaks lxc-attach in 3.12. That code forks a child which does > >> setns() and then does a clone(CLONE_PARENT). That way the > >> grandchild can be in the right namespaces (which the child was > >> not) and be a child of the original task, which is the monitor. > > Serge that is a clever trick to get around the limitation that we can > not change the pid namespace of our current process. Given the > challenging relaying of signals etc I can see why you would use this. > > At the same time it makes me a little sad to see new users of > CLONE_PARENT. With CLONE_THREAD in existence the original reasons for > CLONE_PARENT are gone now. > > Having used bash as an init process I know it can handle unexpeted > children. However using CLONE_PARENT in this way still seems a little > dodgy. Or am I misunderstanding why you are using CLONE_PARENT? FWIW Christian (cc:d from the start) was the author of that code, so he can correct me if i mis-speak, but IIUC the design is: 1. pid X is the first process running lxc-attach. It will be a monitor for the process which is entered into the container 2. pid X forks pid Y, which does setns(). Now if it is setns()ing into a pidns, it won't itself be in the new pidns, which is not satisfactory. So 3. pid Y clones pid Z with CLONE_PARENT. Y exists. Z continues, as a full member of the container, and a child of the monitor process. So yes, as you said it's exactly to work around the fact that pid Y can't change its own pidns. -serge ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 22:50 ` Eric W. Biederman 2013-11-06 22:56 ` Andy Lutomirski 2013-11-06 23:12 ` Serge Hallyn @ 2013-11-06 23:31 ` Christian Seiler 2013-11-08 17:22 ` Oleg Nesterov 2014-01-15 21:11 ` Christian Seiler 4 siblings, 0 replies; 15+ messages in thread From: Christian Seiler @ 2013-11-06 23:31 UTC (permalink / raw) To: Eric W. Biederman, Oleg Nesterov Cc: Serge Hallyn, Andy Lutomirski, Brad Spengler, lkml, Andy Whitcroft, Lxc development list Hi there, > Having used bash as an init process I know it can handle unexpeted > children. However using CLONE_PARENT in this way still seems a little > dodgy. Or am I misunderstanding why you are using CLONE_PARENT? Since I (re)wrote that part of LXC, I should perhaps clarify how that is used: In case of LXC, the grandparent is lxc-attach itself. The logic goes as follows: - user calls lxc-attach -n $container -- /bin/command/to/execute - lxc-attach does a fork() - child process does setns() - child process does clone(CLONE_PARENT) - child process exits - new process is now in all of the correct namespaces - new process does some IPC (socketpair() from before fork/clone) to tell original lxc-attach process to finish initialization (mainly: add new process to the proper cgroups) - new process exec()s to /bin/command/to/execute - original lxc-attach process waitpid()s for the attached process to exit So the only process that needs to handle a new child is going to be lxc-attach itself, but that is designed in such a way that it expects the new child. (The initial fork is necessary because once setns(userns, mntns) has occurred, the cgroup tree may not be writable anymore (depening on further circumstances), so it would be impossible to just do setns() and then fork() if one then wants to add the new process to the proper cgroups.) > That trick sounds like it might be worth adding to nsenter in util-linux > just to simplify the code. I think nsenter currently only does setns() and then fork(), which is simpler than lxc-attach - mainly because there's no need to attach the process to cgroups etc. lxc-attach's approach does not eliminate the need for the original process wait()ing on the attached process, the CLONE_PARENT is really just used internally to simplify the process hierarchy and also the IPC required. Also, re: general point in this thread: I don't see how CLONE_PARENT could be harmful in any way when used after setns() moreso than it might already be harmful without setns(). I could always write a program that just does clone(CLONE_PARENT) (and nothing else) and then the calling process would also get an unexpected child - I don't see how the pid namespace status of that child would change anything here. So I'd definitely be in favor of allowing CLONE_PARENT after setns(). Christian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 22:50 ` Eric W. Biederman ` (2 preceding siblings ...) 2013-11-06 23:31 ` Christian Seiler @ 2013-11-08 17:22 ` Oleg Nesterov 2014-01-15 21:11 ` Christian Seiler 4 siblings, 0 replies; 15+ messages in thread From: Oleg Nesterov @ 2013-11-08 17:22 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge Hallyn, Andy Lutomirski, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Lxc development list On 11/06, Eric W. Biederman wrote: > > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, > * do not allow it to share a thread group or signal handlers or > * parent with the forking task. > */ > - if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) { > + if (clone_flags & (CLONE_SIGHAND)) { > if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) || > (task_active_pid_ns(current) != > current->nsproxy->pid_ns_for_children)) OK, agreed. I failed to find any problem with CLONE_PARENT with CLONE_NEWUSER or after setns. And the main point of 40a0d32d1eaf was "make them consistent", not "tighten up". Besides, this doesn't differ too much from setns + fork() && exit(), the grandchild will have the new namespace and reparented. Acked-by: Oleg Nesterov <oleg@redhat.com> > Changing the CLONE_SIGHAND into CLONE_THREAD will need to happen in a > separate patch. It isn't stable material, and so far there is no > compelling use case for it. Yes. Again, 40a0d32d1eaf chose CLONE_SIGHAND to unify CLONE_NEWUSER/setns cases, copy_process() used this check. And in fact I voted for CLONE_THREAD from the very beginning, it was you who suggested to use CLONE_SIGHAND instead ;) OTOH, it was probably right to not relax the restrictions we already had. Oleg. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 22:50 ` Eric W. Biederman ` (3 preceding siblings ...) 2013-11-08 17:22 ` Oleg Nesterov @ 2014-01-15 21:11 ` Christian Seiler 2014-01-16 4:46 ` Serge Hallyn 4 siblings, 1 reply; 15+ messages in thread From: Christian Seiler @ 2014-01-15 21:11 UTC (permalink / raw) To: Eric W. Biederman, Oleg Nesterov Cc: Serge Hallyn, Andy Lutomirski, Brad Spengler, lkml, Andy Whitcroft, Lxc development list Eric W. Biederman writes: > So hmm. > > Because it can do no harm, and because it is a regression let's remove > the CLONE_PARENT check and send it stable. > > diff --git a/kernel/fork.c b/kernel/fork.c > index 086fe73..c447fbc 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, > * do not allow it to share a thread group or signal handlers or > * parent with the forking task. > */ > - if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) { > + if (clone_flags & (CLONE_SIGHAND)) { > if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) || > (task_active_pid_ns(current) != > current->nsproxy->pid_ns_for_children)) Just a short question, what happened to this patch? As far as I can tell, 3.13rc8 doesn't include it, neither does the current 3.12.7. This means that lxc-attach currently still doesn't work on 3.12 and probably won't work on 3.13 either... (3.11 is fine, see the previous mails in this thread.) -- Christian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2014-01-15 21:11 ` Christian Seiler @ 2014-01-16 4:46 ` Serge Hallyn 0 siblings, 0 replies; 15+ messages in thread From: Serge Hallyn @ 2014-01-16 4:46 UTC (permalink / raw) To: Christian Seiler Cc: Eric W. Biederman, Oleg Nesterov, Andy Lutomirski, Brad Spengler, lkml, Andy Whitcroft, Lxc development list Quoting Christian Seiler (christian@iwakd.de): > Eric W. Biederman writes: > >So hmm. > > > >Because it can do no harm, and because it is a regression let's remove > >the CLONE_PARENT check and send it stable. > > > >diff --git a/kernel/fork.c b/kernel/fork.c > >index 086fe73..c447fbc 100644 > >--- a/kernel/fork.c > >+++ b/kernel/fork.c > >@@ -1174,7 +1174,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, > > * do not allow it to share a thread group or signal handlers or > > * parent with the forking task. > > */ > >- if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) { > >+ if (clone_flags & (CLONE_SIGHAND)) { > > if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) || > > (task_active_pid_ns(current) != > > current->nsproxy->pid_ns_for_children)) > > Just a short question, what happened to this patch? As far as I can > tell, 3.13rc8 doesn't include it, neither does the current 3.12.7. This > means that lxc-attach currently still doesn't work on 3.12 and probably > won't work on 3.13 either... (3.11 is fine, see the previous mails in > this thread.) So, hm. I didn't realize it hadn't hit upstream, because it's in the ubuntu kernel (unfortunately wrongly attributed). However it is in linux-next since Nov 27. -serge ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 19:33 ` Oleg Nesterov 2013-11-06 19:50 ` Andy Lutomirski 2013-11-06 22:50 ` Eric W. Biederman @ 2013-11-06 22:53 ` Serge Hallyn 2013-11-06 22:53 ` Eric W. Biederman 2 siblings, 1 reply; 15+ messages in thread From: Serge Hallyn @ 2013-11-06 22:53 UTC (permalink / raw) To: Oleg Nesterov Cc: Andy Lutomirski, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Eric W. Biederman, Lxc development list Quoting Oleg Nesterov (oleg@redhat.com): > Hi Serge, > > On 11/06, Serge Hallyn wrote: > > > > Hi Oleg, > > > > commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e : > > "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks" > > breaks lxc-attach in 3.12. That code forks a child which does > > setns() and then does a clone(CLONE_PARENT). That way the > > grandchild can be in the right namespaces (which the child was > > not) and be a child of the original task, which is the monitor. > > Thanks... > > Yes, this is what 40a0d32d1ea explicitly tries to disallow. > > > Is there a real danger in allowing CLONE_PARENT > > when current->nsproxy->pidns_for_children is not our pidns, > > or was this done out of an "over-abundance of caution"? > > I am not sure... This all was based on the long discussion, and > it was decided that the CLONE_PARENT check should be consistent > wrt CLONE_NEWPID and pidns_for_children != task_active_pid_ns(). So apart from peers seeing the new task as having pid 0, and sigchild going to the grandparent, are there any other side effects? Is ptrace an issue? (I took a quick look but it doesn't seem like it) If not, then I very much think we should continue to allow this. -serge ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: CLONE_PARENT after setns(CLONE_NEWPID) 2013-11-06 22:53 ` Serge Hallyn @ 2013-11-06 22:53 ` Eric W. Biederman 0 siblings, 0 replies; 15+ messages in thread From: Eric W. Biederman @ 2013-11-06 22:53 UTC (permalink / raw) To: Serge Hallyn Cc: Oleg Nesterov, Andy Lutomirski, Brad Spengler, Christian Seiler, lkml, Andy Whitcroft, Lxc development list Serge Hallyn <serge.hallyn@ubuntu.com> writes: > So apart from peers seeing the new task as having pid 0, and > sigchild going to the grandparent, are there any other side > effects? Is ptrace an issue? (I took a quick look but it > doesn't seem like it) There is nothing new the pid namespace adds to the pid namespace case. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-01-16 4:46 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-11-06 18:02 CLONE_PARENT after setns(CLONE_NEWPID) Serge Hallyn 2013-11-06 19:33 ` Oleg Nesterov 2013-11-06 19:50 ` Andy Lutomirski 2013-11-06 20:06 ` Oleg Nesterov 2013-11-06 20:21 ` Andy Lutomirski 2013-11-06 22:50 ` Eric W. Biederman 2013-11-06 22:56 ` Andy Lutomirski 2013-11-06 23:17 ` Serge Hallyn 2013-11-06 23:12 ` Serge Hallyn 2013-11-06 23:31 ` Christian Seiler 2013-11-08 17:22 ` Oleg Nesterov 2014-01-15 21:11 ` Christian Seiler 2014-01-16 4:46 ` Serge Hallyn 2013-11-06 22:53 ` Serge Hallyn 2013-11-06 22:53 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox