All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/2] CLONE_PARENT and pid namespaces
@ 2009-06-18  2:47 Sukadev Bhattiprolu
       [not found] ` <20090618024743.GA31515-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2009-06-18  2:47 UTC (permalink / raw)
  To: serue-r/Jw6+rmf7HQT0dZR+AlfA, Eric W. Biederman, Oleg Nesterov,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov, Alexey
  Cc: Containers, sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	David C. Hansen


Its not clear what the semantics should be when CLONE_PARENT is combined
with pid namespaces. Patches in this set prevent CLONE_PARENT with pid
namespaces, at least until we have a better understanding of what the
semantics should be.

The patches seem to compile/boot, but could use more testing...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 1/2] Deny CLONE_PARENT|CLONE_NEWPID combination
       [not found] ` <20090618024743.GA31515-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-18  2:49   ` Sukadev Bhattiprolu
       [not found]     ` <20090618024934.GA31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-18  2:51   ` [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT Sukadev Bhattiprolu
  1 sibling, 1 reply; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2009-06-18  2:49 UTC (permalink / raw)
  To: serue-r/Jw6+rmf7HQT0dZR+AlfA, Eric W. Biederman, Oleg Nesterov,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov, Alexey
  Cc: Containers, David C. Hansen


Deny CLONE_PARENT|CLONE_NEWPID combination.

CLONE_PARENT was probably used to implement an older threading model.
If so, for consistency with CLONE_THREAD, the CLONE_PARENT|CLONE_NEWPID
combination should also fail with -EINVAL.

Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 kernel/pid_namespace.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-mmotm/kernel/pid_namespace.c
===================================================================
--- linux-mmotm.orig/kernel/pid_namespace.c	2009-06-17 18:19:42.000000000 -0700
+++ linux-mmotm/kernel/pid_namespace.c	2009-06-17 18:19:58.000000000 -0700
@@ -118,7 +118,7 @@ struct pid_namespace *copy_pid_ns(unsign
 {
 	if (!(flags & CLONE_NEWPID))
 		return get_pid_ns(old_ns);
-	if (flags & CLONE_THREAD)
+	if (flags & (CLONE_THREAD|CLONE_PARENT))
 		return ERR_PTR(-EINVAL);
 	return create_pid_namespace(old_ns);
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
       [not found] ` <20090618024743.GA31515-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-18  2:49   ` [RFC][PATCH 1/2] Deny CLONE_PARENT|CLONE_NEWPID combination Sukadev Bhattiprolu
@ 2009-06-18  2:51   ` Sukadev Bhattiprolu
       [not found]     ` <20090618025103.GB31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2009-06-18  2:51 UTC (permalink / raw)
  To: serue-r/Jw6+rmf7HQT0dZR+AlfA, Eric W. Biederman, Oleg Nesterov,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov, Alexey
  Cc: Containers, David C. Hansen


Prevent container-inits from using CLONE_PARENT

If a container-init creates a sibling (using CLONE_PARENT), pid namespace
semantics become complicated:

	- the "active pid namespace" of the sibling will be the descendant
	  container, but its not obvious if that is correct.

	- if container-init exits, it will terminate the sibling, but again
	  its not clear if that is the correct behavior.

	- the sibling exists in both parent and child containers while current
	  pid namespace semantics assume that only container-init can exist
	  in both parent/child containers.

	- the parent of the sibling is not a descendant of container-init
	  (while pid namespaces assume that all processes in the container
	  are descendants of the container-init)

	- When the sibling dies, the SIGCHLD is sent to its parent (if
	  alive), i.e the signal escapes the container to a parent container.
	  (if the parent of the sibling exits, the container-init then becomes
	  the reaper of the sibling).

To keep pid namespace semantics simple, prevent container-inits from using
CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
and pid-namespace interactions.

Untested, RFC patch :-)

Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 kernel/fork.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux-mmotm/kernel/fork.c
===================================================================
--- linux-mmotm.orig/kernel/fork.c	2009-06-17 18:23:23.000000000 -0700
+++ linux-mmotm/kernel/fork.c	2009-06-17 19:17:54.000000000 -0700
@@ -974,6 +974,14 @@ static struct task_struct *copy_process(
 	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
 		return ERR_PTR(-EINVAL);
 
+	/*
+	 * To keep pid namespace semantics simple, prevent container-inits
+	 * from creating siblings.
+	 */
+	if ((clone_flags & CLONE_PARENT) &&
+			is_container_init(current) && !is_global_init(current))
+		return ERR_PTR(-EINVAL);
+
 	retval = security_task_create(clone_flags);
 	if (retval)
 		goto fork_out;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
       [not found]     ` <20090618025103.GB31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-18  3:20       ` Eric W. Biederman
       [not found]         ` <m18wjqkz2i.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  2009-06-18  3:28       ` Roland McGrath
  2009-06-18 15:35       ` Oleg Nesterov
  2 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2009-06-18  3:20 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, David C. Hansen, Oleg Nesterov, Alexey Dobriyan,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> Prevent container-inits from using CLONE_PARENT
>
> If a container-init creates a sibling (using CLONE_PARENT), pid namespace
> semantics become complicated:
>
> 	- the "active pid namespace" of the sibling will be the descendant
> 	  container, but its not obvious if that is correct.

It is correct the sibling must not change pid namespaces.  You are not
allowed to escape out of a pid namespace.

> 	- if container-init exits, it will terminate the sibling, but again
> 	  its not clear if that is the correct behavior.

Again correct because the container-init is the child reaper for the pid namespace.
No reaper no namespace.

> 	- the sibling exists in both parent and child containers while current
> 	  pid namespace semantics assume that only container-init can exist
> 	  in both parent/child containers.

All tasks in the container also exist in the parent container.
What assumption are you talking about?

> 	- the parent of the sibling is not a descendant of container-init
> 	  (while pid namespaces assume that all processes in the container
> 	  are descendants of the container-init)

User space assumes that certainly.    What part of the pid namespace
code makes such an assumption?

> 	- When the sibling dies, the SIGCHLD is sent to its parent (if
> 	  alive), i.e the signal escapes the container to a parent container.
> 	  (if the parent of the sibling exits, the container-init then becomes
> 	  the reaper of the sibling).

Yes.

> To keep pid namespace semantics simple, prevent container-inits from using
> CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
> and pid-namespace interactions.

The only argument that I can see that carries any weight is that unix
semantics fundamentally assume a process tree.  Allowing init to use
CLONE_PARENT creates a multi-rooted process tree.

At which point the is_global_init check is foolish.

Eric


> Untested, RFC patch :-)
>
> Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/fork.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
>
> Index: linux-mmotm/kernel/fork.c
> ===================================================================
> --- linux-mmotm.orig/kernel/fork.c	2009-06-17 18:23:23.000000000 -0700
> +++ linux-mmotm/kernel/fork.c	2009-06-17 19:17:54.000000000 -0700
> @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
>  	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
>  		return ERR_PTR(-EINVAL);
>  
> +	/*
> +	 * To keep pid namespace semantics simple, prevent container-inits
> +	 * from creating siblings.
> +	 */
> +	if ((clone_flags & CLONE_PARENT) &&
> +			is_container_init(current) && !is_global_init(current))
> +		return ERR_PTR(-EINVAL);
> +
>  	retval = security_task_create(clone_flags);
>  	if (retval)
>  		goto fork_out;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 1/2] Deny CLONE_PARENT|CLONE_NEWPID combination
       [not found]     ` <20090618024934.GA31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-06-18  3:26       ` Roland McGrath
  2009-06-18  3:30       ` Eric W. Biederman
  1 sibling, 0 replies; 13+ messages in thread
From: Roland McGrath @ 2009-06-18  3:26 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, David C. Hansen, Oleg Nesterov, Eric W. Biederman,
	Alexey Dobriyan, Pavel Emelyanov

NPTL has never used CLONE_PARENT, so this is fine.

Acked-by: Roland McGrath <roland-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


Thanks,
Roland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
       [not found]     ` <20090618025103.GB31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-18  3:20       ` Eric W. Biederman
@ 2009-06-18  3:28       ` Roland McGrath
  2009-06-18 15:35       ` Oleg Nesterov
  2 siblings, 0 replies; 13+ messages in thread
From: Roland McGrath @ 2009-06-18  3:28 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, David C. Hansen, Oleg Nesterov, Eric W. Biederman,
	Alexey Dobriyan, Pavel Emelyanov

Looks sensible to me.

Acked-by: Roland McGrath <roland-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


Thanks,
Roland

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 1/2] Deny CLONE_PARENT|CLONE_NEWPID combination
       [not found]     ` <20090618024934.GA31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-18  3:26       ` Roland McGrath
@ 2009-06-18  3:30       ` Eric W. Biederman
       [not found]         ` <m1d492jk0k.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
  1 sibling, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2009-06-18  3:30 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, David C. Hansen, Oleg Nesterov, Alexey Dobriyan,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov

Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:

> Deny CLONE_PARENT|CLONE_NEWPID combination.
>
> CLONE_PARENT was probably used to implement an older threading model.

Yes it was.

> If so, for consistency with CLONE_THREAD, the CLONE_PARENT|CLONE_NEWPID
> combination should also fail with -EINVAL.

CLONE_THREAD can not work with CLONE_NEWPID because the processes share
a signal queue.

I can see a similar argument going for CLONE_SIGHAND even though there is not
as much sharing there.  I don't see how CLONE_PARENT could cause any harm.
Without CLONE_SIGHAND.

Eric

> Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/pid_namespace.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-mmotm/kernel/pid_namespace.c
> ===================================================================
> --- linux-mmotm.orig/kernel/pid_namespace.c	2009-06-17 18:19:42.000000000 -0700
> +++ linux-mmotm/kernel/pid_namespace.c	2009-06-17 18:19:58.000000000 -0700
> @@ -118,7 +118,7 @@ struct pid_namespace *copy_pid_ns(unsign
>  {
>  	if (!(flags & CLONE_NEWPID))
>  		return get_pid_ns(old_ns);
> -	if (flags & CLONE_THREAD)
> +	if (flags & (CLONE_THREAD|CLONE_PARENT))
>  		return ERR_PTR(-EINVAL);
>  	return create_pid_namespace(old_ns);
>  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
       [not found]     ` <20090618025103.GB31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2009-06-18  3:20       ` Eric W. Biederman
  2009-06-18  3:28       ` Roland McGrath
@ 2009-06-18 15:35       ` Oleg Nesterov
       [not found]         ` <20090618153501.GA6404-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2009-06-18 15:35 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, David C. Hansen, Eric W. Biederman, Alexey Dobriyan,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov

On 06/17, Sukadev Bhattiprolu wrote:
>
> Prevent container-inits from using CLONE_PARENT
>
> If a container-init creates a sibling (using CLONE_PARENT), pid namespace
> semantics become complicated:
>
> 	- the "active pid namespace" of the sibling will be the descendant
> 	  container, but its not obvious if that is correct.
>
> 	- if container-init exits, it will terminate the sibling, but again
> 	  its not clear if that is the correct behavior.
>
> 	- the sibling exists in both parent and child containers while current
> 	  pid namespace semantics assume that only container-init can exist
> 	  in both parent/child containers.
>
> 	- the parent of the sibling is not a descendant of container-init
> 	  (while pid namespaces assume that all processes in the container
> 	  are descendants of the container-init)

I agree, this all a bit strange and perhaps should be fixed. But afaics,
nothing bad can happen? I mean, if the sub-namespace does stupid things
it can't do a harm to the parent namespace? Or I missed something?

> 	- When the sibling dies, the SIGCHLD is sent to its parent (if
> 	  alive), i.e the signal escapes the container to a parent container.

The same if container-init exits, we send SIGCHLD up. But yes, I agree,
this is a bit strange.

> 	  (if the parent of the sibling exits, the container-init then becomes
> 	  the reaper of the sibling).

Again, strange but harmless.

> To keep pid namespace semantics simple, prevent container-inits from using
> CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
> and pid-namespace interactions.

Yes, perhaps makes sense.

> --- linux-mmotm.orig/kernel/fork.c	2009-06-17 18:23:23.000000000 -0700
> +++ linux-mmotm/kernel/fork.c	2009-06-17 19:17:54.000000000 -0700
> @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
>  	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
>  		return ERR_PTR(-EINVAL);
>
> +	/*
> +	 * To keep pid namespace semantics simple, prevent container-inits
> +	 * from creating siblings.
> +	 */
> +	if ((clone_flags & CLONE_PARENT) &&
> +			is_container_init(current) && !is_global_init(current))

Both is_ checks are not right afaics. There are per-thread. This means
that container-init can do clone(CLONE_THREAD), and then this thread
does CLONE_PARENT and fools copy_process().

As for !is_global_init(). I never understood what should we do if the
global init does CLONE_PARENT, this attaches another process to swapper,
not good.

Oleg.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
  2009-06-18 15:35       ` Oleg Nesterov
@ 2009-06-18 15:42             ` Oleg Nesterov
  0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2009-06-18 15:42 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Containers, David C. Hansen, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	Eric W. Biederman, Alexey Dobriyan, roland-H+wXaHxf7aLQT0dZR+AlfA,
	Pavel Emelyanov

On 06/18, Oleg Nesterov wrote:
>
> On 06/17, Sukadev Bhattiprolu wrote:
> >
> > @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
> >  	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
> >  		return ERR_PTR(-EINVAL);
> >
> > +	/*
> > +	 * To keep pid namespace semantics simple, prevent container-inits
> > +	 * from creating siblings.
> > +	 */
> > +	if ((clone_flags & CLONE_PARENT) &&
> > +			is_container_init(current) && !is_global_init(current))
>
> Both is_ checks are not right afaics. There are per-thread. This means
> that container-init can do clone(CLONE_THREAD), and then this thread
> does CLONE_PARENT and fools copy_process().
>
> As for !is_global_init(). I never understood what should we do if the
> global init does CLONE_PARENT, this attaches another process to swapper,
> not good.

Hmm. And idle threads run with ->action[SIGHLD] == SIG_DFL, so this is
really wrong. Fortunately, we can trust the global init.

Oleg.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
@ 2009-06-18 15:42             ` Oleg Nesterov
  0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2009-06-18 15:42 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: serue, Eric W. Biederman, roland, Pavel Emelyanov,
	Alexey Dobriyan, Oren Laadan, David C. Hansen, Containers,
	linux-kernel

On 06/18, Oleg Nesterov wrote:
>
> On 06/17, Sukadev Bhattiprolu wrote:
> >
> > @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
> >  	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
> >  		return ERR_PTR(-EINVAL);
> >
> > +	/*
> > +	 * To keep pid namespace semantics simple, prevent container-inits
> > +	 * from creating siblings.
> > +	 */
> > +	if ((clone_flags & CLONE_PARENT) &&
> > +			is_container_init(current) && !is_global_init(current))
>
> Both is_ checks are not right afaics. There are per-thread. This means
> that container-init can do clone(CLONE_THREAD), and then this thread
> does CLONE_PARENT and fools copy_process().
>
> As for !is_global_init(). I never understood what should we do if the
> global init does CLONE_PARENT, this attaches another process to swapper,
> not good.

Hmm. And idle threads run with ->action[SIGHLD] == SIG_DFL, so this is
really wrong. Fortunately, we can trust the global init.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 1/2] Deny CLONE_PARENT|CLONE_NEWPID combination
       [not found]         ` <m1d492jk0k.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-06-18 22:28           ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2009-06-18 22:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Containers, David C. Hansen, Oleg Nesterov, Alexey Dobriyan,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov

Eric W. Biederman [ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org] wrote:
| Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
| 
| > Deny CLONE_PARENT|CLONE_NEWPID combination.
| >
| > CLONE_PARENT was probably used to implement an older threading model.
| 
| Yes it was.
| 
| > If so, for consistency with CLONE_THREAD, the CLONE_PARENT|CLONE_NEWPID
| > combination should also fail with -EINVAL.
| 
| CLONE_THREAD can not work with CLONE_NEWPID because the processes share
| a signal queue.
| 
| I can see a similar argument going for CLONE_SIGHAND even though there is not
| as much sharing there.  I don't see how CLONE_PARENT could cause any harm.
| Without CLONE_SIGHAND.

It does not cause any harm. Only reason to disable CLONE_PARENT, at least for
now, is the confusing semantics (from users pov) and the process-tree model
and the usefulness (if CLONE_PARENT is only used in old threading model, the
needs of such an application acting as container-init is not clear).

Should we disable CLONE_SIGHAND in addition to CLONE_PARENT or just
CLONE_SIGHAND ?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
       [not found]         ` <m18wjqkz2i.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
@ 2009-06-18 22:40           ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2009-06-18 22:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Containers, David C. Hansen, Oleg Nesterov, Alexey Dobriyan,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov

Eric W. Biederman [ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org] wrote:
| Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> writes:
| 
| > Prevent container-inits from using CLONE_PARENT
| >
| > If a container-init creates a sibling (using CLONE_PARENT), pid namespace
| > semantics become complicated:
| >
| > 	- the "active pid namespace" of the sibling will be the descendant
| > 	  container, but its not obvious if that is correct.
| 
| It is correct the sibling must not change pid namespaces.  You are not
| allowed to escape out of a pid namespace.
| 
| > 	- if container-init exits, it will terminate the sibling, but again
| > 	  its not clear if that is the correct behavior.
| 
| Again correct because the container-init is the child reaper for the pid namespace.
| No reaper no namespace.
| 
| > 	- the sibling exists in both parent and child containers while current
| > 	  pid namespace semantics assume that only container-init can exist
| > 	  in both parent/child containers.
| 
| All tasks in the container also exist in the parent container.
| What assumption are you talking about?

You are right, thats not really different for CLONE_PARENT.

| 
| > 	- the parent of the sibling is not a descendant of container-init
| > 	  (while pid namespaces assume that all processes in the container
| > 	  are descendants of the container-init)
| 
| User space assumes that certainly.    What part of the pid namespace
| code makes such an assumption?

I was referring only to user-space view.

| 
| > 	- When the sibling dies, the SIGCHLD is sent to its parent (if
| > 	  alive), i.e the signal escapes the container to a parent container.
| > 	  (if the parent of the sibling exits, the container-init then becomes
| > 	  the reaper of the sibling).
| 
| Yes.
| 
| > To keep pid namespace semantics simple, prevent container-inits from using
| > CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
| > and pid-namespace interactions.
| 
| The only argument that I can see that carries any weight is that unix
| semantics fundamentally assume a process tree.  Allowing init to use
| CLONE_PARENT creates a multi-rooted process tree.

Right.

| 
| At which point the is_global_init check is foolish.

Well, I was trying to disable CLONE_PARENT just with pid namespaces,
Disabling CLONE_PARENT for global init seemed independent of namespaces
and there was recent talk of potential users of CLONE_PARENT so I am
not sure if there is an init that uses the old threading model !

I don't have convincing reason besides "lets enable when uses/semanitcs
for CLONE_PARENT with pid namespaces are clear".




| 
| Eric
| 
| 
| > Untested, RFC patch :-)
| >
| > Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
| > ---
| >  kernel/fork.c |    8 ++++++++
| >  1 file changed, 8 insertions(+)
| >
| > Index: linux-mmotm/kernel/fork.c
| > ===================================================================
| > --- linux-mmotm.orig/kernel/fork.c	2009-06-17 18:23:23.000000000 -0700
| > +++ linux-mmotm/kernel/fork.c	2009-06-17 19:17:54.000000000 -0700
| > @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
| >  	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
| >  		return ERR_PTR(-EINVAL);
| >  
| > +	/*
| > +	 * To keep pid namespace semantics simple, prevent container-inits
| > +	 * from creating siblings.
| > +	 */
| > +	if ((clone_flags & CLONE_PARENT) &&
| > +			is_container_init(current) && !is_global_init(current))
| > +		return ERR_PTR(-EINVAL);
| > +
| >  	retval = security_task_create(clone_flags);
| >  	if (retval)
| >  		goto fork_out;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT
       [not found]         ` <20090618153501.GA6404-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2009-06-18 15:42             ` Oleg Nesterov
@ 2009-06-18 22:52           ` Sukadev Bhattiprolu
  1 sibling, 0 replies; 13+ messages in thread
From: Sukadev Bhattiprolu @ 2009-06-18 22:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Containers, David C. Hansen, Eric W. Biederman, Alexey Dobriyan,
	roland-H+wXaHxf7aLQT0dZR+AlfA, Pavel Emelyanov

| Again, strange but harmless.

Right nothing broken, just trying to disable the strangeness till usage
becomes clear.

| 
| > To keep pid namespace semantics simple, prevent container-inits from using
| > CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
| > and pid-namespace interactions.
| 
| Yes, perhaps makes sense.
| 
| > --- linux-mmotm.orig/kernel/fork.c	2009-06-17 18:23:23.000000000 -0700
| > +++ linux-mmotm/kernel/fork.c	2009-06-17 19:17:54.000000000 -0700
| > @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
| >  	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
| >  		return ERR_PTR(-EINVAL);
| >
| > +	/*
| > +	 * To keep pid namespace semantics simple, prevent container-inits
| > +	 * from creating siblings.
| > +	 */
| > +	if ((clone_flags & CLONE_PARENT) &&
| > +			is_container_init(current) && !is_global_init(current))
| 
| Both is_ checks are not right afaics. There are per-thread. This means
| that container-init can do clone(CLONE_THREAD), and then this thread
| does CLONE_PARENT and fools copy_process().

Good point. Should check the tgid.
| 
| As for !is_global_init(). I never understood what should we do if the
| global init does CLONE_PARENT, this attaches another process to swapper,
| not good.

Agree, like I replied to Eric, I just was not sure there were any
existing users of CLONE_PARENT :-(

I could make a separate patch in this set to just disable CLONE_PARENT for
global init.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-06-18 22:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-18  2:47 [RFC][PATCH 0/2] CLONE_PARENT and pid namespaces Sukadev Bhattiprolu
     [not found] ` <20090618024743.GA31515-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-18  2:49   ` [RFC][PATCH 1/2] Deny CLONE_PARENT|CLONE_NEWPID combination Sukadev Bhattiprolu
     [not found]     ` <20090618024934.GA31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-18  3:26       ` Roland McGrath
2009-06-18  3:30       ` Eric W. Biederman
     [not found]         ` <m1d492jk0k.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-06-18 22:28           ` Sukadev Bhattiprolu
2009-06-18  2:51   ` [RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT Sukadev Bhattiprolu
     [not found]     ` <20090618025103.GB31672-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-06-18  3:20       ` Eric W. Biederman
     [not found]         ` <m18wjqkz2i.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2009-06-18 22:40           ` Sukadev Bhattiprolu
2009-06-18  3:28       ` Roland McGrath
2009-06-18 15:35       ` Oleg Nesterov
     [not found]         ` <20090618153501.GA6404-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-06-18 15:42           ` Oleg Nesterov
2009-06-18 15:42             ` Oleg Nesterov
2009-06-18 22:52           ` Sukadev Bhattiprolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.