All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Allow signalling container-init
@ 2007-08-08 23:47 sukadev-r/Jw6+rmf7HQT0dZR+AlfA
       [not found] ` <20070808234737.GA18334-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-08-08 23:47 UTC (permalink / raw)
  To: Pavel Emelianov; +Cc: Containers, Oleg Nesterov

Pavel,

Should we include this in the patchset ?

Sukadev
---

From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Subject: [PATCH] Allow signalling container-init

Only the global-init process must be special - any other container-init
process must be killable to prevent run-away processes in the system.

TODO: 	Ideally we should allow killing the container-init only from
	ancestor containers and prevent it being killed from that or
	descendant containers.  But that is a more complex change and
	will be addressed by a follow-on patch. For now allow the
	container-init to be terminated by any process with sufficient
	privileges.

Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 kernel/signal.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Index: lx26-23-rc1-mm1/kernel/signal.c
===================================================================
--- lx26-23-rc1-mm1.orig/kernel/signal.c	2007-08-07 13:52:12.000000000 -0700
+++ lx26-23-rc1-mm1/kernel/signal.c	2007-08-08 15:09:27.000000000 -0700
@@ -1861,11 +1861,9 @@ relock:
 			continue;
 
 		/*
-		 * Init of a pid space gets no signals it doesn't want from
-		 * within that pid space. It can of course get signals from
-		 * its parent pid space.
+		 * Global init gets no signals it doesn't want.
 		 */
-		if (current == task_child_reaper(current))
+		if (is_global_init(current->group_leader))
 			continue;
 
 		if (sig_kernel_stop(signr)) {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow signalling container-init
       [not found] ` <20070808234737.GA18334-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2007-08-09  0:02   ` Oleg Nesterov
       [not found]     ` <20070809000234.GA967-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
  2007-08-09  0:46   ` [Devel] " Daniel Pittman
  1 sibling, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2007-08-09  0:02 UTC (permalink / raw)
  To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Containers, Pavel Emelianov

On 08/08, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> 
> From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> Subject: [PATCH] Allow signalling container-init
> 
> Only the global-init process must be special - any other container-init
> process must be killable to prevent run-away processes in the system.

I think you are right, but....

> --- lx26-23-rc1-mm1.orig/kernel/signal.c	2007-08-07 13:52:12.000000000 -0700
> +++ lx26-23-rc1-mm1/kernel/signal.c	2007-08-08 15:09:27.000000000 -0700
> @@ -1861,11 +1861,9 @@ relock:
>  			continue;
>  
>  		/*
> -		 * Init of a pid space gets no signals it doesn't want from
> -		 * within that pid space. It can of course get signals from
> -		 * its parent pid space.
> +		 * Global init gets no signals it doesn't want.
>  		 */
> -		if (current == task_child_reaper(current))
> +		if (is_global_init(current->group_leader))
>  			continue;

...this breaks exec() from /sbin/init. Note that de_thread() kills other
sub-threads with SIGKILL. With this patch de_thread() will hang waiting
for other threads to die.

I think it is better to not change the current behaviour which is not
perfect (buggy), until we actually protect /sbin/init from unwanted
signals.

(That said, I am not sure what behaviour is better (worse :), with or
 without this patch)

Oleg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Devel] [PATCH] Allow signalling container-init
       [not found] ` <20070808234737.GA18334-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2007-08-09  0:02   ` Oleg Nesterov
@ 2007-08-09  0:46   ` Daniel Pittman
       [not found]     ` <87vebph6vq.fsf-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org>
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Pittman @ 2007-08-09  0:46 UTC (permalink / raw)
  To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Containers, Oleg Nesterov, Pavel Emelianov

sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org writes:

> Should we include this in the patchset ?

[...]

> Only the global-init process must be special - any other
> container-init process must be killable to prevent run-away processes
> in the system.

One problem I hit while using OpenVZ is that some init processes --
notable upstart used by Ubuntu -- depend on the special signal processing
extended to init by the kernel.

The problem here was that a signal the kernel would otherwise have
blocked was sent to upstart, the default handler was invoked and init
was terminated.

> TODO:	Ideally we should allow killing the container-init only from
> 	ancestor containers and prevent it being killed from that or
> 	descendant containers.  But that is a more complex change and
> 	will be addressed by a follow-on patch. For now allow the
> 	container-init to be terminated by any process with sufficient
> 	privileges.

This will break, as far as I can see, by allowing the container root to
send signals to init that it doesn't expect.

Regards,
        Daniel
-- 
Digital Infrastructure Solutions -- making IT simple, stable and secure
Phone: 0401 155 707        email: contact-gyMb1R/nBgM33TBCqt261WVqPpYm49HuKQEueVp/e6I@public.gmane.org
                 http://digital-infrastructure.com.au/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Devel] [PATCH] Allow signalling container-init
       [not found]     ` <87vebph6vq.fsf-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org>
@ 2007-08-09  1:21       ` Serge E. Hallyn
       [not found]         ` <20070809012128.GA16391-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
  2007-08-09  8:16       ` Kirill Korotaev
  1 sibling, 1 reply; 12+ messages in thread
From: Serge E. Hallyn @ 2007-08-09  1:21 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: Containers, Oleg Nesterov, Pavel Emelianov

Quoting Daniel Pittman (daniel-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org):
> sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org writes:
> 
> > Should we include this in the patchset ?
> 
> [...]
> 
> > Only the global-init process must be special - any other
> > container-init process must be killable to prevent run-away processes
> > in the system.
> 
> One problem I hit while using OpenVZ is that some init processes --
> notable upstart used by Ubuntu -- depend on the special signal processing
> extended to init by the kernel.
> 
> The problem here was that a signal the kernel would otherwise have
> blocked was sent to upstart, the default handler was invoked and init
> was terminated.
> 
> > TODO:	Ideally we should allow killing the container-init only from
> > 	ancestor containers and prevent it being killed from that or
> > 	descendant containers.  But that is a more complex change and
> > 	will be addressed by a follow-on patch. For now allow the
> > 	container-init to be terminated by any process with sufficient
> > 	privileges.
> 
> This will break, as far as I can see, by allowing the container root to
> send signals to init that it doesn't expect.

Yes, in the end what we want is for a container init to receive

	1. all signals from a (authorized) process in a parent
	   pid namespace.
	2. for signals sent from inside it's pid namespace, only
	   exactly those signals for which it has installed a
	   custom signal handler, no others.

In other words to a process in an ancestor pid namespace, the init of a
container is like any other process.  To a process inside the namespace
for which it is init, it is as /sbin/init is to the system now.

Actually achieving that without affecting performance for all signalers
is nontrivial.  The current patchset is complex enough that I'd like to
see us settle on non-optimal semantics for now, and once these patches
have settled implement the ideal signaling.

-serge

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Devel] [PATCH] Allow signalling container-init
       [not found]         ` <20070809012128.GA16391-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
@ 2007-08-09  1:29           ` Daniel Pittman
       [not found]             ` <87myx1h4wt.fsf-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Pittman @ 2007-08-09  1:29 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Containers, Oleg Nesterov, Pavel Emelianov

"Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
> Quoting Daniel Pittman (daniel-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org):
>> sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org writes:

[...]

>> > TODO:	Ideally we should allow killing the container-init only from
>> > 	ancestor containers and prevent it being killed from that or
>> > 	descendant containers.  But that is a more complex change and
>> > 	will be addressed by a follow-on patch. For now allow the
>> > 	container-init to be terminated by any process with sufficient
>> > 	privileges.
>> 
>> This will break, as far as I can see, by allowing the container root to
>> send signals to init that it doesn't expect.
>
> Yes, in the end what we want is for a container init to receive
>
> 	1. all signals from a (authorized) process in a parent
> 	   pid namespace.
> 	2. for signals sent from inside it's pid namespace, only
> 	   exactly those signals for which it has installed a
> 	   custom signal handler, no others.
>
> In other words to a process in an ancestor pid namespace, the init of a
> container is like any other process.  To a process inside the namespace
> for which it is init, it is as /sbin/init is to the system now.

That makes sense.

> Actually achieving that without affecting performance for all
> signalers is nontrivial.  The current patchset is complex enough that
> I'd like to see us settle on non-optimal semantics for now, and once
> these patches have settled implement the ideal signaling.

I appreciate that.  I figured to make you aware that this will make it
impossible to run upstart and, probably, other versions of init in your
container as expected.

Since this was a somewhat subtle bug to track down it is, I think, work
documenting so that people trying to use this code are aware of the
limitation.

Regards,
        Daniel
-- 
Digital Infrastructure Solutions -- making IT simple, stable and secure
Phone: 0401 155 707        email: contact-gyMb1R/nBgM33TBCqt261WVqPpYm49HuKQEueVp/e6I@public.gmane.org
                 http://digital-infrastructure.com.au/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow signalling container-init
       [not found]     ` <20070809000234.GA967-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
@ 2007-08-09  7:29       ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
       [not found]         ` <20070809072933.GD23175-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-08-09  7:29 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Containers, Pavel Emelianov

Oleg Nesterov [oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org] wrote:
| On 08/08, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
| > 
| > From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
| > Subject: [PATCH] Allow signalling container-init
| > 
| > Only the global-init process must be special - any other container-init
| > process must be killable to prevent run-away processes in the system.
| 
| I think you are right, but....
| 
| > --- lx26-23-rc1-mm1.orig/kernel/signal.c	2007-08-07 13:52:12.000000000 -0700
| > +++ lx26-23-rc1-mm1/kernel/signal.c	2007-08-08 15:09:27.000000000 -0700
| > @@ -1861,11 +1861,9 @@ relock:
| >  			continue;
| >  
| >  		/*
| > -		 * Init of a pid space gets no signals it doesn't want from
| > -		 * within that pid space. It can of course get signals from
| > -		 * its parent pid space.
| > +		 * Global init gets no signals it doesn't want.
| >  		 */
| > -		if (current == task_child_reaper(current))
| > +		if (is_global_init(current->group_leader))
| >  			continue;
| 
| ...this breaks exec() from /sbin/init. Note that de_thread() kills other
| sub-threads with SIGKILL. With this patch de_thread() will hang waiting
| for other threads to die.

Again for threaded-init I guess :-(

Well, we discussed last week about allowing non-root users to clone their
pid namespace. The user can then create a container-init and this
process would become immune to signal even by a root user ?

| 
| I think it is better to not change the current behaviour which is not
| perfect (buggy), until we actually protect /sbin/init from unwanted
| signals.

Can we preserve the existing behavior by checking only the main thread
of global init (i.e pass in 'current' rather than 'current->group_leader'
to is_global_init()) ?

| 
| (That said, I am not sure what behaviour is better (worse :), with or
|  without this patch)
| 
| Oleg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow signalling container-init
       [not found]         ` <20070809072933.GD23175-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2007-08-09  7:55           ` Oleg Nesterov
       [not found]             ` <20070809075535.GA115-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2007-08-09  7:55 UTC (permalink / raw)
  To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Containers, Pavel Emelianov

On 08/09, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
>
> Oleg Nesterov [oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org] wrote:
> | On 08/08, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
> | > 
> | > From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> | > Subject: [PATCH] Allow signalling container-init
> | > 
> | > Only the global-init process must be special - any other container-init
> | > process must be killable to prevent run-away processes in the system.
> | 
> | I think you are right, but....
> | 
> | > --- lx26-23-rc1-mm1.orig/kernel/signal.c	2007-08-07 13:52:12.000000000 -0700
> | > +++ lx26-23-rc1-mm1/kernel/signal.c	2007-08-08 15:09:27.000000000 -0700
> | > @@ -1861,11 +1861,9 @@ relock:
> | >  			continue;
> | >  
> | >  		/*
> | > -		 * Init of a pid space gets no signals it doesn't want from
> | > -		 * within that pid space. It can of course get signals from
> | > -		 * its parent pid space.
> | > +		 * Global init gets no signals it doesn't want.
> | >  		 */
> | > -		if (current == task_child_reaper(current))
> | > +		if (is_global_init(current->group_leader))
> | >  			continue;
> | 
> | ...this breaks exec() from /sbin/init. Note that de_thread() kills other
> | sub-threads with SIGKILL. With this patch de_thread() will hang waiting
> | for other threads to die.
> 
> Again for threaded-init I guess :-(
> 
> Well, we discussed last week about allowing non-root users to clone their
> pid namespace. The user can then create a container-init and this
> process would become immune to signal even by a root user ?

please see below,

> | 
> | I think it is better to not change the current behaviour which is not
> | perfect (buggy), until we actually protect /sbin/init from unwanted
> | signals.
> 
> Can we preserve the existing behavior by checking only the main thread
> of global init (i.e pass in 'current' rather than 'current->group_leader'
> to is_global_init()) ?

Yes, this is what I meant, this is what we have in Linus's tree.
This way a container-init could be killed. This all is not correct,
but we shouldn't replace one bug with another.

Oleg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Devel] [PATCH] Allow signalling container-init
       [not found]     ` <87vebph6vq.fsf-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org>
  2007-08-09  1:21       ` Serge E. Hallyn
@ 2007-08-09  8:16       ` Kirill Korotaev
  1 sibling, 0 replies; 12+ messages in thread
From: Kirill Korotaev @ 2007-08-09  8:16 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: Containers, Oleg Nesterov

Daniel Pittman wrote:
> sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org writes:
> 
> 
>>Should we include this in the patchset ?
> 
> 
> [...]
> 
> 
>>Only the global-init process must be special - any other
>>container-init process must be killable to prevent run-away processes
>>in the system.
> 
> 
> One problem I hit while using OpenVZ is that some init processes --
> notable upstart used by Ubuntu -- depend on the special signal processing
> extended to init by the kernel.
> 
> The problem here was that a signal the kernel would otherwise have
> blocked was sent to upstart, the default handler was invoked and init
> was terminated.
> 
> 
>>TODO:	Ideally we should allow killing the container-init only from
>>	ancestor containers and prevent it being killed from that or
>>	descendant containers.  But that is a more complex change and
>>	will be addressed by a follow-on patch. For now allow the
>>	container-init to be terminated by any process with sufficient
>>	privileges.
> 
> 
> This will break, as far as I can see, by allowing the container root to
> send signals to init that it doesn't expect.

This was fixed in OpenVZ in recent kernels and Pavel tried to address this in
pid namespace patches as well, but since no beatiful solution was found
it was decided to postpone this issue.

NOTE: parent can still send signals to child container' init. This
   is convinient since you can terminate the whole container quickly from
   the host node.

Thanks,
Kirill

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow signalling container-init
       [not found]             ` <20070809075535.GA115-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
@ 2007-08-09 10:47               ` Pavel Emelyanov
       [not found]                 ` <46BAF0CB.2070202-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Emelyanov @ 2007-08-09 10:47 UTC (permalink / raw)
  To: Oleg Nesterov, sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Containers

Oleg Nesterov wrote:
> On 08/09, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
>> Oleg Nesterov [oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org] wrote:
>> | On 08/08, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
>> | > 
>> | > From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>> | > Subject: [PATCH] Allow signalling container-init
>> | > 
>> | > Only the global-init process must be special - any other container-init
>> | > process must be killable to prevent run-away processes in the system.
>> | 
>> | I think you are right, but....
>> | 
>> | > --- lx26-23-rc1-mm1.orig/kernel/signal.c	2007-08-07 13:52:12.000000000 -0700
>> | > +++ lx26-23-rc1-mm1/kernel/signal.c	2007-08-08 15:09:27.000000000 -0700
>> | > @@ -1861,11 +1861,9 @@ relock:
>> | >  			continue;
>> | >  
>> | >  		/*
>> | > -		 * Init of a pid space gets no signals it doesn't want from
>> | > -		 * within that pid space. It can of course get signals from
>> | > -		 * its parent pid space.
>> | > +		 * Global init gets no signals it doesn't want.
>> | >  		 */
>> | > -		if (current == task_child_reaper(current))
>> | > +		if (is_global_init(current->group_leader))
>> | >  			continue;
>> | 
>> | ...this breaks exec() from /sbin/init. Note that de_thread() kills other
>> | sub-threads with SIGKILL. With this patch de_thread() will hang waiting
>> | for other threads to die.
>>
>> Again for threaded-init I guess :-(
>>
>> Well, we discussed last week about allowing non-root users to clone their
>> pid namespace. The user can then create a container-init and this
>> process would become immune to signal even by a root user ?
> 
> please see below,
> 
>> | 
>> | I think it is better to not change the current behaviour which is not
>> | perfect (buggy), until we actually protect /sbin/init from unwanted
>> | signals.
>>
>> Can we preserve the existing behavior by checking only the main thread
>> of global init (i.e pass in 'current' rather than 'current->group_leader'
>> to is_global_init()) ?
> 
> Yes, this is what I meant, this is what we have in Linus's tree.
> This way a container-init could be killed. This all is not correct,
> but we shouldn't replace one bug with another.

Well, I agree with Oleg. I think that we should keep the patches 
without the signal handling until this set is in (at least) -mm. 
init pid namespace will work without it as used to do, and we'll 
develop a better signal handling and fix existing BUGs.

I know that this creates a hole for creating unkillable process,
but since this is for root user only (CAP_SYS_ADMIN) this is OK.

> Oleg.

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Devel] [PATCH] Allow signalling container-init
       [not found]             ` <87myx1h4wt.fsf-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org>
@ 2007-08-09 14:42               ` Serge E. Hallyn
  0 siblings, 0 replies; 12+ messages in thread
From: Serge E. Hallyn @ 2007-08-09 14:42 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: Containers, Oleg Nesterov, Pavel Emelianov

Quoting Daniel Pittman (daniel-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org):
> "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> writes:
> > Quoting Daniel Pittman (daniel-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org):
> >> sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org writes:
> 
> [...]
> 
> >> > TODO:	Ideally we should allow killing the container-init only from
> >> > 	ancestor containers and prevent it being killed from that or
> >> > 	descendant containers.  But that is a more complex change and
> >> > 	will be addressed by a follow-on patch. For now allow the
> >> > 	container-init to be terminated by any process with sufficient
> >> > 	privileges.
> >> 
> >> This will break, as far as I can see, by allowing the container root to
> >> send signals to init that it doesn't expect.
> >
> > Yes, in the end what we want is for a container init to receive
> >
> > 	1. all signals from a (authorized) process in a parent
> > 	   pid namespace.
> > 	2. for signals sent from inside it's pid namespace, only
> > 	   exactly those signals for which it has installed a
> > 	   custom signal handler, no others.
> >
> > In other words to a process in an ancestor pid namespace, the init of a
> > container is like any other process.  To a process inside the namespace
> > for which it is init, it is as /sbin/init is to the system now.
> 
> That makes sense.
> 
> > Actually achieving that without affecting performance for all
> > signalers is nontrivial.  The current patchset is complex enough that
> > I'd like to see us settle on non-optimal semantics for now, and once
> > these patches have settled implement the ideal signaling.
> 
> I appreciate that.  I figured to make you aware that this will make it
> impossible to run upstart and, probably, other versions of init in your
> container as expected.
> 
> Since this was a somewhat subtle bug to track down it is, I think, work
> documenting so that people trying to use this code are aware of the
> limitation.

Agreed.  I do think it is documented in the code and in changelogs.
Maybe it's worth adding a Documentation/ file describing how to use the
pid namespaces, ideal semantics, and current shortcomings, for people
who want to use+test the feature rather than work with the kernel code.

-serge

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow signalling container-init
       [not found]                 ` <46BAF0CB.2070202-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
@ 2007-08-10  0:48                   ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
       [not found]                     ` <20070810004812.GB2850-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 12+ messages in thread
From: sukadev-r/Jw6+rmf7HQT0dZR+AlfA @ 2007-08-10  0:48 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: Containers, Oleg Nesterov

Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote:
| Oleg Nesterov wrote:
| >On 08/09, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
| >>Oleg Nesterov [oleg-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org] wrote:
| >>| On 08/08, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
| >>| > 
| >>| > From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
| >>| > Subject: [PATCH] Allow signalling container-init
| >>| > 
| >>| > Only the global-init process must be special - any other 
| >>container-init
| >>| > process must be killable to prevent run-away processes in the system.
| >>| 
| >>| I think you are right, but....
| >>| 
| >>| > --- lx26-23-rc1-mm1.orig/kernel/signal.c	2007-08-07 
| >>13:52:12.000000000 -0700
| >>| > +++ lx26-23-rc1-mm1/kernel/signal.c	2007-08-08 
| >>15:09:27.000000000 -0700
| >>| > @@ -1861,11 +1861,9 @@ relock:
| >>| >  			continue;
| >>| >  
| >>| >  		/*
| >>| > -		 * Init of a pid space gets no signals it doesn't 
| >>want from
| >>| > -		 * within that pid space. It can of course get 
| >>signals from
| >>| > -		 * its parent pid space.
| >>| > +		 * Global init gets no signals it doesn't want.
| >>| >  		 */
| >>| > -		if (current == task_child_reaper(current))
| >>| > +		if (is_global_init(current->group_leader))
| >>| >  			continue;
| >>| 
| >>| ...this breaks exec() from /sbin/init. Note that de_thread() kills other
| >>| sub-threads with SIGKILL. With this patch de_thread() will hang waiting
| >>| for other threads to die.
| >>
| >>Again for threaded-init I guess :-(
| >>
| >>Well, we discussed last week about allowing non-root users to clone their
| >>pid namespace. The user can then create a container-init and this
| >>process would become immune to signal even by a root user ?
| >
| >please see below,
| >
| >>| 
| >>| I think it is better to not change the current behaviour which is not
| >>| perfect (buggy), until we actually protect /sbin/init from unwanted
| >>| signals.
| >>
| >>Can we preserve the existing behavior by checking only the main thread
| >>of global init (i.e pass in 'current' rather than 'current->group_leader'
| >>to is_global_init()) ?
| >
| >Yes, this is what I meant, this is what we have in Linus's tree.
| >This way a container-init could be killed. This all is not correct,
| >but we shouldn't replace one bug with another.
| 
| Well, I agree with Oleg.  I think that we should keep the patches 
| without the signal handling until this set is in (at least) -mm. 
| init pid namespace will work without it as used to do, and we'll 
| develop a better signal handling and fix existing BUGs.
| 
| I know that this creates a hole for creating unkillable process,
| but since this is for root user only (CAP_SYS_ADMIN) this is OK.

But I think there is a difference bw what you are saying and what
Oleg is saying.

Oleg pls correct me if I am wrong, but from what I understand, we
just need modify my earlier fix so we can still terminate the
container from a parent namespace but preserve the existing behavior
w.r.t threaded-inits.

Here is the modified patch for this.

Suka
---
From: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Subject: [PATCH] Allow signalling container-init

Only the global-init process must be special - any other container-init
process must be killable to prevent run-away processes in the system.

TODO: 	Ideally we should allow killing the container-init only from parent
	container and prevent it being killed from within the container.
	But that is a more complex change and will be addressed by a follow-on
	patch. For now allow the container-init to be terminated by any process
	with sufficient privileges.

Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
---
 kernel/signal.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Index: lx26-23-rc1-mm1/kernel/signal.c
===================================================================
--- lx26-23-rc1-mm1.orig/kernel/signal.c	2007-08-07 13:52:12.000000000 -0700
+++ lx26-23-rc1-mm1/kernel/signal.c	2007-08-09 17:22:19.000000000 -0700
@@ -1861,11 +1861,9 @@ relock:
 			continue;
 
 		/*
-		 * Init of a pid space gets no signals it doesn't want from
-		 * within that pid space. It can of course get signals from
-		 * its parent pid space.
+		 * Global init gets no signals it doesn't want.
 		 */
-		if (current == task_child_reaper(current))
+		if (is_global_init(current))
 			continue;
 
 		if (sig_kernel_stop(signr)) {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Allow signalling container-init
       [not found]                     ` <20070810004812.GB2850-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2007-08-10 10:53                       ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2007-08-10 10:53 UTC (permalink / raw)
  To: sukadev-r/Jw6+rmf7HQT0dZR+AlfA; +Cc: Containers, Pavel Emelyanov

On 08/09, sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org wrote:
>
> Pavel Emelianov [xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org] wrote:
> | Oleg Nesterov wrote:
> | >>| 
> | >>| I think it is better to not change the current behaviour which is not
> | >>| perfect (buggy), until we actually protect /sbin/init from unwanted
> | >>| signals.
> | >>
> | >>Can we preserve the existing behavior by checking only the main thread
> | >>of global init (i.e pass in 'current' rather than 'current->group_leader'
> | >>to is_global_init()) ?
> | >
> | >Yes, this is what I meant, this is what we have in Linus's tree.
> | >This way a container-init could be killed. This all is not correct,
> | >but we shouldn't replace one bug with another.
> | 
> | Well, I agree with Oleg.  I think that we should keep the patches 
> | without the signal handling until this set is in (at least) -mm. 
> | init pid namespace will work without it as used to do, and we'll 
> | develop a better signal handling and fix existing BUGs.
> | 
> | I know that this creates a hole for creating unkillable process,
> | but since this is for root user only (CAP_SYS_ADMIN) this is OK.
> 
> But I think there is a difference bw what you are saying and what
> Oleg is saying.

Hm. It seems to me there is no difference, but perhaps I misunderstood
Pavel?

> Only the global-init process must be special - any other container-init
> process must be killable to prevent run-away processes in the system.
> 
> TODO: 	Ideally we should allow killing the container-init only from parent
> 	container and prevent it being killed from within the container.
> 	But that is a more complex change and will be addressed by a follow-on
> 	patch. For now allow the container-init to be terminated by any process
> 	with sufficient privileges.
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  kernel/signal.c |    6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> Index: lx26-23-rc1-mm1/kernel/signal.c
> ===================================================================
> --- lx26-23-rc1-mm1.orig/kernel/signal.c	2007-08-07 13:52:12.000000000 -0700
> +++ lx26-23-rc1-mm1/kernel/signal.c	2007-08-09 17:22:19.000000000 -0700
> @@ -1861,11 +1861,9 @@ relock:
>  			continue;
>  
>  		/*
> -		 * Init of a pid space gets no signals it doesn't want from
> -		 * within that pid space. It can of course get signals from
> -		 * its parent pid space.
> +		 * Global init gets no signals it doesn't want.
>  		 */
> -		if (current == task_child_reaper(current))
> +		if (is_global_init(current))
>  			continue;

Imho, this is a right change for now. The Linus's tree does the same:

		if (current == child_reaper(current))
			continue;

Because child_reaper() === init_pid_ns.child_reaper.

Oleg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-08-10 10:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-08 23:47 [PATCH] Allow signalling container-init sukadev-r/Jw6+rmf7HQT0dZR+AlfA
     [not found] ` <20070808234737.GA18334-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-08-09  0:02   ` Oleg Nesterov
     [not found]     ` <20070809000234.GA967-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
2007-08-09  7:29       ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
     [not found]         ` <20070809072933.GD23175-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-08-09  7:55           ` Oleg Nesterov
     [not found]             ` <20070809075535.GA115-6lXkIZvqkOAvJsYlp49lxw@public.gmane.org>
2007-08-09 10:47               ` Pavel Emelyanov
     [not found]                 ` <46BAF0CB.2070202-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-08-10  0:48                   ` sukadev-r/Jw6+rmf7HQT0dZR+AlfA
     [not found]                     ` <20070810004812.GB2850-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2007-08-10 10:53                       ` Oleg Nesterov
2007-08-09  0:46   ` [Devel] " Daniel Pittman
     [not found]     ` <87vebph6vq.fsf-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org>
2007-08-09  1:21       ` Serge E. Hallyn
     [not found]         ` <20070809012128.GA16391-6s5zFf/epYLPQpwDFJZrxKsjOiXwFzmk@public.gmane.org>
2007-08-09  1:29           ` Daniel Pittman
     [not found]             ` <87myx1h4wt.fsf-zvVxMF7wGoXk1uMJSBkQmQ@public.gmane.org>
2007-08-09 14:42               ` Serge E. Hallyn
2007-08-09  8:16       ` Kirill Korotaev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.