* [RFC][PATCH] fork: Don't special case CLONE_NEWPID for process or sessions
@ 2007-10-27 1:49 Eric W. Biederman
[not found] ` <m11wbhuy0z.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2007-10-27 1:49 UTC (permalink / raw)
To: Linux Containers; +Cc: Oleg Nesterov, Pavel Emelyanov
Given that the kernel supports sys_setsid we don't need a special case
in fork if we want to set: session == pgrp == pid.
The historical (although not 2.6) linux behavior has been to start the
init with session == pgrp == 0 which is effectively what removing this
special case will do.
Is there any reason why we want/need this special case in fork? Or
can we remove it and save some code, make copy_process easier to read
easier to maintain, and possibly a little faster?
I know it is a little weird belong to a process groups that isn't
visible in your pid namespace, but it there are no good reasons
why it shouldn't work.
I think making this change makes the interface more flexible,
and general.
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
kernel/fork.c | 18 +++++-------------
1 files changed, 5 insertions(+), 13 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index ddafdfa..b0de799 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1292,20 +1292,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
if (thread_group_leader(p)) {
if (clone_flags & CLONE_NEWPID) {
p->nsproxy->pid_ns->child_reaper = p;
- p->signal->tty = NULL;
- set_task_pgrp(p, p->pid);
- set_task_session(p, p->pid);
- attach_pid(p, PIDTYPE_PGID, pid);
- attach_pid(p, PIDTYPE_SID, pid);
- } else {
- p->signal->tty = current->signal->tty;
- set_task_pgrp(p, task_pgrp_nr(current));
- set_task_session(p, task_session_nr(current));
- attach_pid(p, PIDTYPE_PGID,
- task_pgrp(current));
- attach_pid(p, PIDTYPE_SID,
- task_session(current));
}
+ p->signal->tty = current->signal->tty;
+ set_task_pgrp(p, task_pgrp_nr(current));
+ set_task_session(p, task_session_nr(current));
+ attach_pid(p, PIDTYPE_PGID, task_pgrp(current));
+ attach_pid(p, PIDTYPE_SID, task_session(current));
list_add_tail_rcu(&p->tasks, &init_task.tasks);
__get_cpu_var(process_counts)++;
--
1.5.3.rc6.17.g1911
^ permalink raw reply related [flat|nested] 5+ messages in thread[parent not found: <m11wbhuy0z.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>]
* Re: [RFC][PATCH] fork: Don't special case CLONE_NEWPID for process or sessions [not found] ` <m11wbhuy0z.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org> @ 2007-11-01 10:28 ` Pavel Emelyanov [not found] ` <4729AA30.6080301-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Pavel Emelyanov @ 2007-11-01 10:28 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linux Containers, Oleg Nesterov Eric W. Biederman wrote: Sorry for the late answer, I have just noticed that I forgot to answer on this patch. > Given that the kernel supports sys_setsid we don't need a special case > in fork if we want to set: session == pgrp == pid. > > The historical (although not 2.6) linux behavior has been to start the > init with session == pgrp == 0 which is effectively what removing this > special case will do. Hm... I overlooked this fact. Looks like the namespace's init will have them set to 1. > Is there any reason why we want/need this special case in fork? Or Mainly to address the issue I describe below. > can we remove it and save some code, make copy_process easier to read > easier to maintain, and possibly a little faster? > > I know it is a little weird belong to a process groups that isn't > visible in your pid namespace, but it there are no good reasons > why it shouldn't work. This is not good to have such a situation as the init will have the ability to kill the tasks from the namespace he can't see, e.g. his parent and the processes in that group. > I think making this change makes the interface more flexible, > and general. > > Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> > --- > kernel/fork.c | 18 +++++------------- > 1 files changed, 5 insertions(+), 13 deletions(-) > > diff --git a/kernel/fork.c b/kernel/fork.c > index ddafdfa..b0de799 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1292,20 +1292,12 @@ static struct task_struct *copy_process(unsigned long clone_flags, > if (thread_group_leader(p)) { > if (clone_flags & CLONE_NEWPID) { > p->nsproxy->pid_ns->child_reaper = p; > - p->signal->tty = NULL; > - set_task_pgrp(p, p->pid); > - set_task_session(p, p->pid); > - attach_pid(p, PIDTYPE_PGID, pid); > - attach_pid(p, PIDTYPE_SID, pid); > - } else { > - p->signal->tty = current->signal->tty; > - set_task_pgrp(p, task_pgrp_nr(current)); > - set_task_session(p, task_session_nr(current)); > - attach_pid(p, PIDTYPE_PGID, > - task_pgrp(current)); > - attach_pid(p, PIDTYPE_SID, > - task_session(current)); > } > + p->signal->tty = current->signal->tty; > + set_task_pgrp(p, task_pgrp_nr(current)); > + set_task_session(p, task_session_nr(current)); > + attach_pid(p, PIDTYPE_PGID, task_pgrp(current)); > + attach_pid(p, PIDTYPE_SID, task_session(current)); > > list_add_tail_rcu(&p->tasks, &init_task.tasks); > __get_cpu_var(process_counts)++; ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <4729AA30.6080301-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [RFC][PATCH] fork: Don't special case CLONE_NEWPID for process or sessions [not found] ` <4729AA30.6080301-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2007-11-01 15:14 ` Eric W. Biederman [not found] ` <m1mytyf16m.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Eric W. Biederman @ 2007-11-01 15:14 UTC (permalink / raw) To: Pavel Emelyanov; +Cc: Linux Containers, Oleg Nesterov Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes: > Eric W. Biederman wrote: > > Sorry for the late answer, I have just noticed that I forgot to > answer on this patch. Thanks for answering. >> Given that the kernel supports sys_setsid we don't need a special case >> in fork if we want to set: session == pgrp == pid. >> >> The historical (although not 2.6) linux behavior has been to start the >> init with session == pgrp == 0 which is effectively what removing this >> special case will do. > > Hm... I overlooked this fact. Looks like the namespace's init will > have them set to 1. Yes. It is not a big difference as init can handle being exec'd by something else, thus is expected to be able to handle the case where setsid has already been called. So we are good but your current code makes it impossible to set tsk->signal->leader and become a proper session leader which is painful. >> can we remove it and save some code, make copy_process easier to read >> easier to maintain, and possibly a little faster? >> >> I know it is a little weird belong to a process groups that isn't >> visible in your pid namespace, but it there are no good reasons >> why it shouldn't work. > > This is not good to have such a situation as the init will have > the ability to kill the tasks from the namespace he can't see, > e.g. his parent and the processes in that group. Yes. sys_kill(0, SIGXXX) will allow this. As this is the main reason for this I don't see any reason to keep the current clone behavior. Sending signals to our process group and our parent is an ability that we allow even the most untrusted processes normally, and it is an ability we can easily remove simply by calling setsid. Not doing magic with the session and the process group allows init to properly become a session leader when setsid is called. Starting with a shared session and process group makes it more likely kernel implementors will look closely to ensure they handle strange cases like this properly and that developers using CLONE_NEWPID will look closely to ensure there are not other pid gotchas the need to deal with. Sharing the process group, session and controlling tty of our parent can be an advantage in small scenarios where using an existing controlling tty is an advantage. Think of a chroot build root or a chroot rpm install. Not letting processes escape and become deaemons is an advantage, but it really doesn't matter if they send signals to their parent. When isolation is important we do not want the ability to send signals to outside of the pid namespace. Currently except for the child death signal of init it appears that simply calling setsid is enough. So short of any other objections I think I will brush up this patch and send it along to Andrew. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <m1mytyf16m.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>]
* Re: [RFC][PATCH] fork: Don't special case CLONE_NEWPID for process or sessions [not found] ` <m1mytyf16m.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org> @ 2007-11-01 15:37 ` Pavel Emelyanov [not found] ` <4729F2CF.2080101-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Pavel Emelyanov @ 2007-11-01 15:37 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Linux Containers, Oleg Nesterov Eric W. Biederman wrote: > Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes: > >> Eric W. Biederman wrote: >> >> Sorry for the late answer, I have just noticed that I forgot to >> answer on this patch. > > Thanks for answering. > >>> Given that the kernel supports sys_setsid we don't need a special case >>> in fork if we want to set: session == pgrp == pid. >>> >>> The historical (although not 2.6) linux behavior has been to start the >>> init with session == pgrp == 0 which is effectively what removing this >>> special case will do. >> Hm... I overlooked this fact. Looks like the namespace's init will >> have them set to 1. > > Yes. It is not a big difference as init can handle being exec'd by > something else, thus is expected to be able to handle the case where > setsid has already been called. > > So we are good but your current code makes it impossible to set > tsk->signal->leader and become a proper session leader which is > painful. > >>> can we remove it and save some code, make copy_process easier to read >>> easier to maintain, and possibly a little faster? >>> >>> I know it is a little weird belong to a process groups that isn't >>> visible in your pid namespace, but it there are no good reasons >>> why it shouldn't work. >> This is not good to have such a situation as the init will have >> the ability to kill the tasks from the namespace he can't see, >> e.g. his parent and the processes in that group. > > Yes. sys_kill(0, SIGXXX) will allow this. > > As this is the main reason for this I don't see any reason to keep > the current clone behavior. Are you talking about keeping the ability to kill the outer processes? > Sending signals to our process group and our parent is an ability that > we allow even the most untrusted processes normally, and it is an > ability we can easily remove simply by calling setsid. You mix two things together - letting tasks send signals to their groups is good, but letting tasks send signals outside the namespace is bad. > Not doing magic with the session and the process group allows init > to properly become a session leader when setsid is called. > > Starting with a shared session and process group makes it more likely > kernel implementors will look closely to ensure they handle strange > cases like this properly and that developers using CLONE_NEWPID will > look closely to ensure there are not other pid gotchas the need to > deal with. > > Sharing the process group, session and controlling tty of our parent > can be an advantage in small scenarios where using an existing > controlling tty is an advantage. Think of a chroot build root or a > chroot rpm install. Not letting processes escape and become deaemons > is an advantage, but it really doesn't matter if they send signals to > their parent. Well, we allow a tiny possibility to have shared pids, but do we really want to support this possibility in the rest of the code? > When isolation is important we do not want the ability to send signals > to outside of the pid namespace. Currently except for the child death > signal of init it appears that simply calling setsid is enough. > > So short of any other objections I think I will brush up this patch and > send it along to Andrew. Hm... Could you please send it for pre-rfc before then? > Eric > ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <4729F2CF.2080101-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>]
* Re: [RFC][PATCH] fork: Don't special case CLONE_NEWPID for process or sessions [not found] ` <4729F2CF.2080101-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> @ 2007-11-01 17:03 ` Eric W. Biederman 0 siblings, 0 replies; 5+ messages in thread From: Eric W. Biederman @ 2007-11-01 17:03 UTC (permalink / raw) To: Pavel Emelyanov; +Cc: Linux Containers, Oleg Nesterov Pavel Emelyanov <xemul-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes: >> As this is the main reason for this I don't see any reason to keep >> the current clone behavior. > > Are you talking about keeping the ability to kill the outer processes? I am talking about keeping the session and pgrp from the outer pid namespace. Signals going out is less important. >> Sending signals to our process group and our parent is an ability that >> we allow even the most untrusted processes normally, and it is an >> ability we can easily remove simply by calling setsid. > > You mix two things together - letting tasks send signals to their > groups is good, but letting tasks send signals outside the namespace > is bad. If we have a case where we can send signals to the parent namespace that makes some setting si_pid more difficult. So on those grounds it is technically worth avoiding if we can. I don't see any problem sharing a session and a process group optionally with processes outside the pid namespace, and in fact I find that desirable. So we don't always have to run pid namespace leaders as daemons. To not be a daemon we need the session, pgrp, and tty from the outside. As for sending signals outside since setsid nicely closes that hole in the cases we want to avoid, I am ambivalent about whether we should look for a more robust solution to handling si_pid in which case technical objections go away or if we should disallow the sending altogether. If the code can readily handle the full general case Since si_pid is always task_pid(current) fixing the technical side may not be much of a problem. The way namespaces are defined sending signals outside the namespace is almost impossible (because the pid lookup fails). If we get past that sending signals is well defined and I don't have a problem with it. To support migration and strong virtual servers it is necessary that we close all of the holes where we can get access to objects outside of our namespace. In general that doesn't mean we should remove the possibility for other users. > Well, we allow a tiny possibility to have shared pids, but do we > really want to support this possibility in the rest of the code? There is essentially no cost if things are implemented properly. Just use struct pid and it works. So I don't see a reason to avoid it. Eric ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-11-01 17:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-27 1:49 [RFC][PATCH] fork: Don't special case CLONE_NEWPID for process or sessions Eric W. Biederman
[not found] ` <m11wbhuy0z.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-11-01 10:28 ` Pavel Emelyanov
[not found] ` <4729AA30.6080301-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-01 15:14 ` Eric W. Biederman
[not found] ` <m1mytyf16m.fsf-T1Yj925okcoyDheHMi7gv2pdwda3JcWeAL8bYrjMMd8@public.gmane.org>
2007-11-01 15:37 ` Pavel Emelyanov
[not found] ` <4729F2CF.2080101-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2007-11-01 17:03 ` Eric W. Biederman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.