From: ebiederm@xmission.com (Eric W. Biederman)
To: Pavel Emelyanov <xemul@parallels.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>,
Daniel Lezcano <daniel.lezcano@free.fr>,
Serge Hallyn <serue@us.ibm.com>,
Linux Netdev List <netdev@vger.kernel.org>,
containers@lists.linux-foundation.org,
Netfilter Development Mailinglist
<netfilter-devel@vger.kernel.org>,
Ben Greear <greearb@candelatech.com>
Subject: Re: [RFC][PATCH] ns: Syscalls for better namespace sharing control.
Date: Fri, 05 Mar 2010 12:26:30 -0800 [thread overview]
Message-ID: <m1vdda1pmx.fsf@fess.ebiederm.org> (raw)
In-Reply-To: <4B9158F5.5040205@parallels.com> (Pavel Emelyanov's message of "Fri\, 05 Mar 2010 22\:18\:13 +0300")
Pavel Emelyanov <xemul@parallels.com> writes:
>> 2 parallel enters? I meant you have pid 0 in the entered pid namespace.
>> You have pid 0 because your pid simply does not map.
>
> Oh, I see.
>
>> There is nothing that makes to parallel enters impossible in that.
>> Even today we have one thread per cpu that has task->pid == &init_struct_pid
>> which is pid 0.
>
> How about the forked processes then? Who will be their parent?
The normal rules of parentage apply. So the child will see simply
see it's parent as ppid == 0. If that child daemonizes it will become
a child of the pid namespaces init.
This is a lot like something that gets started from call_usermodehelper. It's
parent process is not a descendant of init either.
The implementation of the join is to simply change current->nsproxy->pid_ns.
Then to use it you simply fork to get a child in the target pid namespace.
>> For the case of unshare where we are designed to be used with PAM I don't
>> think my proposed semantics work. For a join needed an extra fork before
>> you are really in the pid namespace should be minor.
>
> Hm... One more proposal - can we adopt the planned new fork_with_pids system
> call to fork the process right into a new pid namespace?
In a lot of ways I like this idea of sys_hijack/sys_cloneat, and I
don't think anything I am doing fundamentally undermines it. The use
case of doing things in fork is that there is automatic inheritance of
everything. All of the namespaces and all of the control groups, and
possibly also the parent process. It does have the high cost that the
process we are copying from must be stopped because there are no locks
that let us take everything. I haven't looked at the recent proposals
to see if anyone has solved that problem cleanly.
If we can do a sys_hijack/sys_cloneat style of join, that means we can
afford a fork. At which point the my proposed pid namespace semantics
should be fine.
aka:
setns(NSTYPE_PID);
pid = fork();
if (pid == 0) {
getpid() == 2; /* Or whatever the first free pid is joined pid namespace */
getppid() == 0;
} else {
pid == 6400; /* Or whatever the first free pid is in the original pid namespace */
waitpid(pid);
}
>> That doesn't handle the case of cached struct pids. A good example is
>> waitpid, where it waits for a specific struct pid. Which means that
>> allocating a new struct pid and changing task->pid will cause
>> waitpid(pid) to wait forever...
>
> OK. Good example. Thanks.
>
>> To change struct pid would require the refcount on struct pid to show
>> no references from anywhere except the task_struct.
>
> I think this is OK to return -EBUSY for this. And fix the waitpid
> respectively not to block this common case. All the others I think
> can be stayed as is.
That would probably work. setsid() and setpgrp() have similar sorts
of restrictions. That is both more challenging and more limiting than
the semantics that come out of my unshare(CLONE_NEWPID) patch. So I
would prefer to keep this sort of thing as a last resort.
>> At the cost of a little memory we can solve that problem for unshare
>> if we have a an extra upid in struct pid, how we verify there is space
>> in struct pid I'm not certain.
>>
>> I do think that at least until someone calls exec the namespace pids are
>> reported to the process itself should not change. That is kill and
>
> Wait a second - in that case the wait will be blocked too! No?
If all we do is populate an unused struct upid in struct pid there
isn't a chance of a problem.
>> waitpid etc. Which suggests an implementation the opposite of what
>> I proposed. With ns_of_pid(task_pid(current)) being used as the
>> pid namespace of children, and current->nsproxy->pid_ns not changing
>> in the case of unshare.
>>
>> Shrug.
>>
>> Or perhaps this is a case where we use we can implement join with
>> an extra process but we can't implement unshare, because the effect
>> cannot be immediate.
>
> Well, I'm talking only about the join now.
Overall it sounds like the semantics I have proposed with
unshare(CLONE_NEWPID) are workable, and simple to implement. The
extra fork is a bit surprising but it certainly does not
look like a show stopper for implementing a pid namespace join.
Eric
next prev parent reply other threads:[~2010-03-05 20:26 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-14 14:05 RFC: netfilter: nf_conntrack: add support for "conntrack zones" Patrick McHardy
2010-01-14 15:05 ` jamal
2010-01-14 15:37 ` Patrick McHardy
2010-01-14 17:33 ` jamal
2010-01-15 10:15 ` Patrick McHardy
2010-01-15 15:19 ` jamal
2010-02-22 20:46 ` Eric W. Biederman
2010-02-22 21:55 ` jamal
2010-02-22 23:17 ` Eric W. Biederman
[not found] ` <m1wry46es9.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-02-23 13:27 ` jamal
2010-02-23 14:07 ` Eric W. Biederman
2010-02-23 14:20 ` jamal
2010-02-23 20:00 ` Eric W. Biederman
2010-02-23 23:09 ` jamal
2010-02-24 1:43 ` Eric W. Biederman
2010-02-25 20:57 ` [RFC][PATCH] ns: Syscalls for better namespace sharing control Eric W. Biederman
2010-02-25 21:31 ` Daniel Lezcano
2010-02-25 21:49 ` Eric W. Biederman
2010-02-25 22:13 ` Daniel Lezcano
2010-02-25 22:31 ` Eric W. Biederman
2010-02-26 20:35 ` Eric W. Biederman
2010-02-25 21:46 ` Matt Helsley
2010-02-25 21:54 ` Eric W. Biederman
2010-02-26 0:53 ` Eric W. Biederman
2010-02-26 1:09 ` Matt Helsley
2010-02-26 1:26 ` Eric W. Biederman
2010-02-26 3:15 ` [RFC][PATCH] ns: Syscalls for better namespace sharing control. v2 Eric W. Biederman
[not found] ` <m18wagy9f3.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-03-03 20:29 ` Jonathan Corbet
2010-03-03 20:50 ` Eric W. Biederman
[not found] ` <m1pr3t2fvl.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-02-26 21:13 ` [RFC][PATCH] ns: Syscalls for better namespace sharing control Pavel Emelyanov
2010-02-26 21:24 ` Eric W. Biederman
2010-02-26 21:34 ` Pavel Emelyanov
2010-02-26 21:42 ` Eric W. Biederman
2010-02-26 21:58 ` Oren Laadan
2010-02-26 22:16 ` Eric W. Biederman
2010-02-26 22:52 ` Oren Laadan
2010-02-26 23:13 ` Eric W. Biederman
2010-02-27 8:30 ` Pavel Emelyanov
2010-02-27 9:04 ` Eric W. Biederman
[not found] ` <m1mxyvrqvk.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-02-27 9:21 ` Pavel Emelyanov
2010-02-27 9:42 ` Eric W. Biederman
2010-02-27 16:16 ` Pavel Emelyanov
2010-02-27 19:08 ` Eric W. Biederman
2010-02-27 19:29 ` Pavel Emelyanov
2010-02-27 19:44 ` Eric W. Biederman
2010-02-28 22:05 ` Daniel Lezcano
2010-03-01 19:24 ` Eric W. Biederman
2010-03-01 21:42 ` Eric W. Biederman
2010-03-02 13:10 ` Cedric Le Goater
2010-03-02 15:03 ` Pavel Emelyanov
2010-03-02 15:14 ` Jan Engelhardt
2010-03-02 21:45 ` Eric W. Biederman
2010-03-02 21:19 ` Sukadev Bhattiprolu
2010-03-02 22:13 ` Eric W. Biederman
2010-03-03 0:07 ` Sukadev Bhattiprolu
2010-03-03 0:46 ` Eric W. Biederman
2010-03-03 15:38 ` Serge E. Hallyn
2010-03-03 19:47 ` Eric W. Biederman
2010-03-04 21:45 ` Eric W. Biederman
2010-03-04 22:55 ` Jan Engelhardt
2010-03-03 16:50 ` Pavel Emelyanov
2010-03-03 20:16 ` Eric W. Biederman
2010-03-05 19:18 ` Pavel Emelyanov
2010-03-05 20:26 ` Eric W. Biederman [this message]
2010-03-06 14:47 ` Daniel Lezcano
[not found] ` <4B926B1B.5070207-GANU6spQydw@public.gmane.org>
2010-03-06 20:48 ` Eric W. Biederman
[not found] ` <m1aaulyy5c.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-03-06 21:26 ` Daniel Lezcano
2010-03-08 8:32 ` Eric W. Biederman
2010-03-08 16:54 ` Daniel Lezcano
2010-03-08 17:29 ` Eric W. Biederman
2010-03-08 19:57 ` Daniel Lezcano
2010-03-08 20:24 ` Eric W. Biederman
2010-03-08 20:42 ` Daniel Lezcano
2010-03-08 20:47 ` Eric W. Biederman
2010-03-08 21:12 ` Daniel Lezcano
2010-03-08 21:25 ` Eric W. Biederman
2010-03-08 21:49 ` Serge E. Hallyn
2010-03-08 22:24 ` Eric W. Biederman
2010-03-09 10:03 ` Daniel Lezcano
2010-03-09 10:13 ` Eric W. Biederman
2010-03-09 10:26 ` Daniel Lezcano
2010-03-10 21:16 ` Daniel Lezcano
2010-03-08 17:07 ` Serge E. Hallyn
2010-03-08 17:35 ` Eric W. Biederman
2010-03-08 17:47 ` Serge E. Hallyn
2010-03-03 20:59 ` Oren Laadan
2010-03-03 21:05 ` Eric W. Biederman
[not found] ` <m18wa9glpo.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-05-27 12:06 ` [Devel] " Enrico Weigelt
[not found] ` <m1bpfbwuze.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2010-02-26 21:35 ` Pavel Emelyanov
2010-02-26 21:49 ` Eric W. Biederman
2010-02-23 23:49 ` RFC: netfilter: nf_conntrack: add support for "conntrack zones" Matt Helsley
2010-02-24 1:32 ` Eric W. Biederman
2010-02-24 1:39 ` Serge E. Hallyn
2010-01-14 18:32 ` Ben Greear
2010-01-15 15:03 ` jamal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1vdda1pmx.fsf@fess.ebiederm.org \
--to=ebiederm@xmission.com \
--cc=containers@lists.linux-foundation.org \
--cc=daniel.lezcano@free.fr \
--cc=greearb@candelatech.com \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=serue@us.ibm.com \
--cc=sukadev@linux.vnet.ibm.com \
--cc=xemul@parallels.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).