From: Hubertus Franke <frankeh@watson.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>, Sam Vilain <sam@vilain.net>,
Rik van Riel <riel@redhat.com>, Kirill Korotaev <dev@openvz.org>,
Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, clg@fr.ibm.com,
haveblue@us.ibm.com, greg@kroah.com, alan@lxorguk.ukuu.org.uk,
arjan@infradead.org, kuznet@ms2.inr.ac.ru, saw@sawoct.com,
devel@openvz.org, Dmitry Mishin <dim@sw.ru>,
Andi Kleen <ak@suse.de>, Herbert Poetzl <herbert@13thfloor.at>
Subject: Re: The issues for agreeing on a virtualization/namespaces implementation.
Date: Wed, 08 Feb 2006 09:40:22 -0500 [thread overview]
Message-ID: <43EA02D6.30208@watson.ibm.com> (raw)
In-Reply-To: <m1ek2ea0fw.fsf@ebiederm.dsl.xmission.com>
Eric W. Biederman wrote:
> Hubertus Franke <frankeh@watson.ibm.com> writes:
>
>
>>Eric W. Biederman wrote:
>>
>>>2) What is the syscall interface to create these namespaces?
>>> - Do we add clone flags? (Plan 9 style)
>>
>>Like that approach .. flexible .. particular when one has well specified
>>namespaces.
>>
>>
>>> - Do we add a syscall (similar to setsid) per namespace?
>>> (Traditional unix style)?
>>
>>Where does that approach end .. what's wrong with doing it at clone() time ?
>>Mainly the naming issue. Just providing a flag does not give me name.
>
>
> It really is a fairly even toss up. The usual argument for doing it
> this way is that you will get a endless stream of arguments added to
> fork+exec other wise. Look of posix_spawn or the windows version if
> you want an example. Bits to clone are skirting the edge of a slippery
> slope.
>
So it seems the clone( flags ) is a reasonable approach to create new
namespaces. Question is what is the initial state of each namespace?
In pidspace we know we should be creating an empty pidmap !
In network, someone suggested creating a loopback device
In uts, create "localhost"
Are there examples where we rather inherit ? Filesystem ?
Can we iterate the assumption for each subsystem what people thing is right?
IMHO, there is only a need to refer to a namespace from the global context.
Since one will be moving into a new container, but getting out of one
could be prohibitive (e.g. after migration)
It does not make sense therefore to know the name of a namespace in
a different container.
The example you used below by using the pid comes natural, because
that already limits visibility.
I am still struggling with why we need new sys_calls.
sys_calls already exist for changing certain system parameters (e.g. utsname )
so to me it boils down to identifying a proper initial state when the
namespace is created.
>
>>>3) How do we refer to namespaces and containers when we are not members?
>>> - Do we refer to them indirectly by processes or other objects that
>>> we can see and are members?
>>> - Do we assign some kind of unique id to the containers?
>>
>>In containers I simply created an explicite name, which ofcourse colides with
>>the
>>clone() approach ..
>>One possibility is to allow associating a name with a namespace.
>>For instance
>>int set_namespace_name( long flags, const char *name ) /* the once we are using
>>in clone */
>>{
>> if (!flag)
>> set name of container associated with current.
>> if (flag())
>> set the name if only one container is associated with the
>>namespace(s)
>> identified .. or some similar rule
>>}
>>
>
>
> What I have done which seems easier than creating new names is to refer
> to the process which has the namespace I want to manipulate.
Is then the idea to only allow the container->init to manipulate
or is there need to allow other priviliged processes to perform namespace
manipulation?
Also after thinking about it.. why is there a need to have an external name
for a namespace ?
>
>
>>>6) How do we do all of this efficiently without a noticeable impact on
>>> performance?
>>> - I have already heard concerns that I might be introducing cache
>>> line bounces and thus increasing tasklist_lock hold time.
>>> Which on big way systems can be a problem.
>>
>>Possible to split the lock up now.. one for each pidspace ?
>
>
> At the moment it is worth thinking about. If the problem isn't
> so bad that people aren't actively working on it we don't have to
> solve the problem for a little while, just be aware of it.
>
Agree, just need to be sure we can split it up. But you already keep
a task list per pid-namespace, so there should be no problem IMHO.
If so let's do it now and take it of the table it its as simple as
task_list_lock ::= pspace->task_list_lock
>
>>>7) How do we allow a process inside a container to create containers
>>> for it's children?
>>> - In general this is trivial but there are a few ugly issues
>>> here.
>>
>>Speaking of pids only here ...
>>Does it matter, you just hang all those containers hang of init.
>>What ever hierarchy they form is external ...
>
>
> In general it is simple. For resource accounting, and for naming so
> you can migrate a container with a nested container it is a question
> you need to be slightly careful with.
Absolutely, that's why it is useful to have an "external" idea of how
containers are constructed of basic namespaces==subsystems.
The it "simply" becomes a policy. E.g. one can not migrate a container
that has shared subsystems.
Resource accounting I agree, that might required active aggregation
at request time.
-- Hubertus
next prev parent reply other threads:[~2006-02-08 14:40 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-06 21:57 [PATCH 1/4] Virtualization/containers: introduction Kirill Korotaev
2006-02-06 22:12 ` [PATCH 2/4] Virtualization/containers: CONFIG_CONTAINER Kirill Korotaev
2006-02-06 22:17 ` [PATCH 3/4] Virtualization/containers: UID hash Kirill Korotaev
2006-02-06 22:22 ` [PATCH 4/4] Virtualization/containers: uts name Kirill Korotaev
2006-02-06 23:00 ` [PATCH 1/4] Virtualization/containers: introduction Dave Hansen
2006-02-07 12:24 ` Kirill Korotaev
2006-02-07 3:34 ` Eric W. Biederman
2006-02-07 3:40 ` Rik van Riel
2006-02-07 6:30 ` Sam Vilain
2006-02-07 11:51 ` Kirill Korotaev
2006-02-07 14:31 ` Eric W. Biederman
2006-02-07 15:42 ` Eric W. Biederman
2006-02-07 16:18 ` Kirill Korotaev
2006-02-07 17:20 ` Eric W. Biederman
2006-02-07 22:43 ` Sam Vilain
2006-02-07 16:57 ` Hubertus Franke
2006-02-07 20:19 ` Serge E. Hallyn
2006-02-07 20:46 ` Hubertus Franke
2006-02-07 22:00 ` Eric W. Biederman
2006-02-07 22:19 ` Hubertus Franke
2006-02-07 22:06 ` The issues for agreeing on a virtualization/namespaces implementation Eric W. Biederman
2006-02-07 23:35 ` Hubertus Franke
2006-02-08 0:43 ` Alexey Kuznetsov
2006-02-08 2:49 ` Eric W. Biederman
2006-02-08 3:36 ` Serge E. Hallyn
2006-02-08 3:52 ` Eric W. Biederman
2006-02-08 4:37 ` Herbert Poetzl
2006-02-08 4:46 ` Eric W. Biederman
2006-02-08 19:24 ` Stephen Hemminger
2006-02-08 5:23 ` Eric W. Biederman
2006-02-08 14:40 ` Hubertus Franke [this message]
2006-02-08 15:17 ` Serge E. Hallyn
2006-02-08 15:35 ` Kirill Korotaev
2006-02-08 15:57 ` Hubertus Franke
2006-02-08 19:02 ` Herbert Poetzl
2006-02-08 16:48 ` Eric W. Biederman
2006-02-08 17:46 ` Eric W. Biederman
2006-02-08 18:03 ` Serge E. Hallyn
2006-02-08 18:31 ` Hubertus Franke
2006-02-08 20:21 ` Dave Hansen
2006-02-08 21:22 ` Serge E. Hallyn
2006-02-08 22:28 ` Eric W. Biederman
2006-02-20 12:11 ` Kirill Korotaev
2006-02-20 12:41 ` Herbert Poetzl
2006-02-20 14:26 ` Kirill Korotaev
2006-02-20 15:16 ` Herbert Poetzl
2006-02-08 4:56 ` Herbert Poetzl
2006-02-08 14:38 ` Serge E. Hallyn
2006-02-08 14:51 ` Hubertus Franke
2006-02-09 4:45 ` Kyle Moffett
2006-02-09 5:41 ` Eric W. Biederman
2006-02-09 22:25 ` Eric W. Biederman
2006-02-07 22:58 ` [PATCH 1/4] Virtualization/containers: introduction Sam Vilain
2006-02-07 23:18 ` Hubertus Franke
2006-02-08 5:03 ` Eric W. Biederman
2006-02-08 14:13 ` Hubertus Franke
2006-02-08 15:44 ` Kirill Korotaev
2006-02-08 16:39 ` Eric W. Biederman
2006-02-08 2:08 ` Kevin Fox
2006-02-08 1:16 ` Sam Vilain
2006-02-08 4:21 ` Paul Jackson
2006-02-08 15:36 ` Kirill Korotaev
2006-02-08 17:16 ` Eric W. Biederman
2006-02-08 20:43 ` Dave Hansen
2006-02-08 21:04 ` Eric W. Biederman
2006-02-07 12:14 ` Kirill Korotaev
2006-02-07 14:06 ` Eric W. Biederman
2006-02-07 14:52 ` Rik van Riel
2006-02-07 15:13 ` Eric W. Biederman
2006-02-09 0:24 ` Eric W. Biederman
2006-02-09 2:18 ` Jeff Dike
2006-02-09 3:16 ` Eric W. Biederman
2006-02-09 14:28 ` Kirill Korotaev
2006-02-09 15:40 ` Jeff Dike
2006-02-09 15:49 ` Kirill Korotaev
2006-02-09 17:50 ` Jeff Dike
2006-02-09 16:38 ` Hubertus Franke
2006-02-09 17:48 ` Jeff Dike
2006-02-09 22:09 ` Sam Vilain
2006-02-09 21:56 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43EA02D6.30208@watson.ibm.com \
--to=frankeh@watson.ibm.com \
--cc=ak@suse.de \
--cc=akpm@osdl.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=arjan@infradead.org \
--cc=clg@fr.ibm.com \
--cc=dev@openvz.org \
--cc=devel@openvz.org \
--cc=dim@sw.ru \
--cc=ebiederm@xmission.com \
--cc=greg@kroah.com \
--cc=haveblue@us.ibm.com \
--cc=herbert@13thfloor.at \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=riel@redhat.com \
--cc=sam@vilain.net \
--cc=saw@sawoct.com \
--cc=serue@us.ibm.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox