Re: The issues for agreeing on a virtualization/namespaces implementation.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Hubertus Franke <frankeh@watson.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>, Sam Vilain <sam@vilain.net>,
	Rik van Riel <riel@redhat.com>, Kirill Korotaev <dev@openvz.org>,
	Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, clg@fr.ibm.com,
	haveblue@us.ibm.com, greg@kroah.com, alan@lxorguk.ukuu.org.uk,
	arjan@infradead.org, kuznet@ms2.inr.ac.ru, saw@sawoct.com,
	devel@openvz.org, Dmitry Mishin <dim@sw.ru>,
	Andi Kleen <ak@suse.de>, Herbert Poetzl <herbert@13thfloor.at>
Subject: Re: The issues for agreeing on a virtualization/namespaces implementation.
Date: Wed, 08 Feb 2006 09:40:22 -0500	[thread overview]
Message-ID: <43EA02D6.30208@watson.ibm.com> (raw)
In-Reply-To: <m1ek2ea0fw.fsf@ebiederm.dsl.xmission.com>

Eric W. Biederman wrote:
> Hubertus Franke <frankeh@watson.ibm.com> writes:
> 
> 
>>Eric W. Biederman wrote:
>>

>>>2) What is the syscall interface to create these namespaces?
>>>   - Do we add clone flags?       (Plan 9 style)
>>
>>Like that approach .. flexible .. particular when one has well specified
>>namespaces.
>>
>>
>>>   - Do we add a syscall (similar to setsid) per namespace?
>>>     (Traditional unix style)?
>>
>>Where does that approach end .. what's wrong with doing it at clone() time ?
>>Mainly the naming issue. Just providing a flag does not give me name.
> 
> 
> It really is a fairly even toss up.  The usual argument for doing it
> this way is that you will get a endless stream of arguments added to
> fork+exec other wise.  Look of posix_spawn or the windows version if
> you want an example.  Bits to clone are skirting the edge of a slippery 
> slope.
> 

So it seems the clone( flags ) is a reasonable approach to create new
namespaces. Question is what is the initial state of each namespace?
In pidspace we know we should be creating an empty pidmap !
In network, someone suggested creating a loopback device
In uts, create "localhost"
Are there examples where we rather inherit ?  Filesystem ?
Can we iterate the assumption for each subsystem what people thing is right?

IMHO, there is only a need to refer to a namespace from the global context.
Since one will be moving into a new container, but getting out of one
could be prohibitive (e.g. after migration)
It does not make sense therefore to know the name of a namespace in
a different container.

The example you used below by using the pid comes natural, because
that already limits visibility.

I am still struggling with why we need new sys_calls.
sys_calls already exist for changing certain system parameters (e.g. utsname )
so to me it boils down to identifying a proper initial state when the
namespace is created.

> 
>>>3) How do we refer to namespaces and containers when we are not members?
>>>   - Do we refer to them indirectly by processes or other objects that
>>>     we can see and are members?
>>>   - Do we assign some kind of unique id to the containers?
>>
>>In containers I simply created an explicite name, which ofcourse colides with
>>the
>>clone() approach ..
>>One possibility is to allow associating a name with a namespace.
>>For instance
>>int set_namespace_name( long flags, const char *name ) /* the once we are using
>>in clone */
>>{
>>	if (!flag)
>>		set name of container associated with current.
>>	if (flag())
>>		set the name if only one container is associated with the
>>namespace(s)
>>		identified .. or some similar rule
>>}
>>
> 
> 
> What I have done which seems easier than creating new names is to refer
> to the process which has the namespace I want to manipulate.

Is then the idea to only allow the container->init to manipulate
or is there need to allow other priviliged processes to perform namespace
manipulation?
Also after thinking about it.. why is there a need to have an external name
for a namespace ?

> 
> 
>>>6) How do we do all of this efficiently without a noticeable impact on
>>>   performance?
>>>   - I have already heard concerns that I might be introducing cache
>>>     line bounces and thus increasing tasklist_lock hold time.
>>>     Which on big way systems can be a problem.
>>
>>Possible to split the lock up now.. one for each pidspace ?
> 
> 
> At the moment it is worth thinking about.  If the problem isn't
> so bad that people aren't actively working on it we don't have to
> solve the problem for a little while, just be aware of it.
> 

Agree, just need to be sure we can split it up. But you already keep
a task list per pid-namespace, so there should be no problem IMHO.
If so let's do it now and take it of the table it its as simple as

task_list_lock ::= pspace->task_list_lock

> 
>>>7) How do we allow a process inside a container to create containers
>>>   for it's children?
>>>   - In general this is trivial but there are a few ugly issues
>>>     here.
>>
>>Speaking of pids only here ...
>>Does it matter, you just hang all those containers hang of init.
>>What ever hierarchy they form is external ...
> 
> 
> In general it is simple.  For resource accounting, and for naming so
> you can migrate a container with a nested container it is a question
> you need to be slightly careful with.

Absolutely, that's why it is useful to have an "external" idea of how
containers are constructed of basic namespaces==subsystems.
The it "simply" becomes a policy. E.g. one can not migrate a container
that has shared subsystems.
Resource accounting I agree, that might required active aggregation
at request time.

-- Hubertus

next prev parent reply	other threads:[~2006-02-08 14:40 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-06 21:57 [PATCH 1/4] Virtualization/containers: introduction Kirill Korotaev
2006-02-06 22:12 ` [PATCH 2/4] Virtualization/containers: CONFIG_CONTAINER Kirill Korotaev
2006-02-06 22:17 ` [PATCH 3/4] Virtualization/containers: UID hash Kirill Korotaev
2006-02-06 22:22 ` [PATCH 4/4] Virtualization/containers: uts name Kirill Korotaev
2006-02-06 23:00 ` [PATCH 1/4] Virtualization/containers: introduction Dave Hansen
2006-02-07 12:24   ` Kirill Korotaev
2006-02-07  3:34 ` Eric W. Biederman
2006-02-07  3:40   ` Rik van Riel
2006-02-07  6:30     ` Sam Vilain
2006-02-07 11:51       ` Kirill Korotaev
2006-02-07 14:31         ` Eric W. Biederman
2006-02-07 15:42       ` Eric W. Biederman
2006-02-07 16:18         ` Kirill Korotaev
2006-02-07 17:20           ` Eric W. Biederman
2006-02-07 22:43         ` Sam Vilain
2006-02-07 16:57       ` Hubertus Franke
2006-02-07 20:19         ` Serge E. Hallyn
2006-02-07 20:46           ` Hubertus Franke
2006-02-07 22:00             ` Eric W. Biederman
2006-02-07 22:19               ` Hubertus Franke
2006-02-07 22:06             ` The issues for agreeing on a virtualization/namespaces implementation Eric W. Biederman
2006-02-07 23:35               ` Hubertus Franke
2006-02-08  0:43                 ` Alexey Kuznetsov
2006-02-08  2:49                   ` Eric W. Biederman
2006-02-08  3:36                     ` Serge E. Hallyn
2006-02-08  3:52                       ` Eric W. Biederman
2006-02-08  4:37                         ` Herbert Poetzl
2006-02-08  4:46                           ` Eric W. Biederman
2006-02-08 19:24                         ` Stephen Hemminger
2006-02-08  5:23                 ` Eric W. Biederman
2006-02-08 14:40                   ` Hubertus Franke [this message]
2006-02-08 15:17                     ` Serge E. Hallyn
2006-02-08 15:35                       ` Kirill Korotaev
2006-02-08 15:57                         ` Hubertus Franke
2006-02-08 19:02                           ` Herbert Poetzl
2006-02-08 16:48                         ` Eric W. Biederman
2006-02-08 17:46                     ` Eric W. Biederman
2006-02-08 18:03                     ` Serge E. Hallyn
2006-02-08 18:31                       ` Hubertus Franke
2006-02-08 20:21                       ` Dave Hansen
2006-02-08 21:22                         ` Serge E. Hallyn
2006-02-08 22:28                     ` Eric W. Biederman
2006-02-20 12:11                 ` Kirill Korotaev
2006-02-20 12:41                   ` Herbert Poetzl
2006-02-20 14:26                     ` Kirill Korotaev
2006-02-20 15:16                       ` Herbert Poetzl
2006-02-08  4:56               ` Herbert Poetzl
2006-02-08 14:38                 ` Serge E. Hallyn
2006-02-08 14:51                   ` Hubertus Franke
2006-02-09  4:45               ` Kyle Moffett
2006-02-09  5:41                 ` Eric W. Biederman
2006-02-09 22:25               ` Eric W. Biederman
2006-02-07 22:58         ` [PATCH 1/4] Virtualization/containers: introduction Sam Vilain
2006-02-07 23:18           ` Hubertus Franke
2006-02-08  5:03             ` Eric W. Biederman
2006-02-08 14:13               ` Hubertus Franke
2006-02-08 15:44                 ` Kirill Korotaev
2006-02-08 16:39                   ` Eric W. Biederman
2006-02-08  2:08           ` Kevin Fox
2006-02-08  1:16             ` Sam Vilain
2006-02-08  4:21               ` Paul Jackson
2006-02-08 15:36         ` Kirill Korotaev
2006-02-08 17:16           ` Eric W. Biederman
2006-02-08 20:43           ` Dave Hansen
2006-02-08 21:04             ` Eric W. Biederman
2006-02-07 12:14   ` Kirill Korotaev
2006-02-07 14:06     ` Eric W. Biederman
2006-02-07 14:52       ` Rik van Riel
2006-02-07 15:13         ` Eric W. Biederman
2006-02-09  0:24 ` Eric W. Biederman
2006-02-09  2:18   ` Jeff Dike
2006-02-09  3:16     ` Eric W. Biederman
2006-02-09 14:28     ` Kirill Korotaev
2006-02-09 15:40       ` Jeff Dike
2006-02-09 15:49         ` Kirill Korotaev
2006-02-09 17:50           ` Jeff Dike
2006-02-09 16:38     ` Hubertus Franke
2006-02-09 17:48       ` Jeff Dike
2006-02-09 22:09         ` Sam Vilain
2006-02-09 21:56   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43EA02D6.30208@watson.ibm.com \
    --to=frankeh@watson.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=arjan@infradead.org \
    --cc=clg@fr.ibm.com \
    --cc=dev@openvz.org \
    --cc=devel@openvz.org \
    --cc=dim@sw.ru \
    --cc=ebiederm@xmission.com \
    --cc=greg@kroah.com \
    --cc=haveblue@us.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=sam@vilain.net \
    --cc=saw@sawoct.com \
    --cc=serue@us.ibm.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox