Re: The issues for agreeing on a virtualization/namespaces implementation.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hubertus Franke <frankeh@watson.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>, Sam Vilain <sam@vilain.net>,
	Rik van Riel <riel@redhat.com>, Kirill Korotaev <dev@openvz.org>,
	Linus Torvalds <torvalds@osdl.org>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, clg@fr.ibm.com,
	haveblue@us.ibm.com, greg@kroah.com, alan@lxorguk.ukuu.org.uk,
	arjan@infradead.org, kuznet@ms2.inr.ac.ru, saw@sawoct.com,
	devel@openvz.org, Dmitry Mishin <dim@sw.ru>,
	Andi Kleen <ak@suse.de>, Herbert Poetzl <herbert@13thfloor.at>
Subject: Re: The issues for agreeing on a virtualization/namespaces implementation.
Date: Wed, 08 Feb 2006 09:40:22 -0500	[thread overview]
Message-ID: <43EA02D6.30208@watson.ibm.com> (raw)
In-Reply-To: <m1ek2ea0fw.fsf@ebiederm.dsl.xmission.com>

Eric W. Biederman wrote:
> Hubertus Franke <frankeh@watson.ibm.com> writes:
> 
> 
>>Eric W. Biederman wrote:
>>

>>>2) What is the syscall interface to create these namespaces?
>>>   - Do we add clone flags?       (Plan 9 style)
>>
>>Like that approach .. flexible .. particular when one has well specified
>>namespaces.
>>
>>
>>>   - Do we add a syscall (similar to setsid) per namespace?
>>>     (Traditional unix style)?
>>
>>Where does that approach end .. what's wrong with doing it at clone() time ?
>>Mainly the naming issue. Just providing a flag does not give me name.
> 
> 
> It really is a fairly even toss up.  The usual argument for doing it
> this way is that you will get a endless stream of arguments added to
> fork+exec other wise.  Look of posix_spawn or the windows version if
> you want an example.  Bits to clone are skirting the edge of a slippery 
> slope.
> 

So it seems the clone( flags ) is a reasonable approach to create new
namespaces. Question is what is the initial state of each namespace?
In pidspace we know we should be creating an empty pidmap !
In network, someone suggested creating a loopback device
In uts, create "localhost"
Are there examples where we rather inherit ?  Filesystem ?
Can we iterate the assumption for each subsystem what people thing is right?

IMHO, there is only a need to refer to a namespace from the global context.
Since one will be moving into a new container, but getting out of one
could be prohibitive (e.g. after migration)
It does not make sense therefore to know the name of a namespace in
a different container.

The example you used below by using the pid comes natural, because
that already limits visibility.

I am still struggling with why we need new sys_calls.
sys_calls already exist for changing certain system parameters (e.g. utsname )
so to me it boils down to identifying a proper initial state when the
namespace is created.

> 
>>>3) How do we refer to namespaces and containers when we are not members?
>>>   - Do we refer to them indirectly by processes or other objects that
>>>     we can see and are members?
>>>   - Do we assign some kind of unique id to the containers?
>>
>>In containers I simply created an explicite name, which ofcourse colides with
>>the
>>clone() approach ..
>>One possibility is to allow associating a name with a namespace.
>>For instance
>>int set_namespace_name( long flags, const char *name ) /* the once we are using
>>in clone */
>>{
>>	if (!flag)
>>		set name of container associated with current.
>>	if (flag())
>>		set the name if only one container is associated with the
>>namespace(s)
>>		identified .. or some similar rule
>>}
>>
> 
> 
> What I have done which seems easier than creating new names is to refer
> to the process which has the namespace I want to manipulate.

Is then the idea to only allow the container->init to manipulate
or is there need to allow other priviliged processes to perform namespace
manipulation?
Also after thinking about it.. why is there a need to have an external name
for a namespace ?

> 
> 
>>>6) How do we do all of this efficiently without a noticeable impact on
>>>   performance?
>>>   - I have already heard concerns that I might be introducing cache
>>>     line bounces and thus increasing tasklist_lock hold time.
>>>     Which on big way systems can be a problem.
>>
>>Possible to split the lock up now.. one for each pidspace ?
> 
> 
> At the moment it is worth thinking about.  If the problem isn't
> so bad that people aren't actively working on it we don't have to
> solve the problem for a little while, just be aware of it.
> 

Agree, just need to be sure we can split it up. But you already keep
a task list per pid-namespace, so there should be no problem IMHO.
If so let's do it now and take it of the table it its as simple as

task_list_lock ::= pspace->task_list_lock

> 
>>>7) How do we allow a process inside a container to create containers
>>>   for it's children?
>>>   - In general this is trivial but there are a few ugly issues
>>>     here.
>>
>>Speaking of pids only here ...
>>Does it matter, you just hang all those containers hang of init.
>>What ever hierarchy they form is external ...
> 
> 
> In general it is simple.  For resource accounting, and for naming so
> you can migrate a container with a nested container it is a question
> you need to be slightly careful with.

Absolutely, that's why it is useful to have an "external" idea of how
containers are constructed of basic namespaces==subsystems.
The it "simply" becomes a policy. E.g. one can not migrate a container
that has shared subsystems.
Resource accounting I agree, that might required active aggregation
at request time.

-- Hubertus

next prev parent reply	other threads:[~2006-02-08 14:40 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-06 21:57 [PATCH 1/4] Virtualization/containers: introduction Kirill Korotaev
2006-02-06 22:12 ` [PATCH 2/4] Virtualization/containers: CONFIG_CONTAINER Kirill Korotaev
2006-02-06 22:17 ` [PATCH 3/4] Virtualization/containers: UID hash Kirill Korotaev
2006-02-06 22:22 ` [PATCH 4/4] Virtualization/containers: uts name Kirill Korotaev
2006-02-06 23:00 ` [PATCH 1/4] Virtualization/containers: introduction Dave Hansen
2006-02-07 12:24   ` Kirill Korotaev
2006-02-07  3:34 ` Eric W. Biederman
2006-02-07  3:40   ` Rik van Riel
2006-02-07  6:30     ` Sam Vilain
2006-02-07 11:51       ` Kirill Korotaev
2006-02-07 14:31         ` Eric W. Biederman
2006-02-07 15:42       ` Eric W. Biederman
2006-02-07 16:18         ` Kirill Korotaev
2006-02-07 17:20           ` Eric W. Biederman
2006-02-07 22:43         ` Sam Vilain
2006-02-07 16:57       ` Hubertus Franke
2006-02-07 20:19         ` Serge E. Hallyn
2006-02-07 20:46           ` Hubertus Franke
2006-02-07 22:00             ` Eric W. Biederman
2006-02-07 22:19               ` Hubertus Franke
2006-02-07 22:06             ` The issues for agreeing on a virtualization/namespaces implementation Eric W. Biederman
2006-02-07 23:35               ` Hubertus Franke
2006-02-08  0:43                 ` Alexey Kuznetsov
2006-02-08  2:49                   ` Eric W. Biederman
2006-02-08  3:36                     ` Serge E. Hallyn
2006-02-08  3:52                       ` Eric W. Biederman
2006-02-08  4:37                         ` Herbert Poetzl
2006-02-08  4:46                           ` Eric W. Biederman
2006-02-08 19:24                         ` Stephen Hemminger
2006-02-08  5:23                 ` Eric W. Biederman
2006-02-08 14:40                   ` Hubertus Franke [this message]
2006-02-08 15:17                     ` Serge E. Hallyn
2006-02-08 15:35                       ` Kirill Korotaev
2006-02-08 15:57                         ` Hubertus Franke
2006-02-08 19:02                           ` Herbert Poetzl
2006-02-08 16:48                         ` Eric W. Biederman
2006-02-08 17:46                     ` Eric W. Biederman
2006-02-08 18:03                     ` Serge E. Hallyn
2006-02-08 18:31                       ` Hubertus Franke
2006-02-08 20:21                       ` Dave Hansen
2006-02-08 21:22                         ` Serge E. Hallyn
2006-02-08 22:28                     ` Eric W. Biederman
2006-02-20 12:11                 ` Kirill Korotaev
2006-02-20 12:41                   ` Herbert Poetzl
2006-02-20 14:26                     ` Kirill Korotaev
2006-02-20 15:16                       ` Herbert Poetzl
2006-02-08  4:56               ` Herbert Poetzl
2006-02-08 14:38                 ` Serge E. Hallyn
2006-02-08 14:51                   ` Hubertus Franke
2006-02-09  4:45               ` Kyle Moffett
2006-02-09  5:41                 ` Eric W. Biederman
2006-02-09 22:25               ` Eric W. Biederman
2006-02-07 22:58         ` [PATCH 1/4] Virtualization/containers: introduction Sam Vilain
2006-02-07 23:18           ` Hubertus Franke
2006-02-08  5:03             ` Eric W. Biederman
2006-02-08 14:13               ` Hubertus Franke
2006-02-08 15:44                 ` Kirill Korotaev
2006-02-08 16:39                   ` Eric W. Biederman
2006-02-08  2:08           ` Kevin Fox
2006-02-08  1:16             ` Sam Vilain
2006-02-08  4:21               ` Paul Jackson
2006-02-08 15:36         ` Kirill Korotaev
2006-02-08 17:16           ` Eric W. Biederman
2006-02-08 20:43           ` Dave Hansen
2006-02-08 21:04             ` Eric W. Biederman
2006-02-07 12:14   ` Kirill Korotaev
2006-02-07 14:06     ` Eric W. Biederman
2006-02-07 14:52       ` Rik van Riel
2006-02-07 15:13         ` Eric W. Biederman
2006-02-09  0:24 ` Eric W. Biederman
2006-02-09  2:18   ` Jeff Dike
2006-02-09  3:16     ` Eric W. Biederman
2006-02-09 14:28     ` Kirill Korotaev
2006-02-09 15:40       ` Jeff Dike
2006-02-09 15:49         ` Kirill Korotaev
2006-02-09 17:50           ` Jeff Dike
2006-02-09 16:38     ` Hubertus Franke
2006-02-09 17:48       ` Jeff Dike
2006-02-09 22:09         ` Sam Vilain
2006-02-09 21:56   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43EA02D6.30208@watson.ibm.com \
    --to=frankeh@watson.ibm.com \
    --cc=ak@suse.de \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=arjan@infradead.org \
    --cc=clg@fr.ibm.com \
    --cc=dev@openvz.org \
    --cc=devel@openvz.org \
    --cc=dim@sw.ru \
    --cc=ebiederm@xmission.com \
    --cc=greg@kroah.com \
    --cc=haveblue@us.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riel@redhat.com \
    --cc=sam@vilain.net \
    --cc=saw@sawoct.com \
    --cc=serue@us.ibm.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.