Re: Which of the virtualization approaches is more suitable for kernel?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sam Vilain <sam@vilain.net>
To: Kirill Korotaev <dev@sw.ru>
Cc: Herbert Poetzl <herbert@13thfloor.at>,
	Linus Torvalds <torvalds@osdl.org>,
	Rik van Riel <riel@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	devel@openvz.org, "Eric W. Biederman" <ebiederm@xmission.com>,
	Andrey Savochkin <saw@sawoct.com>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Stanislav Protassov <st@sw.ru>,
	serue@us.ibm.com, frankeh@watson.ibm.com, clg@fr.ibm.com,
	haveblue@us.ibm.com, mrmacman_g4@mac.com,
	alan@lxorguk.ukuu.org.uk, Andrew Morton <akpm@osdl.org>
Subject: Re: Which of the virtualization approaches is more suitable for kernel?
Date: Wed, 22 Feb 2006 09:33:41 +1300	[thread overview]
Message-ID: <43FB7925.5060609@vilain.net> (raw)
In-Reply-To: <43FB3937.408@sw.ru>

Kirill Korotaev wrote:
>>>- fine grained namespaces are actually an obfuscation, since kernel
>>> subsystems are tightly interconnected. e.g. network -> sysctl -> proc,
>>> mqueues -> netlink, ipc -> fs and most often can be used only as a
>>> whole container.
>>I think a lot of _strange_ interconnects there could
>>use some cleanup, and after that the interconenctions
>>would be very small
> Why do you think they are strange!? Is it strange that networking 
> exports it's sysctls and statictics via proc?
> Is it strange for you that IPC uses fs?
> It is by _design_.

Great, and this kind of simple design also worked well for the first few 
iterations of Linux-VServer.  However, some people need more flexibility 
as we are seeing by the wide range of virtualisation schemes being 
proposed.  In the 2.1.x VServer patch the network and (process&IPC) 
isolation and virtualisation have been kept seperate, and can be managed 
with seperate utilities.  There is also a syscall and utility to manage 
the existing kernel filesystem namespaces.

Eric's pspace work keeps the PID aspect seperate too, which I never 
envisioned possible.

I think that if we can keep as much seperation between systems as 
possible, then we will have a cleaner design.  Also it will make life 
easier for the core team as we can more easily divide up the patches for 
consideration by the relevant subsystem maintainer.

> - you need to track dependencies between namespaces (e.g. NAT requires 
> conntracks, IPC requires FS etc.). this should be handled, otherwise one 
> container being able to create nested container will be able to make oops.

This is just normal refcounting.  Yes, IPC requires filesystem code, but 
it doesn't care about the VFS, which is what filesystem namespaces abstract.

> do you have support for it in tools?
 > i.e. do you support namespaces somehow? can you create half
 > virtualized container?

See the util-vserver package, it comes with chbind and vnamespace which 
allow creation of 'half-virtualized' containers, though most of the rest 
of the functionality, such as per-vserver ulimits, disklimits, etc have 
been shoehorned into the general vx_info structure.  As we merge into 
the mainstream we can review each of these decisions and decide whether 
it is an inherantly per-process decision, or more XX_info structures are 
warranted.

>>this doesn't look very cool to me, as IRQs should
>>be handled in the host context and TCP/IP in the
>>proper network space ...
> this is exactly what it does.
> on IRQ context is switched to host.
> In TCP/IP to context of socket or network device.

That sounds like an interesting innovation, and we can compare our 
patches in this space once we have some common terms of reference and 
starting points.

>>the question here is, do we really want to turn it 
>>off at all? IMHO the design and implementation 
>>should be sufficiently good so that it does neither
>>impose unnecessary overhead nor change the default
>>behaviour ...
> this is the question I want to get from Linus/Andrew.
> I don't believe in low overhead. It starts from virtualization, then 
> goes reource management etc.
> These features _definetely_ introduce overhead and increase resource 
> consumption. Not big, but why not configurable?

Obviously, our projects have different goals; Linux-VServer has very 
little performance overhead.  Special provisions are made to achieve 
scalability on SMP and to avoid unnecessary cacheline issues.  Once that 
is sorted out, it's very hard to measure any performance overhead of it, 
especially when the task_struct->vx_info pointer is null.

However I see nothing wrong with making all code disappear without the 
kernel config option enabled.  I expect that as time goes on, you'd just 
as soon disable it as you would disable the open() system call.  I think 
that's what Herbert was getting at with his comment.

> Seems, you are just trying to move from the topic. Great.

I always did want to be a Lumberjack!

Sam.

next prev parent reply	other threads:[~2006-02-21 20:34 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-20 15:45 Which of the virtualization approaches is more suitable for kernel? Kirill Korotaev
2006-02-20 16:12 ` Herbert Poetzl
2006-02-21 16:00   ` Kirill Korotaev
2006-02-21 20:33     ` Sam Vilain [this message]
2006-02-21 23:50     ` Herbert Poetzl
2006-02-22 10:09       ` [Devel] " Kir Kolyshkin
2006-02-22 15:26         ` Eric W. Biederman
2006-02-23 12:02           ` Kir Kolyshkin
2006-02-23 13:25             ` Eric W. Biederman
2006-02-23 14:00               ` Kir Kolyshkin
2006-02-24 21:44 ` Eric W. Biederman
2006-02-24 23:01   ` Herbert Poetzl
2006-02-27 17:42   ` Dave Hansen
2006-02-27 21:14     ` Eric W. Biederman
2006-02-27 21:35       ` Dave Hansen
2006-02-27 21:56         ` Eric W. Biederman
2006-03-04  3:17       ` sysctls inside containers Dave Hansen
2006-03-04 10:27         ` Eric W. Biederman
2006-03-06 16:27           ` Dave Hansen
2006-03-06 17:08             ` Herbert Poetzl
2006-03-06 17:18               ` Dave Hansen
2006-03-06 18:56             ` Eric W. Biederman
2006-03-10 10:17         ` Kirill Korotaev
2006-03-10 13:22           ` Eric W. Biederman
2006-03-10 10:19         ` Kirill Korotaev
2006-03-10 11:55           ` Eric W. Biederman
2006-03-10 18:58           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43FB7925.5060609@vilain.net \
    --to=sam@vilain.net \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=clg@fr.ibm.com \
    --cc=dev@sw.ru \
    --cc=devel@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=frankeh@watson.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mrmacman_g4@mac.com \
    --cc=riel@redhat.com \
    --cc=saw@sawoct.com \
    --cc=serue@us.ibm.com \
    --cc=st@sw.ru \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox