From: Daniel Lezcano <dlezcano@fr.ibm.com>
To: James Morris <jmorris@namei.org>
Cc: hadi@cyberus.ca, Dmitry Mishin <dim@openvz.org>,
Stephen Hemminger <shemminger@osdl.org>,
netdev@vger.kernel.org
Subject: Re: Network virtualization/isolation
Date: Wed, 15 Nov 2006 10:56:55 +0100 [thread overview]
Message-ID: <455AE467.2060308@fr.ibm.com> (raw)
In-Reply-To: <XMMS.LNX.4.64.0611141304330.22471@d.namei>
[-- Attachment #1: Type: text/plain, Size: 5002 bytes --]
James Morris wrote:
> On Tue, 14 Nov 2006, Daniel Lezcano wrote:
>
>> the attached document describes the network isolation at the layer 2 and at
>> the layer 3, it presents the pros and cons of the different approaches, their
>> common points and the impacted network code.
>> I hope it will be helpful :)
>
> What about other network subsystems: xfrm, netfilter, iptables, netlink,
> etc. ?
They are not addressed for the moment, Dmitry Mishin is looking at
netfilters isolation.
Netlink has 2 aspects:
* the communication between processes and because the message format
use a pid destination, the netlink will be addressed by the pid
virtualization
* the ip management. At the layer 2, there is nothing to do because
the data access are relative to the namespace. At the layer 3, the cases
should be handled to check the IPs. Some work is already done for the
ifaddr isolation.
* Jong Choi made some work on the iptables isolation (see below)
Cheers.
-- Daniel
-------------------------------------------------------------------------------
Hi Rusty,
> > I'm currently looking at a container-based lightweight virtualization
> > technology in Linux which recently being actively discussed in LKML by
> > IBMers (Hubertus Franke, Dave Hansen, Serge Hallyn, and Cedric Le
> > Goater) and by developers like Herbert Poetzl, Eric Bierderman, and
> > Kirill Korotaev. One of the main issues is on the netfilter
> > virtualization (please refer to
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=114322107510852&w=2).
>
> Yes. Virtualization of iptables is fairly silly: you can crash the
> machine with careful insertion of bad rules. We do simple sanity
> checks, but they're not complete.
>
> Currently, that's OK, you have to be root to do these operations anyway.
> But it illustrates one problem with iptables virtualization which the
> OpenVZ people don't seem to have a grasp on 8(
As a first step, we started looking at a mechanism which can provide
a scalable way of implementing per container tables and rules.
It seems that the safety issue you pointed would be a next step
to discuss should there be needs for more than the root based scheme.
> Do you have a pointer to the alternative implementation?
We are currently working on a patch against a 2.6.16 kernel and will be
able to point
to the patch sooner or later within this week. Instead of pointing to an
incomplete
patch now, let me describe the proposed mechanism in further detail for now.
The OpenVZ patch can be considered intrusive for the following reasons:
1) It requires API changes of iptables modules such as filter, mangle, nat
and also including the existing off-the-tree or future iptable modules.
OpenVZ basically replaces the existing table data instance such as
packet_filter of iptable_filter so that each VPS will have its own table
data
instance in its "ve_struct" data structure. This might not be acceptable
because it causes changes in API / ABI.
2) it implements per VPS ipt_tables, ipt_target, and ipt_match linked
lists in its
"ve_struct" data structure. Rules and tables are purely local in the sense
that each VPS has its own set of chains, tables, matches and targets and
even the HW node cannot observe other VPS' iptable. It seems not
desirable not being able to provide an entire system-wide visibility and
management functionality to the HW node, because of manageability issue
and more importantly the safety issue you pointed out.
On the other hand, the Vserver approach of not providing special isolation
would cause scalability issue when it is needed to set up iptables rules for
hundreds of vservers.
The following diagram illustrates the data structure of the proposed
virtualization scheme:
**** Look at the attached file ****
This scheme can be considered as adding additional indirection layer at the
private field of struct ipt_table instead of having per container
nf_hooks array,
and has the following potential advantages:
1) The "private" pointer seems a very natural place to implement this
indirection and
the changes needed for this indirection is very small to the OpenVZ
approach.
2) The indirection is implemented in the "ip_tables" module, but not in the
individual modules for tables, match, and target themselves. Because the
changes
are rather confined in the ip_tables module, it won't change API / ABI
and hence
will keep the existing iptables module base.
3) The indirection at the private field of "ipt_tables" struct will
provide isolation for both
paths starting from nf_hooks[][] and xt at the same time. The lengths of
both of
these paths are O(1) and the legnth of each chain in a table is also
O(1) wrt the
number of containers.
Assuming the root based operation, wondering whether it is a workable
approach
to add a command line option to let root specify which container to work on.
> Thanks!
> Rusty.
Thanks for your comments and will send you the pointer to the patch soonest.
- JH
[-- Attachment #2: 2006-04 --]
[-- Type: image/gif, Size: 46409 bytes --]
next prev parent reply other threads:[~2006-11-15 9:57 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-25 15:51 Network virtualization/isolation Daniel Lezcano
2006-10-23 20:01 ` Stephen Hemminger
2006-10-26 9:44 ` Daniel Lezcano
2006-10-26 15:56 ` Stephen Hemminger
2006-10-26 22:16 ` Daniel Lezcano
2006-10-27 7:34 ` Dmitry Mishin
2006-10-27 9:10 ` Daniel Lezcano
2006-11-01 14:35 ` jamal
2006-11-01 16:13 ` Daniel Lezcano
2006-11-14 15:17 ` Daniel Lezcano
2006-11-14 18:12 ` James Morris
2006-11-15 9:56 ` Daniel Lezcano [this message]
2006-11-22 12:00 ` Daniel Lezcano
2006-11-25 9:09 ` Eric W. Biederman
2006-11-28 14:15 ` Daniel Lezcano
2006-11-28 16:51 ` Eric W. Biederman
2006-11-28 17:37 ` Herbert Poetzl
2006-11-28 20:26 ` Daniel Lezcano
2006-11-28 21:50 ` Eric W. Biederman
2006-11-29 5:54 ` Herbert Poetzl
2006-11-29 20:21 ` Brian Haley
2006-11-29 22:10 ` [Devel] " Daniel Lezcano
2006-11-30 16:15 ` Vlad Yasevich
2006-11-30 16:38 ` Daniel Lezcano
2006-11-30 17:24 ` Herbert Poetzl
2006-12-03 12:26 ` jamal
2006-12-03 14:13 ` jamal
2006-12-03 16:00 ` Eric W. Biederman
2006-12-04 15:19 ` Dmitry Mishin
2006-12-04 15:45 ` Eric W. Biederman
2006-12-04 16:43 ` Herbert Poetzl
2006-12-04 16:58 ` Eric W. Biederman
2006-12-04 17:02 ` Dmitry Mishin
2006-12-04 17:19 ` Herbert Poetzl
2006-12-04 17:41 ` Daniel Lezcano
2006-12-04 12:15 ` Eric W. Biederman
2006-12-04 13:44 ` jamal
2006-12-04 15:35 ` Eric W. Biederman
2006-12-04 16:00 ` Dmitry Mishin
2006-12-04 16:52 ` Eric W. Biederman
2006-12-06 11:54 ` [Devel] " Kirill Korotaev
2006-12-06 18:30 ` Herbert Poetzl
2006-12-08 19:57 ` Eric W. Biederman
2006-12-09 3:50 ` Herbert Poetzl
2006-12-09 6:13 ` Andrew Morton
2006-12-09 6:35 ` Herbert Poetzl
2006-12-09 21:18 ` Dmitry Mishin
2006-12-09 22:34 ` Kir Kolyshkin
2006-12-10 2:21 ` Herbert Poetzl
2006-12-09 8:07 ` Eric W. Biederman
2006-12-09 11:27 ` Tomasz Torcz
2006-12-09 19:04 ` Herbert Poetzl
2006-12-03 16:37 ` Herbert Poetzl
2006-12-03 16:58 ` jamal
2006-12-04 10:18 ` Daniel Lezcano
2006-12-04 13:22 ` jamal
2006-12-02 11:29 ` Kari Hurtta
2006-12-02 11:49 ` Kari Hurtta
2006-11-29 5:58 ` Herbert Poetzl
2006-11-25 8:21 ` Eric W. Biederman
2006-11-26 18:34 ` Herbert Poetzl
2006-11-26 19:41 ` Ben Greear
2006-11-26 20:52 ` Eric W. Biederman
2006-11-25 8:27 ` Eric W. Biederman
-- strict thread matches above, loose matches on Subject: below --
2006-11-25 16:35 Leonid Grossman
2006-11-25 19:26 ` Eric W. Biederman
2006-11-25 22:17 Leonid Grossman
2006-11-25 23:16 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=455AE467.2060308@fr.ibm.com \
--to=dlezcano@fr.ibm.com \
--cc=dim@openvz.org \
--cc=hadi@cyberus.ca \
--cc=jmorris@namei.org \
--cc=netdev@vger.kernel.org \
--cc=shemminger@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).