From: Daniel Lezcano <dlezcano@fr.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: hadi@cyberus.ca, Dmitry Mishin <dim@openvz.org>,
Stephen Hemminger <shemminger@osdl.org>,
netdev@vger.kernel.org,
Linux Containers <containers@lists.osdl.org>
Subject: Re: Network virtualization/isolation
Date: Tue, 28 Nov 2006 15:15:26 +0100 [thread overview]
Message-ID: <456C447E.5090703@fr.ibm.com> (raw)
In-Reply-To: <m1y7pzoptr.fsf@ebiederm.dsl.xmission.com>
Eric W. Biederman wrote:
[ snip ]
>>
>> The packets arrive to the real device and go through the routes
>> engine. From this point, the used route is enough to know to which
>> container the traffic can go and the sockets subset assigned to the
>> container.
>
> Note this has potentially the highest overhead of them all because
> this is the only approach in which it is mandatory to inspect the
> network packets to see which container they are in.
If the container is in the route information, when you use the route,
you have the container destination with it. I don't see the overhead here.
>
> My real problem with this approach besides seriously complicating
> the administration by not delegating it is that you loose enormous
> amounts of power.
I don't understand why you say administration is more complicated.
unshare -> ifconfig
1 container = 1 IP
[ snip ]
> So you have two columns that you rate these things that I disagree
> with, and you left out what the implications are for code maintenance.
>
> 1) Network setup.
> Past a certainly point both bind filtering and Daniel's L3 use a new
> paradigm for managing the network code and become nearly impossible for
> system administrators to understand. The classic one is routing packets
> between machines over the loopback interface by accident. Huh?
What is this new paradigm you are talking about ?
>
> The L2. Network setup iss simply the cost of setting up a multiple
> machine network. This is more complicated but it is well understood
> and well documented today. Plus for the common cases it is easy to
> get a tool to automate this for you. When you get a complicated
> network this wins hands down because the existing tools work and
> you don't have to retrain your sysadmins to understand what is
> happening.
unshare -> (guest) add mac address
(host) add mac address
(guest) set ip address
(host) set ip address
(host) setup bridge
1 container = 2 net devices (root + guest), 2 IPs, 2 mac addresses, 1
bridge.
100 containers = 200 net devices, 200 IPs, 200 mac addresses, 1 bridge.
>
> 2) Runtime Overhead.
>
> Your analysis is confused. Bind/Accept filter is much cheaper than
> doing a per packet evaluation in the route cache of which container
> it belongs to. Among other things Bind/Accept filtering allows all
> of the global variables in the network stack to remain global and
> only touches a slow path. So it is both very simple and very cheap.
>
> Next in line comes L2 using real network devices, and Daniel's
> L3 thing. Because there are multiple instances of the networking data
> structures we have an extra pointer indirection.
There is not extra networking data structure instantiation in the
Daniel's L3.
>
> Finally we get L2 with an extra network stack traversal, because
> we either need the full power of netfilter and traffic shaping
> gating access to what a node is doing or we simply don't have
> enough real network interfaces. I assert that we can optimize
> the lack of network interfaces away by optimizing the drivers
> once this becomes an interesting case.
>
> 3) Long Term Code Maintenance Overhead.
>
> - A pure L2 implementation. There is a big one time cost of
> changing all of the variable accesses. Once that transition
> is complete things just work. All code is shared so there
> is no real overhead.
>
> - Bind/Connect/Accept filtering. There are so few places in
> the code this is easy to maintain without sharing code with
> everyone else.
For isolation too ? Can we build network migration on top of that ?
>
> - Daniel's L3. A big mass of special purpose code with peculiar
> semantics that no one else in the network stack cares about
> but is right in the middle of the code.
Thanks Eric for all your comments.
-- Daniel
next prev parent reply other threads:[~2006-11-28 14:15 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-25 15:51 Network virtualization/isolation Daniel Lezcano
2006-10-23 20:01 ` Stephen Hemminger
2006-10-26 9:44 ` Daniel Lezcano
2006-10-26 15:56 ` Stephen Hemminger
2006-10-26 22:16 ` Daniel Lezcano
2006-10-27 7:34 ` Dmitry Mishin
2006-10-27 9:10 ` Daniel Lezcano
2006-11-01 14:35 ` jamal
2006-11-01 16:13 ` Daniel Lezcano
2006-11-14 15:17 ` Daniel Lezcano
2006-11-14 18:12 ` James Morris
2006-11-15 9:56 ` Daniel Lezcano
2006-11-22 12:00 ` Daniel Lezcano
2006-11-25 9:09 ` Eric W. Biederman
2006-11-28 14:15 ` Daniel Lezcano [this message]
2006-11-28 16:51 ` Eric W. Biederman
2006-11-28 17:37 ` Herbert Poetzl
2006-11-28 20:26 ` Daniel Lezcano
2006-11-28 21:50 ` Eric W. Biederman
2006-11-29 5:54 ` Herbert Poetzl
2006-11-29 20:21 ` Brian Haley
2006-11-29 22:10 ` [Devel] " Daniel Lezcano
2006-11-30 16:15 ` Vlad Yasevich
2006-11-30 16:38 ` Daniel Lezcano
2006-11-30 17:24 ` Herbert Poetzl
2006-12-03 12:26 ` jamal
2006-12-03 14:13 ` jamal
2006-12-03 16:00 ` Eric W. Biederman
2006-12-04 15:19 ` Dmitry Mishin
2006-12-04 15:45 ` Eric W. Biederman
2006-12-04 16:43 ` Herbert Poetzl
2006-12-04 16:58 ` Eric W. Biederman
2006-12-04 17:02 ` Dmitry Mishin
2006-12-04 17:19 ` Herbert Poetzl
2006-12-04 17:41 ` Daniel Lezcano
2006-12-04 12:15 ` Eric W. Biederman
2006-12-04 13:44 ` jamal
2006-12-04 15:35 ` Eric W. Biederman
2006-12-04 16:00 ` Dmitry Mishin
2006-12-04 16:52 ` Eric W. Biederman
2006-12-06 11:54 ` [Devel] " Kirill Korotaev
2006-12-06 18:30 ` Herbert Poetzl
2006-12-08 19:57 ` Eric W. Biederman
2006-12-09 3:50 ` Herbert Poetzl
2006-12-09 6:13 ` Andrew Morton
2006-12-09 6:35 ` Herbert Poetzl
2006-12-09 21:18 ` Dmitry Mishin
2006-12-09 22:34 ` Kir Kolyshkin
2006-12-10 2:21 ` Herbert Poetzl
2006-12-09 8:07 ` Eric W. Biederman
2006-12-09 11:27 ` Tomasz Torcz
2006-12-09 19:04 ` Herbert Poetzl
2006-12-03 16:37 ` Herbert Poetzl
2006-12-03 16:58 ` jamal
2006-12-04 10:18 ` Daniel Lezcano
2006-12-04 13:22 ` jamal
2006-12-02 11:29 ` Kari Hurtta
2006-12-02 11:49 ` Kari Hurtta
2006-11-29 5:58 ` Herbert Poetzl
2006-11-25 8:21 ` Eric W. Biederman
2006-11-26 18:34 ` Herbert Poetzl
2006-11-26 19:41 ` Ben Greear
2006-11-26 20:52 ` Eric W. Biederman
2006-11-25 8:27 ` Eric W. Biederman
-- strict thread matches above, loose matches on Subject: below --
2006-11-25 16:35 Leonid Grossman
2006-11-25 19:26 ` Eric W. Biederman
2006-11-25 22:17 Leonid Grossman
2006-11-25 23:16 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=456C447E.5090703@fr.ibm.com \
--to=dlezcano@fr.ibm.com \
--cc=containers@lists.osdl.org \
--cc=dim@openvz.org \
--cc=ebiederm@xmission.com \
--cc=hadi@cyberus.ca \
--cc=netdev@vger.kernel.org \
--cc=shemminger@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).