netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Lezcano <dlezcano@fr.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: hadi@cyberus.ca, Dmitry Mishin <dim@openvz.org>,
	Stephen Hemminger <shemminger@osdl.org>,
	netdev@vger.kernel.org,
	Linux Containers <containers@lists.osdl.org>
Subject: Re: Network virtualization/isolation
Date: Tue, 28 Nov 2006 15:15:26 +0100	[thread overview]
Message-ID: <456C447E.5090703@fr.ibm.com> (raw)
In-Reply-To: <m1y7pzoptr.fsf@ebiederm.dsl.xmission.com>

Eric W. Biederman wrote:

[ snip ]

>>
>> The packets arrive to the real device and go through the routes
>> engine. From this point, the used route is enough to know to which
>> container the traffic can go and the sockets subset assigned to the
>> container.
> 
> Note this has potentially the highest overhead of them all because
> this is the only approach in which it is mandatory to inspect the
> network packets to see which container they are in.

If the container is in the route information, when you use the route, 
you have the container destination with it. I don't see the overhead here.

> 
> My real problem with this approach besides seriously complicating
> the administration by not delegating it is that you loose enormous
> amounts of power.

I don't understand why you say administration is more complicated.
unshare -> ifconfig

1 container = 1 IP

[ snip ]

> So you have two columns that you rate these things that I disagree
> with, and you left out what the implications are for code maintenance.
> 
> 1) Network setup.
>    Past a certainly point both bind filtering and Daniel's L3 use a new
>    paradigm for managing the network code and become nearly impossible for
>    system administrators to understand.  The classic one is routing packets
>    between machines over the loopback interface by accident. Huh?

What is this new paradigm you are talking about ?

> 
> The L2. Network setup iss simply the cost of setting up a multiple
> machine network.  This is more complicated but it is well understood
> and well documented today.  Plus for the common cases it is easy to
> get a tool to automate this for you.  When you get a complicated
> network this wins hands down because the existing tools work and
> you don't have to retrain your sysadmins to understand what is
> happening.

unshare -> (guest) add mac address
            (host) add mac address
	   (guest) set ip address
            (host) set ip address
            (host) setup bridge

1 container = 2 net devices (root + guest), 2 IPs, 2 mac addresses, 1 
bridge.
100 containers = 200 net devices, 200 IPs, 200 mac addresses, 1 bridge.

> 
> 2) Runtime Overhead.
> 
> Your analysis is confused. Bind/Accept filter is much cheaper than
> doing a per packet evaluation in the route cache of which container
> it belongs to.  Among other things Bind/Accept filtering allows all
> of the global variables in the network stack to remain global and
> only touches a slow path.  So it is both very simple and very cheap.
> 
> Next in line comes L2 using real network devices, and Daniel's
> L3 thing.  Because there are multiple instances of the networking data
> structures we have an extra pointer indirection.

There is not extra networking data structure instantiation in the 
Daniel's L3.
> 
> Finally we get L2 with an extra network stack traversal, because
> we either need the full power of netfilter and traffic shaping
> gating access to what a node is doing or we simply don't have
> enough real network interfaces.  I assert that we can optimize
> the lack of network interfaces away by optimizing the drivers
> once this becomes an interesting case.
> 
> 3) Long Term Code Maintenance Overhead.
> 
> - A pure L2 implementation.  There is a big one time cost of
>   changing all of the variable accesses.  Once that transition
>   is complete things just work.  All code is shared so there
>   is no real overhead.
> 
> - Bind/Connect/Accept filtering.  There are so few places in
>   the code this is easy to maintain without sharing code with
>   everyone else.

For isolation too ? Can we build network migration on top of that ?

> 
> - Daniel's L3.  A big mass of special purpose code with peculiar
>   semantics that no one else in the network stack cares about
>   but is right in the middle of the code.

Thanks Eric for all your comments.

   -- Daniel

  reply	other threads:[~2006-11-28 14:15 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-25 15:51 Network virtualization/isolation Daniel Lezcano
2006-10-23 20:01 ` Stephen Hemminger
2006-10-26  9:44   ` Daniel Lezcano
2006-10-26 15:56     ` Stephen Hemminger
2006-10-26 22:16       ` Daniel Lezcano
2006-10-27  7:34       ` Dmitry Mishin
2006-10-27  9:10         ` Daniel Lezcano
2006-11-01 14:35           ` jamal
2006-11-01 16:13             ` Daniel Lezcano
2006-11-14 15:17             ` Daniel Lezcano
2006-11-14 18:12               ` James Morris
2006-11-15  9:56                 ` Daniel Lezcano
2006-11-22 12:00               ` Daniel Lezcano
2006-11-25  9:09               ` Eric W. Biederman
2006-11-28 14:15                 ` Daniel Lezcano [this message]
2006-11-28 16:51                   ` Eric W. Biederman
2006-11-28 17:37                     ` Herbert Poetzl
2006-11-28 20:26                     ` Daniel Lezcano
2006-11-28 21:50                       ` Eric W. Biederman
2006-11-29  5:54                         ` Herbert Poetzl
2006-11-29 20:21                         ` Brian Haley
2006-11-29 22:10                           ` [Devel] " Daniel Lezcano
2006-11-30 16:15                             ` Vlad Yasevich
2006-11-30 16:38                               ` Daniel Lezcano
2006-11-30 17:24                                 ` Herbert Poetzl
2006-12-03 12:26                             ` jamal
2006-12-03 14:13                               ` jamal
2006-12-03 16:00                                 ` Eric W. Biederman
2006-12-04 15:19                                   ` Dmitry Mishin
2006-12-04 15:45                                     ` Eric W. Biederman
2006-12-04 16:43                                     ` Herbert Poetzl
2006-12-04 16:58                                       ` Eric W. Biederman
2006-12-04 17:02                                       ` Dmitry Mishin
2006-12-04 17:19                                         ` Herbert Poetzl
2006-12-04 17:41                                         ` Daniel Lezcano
2006-12-04 12:15                                 ` Eric W. Biederman
2006-12-04 13:44                                   ` jamal
2006-12-04 15:35                                     ` Eric W. Biederman
2006-12-04 16:00                                       ` Dmitry Mishin
2006-12-04 16:52                                         ` Eric W. Biederman
2006-12-06 11:54                                           ` [Devel] " Kirill Korotaev
2006-12-06 18:30                                             ` Herbert Poetzl
2006-12-08 19:57                                               ` Eric W. Biederman
2006-12-09  3:50                                                 ` Herbert Poetzl
2006-12-09  6:13                                                   ` Andrew Morton
2006-12-09  6:35                                                     ` Herbert Poetzl
2006-12-09 21:18                                                       ` Dmitry Mishin
2006-12-09 22:34                                                       ` Kir Kolyshkin
2006-12-10  2:21                                                         ` Herbert Poetzl
2006-12-09  8:07                                                   ` Eric W. Biederman
2006-12-09 11:27                                                   ` Tomasz Torcz
2006-12-09 19:04                                                     ` Herbert Poetzl
2006-12-03 16:37                               ` Herbert Poetzl
2006-12-03 16:58                                 ` jamal
2006-12-04 10:18                               ` Daniel Lezcano
2006-12-04 13:22                                 ` jamal
2006-12-02 11:29                         ` Kari Hurtta
2006-12-02 11:49                           ` Kari Hurtta
2006-11-29  5:58                       ` Herbert Poetzl
2006-11-25  8:21             ` Eric W. Biederman
2006-11-26 18:34               ` Herbert Poetzl
2006-11-26 19:41                 ` Ben Greear
2006-11-26 20:52                 ` Eric W. Biederman
2006-11-25  8:27       ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2006-11-25 16:35 Leonid Grossman
2006-11-25 19:26 ` Eric W. Biederman
2006-11-25 22:17 Leonid Grossman
2006-11-25 23:16 ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=456C447E.5090703@fr.ibm.com \
    --to=dlezcano@fr.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=dim@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=hadi@cyberus.ca \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).