Network virtualization/isolation

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: jamal <hadi@cyberus.ca>
To: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: devel@openvz.org, "Eric W. Biederman" <ebiederm@xmission.com>,
	Linux Containers <containers@lists.osdl.org>,
	Stephen Hemminger <shemminger@osdl.org>,
	netdev@vger.kernel.org
Subject: Network virtualization/isolation
Date: Sun, 03 Dec 2006 09:13:18 -0500	[thread overview]
Message-ID: <1165155198.3517.86.camel@localhost> (raw)
In-Reply-To: <1165148762.3517.58.camel@localhost>

I have removed the Re: just to add some freshness to the discussion

So i read quickly the rest of the discussions. I was almost suprised to
find that i agree with Eric on a lot of opinions (we also agree that
vindaloo is good for you i guess);->
The two issues that stood out for me (in addition to what i already said
below):

1) the solution must ease the migration of containers; i didnt see
anything about migrating them to another host across a network, but i
assume that this is a given.
2) the socket level bind/accept filtering with multiple IPs. From
reading what Herbert has, it seems they have figured a clever way to
optimize this path albeit some challenges (speacial casing for raw
filters) etc.

I am wondering if one was to use the two level muxing of the socket
layer, how much more performance improvement the above scheme provides
for #2? 
Consider the case of L2 where by the time the packet hits the socket
layer on incoming, the VE is already known; in such a case, the lookup
would be very cheap. The advantage being you get rid of the speacial
casing altogether. I dont see any issues with binds per multiple IPs etc
using such a technique.

For the case of #1 above, wouldnt it be also easier if the tables for
netdevices, PIDs etc were per VE (using the 2 level mux)?


In any case, folks, i hope i am not treading on anyones toes; i know
each one of you has implemented and has users and i am trying to be as 
neutral as i can (but clearly biased;->).

cheers,
jamal



On Sun, 2006-03-12 at 07:26 -0500, jamal wrote:
> On Wed, 2006-14-11 at 16:17 +0100, Daniel Lezcano wrote:
> > The attached document describes the network isolation at the layer 2
> > and at the layer 3 ..
> 
> Daniel,
> 
> I apologize for taking this long to get back to you. The document (I
> hope) made it clear to me at least the difference between the two
> approaches. So thanks for taking the time to put it together.
> 
> So here are my thoughts ...
> I havent read the rest of the thread so i may be repeating some of the
> discussion; i have time today, I will try to catchup with the
> discussion.
> 
> * i think the L2 approach is the more complete of the two approaches:
> 
> It caters to more applications: eg i can have network elements such as
> virtual bridges and routers. It doesnt seem like i can do that with the
> L3 approach. I think this in itself is a powerful enough reason to
> disqualify the L3 approach.
> 
> Leading from the above, I dont have to make _a single line of code
> change_ to any of the network element management tools inside the
> container. i.e i can just run quagga and OSPF and BGP will work as is or
> the bridge daemon and STP will work as is or tc to control "real"
> devices or ip to control "real" ip addresses. Virtual routers and
> bridges are real world applications (if you want more info ask me or ask
> google, she knows).
> 
> **** This wasnt clear to me from the doc on the L3 side of things, so
> please correct me: 
> because of the pid virtualization in the L2 approach(openvz?) I can run
> all applications as is. They just dont know they are running on a
> virtual environment. To use an extreme example: if i picked apache as a
> binary compiled 10 years ago, it will run on the L2 approach but not on
> the L3 approach. Is this understanding correct?  I find it hard to
> believe that the L3 approach wouldnt work this way - it may be just my
> reading into the doc.
> 
> So lets say the approach taken is that of L2 (I am biased towards this
> because i want to be able to do virtual bridges and routers). The
> disadvantage of the L2 approach (or is it just the openvz?) approach is:
> 
> - it is clear theres a lot more code needed to allow for the two level
> multiplexing every where. i.e first you mux to select the namespace then
> you do other things like find a pid, netdevice, ip address etc. I am
> also not sure how complete that code is; you clearly get everything
> attached to netdevices for free (eg networkc scheduler block) - which is
> nice in itself; but you may have to do the muxing code for other blocks.
> If my understanding is correct everything in the net subsystem has this
> mux levels already in place (at least with openvz)? I think each
> subsystem may have its own merit discussed (eg the L3 tables with the
> recent changes from Patrick allow up to 2^32 -1 tables, so a muxing
> layer at L3 maybe unnecessary)
> ---> To me this 2 level muxing looks like a clean design in that there
> is consistency (i.e no hack thats specific to just one sub-subsystem but
> not others). With this approach one could imagine hardware support that
> does the first level of muxing (selecting a namespace for you). This is
> clearly already happening with NICs supporting several unicast MAC
> addresses. 
> I think the litmus test for this approach is the answer to the question:
> If i compiled in the containers in and do not use the namespaces, how
> much more overhead is there for the host path? I would hope that it is
> as close to 0 as possible. It should certainly be 0 if i dont compile in
> containers.
> 
> - The desire for many MAC addresses. I dont think this is a killer
> issue. NICs are begining to show up which capabilities for many unicast
> MACs; many current have multicast hardware tables that can be used for
> stashing unicast MAC addresses; it has also been shown you can use
> multicast MAC addresses and get away with it if there is no conflict
> (protocols such as VRRP, CARP etc do this).
> 
> - Manageability from the host side. It seems to be more complex with the
> L2 than with L3. But so what? These tools are written from scratch and
> there is no "backward compatibility" baggage.
> 
> Ok, I am out of coffee for the last 10 minutes;-> But above sit my views
> worth about $0.02 Canadian (which is about $0.02 US these days).
> 
> I will try later to catch up with the discussion that started from
> Daniels original posting.
> 
> cheers,
> jamal

next prev parent reply	other threads:[~2006-12-03 14:13 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-25 15:51 Network virtualization/isolation Daniel Lezcano
2006-10-23 20:01 ` Stephen Hemminger
2006-10-26  9:44   ` Daniel Lezcano
2006-10-26 15:56     ` Stephen Hemminger
2006-10-26 22:16       ` Daniel Lezcano
2006-10-27  7:34       ` Dmitry Mishin
2006-10-27  9:10         ` Daniel Lezcano
2006-11-01 14:35           ` jamal
2006-11-01 16:13             ` Daniel Lezcano
2006-11-14 15:17             ` Daniel Lezcano
2006-11-14 18:12               ` James Morris
2006-11-15  9:56                 ` Daniel Lezcano
2006-11-22 12:00               ` Daniel Lezcano
2006-11-25  9:09               ` Eric W. Biederman
2006-11-28 14:15                 ` Daniel Lezcano
2006-11-28 16:51                   ` Eric W. Biederman
2006-11-28 17:37                     ` Herbert Poetzl
2006-11-28 20:26                     ` Daniel Lezcano
2006-11-28 21:50                       ` Eric W. Biederman
2006-11-29  5:54                         ` Herbert Poetzl
2006-11-29 20:21                         ` Brian Haley
2006-11-29 22:10                           ` [Devel] " Daniel Lezcano
2006-11-30 16:15                             ` Vlad Yasevich
2006-11-30 16:38                               ` Daniel Lezcano
2006-11-30 17:24                                 ` Herbert Poetzl
2006-12-03 12:26                             ` jamal
2006-12-03 14:13                               ` jamal [this message]
2006-12-03 16:00                                 ` Eric W. Biederman
2006-12-04 15:19                                   ` Dmitry Mishin
2006-12-04 15:45                                     ` Eric W. Biederman
2006-12-04 16:43                                     ` Herbert Poetzl
2006-12-04 16:58                                       ` Eric W. Biederman
2006-12-04 17:02                                       ` Dmitry Mishin
2006-12-04 17:19                                         ` Herbert Poetzl
2006-12-04 17:41                                         ` Daniel Lezcano
2006-12-04 12:15                                 ` Eric W. Biederman
2006-12-04 13:44                                   ` jamal
2006-12-04 15:35                                     ` Eric W. Biederman
2006-12-04 16:00                                       ` Dmitry Mishin
2006-12-04 16:52                                         ` Eric W. Biederman
2006-12-06 11:54                                           ` [Devel] " Kirill Korotaev
2006-12-06 18:30                                             ` Herbert Poetzl
2006-12-08 19:57                                               ` Eric W. Biederman
2006-12-09  3:50                                                 ` Herbert Poetzl
2006-12-09  6:13                                                   ` Andrew Morton
2006-12-09  6:35                                                     ` Herbert Poetzl
2006-12-09 21:18                                                       ` Dmitry Mishin
2006-12-09 22:34                                                       ` Kir Kolyshkin
2006-12-10  2:21                                                         ` Herbert Poetzl
2006-12-09  8:07                                                   ` Eric W. Biederman
2006-12-09 11:27                                                   ` Tomasz Torcz
2006-12-09 19:04                                                     ` Herbert Poetzl
2006-12-03 16:37                               ` Herbert Poetzl
2006-12-03 16:58                                 ` jamal
2006-12-04 10:18                               ` Daniel Lezcano
2006-12-04 13:22                                 ` jamal
2006-12-02 11:29                         ` Kari Hurtta
2006-12-02 11:49                           ` Kari Hurtta
2006-11-29  5:58                       ` Herbert Poetzl
2006-11-25  8:21             ` Eric W. Biederman
2006-11-26 18:34               ` Herbert Poetzl
2006-11-26 19:41                 ` Ben Greear
2006-11-26 20:52                 ` Eric W. Biederman
2006-11-25  8:27       ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2006-11-25 16:35 Leonid Grossman
2006-11-25 19:26 ` Eric W. Biederman
2006-11-25 22:17 Leonid Grossman
2006-11-25 23:16 ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1165155198.3517.86.camel@localhost \
    --to=hadi@cyberus.ca \
    --cc=containers@lists.osdl.org \
    --cc=devel@openvz.org \
    --cc=dlezcano@fr.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).