From: ebiederm@xmission.com (Eric W. Biederman)
To: "Caitlin Bestler" <caitlinb@broadcom.com>
Cc: "Kir Kolyshkin" <kir@openvz.org>,
devel@openvz.org, "Andrey Savochkin" <saw@sw.ru>,
alexey@sw.ru, "Linux Containers" <containers@lists.osdl.org>,
netdev@vger.kernel.org, sam@vilain.net
Subject: Re: [RFC] network namespaces
Date: Wed, 06 Sep 2006 17:25:50 -0600 [thread overview]
Message-ID: <m1irk0tw5d.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1963B59@NT-SJCA-0751.brcm.ad.broadcom.com> (Caitlin Bestler's message of "Wed, 6 Sep 2006 16:06:16 -0700")
"Caitlin Bestler" <caitlinb@broadcom.com> writes:
> ebiederm@xmission.com wrote:
>
>>
>>> Finally, as I understand both network isolation and network
>>> virtualization (both level2 and level3) can happily co-exist. We do
>>> have several filesystems in kernel. Let's have several network
>>> virtualization approaches, and let a user choose. Is that makes
>>> sense?
>>
>> If there are not compelling arguments for using both ways of
>> doing it is silly to merge both, as it is more maintenance overhead.
>>
>
> My reading is that full virtualization (Xen, etc.) calls for
> implementing
> L2 switching between the partitions and the physical NIC(s).
>
> The tradeoffs between L2 and L3 switching are indeed complex, but
> there are two implications of doing L2 switching between partitions:
>
> 1) Do we really want to ask device drivers to support L2 switching for
> partitions and something *different* for containers?
No.
> 2) Do we really want any single packet to traverse an L2 switch (for
> the partition-style virtualization layer) and then an L3 switch
> (for the container-style layer)?
In general what has been done with layer 3 is to simply filter which
processes can use which IP addresses and it all happens at socket
creation time. So it is very cheap, and it can be done purely
in the network layer without any driver intervention.
Basically think of what is happening at layer 3 as an extremely light-weight
version of traffic filtering.
> The full virtualization solution calls for virtual NICs with distinct
> MAC addresses. Is there any reason why this same solution cannot work
> for containers (just creating more than one VNIC for the partition,
> and then assigning each VNIC to a container?)
The VNIC approach is the fundamental idea with the layer two networking
and if we can push the work down into the device driver it so different
destination macs show up a in different packet queues it should be
as fast as a normal networking stack.
Implementing VNICs so far is the only piece of containers that has
come close to device drivers, and we can likely do it without device
driver support (but with more cost). Basically this optimization
is a subset of the Grand Unified Lookup idea.
I think we can do a mergeable implementation without noticeable cost without
when not using containers without having to resort to a grand unified lookup
but I may be wrong.
Eric
next prev parent reply other threads:[~2006-09-06 23:26 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-15 14:20 [RFC] network namespaces Andrey Savochkin
2006-08-15 14:48 ` [PATCH 1/9] network namespaces: core and device list Andrey Savochkin
2006-08-16 14:46 ` Dave Hansen
2006-08-16 16:45 ` Stephen Hemminger
2006-08-15 14:48 ` [PATCH 2/9] network namespaces: IPv4 routing Andrey Savochkin
2006-08-15 14:48 ` [PATCH 3/9] network namespaces: playing and debugging Andrey Savochkin
2006-08-16 16:46 ` Stephen Hemminger
2006-08-16 17:22 ` Eric W. Biederman
2006-08-17 6:28 ` Andrey Savochkin
2006-08-17 8:30 ` Kirill Korotaev
2006-08-15 14:48 ` [PATCH 4/9] network namespaces: socket hashes Andrey Savochkin
2006-09-18 15:12 ` Daniel Lezcano
2006-09-20 16:32 ` Andrey Savochkin
2006-09-21 12:34 ` Daniel Lezcano
2006-08-15 14:48 ` [PATCH 5/9] network namespaces: async socket operations Andrey Savochkin
2006-09-22 15:33 ` Daniel Lezcano
2006-09-23 13:16 ` Andrey Savochkin
2006-08-15 14:48 ` [PATCH 6/9] allow proc_dir_entries to have destructor Andrey Savochkin
2006-08-15 14:48 ` [PATCH 7/9] net_device seq_file Andrey Savochkin
2006-08-15 14:48 ` [PATCH 8/9] network namespaces: device to pass packets between namespaces Andrey Savochkin
2006-08-15 14:48 ` [PATCH 9/9] network namespaces: playing with pass-through device Andrey Savochkin
2006-08-16 11:53 ` [RFC] network namespaces Serge E. Hallyn
2006-08-16 15:12 ` Alexey Kuznetsov
2006-08-16 17:35 ` Eric W. Biederman
2006-08-17 8:29 ` Kirill Korotaev
2006-09-05 13:34 ` Daniel Lezcano
2006-09-05 14:45 ` Eric W. Biederman
2006-09-05 15:32 ` Daniel Lezcano
2006-09-05 16:53 ` Herbert Poetzl
2006-09-05 18:27 ` Eric W. Biederman
2006-09-06 14:52 ` Kirill Korotaev
2006-09-06 15:09 ` [Devel] " Kir Kolyshkin
2006-09-06 9:10 ` Daniel Lezcano
2006-09-06 16:56 ` Herbert Poetzl
2006-09-06 17:37 ` [Devel] " Kir Kolyshkin
2006-09-06 18:34 ` Eric W. Biederman
2006-09-06 18:58 ` Kir Kolyshkin
2006-09-06 20:53 ` Cedric Le Goater
2006-09-06 23:06 ` Caitlin Bestler
2006-09-06 23:25 ` Eric W. Biederman [this message]
2006-09-07 0:53 ` Stephen Hemminger
2006-09-07 5:11 ` Eric W. Biederman
2006-09-07 8:25 ` Daniel Lezcano
2006-09-07 18:29 ` Eric W. Biederman
2006-09-08 6:02 ` Herbert Poetzl
2006-09-07 16:23 ` [Devel] " Kirill Korotaev
2006-09-07 17:27 ` Herbert Poetzl
2006-09-07 19:50 ` Eric W. Biederman
2006-09-08 13:10 ` Dmitry Mishin
2006-09-08 18:11 ` Herbert Poetzl
2006-09-09 7:57 ` Dmitry Mishin
2006-09-10 2:47 ` Herbert Poetzl
2006-09-10 3:41 ` Eric W. Biederman
2006-09-10 8:11 ` Dmitry Mishin
2006-09-10 11:48 ` Eric W. Biederman
2006-09-10 19:19 ` [Devel] " Herbert Poetzl
2006-09-10 7:45 ` Dmitry Mishin
2006-09-10 19:22 ` Herbert Poetzl
2006-09-12 3:26 ` Eric W. Biederman
2006-09-11 14:40 ` [Devel] " Daniel Lezcano
2006-09-11 14:57 ` Herbert Poetzl
2006-09-11 15:04 ` Daniel Lezcano
2006-09-11 15:10 ` Dmitry Mishin
2006-09-12 3:28 ` Eric W. Biederman
2006-09-12 7:38 ` Dmitry Mishin
2006-09-06 21:44 ` [Devel] " Daniel Lezcano
2006-09-06 17:58 ` Eric W. Biederman
2006-09-05 15:47 ` Kirill Korotaev
2006-09-05 17:09 ` Eric W. Biederman
2006-09-06 20:25 ` Cedric Le Goater
2006-09-06 20:40 ` Eric W. Biederman
2006-10-04 9:40 ` Daniel Lezcano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=m1irk0tw5d.fsf@ebiederm.dsl.xmission.com \
--to=ebiederm@xmission.com \
--cc=alexey@sw.ru \
--cc=caitlinb@broadcom.com \
--cc=containers@lists.osdl.org \
--cc=devel@openvz.org \
--cc=kir@openvz.org \
--cc=netdev@vger.kernel.org \
--cc=sam@vilain.net \
--cc=saw@sw.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).