Re: [patch 2/6] [Network namespace] Network device sharing by view

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>, Daniel Lezcano <dlezcano@fr.ibm.com>,
	Andrey Savochkin <saw@swsoft.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	serue@us.ibm.com, haveblue@us.ibm.com, clg@fr.ibm.com,
	Andrew Morton <akpm@osdl.org>,
	devel@openvz.org, sam@vilain.net, viro@ftp.linux.org.uk,
	Alexey Kuznetsov <alexey@sw.ru>
Subject: Re: [patch 2/6] [Network namespace] Network device sharing by view
Date: Wed, 28 Jun 2006 10:10:57 -0600	[thread overview]
Message-ID: <m1wtb1fege.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20060628141148.GA5572@MAIL.13thfloor.at> (Herbert Poetzl's message of "Wed, 28 Jun 2006 16:11:48 +0200")

Herbert Poetzl <herbert@13thfloor.at> writes:

>> Have a few more network interfaces for a layer 2 solution
>> is fundamental.  Believing without proof and after arguments
>> to the contrary that you have not contradicted that a layer 2
>> solution is inherently slower is non-productive.  
>
> assuming that it will not be slower, although it
> will now pass two network stacks and the bridging
> code is non-productive too, let's see how it goes
> but do not ignore the overhead just because it
> might simplify the implementation ...

Sure.  Mostly I have set it aside because the overhead
is not horrible and it is a very specific case that
can be heavily optimized if the core infrastructure is
solid.

>> Arguing that a layer 2 only solution most prove itself on 
>> guest to guest communication is also non-productive.
>> 
>> So just to sink one additional nail in the coffin of the silly
>> guest to guest communication issue.  For any two guests where
>> fast communication between them is really important I can run
>> an additional interface pair that requires no routing or bridging.
>> Given that the implementation of the tunnel device is essentially
>> the same as the loopback interface and that I make only one
>> trip through the network stack there will be no performance overhead.
>
> that is a good argument and I think I'm perfectly
> fine with this, given that the implementation 
> allows that (i.e. the network stack can handle
> two interfaces with the same IP assigned and will
> choose the local interface over the remote one
> when the traffic will be between guests)

Yep.  That exists today.  The network stack prefers routes
as specific as possible.

>> Similarly for any critical guest communication to the outside world
>> I can give the guest a real network adapter.
>
> with a single MAC assigned, that is, I presume?

Yes.

>
> guess that's what this discussion is about,
> finding out the various aspects how isolation
> and/or vitrtualization can be accomplished and
> what features we consider common/useful enough
> for mainline ... for me that is still in the
> brainstorming phase, although several 'working
> prototypes' already exist. IMHO the next step
> is to collect a set of representative use cases
> and test them with each implementation, regarding
> performance, usability and practicability

I am fairly strongly convinced a layer 2 solution will
do fine.  So for me it is a matter of proving that
and ensuring a good implementation.

> not necessarily, but I _know_ that the overhead
> added at layer 3 is unmeasureable, and it still
> needs to be proven that this is true for a layer
> 2 solution (which I'd actually prefer, because
> it solves the protocol _and_ setup issues)

That is a good perspective.  Layer 3 is free, is layer 2 also free?
Unless the cache miss penalty is a killer layer 2 should come very
close.  Of course VJ recently gave some evidence that packet processing
is dominated by cache misses.

>> >From what I have seen of layer 3 solutions it is a 
>> bloody maintenance nightmare, and an inflexible mess.
>
> that is your opinion, I really doubt that you
> will have less maintenance when you apply policy
> to the guests ...

Yes and mostly of the layer 3 things that I implemented.
At a moderately fundamental level I see layer 3 implementations
being a special case that is a tangent from the rest of the
networking code.  So I don't see a real synthesis with what
the rest of the networking stack is doing.  Plus all of the
limitations that come with a layer 3 implementation.

> example here (just to clarify):
>
>  - let's assume we have eth0 on the host and in
>    guest A and B, with the following setup:
>
>    eth0(H) 192.168.0.1/24
>    eth0(A) 10.0.1.1/16 10.0.1.2/16
>    eth0(B) 10.0.2.1/16
>
>  - now what keeps guest B from jsut assigning
>    10.0.2.2/16 to eth0? you need some kind of
>    mechanism to prevent that, and/or to block
>    the packets using inappropriate IPs
>
>  * in the first case, i.e. you prevent assigning
>    certain IPs inside a guest, you get a semantic
>    change in the behaviour compared to a normal
>    system, but there is no additional overhead
>    on the communication
>
>  * in the second case, you have to maintain the
>    policy mechanism and keep it in sync with the
>    guest configuration (somehow), and of course
>    you have to verify every communication
>
>  - OTOH, if you do not care about collisions
>    basically assuming the point "that's like
>    a hub on a network, if there are two guests
>    with the same ip, it will be trouble, but
>    that's okay" then this becomes a real issue
>    for providers with potentially 'evil' customers

So linux when serving as a router has strong filter capabilities.

So we can either use the strong network filtering linux already has
making work for the host administrator who has poorly behaved customers.
Or we can simply not give those poorly behaved guests CAP_NET_ADMIN,
and assign the IP address at guest startup before dropping the
capability.  At which point the guest cannot misbehave.

Eric

next prev parent reply	other threads:[~2006-06-28 16:12 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-09 21:02 [RFC] [patch 0/6] [Network namespace] introduction dlezcano
2006-06-09 21:02 ` [RFC] [patch 1/6] [Network namespace] Network namespace structure dlezcano
2006-06-09 21:02 ` [RFC] [patch 2/6] [Network namespace] Network device sharing by view dlezcano
2006-06-11 10:18   ` Andrew Morton
2006-06-18 18:53   ` Al Viro
2006-06-26  9:47   ` Andrey Savochkin
2006-06-26 13:02     ` Herbert Poetzl
2006-06-26 14:05       ` Eric W. Biederman
2006-06-26 14:08       ` Andrey Savochkin
2006-06-26 18:28         ` Herbert Poetzl
2006-06-26 18:59           ` Eric W. Biederman
2006-06-26 14:56     ` Daniel Lezcano
2006-06-26 15:21       ` Eric W. Biederman
2006-06-26 15:27       ` Andrey Savochkin
2006-06-26 15:49         ` Daniel Lezcano
2006-06-26 16:40           ` Eric W. Biederman
2006-06-26 18:36             ` Herbert Poetzl
2006-06-26 19:35               ` Eric W. Biederman
2006-06-26 20:02                 ` Herbert Poetzl
2006-06-26 20:37                   ` Eric W. Biederman
2006-06-26 21:26                     ` Herbert Poetzl
2006-06-26 21:59                       ` Ben Greear
2006-06-26 22:11                       ` Eric W. Biederman
2006-06-27  9:09                   ` Andrey Savochkin
2006-06-27 15:48                     ` Herbert Poetzl
2006-06-27 16:19                       ` Andrey Savochkin
2006-06-27 16:40                       ` Eric W. Biederman
2006-06-26 22:13                 ` Ben Greear
2006-06-26 22:54                   ` Herbert Poetzl
2006-06-26 23:08                     ` Ben Greear
2006-06-27 16:07                       ` Ben Greear
2006-06-27 22:48                         ` Herbert Poetzl
2006-06-27  9:11           ` Andrey Savochkin
2006-06-27  9:34             ` Daniel Lezcano
2006-06-27  9:38               ` Andrey Savochkin
2006-06-27 11:21                 ` Daniel Lezcano
2006-06-27 11:52                   ` Eric W. Biederman
2006-06-27 16:02                     ` Herbert Poetzl
2006-06-27 16:47                       ` Eric W. Biederman
2006-06-27 17:19                         ` Ben Greear
2006-06-27 22:52                           ` Herbert Poetzl
2006-06-27 23:12                             ` Dave Hansen
2006-06-27 23:42                               ` Alexey Kuznetsov
2006-06-28  3:38                                 ` Eric W. Biederman
2006-06-28 13:36                                   ` Herbert Poetzl
2006-06-28 13:53                                     ` jamal
2006-06-28 14:19                                       ` Andrey Savochkin
2006-06-28 16:17                                         ` jamal
2006-06-28 16:58                                           ` Andrey Savochkin
2006-06-28 17:17                                           ` Eric W. Biederman
2006-06-28 17:04                                         ` Herbert Poetzl
2006-06-28 14:39                                       ` Eric W. Biederman
2006-06-30  1:41                                         ` Sam Vilain
2006-06-29 21:07                                       ` Sam Vilain
2006-06-29 22:14                                         ` strict isolation of net interfaces Cedric Le Goater
2006-06-30  2:39                                           ` Serge E. Hallyn
2006-06-30  2:49                                             ` Sam Vilain
2006-07-03 14:53                                               ` Andrey Savochkin
2006-07-04  3:00                                                 ` Sam Vilain
2006-07-04 12:29                                                 ` Daniel Lezcano
2006-07-04 13:13                                                   ` Sam Vilain
2006-07-04 13:19                                                     ` Daniel Lezcano
2006-06-30  8:56                                             ` Cedric Le Goater
2006-07-03 13:36                                               ` Herbert Poetzl
2006-06-30 12:23                                             ` Daniel Lezcano
2006-06-30 14:20                                               ` Eric W. Biederman
2006-06-30 15:22                                                 ` Daniel Lezcano
2006-06-30 17:58                                                   ` Eric W. Biederman
2006-06-30 16:14                                                 ` Serge E. Hallyn
2006-06-30 17:41                                                   ` Eric W. Biederman
2006-06-30 18:09                                               ` Eric W. Biederman
2006-06-30  0:15                                         ` [patch 2/6] [Network namespace] Network device sharing by view jamal
2006-06-30  3:35                                           ` Herbert Poetzl
2006-06-30  7:45                                           ` Andrey Savochkin
2006-06-30 13:50                                             ` jamal
2006-06-30 15:01                                               ` Andrey Savochkin
2006-06-30 18:22                                               ` Eric W. Biederman
2006-06-30 21:51                                                 ` jamal
2006-07-01  0:50                                                   ` Eric W. Biederman
2006-06-28 14:21                                     ` Eric W. Biederman
2006-06-28 14:51                               ` Eric W. Biederman
2006-06-27 16:49                       ` Alexey Kuznetsov
2006-06-27 11:55                   ` Andrey Savochkin
2006-06-27  9:54               ` Kirill Korotaev
2006-06-27 16:09                 ` Herbert Poetzl
2006-06-27 16:29                   ` Eric W. Biederman
2006-06-27 23:07                     ` Herbert Poetzl
2006-06-28  4:07                       ` Eric W. Biederman
2006-06-28  6:31                         ` Sam Vilain
2006-06-28 14:15                           ` Herbert Poetzl
2006-06-28 15:36                             ` Eric W. Biederman
2006-06-28 17:18                               ` Herbert Poetzl
2006-06-28 10:14                         ` Cedric Le Goater
2006-06-28 14:11                         ` Herbert Poetzl
2006-06-28 16:10                           ` Eric W. Biederman [this message]
2006-07-06  9:45               ` Routing tables (Re: [patch 2/6] [Network namespace] Network device sharing by view) Kari Hurtta
2006-06-09 21:02 ` [RFC] [patch 3/6] [Network namespace] Network devices isolation dlezcano
2006-06-18 18:57   ` Al Viro
2006-06-09 21:02 ` [RFC] [patch 4/6] [Network namespace] Network inet " dlezcano
2006-06-09 21:02 ` [RFC] [patch 5/6] [Network namespace] ipv4 isolation dlezcano
2006-06-10  0:23   ` James Morris
2006-06-10  0:27     ` Rick Jones
2006-06-10  0:47       ` James Morris
2006-06-09 21:02 ` [RFC] [patch 6/6] [Network namespace] Network namespace debugfs dlezcano
2006-06-10  7:16 ` [RFC] [patch 0/6] [Network namespace] introduction Kari Hurtta
2006-06-16  4:23 ` Eric W. Biederman
2006-06-16  9:06   ` Daniel Lezcano
2006-06-16  9:22     ` Eric W. Biederman
2006-06-18 18:47 ` Al Viro
2006-06-20 21:21   ` Daniel Lezcano
2006-06-20 21:25     ` Al Viro
2006-06-20 22:45       ` Daniel Lezcano
2006-06-26 23:38 ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1wtb1fege.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@osdl.org \
    --cc=alexey@sw.ru \
    --cc=clg@fr.ibm.com \
    --cc=dev@sw.ru \
    --cc=devel@openvz.org \
    --cc=dlezcano@fr.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=herbert@13thfloor.at \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sam@vilain.net \
    --cc=saw@swsoft.com \
    --cc=serue@us.ibm.com \
    --cc=viro@ftp.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).