netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andrey Savochkin <saw@sw.ru>
Cc: dlezcano@fr.ibm.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, serue@us.ibm.com, haveblue@us.ibm.com,
	clg@fr.ibm.com, Andrew Morton <akpm@osdl.org>,
	dev@sw.ru, devel@openvz.org, sam@vilain.net,
	viro@ftp.linux.org.uk
Subject: Re: [patch 2/6] [Network namespace] Network device sharing by view
Date: Mon, 26 Jun 2006 08:05:24 -0600	[thread overview]
Message-ID: <m1ac80vwpn.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <20060626130203.GA504@MAIL.13thfloor.at> (Herbert Poetzl's message of "Mon, 26 Jun 2006 15:02:03 +0200")

Herbert Poetzl <herbert@13thfloor.at> writes:

> On Mon, Jun 26, 2006 at 01:47:11PM +0400, Andrey Savochkin wrote:
>> Hi Daniel,
>> 
>> It's good that you kicked off network namespace discussion Although I.
>> wish you'd Cc'ed someone at OpenVZ so I could notice it earlier :)   .
>
>> Indeed, the first point to agree in this discussion is device list. 
>> In your patch, you essentially introduce a data structure parallel
>> to the main device list, creating a "view" of this list. 
>
>> I see a fundamental problem with this approach. When a device presents
>> an skb to the protocol layer, it needs to know to which namespace this
>> skb belongs.
>
>> Otherwise you would never get rid of problems with bind: what to do if
>> device eth1 is visible in namespace1, namespace2, and root namespace,
>> and each namespace has a socket bound to 0.0.0.0:80?
>
> this is something which isn't a fundamental problem at
> all, and IMHO there are at least three options here
> (probably more)

I agree that there are other implementations that can be used for
containers.  However when you think namespaces this is what you need.

For several reasons.
1) So you can use AF_PACKET safely.
   This allows a network namespace to use DHCP and all of the other
   usual network autoconfiguration tools.  0.0.0.0:80 is just
   a special subset of that.

2) It means the existing network stack can be used without
   logic changes.  All that is needed is a lookup of the appropriate
   context.  This is very straight forward to audit.

3) Since all of the network stack is trivially available all of
   the advanced network stack features like iptables are easily
   available.

4) There is no retraining or other rules for user to learn.
   Because people understand what is going on it is more likely
   a setup will be secure.  Most of the other implementations
   don't quite act like a normal network setup and the special
   rules can be hard to learn.

>  - check at 'bind' time if the binding would overlap
>    and give the 'proper' error (as it happens right
>    now on the host)
>    (this is how Linux-VServer currently handles the
>    network isolation, and yes, it works quite fine :)

It works yes but it limits you to a subset of the network
stack.   And has serious problems with concepts like INADDR_ANY.
PF_PACKET is not an option.

>  - allow arbitrary binds and 'tag' the packets according
>    to some 'host' policy (e.g. iptables or tc)
>    (this is how the Linux-VServer ngnet was designed)

A little more general but very weird.

>  - deliver packets to _all_ bound sockets/destinations
>    (this is probably a more unusable but quite thinkable
>    solution)
>
>> We have to conclude that each device should be visible only in one
>> namespace. 
>
> I disagree here, especially some supervisor context or
> the host context should be able to 'see' and probably
> manipulate _all_ of the devices

This part really is necessary.  This does not preclude managing
a network namespace from outside of the namespace.

>> In this case, instead of introducing net_ns_dev and net_ns_dev_list
>> structures, we can simply have a separate dev_base list head in each
>> namespace. Moreover, separate device list in each namespace will be in
>> line with making namespace isolation complete. 
>
>> Complete isolation will allow each namespace to set up own tun/tap
>> devices, have own routes, netfilter tables, and so on.
>
> tun/tap devices are quite possible with this approach
> too, I see no problem here ...
>
> for iptables and routes, I'm worried about the required
> 'policy' to make them secure, i.e. how do you ensure
> that the packets 'leaving' guest X do not contain
> 'evil' packets and/or disrupt your host system?

In the traditional ways.  When you control the router and/or the switch
someone is directly connected to.

We don't need to reinvent the wheel if we do this properly.

>> This patchset introduces namespaces for device list and IPv4
>> FIB/routing. Two technical issues are omitted to make the patch idea
>> clearer: device moving between namespaces, and selective routing cache
>> flush + garbage collection.
>>
>> If this patchset is agreeable, the next patchset will finalize
>> integration with nsproxy, add namespaces to socket lookup code and
>> neighbour cache, and introduce a simple device to pass traffic between
>> namespaces.
>
> passing traffic 'between' namespaces should happen via
> lo, no? what kind of 'device' is required there, and
> what overhead does it add to the networking?

Definitely not.  lo is a local loopback interface.

What is needed is a two headed device that is the cousin of lo.
But with one network interface in each network namespace.

Note even connecting network namespaces is optional.

Eric

  reply	other threads:[~2006-06-26 14:06 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-09 21:02 [RFC] [patch 0/6] [Network namespace] introduction dlezcano
2006-06-09 21:02 ` [RFC] [patch 1/6] [Network namespace] Network namespace structure dlezcano
2006-06-09 21:02 ` [RFC] [patch 2/6] [Network namespace] Network device sharing by view dlezcano
2006-06-11 10:18   ` Andrew Morton
2006-06-18 18:53   ` Al Viro
2006-06-26  9:47   ` Andrey Savochkin
2006-06-26 13:02     ` Herbert Poetzl
2006-06-26 14:05       ` Eric W. Biederman [this message]
2006-06-26 14:08       ` Andrey Savochkin
2006-06-26 18:28         ` Herbert Poetzl
2006-06-26 18:59           ` Eric W. Biederman
2006-06-26 14:56     ` Daniel Lezcano
2006-06-26 15:21       ` Eric W. Biederman
2006-06-26 15:27       ` Andrey Savochkin
2006-06-26 15:49         ` Daniel Lezcano
2006-06-26 16:40           ` Eric W. Biederman
2006-06-26 18:36             ` Herbert Poetzl
2006-06-26 19:35               ` Eric W. Biederman
2006-06-26 20:02                 ` Herbert Poetzl
2006-06-26 20:37                   ` Eric W. Biederman
2006-06-26 21:26                     ` Herbert Poetzl
2006-06-26 21:59                       ` Ben Greear
2006-06-26 22:11                       ` Eric W. Biederman
2006-06-27  9:09                   ` Andrey Savochkin
2006-06-27 15:48                     ` Herbert Poetzl
2006-06-27 16:19                       ` Andrey Savochkin
2006-06-27 16:40                       ` Eric W. Biederman
2006-06-26 22:13                 ` Ben Greear
2006-06-26 22:54                   ` Herbert Poetzl
2006-06-26 23:08                     ` Ben Greear
2006-06-27 16:07                       ` Ben Greear
2006-06-27 22:48                         ` Herbert Poetzl
2006-06-27  9:11           ` Andrey Savochkin
2006-06-27  9:34             ` Daniel Lezcano
2006-06-27  9:38               ` Andrey Savochkin
2006-06-27 11:21                 ` Daniel Lezcano
2006-06-27 11:52                   ` Eric W. Biederman
2006-06-27 16:02                     ` Herbert Poetzl
2006-06-27 16:47                       ` Eric W. Biederman
2006-06-27 17:19                         ` Ben Greear
2006-06-27 22:52                           ` Herbert Poetzl
2006-06-27 23:12                             ` Dave Hansen
2006-06-27 23:42                               ` Alexey Kuznetsov
2006-06-28  3:38                                 ` Eric W. Biederman
2006-06-28 13:36                                   ` Herbert Poetzl
2006-06-28 13:53                                     ` jamal
2006-06-28 14:19                                       ` Andrey Savochkin
2006-06-28 16:17                                         ` jamal
2006-06-28 16:58                                           ` Andrey Savochkin
2006-06-28 17:17                                           ` Eric W. Biederman
2006-06-28 17:04                                         ` Herbert Poetzl
2006-06-28 14:39                                       ` Eric W. Biederman
2006-06-30  1:41                                         ` Sam Vilain
2006-06-29 21:07                                       ` Sam Vilain
2006-06-29 22:14                                         ` strict isolation of net interfaces Cedric Le Goater
2006-06-30  2:39                                           ` Serge E. Hallyn
2006-06-30  2:49                                             ` Sam Vilain
2006-07-03 14:53                                               ` Andrey Savochkin
2006-07-04  3:00                                                 ` Sam Vilain
2006-07-04 12:29                                                 ` Daniel Lezcano
2006-07-04 13:13                                                   ` Sam Vilain
2006-07-04 13:19                                                     ` Daniel Lezcano
2006-06-30  8:56                                             ` Cedric Le Goater
2006-07-03 13:36                                               ` Herbert Poetzl
2006-06-30 12:23                                             ` Daniel Lezcano
2006-06-30 14:20                                               ` Eric W. Biederman
2006-06-30 15:22                                                 ` Daniel Lezcano
2006-06-30 17:58                                                   ` Eric W. Biederman
2006-06-30 16:14                                                 ` Serge E. Hallyn
2006-06-30 17:41                                                   ` Eric W. Biederman
2006-06-30 18:09                                               ` Eric W. Biederman
2006-06-30  0:15                                         ` [patch 2/6] [Network namespace] Network device sharing by view jamal
2006-06-30  3:35                                           ` Herbert Poetzl
2006-06-30  7:45                                           ` Andrey Savochkin
2006-06-30 13:50                                             ` jamal
2006-06-30 15:01                                               ` Andrey Savochkin
2006-06-30 18:22                                               ` Eric W. Biederman
2006-06-30 21:51                                                 ` jamal
2006-07-01  0:50                                                   ` Eric W. Biederman
2006-06-28 14:21                                     ` Eric W. Biederman
2006-06-28 14:51                               ` Eric W. Biederman
2006-06-27 16:49                       ` Alexey Kuznetsov
2006-06-27 11:55                   ` Andrey Savochkin
2006-06-27  9:54               ` Kirill Korotaev
2006-06-27 16:09                 ` Herbert Poetzl
2006-06-27 16:29                   ` Eric W. Biederman
2006-06-27 23:07                     ` Herbert Poetzl
2006-06-28  4:07                       ` Eric W. Biederman
2006-06-28  6:31                         ` Sam Vilain
2006-06-28 14:15                           ` Herbert Poetzl
2006-06-28 15:36                             ` Eric W. Biederman
2006-06-28 17:18                               ` Herbert Poetzl
2006-06-28 10:14                         ` Cedric Le Goater
2006-06-28 14:11                         ` Herbert Poetzl
2006-06-28 16:10                           ` Eric W. Biederman
2006-07-06  9:45               ` Routing tables (Re: [patch 2/6] [Network namespace] Network device sharing by view) Kari Hurtta
2006-06-09 21:02 ` [RFC] [patch 3/6] [Network namespace] Network devices isolation dlezcano
2006-06-18 18:57   ` Al Viro
2006-06-09 21:02 ` [RFC] [patch 4/6] [Network namespace] Network inet " dlezcano
2006-06-09 21:02 ` [RFC] [patch 5/6] [Network namespace] ipv4 isolation dlezcano
2006-06-10  0:23   ` James Morris
2006-06-10  0:27     ` Rick Jones
2006-06-10  0:47       ` James Morris
2006-06-09 21:02 ` [RFC] [patch 6/6] [Network namespace] Network namespace debugfs dlezcano
2006-06-10  7:16 ` [RFC] [patch 0/6] [Network namespace] introduction Kari Hurtta
2006-06-16  4:23 ` Eric W. Biederman
2006-06-16  9:06   ` Daniel Lezcano
2006-06-16  9:22     ` Eric W. Biederman
2006-06-18 18:47 ` Al Viro
2006-06-20 21:21   ` Daniel Lezcano
2006-06-20 21:25     ` Al Viro
2006-06-20 22:45       ` Daniel Lezcano
2006-06-26 23:38 ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1ac80vwpn.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@osdl.org \
    --cc=clg@fr.ibm.com \
    --cc=dev@sw.ru \
    --cc=devel@openvz.org \
    --cc=dlezcano@fr.ibm.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sam@vilain.net \
    --cc=saw@sw.ru \
    --cc=serue@us.ibm.com \
    --cc=viro@ftp.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).