All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] netns: add linux-vrf features via network namespaces
@ 2008-10-30 13:05 Vivien Chappelier
       [not found] ` <4909B10A.8090403-L+G57L1VLRbR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Vivien Chappelier @ 2008-10-30 13:05 UTC (permalink / raw)
  To: containers-qjLDD68F18O7TbgM5vRIOg

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

Hi,

    The recently introduced network namespaces allow separate standalone 
network stacks to coexist on the same machine. This is a very useful 
functionality that we have been needing and using in our products for 
some time, through the VRF ptchset (http://linux-vrf.sourceforge.net/). 
The goal of the VRF patchset and network namespaces are very similar, 
yet some features of the VRF are missing that these patches intend to 
provide.

    The network namespaces are currently tied to a process, and 
referenced by its pid. However, a networking stack has no particular 
reason to be associated with any process and it should be possible to 
use and setup additional networking stacks without the need to clone() 
or unshare(). The initial version of the "Coexist with the sysfs 
limitations" patches by Benjamin Thery introduced the notion of a unique 
network namespace identifier (nsid)  that is perfectly fit for the 
purpose of referencing networking stacks independently of any process. 
The first two patches of his set are therefore reused here to identify 
networking stacks.

    These patches additionally introduce the following features that 
were initially provided by the VRF patchset:
- the ability to move a socket to a different network namespace, through 
the new SO_NSID setsockopt(), given the nsid
- the ability to move a process to an existing network namespace, 
through the new SO_NETNS setsockopt(), given the nsid
- the ability to move an interface to a different namespace by nsid 
instead of pid
- the ability to create additional network namespaces on startup 
(dynamic addition/deletion is not supported but should be easy to add)

   To test those features, the chvrf tools attached in this mail have 
been ported to the new setsockopt() API. Example usage:

$ chnetns 1 /bin/sh   # This will attach a shell to existing network 
namespace 1

$ port -n 1 -p 3434   # This will open a listening socket on port 3434 
of network namespace 1

   Also attached is a patch to iproute2 to add the ability to move an 
interface to a different namespace by nsid, used this way:

$ ip link set eth0 nsid 1    # This will move eth0 to network namespace 1

   The patches should apply cleanly to net-next-2.6, version 2.6.28-rc2, 
commit 3891845e1ef6e6807075d4241966b26f6ecb0a5c.

   I would be glad to have your impressions and comments on these 
patches, and to have them merged upstream once everybody is satisfied 
with them.

regards,
Vivien Chappelier.


[-- Attachment #2: iproute2.patch --]
[-- Type: text/x-diff, Size: 2479 bytes --]

From 698eb7aeb60baca7fc7f0fda9080174c96f92e02 Mon Sep 17 00:00:00 2001
From: Vivien Chappelier <vivien.chappelier-L+G57L1VLRbR7s880joybQ@public.gmane.org>
Date: Tue, 28 Oct 2008 18:06:13 +0100
Subject: [PATCH] Add support for testing IFLA_NET_NS

---
 include/linux/if_link.h |    2 ++
 ip/iplink.c             |    9 +++++++++
 misc/Makefile           |    2 +-
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index c948395..fab393d 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -79,6 +79,8 @@ enum
 	IFLA_LINKINFO,
 #define IFLA_LINKINFO IFLA_LINKINFO
 	IFLA_NET_NS_PID,
+	IFLA_IFALIAS,
+	IFLA_NET_NS,
 	__IFLA_MAX
 };
 
diff --git a/ip/iplink.c b/ip/iplink.c
index fd23db1..ffb0d39 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -67,6 +67,7 @@ void iplink_usage(void)
 	fprintf(stderr, "	                  [ broadcast LLADDR ]\n");
 	fprintf(stderr, "	                  [ mtu MTU ]\n");
 	fprintf(stderr, "	                  [ netns PID ]\n");
+	fprintf(stderr, "	                  [ nsid NSID ]\n");
 	fprintf(stderr, "       ip link show [ DEVICE ]\n");
 
 	if (iplink_have_newlink()) {
@@ -179,6 +180,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 	char abuf[32];
 	int qlen = -1;
 	int mtu = -1;
+	int net = -1;
 	int netns = -1;
 
 	ret = argc;
@@ -228,6 +230,13 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
                         if (get_integer(&netns, *argv, 0))
                                 invarg("Invalid \"netns\" value\n", *argv);
                         addattr_l(&req->n, sizeof(*req), IFLA_NET_NS_PID, &netns, 4);
+		} else if (strcmp(*argv, "nsid") == 0) {
+                        NEXT_ARG();
+                        if (net != -1)
+                                duparg("nsid", *argv);
+                        if (get_integer(&net, *argv, 0))
+                                invarg("Invalid \"nsid\" value\n", *argv);
+                        addattr_l(&req->n, sizeof(*req), IFLA_NET_NS, &net, 4);
 		} else if (strcmp(*argv, "multicast") == 0) {
 			NEXT_ARG();
 			req->i.ifi_change |= IFF_MULTICAST;
diff --git a/misc/Makefile b/misc/Makefile
index 8c25381..a4c9591 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -1,7 +1,7 @@
 SSOBJ=ss.o ssfilter.o
 LNSTATOBJ=lnstat.o lnstat_util.o
 
-TARGETS=ss nstat ifstat rtacct arpd lnstat
+TARGETS=ss nstat ifstat rtacct lnstat
 
 include ../Config
 
-- 
1.5.4.4


[-- Attachment #3: chnetns.tar.gz --]
[-- Type: application/x-gzip, Size: 1553 bytes --]

[-- Attachment #4: Type: text/plain, Size: 206 bytes --]

_______________________________________________
Containers mailing list
Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found] ` <4909B10A.8090403-L+G57L1VLRbR7s880joybQ@public.gmane.org>
@ 2008-10-30 14:38   ` Andreas B Aaen
       [not found]     ` <200810301538.08032.andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org>
  2009-03-25 18:21   ` Bruce Jones
  1 sibling, 1 reply; 14+ messages in thread
From: Andreas B Aaen @ 2008-10-30 14:38 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Hi,

On Thursday 30 October 2008 14:05, Vivien Chappelier wrote:
>     The recently introduced network namespaces allow separate standalone
> network stacks to coexist on the same machine. This is a very useful
> functionality that we have been needing and using in our products for
> some time, through the VRF ptchset (http://linux-vrf.sourceforge.net/).
> The goal of the VRF patchset and network namespaces are very similar,
> yet some features of the VRF are missing that these patches intend to
> provide.

I have worked with a similar patchset. the goal was to be able to terminate 
traffic from different IPv4 nets with possible overlapping IP addresses. You 
should be able to communicate with all IPv4 nets from the same process.

>     The network namespaces are currently tied to a process, and
> referenced by its pid. However, a networking stack has no particular
> reason to be associated with any process and it should be possible to
> use and setup additional networking stacks without the need to clone()

Right.

> or unshare(). The initial version of the "Coexist with the sysfs
> limitations" patches by Benjamin Thery introduced the notion of a unique
> network namespace identifier (nsid)  that is perfectly fit for the
> purpose of referencing networking stacks independently of any process.
> The first two patches of his set are therefore reused here to identify
> networking stacks.

I have proposed such a global namespace before on this list, but no one seemed 
interested.

>     These patches additionally introduce the following features that
> were initially provided by the VRF patchset:
> - the ability to move a socket to a different network namespace, through
> the new SO_NSID setsockopt(), given the nsid

This was exactly our solution although the name of the option was different.
Very elegant solution for scaleability. A huge number of networks can be 
reached from within the same process through different sockets.

> - the ability to move a process to an existing network namespace,
> through the new SO_NETNS setsockopt(), given the nsid

I don't see the need for this. The current network namespace implementation 
handles this just fine without giving the network namespaces numbers.
I see the usecase for this as isolation. Standard applications without any 
setsockopt() use of SO_NSID 

> - the ability to move an interface to a different namespace by nsid
> instead of pid

Here use can use the SO_NSID option through the netlink socket.

> - the ability to create additional network namespaces on startup

Why do it at startup?

> (dynamic addition/deletion is not supported but should be easy to add)

I have only one problem left here on this. When I call copy_net_ns() through 
the netlink socket the rtnl_lock() is already taken.

>    Also attached is a patch to iproute2 to add the ability to move an
> interface to a different namespace by nsid, used this way:
>
> $ ip link set eth0 nsid 1    # This will move eth0 to network namespace 1

ip netns add 1    # create network namespace with index 1
ip link set eth1 nsid 1    # move eth1 to network namespace with index 1
ip -nsid 1 addr add 192.168.50.1/24 dev eth1
ip -nsid 1 addr add 127.0.0.1/8 dev lo
ip -nsid 1 link set eth1 up
ip -nsid 1 link set lo up

The nsid option uses SO_NSID on the socket. This makes sure that the device 
name to index conversion inside iproute2 and the kernel works as it should.

Regards,
-- 
Andreas Bach Aaen              System Developer, M. Sc. 
Tieto Enator A/S               tel: +45 89 38 51 00
Skanderborgvej 232             fax: +45 89 38 51 01
8260 Viby J      Denmark       andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]     ` <200810301538.08032.andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org>
@ 2008-10-30 15:03       ` Serge E. Hallyn
  2008-10-30 16:20       ` Vivien Chappelier
  1 sibling, 0 replies; 14+ messages in thread
From: Serge E. Hallyn @ 2008-10-30 15:03 UTC (permalink / raw)
  To: Andreas B Aaen; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Quoting Andreas B Aaen (andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org):
> Hi,
> 
> On Thursday 30 October 2008 14:05, Vivien Chappelier wrote:
> >     The recently introduced network namespaces allow separate standalone
> > network stacks to coexist on the same machine. This is a very useful
> > functionality that we have been needing and using in our products for
> > some time, through the VRF ptchset (http://linux-vrf.sourceforge.net/).
> > The goal of the VRF patchset and network namespaces are very similar,
> > yet some features of the VRF are missing that these patches intend to
> > provide.
> 
> I have worked with a similar patchset. the goal was to be able to terminate 
> traffic from different IPv4 nets with possible overlapping IP addresses. You 
> should be able to communicate with all IPv4 nets from the same process.
> 
> >     The network namespaces are currently tied to a process, and
> > referenced by its pid. However, a networking stack has no particular
> > reason to be associated with any process and it should be possible to
> > use and setup additional networking stacks without the need to clone()
> 
> Right.
> 
> > or unshare(). The initial version of the "Coexist with the sysfs
> > limitations" patches by Benjamin Thery introduced the notion of a unique
> > network namespace identifier (nsid)  that is perfectly fit for the
> > purpose of referencing networking stacks independently of any process.
> > The first two patches of his set are therefore reused here to identify
> > networking stacks.
> 
> I have proposed such a global namespace before on this list, but no one seemed 
> interested.

Eric in particular is opposed to any "nsid" because it introduces yet
another namespace to worry about at checkpoint/restart.  A reasonable
concern.

There was quite a bit of talk at the containers mini-summit about
creating a minimal filesystem to represent the namespaces.  (See 
http://wiki.openvz.org/Containers/Mini-summit_2008_notes for the notes,
but they're not particularly helpful on their own).

Eric, if you have a moment, could you recap your proposal?

thanks,
-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]     ` <200810301538.08032.andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org>
  2008-10-30 15:03       ` Serge E. Hallyn
@ 2008-10-30 16:20       ` Vivien Chappelier
       [not found]         ` <4909DEC8.9090102-L+G57L1VLRbR7s880joybQ@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Vivien Chappelier @ 2008-10-30 16:20 UTC (permalink / raw)
  To: Andreas B Aaen
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Hi,

Andreas B Aaen wrote:
> I have worked with a similar patchset. the goal was to be able to terminate 
> traffic from different IPv4 nets with possible overlapping IP addresses. You 
> should be able to communicate with all IPv4 nets from the same process.
>   
> I have proposed such a global namespace before on this list, but no one seemed 
> interested.
>
>   
Good, that makes at least two people interested in this feature. It has 
actually nothing to do with isolation, which is probably why interest 
may be low here, but I tried posting on netdev and was redirected here.

>> - the ability to move a process to an existing network namespace,
>> through the new SO_NETNS setsockopt(), given the nsid
>>     
>
> I don't see the need for this. The current network namespace implementation 
> handles this just fine without giving the network namespaces numbers.
>   
Maybe I missed something: how do you join an existing namespace without 
being the child of the process that created it?

> I see the usecase for this as isolation. Standard applications without any 
> setsockopt() use of SO_NSID
>   
Here the use case is rather to use existing tools (without adding 
SO_NSID setsockopt() and recompiling) in a different namespace easily, e.g.
$ chnetns 1 ifconfig

> Here use can use the SO_NSID option through the netlink socket.
>   
This is indeed an elegant solution (works with ioctl() like interfaces 
too by the way) to avoid adding a lot of options to the already existing 
netlink interface.

>> - the ability to create additional network namespaces on startup
>>     
>
> Why do it at startup?
>   
Mostly because I've not ported dynamic addition/deletion yet like you did...

>> (dynamic addition/deletion is not supported but should be easy to add)
>>     
>
> I have only one problem left here on this. When I call copy_net_ns() through 
> the netlink socket the rtnl_lock() is already taken.
>   
I see. And locking order with net_mutex has to be maintained, so perhaps 
defering adding/deletion to a workqueue would be the solution?

> ip netns add 1    # create network namespace with index 1
> ip link set eth1 nsid 1    # move eth1 to network namespace with index 1
> ip -nsid 1 addr add 192.168.50.1/24 dev eth1
> ip -nsid 1 addr add 127.0.0.1/8 dev lo
> ip -nsid 1 link set eth1 up
> ip -nsid 1 link set lo up
>
> The nsid option uses SO_NSID on the socket. This makes sure that the device 
> name to index conversion inside iproute2 and the kernel works as it should.
>   
I really like this idea of calling setsockopt(SO_NSID) on the netlink 
socket directly. Did you post any patches of your implementation?

To bring back the discussion you had with Eric W. Biederman, it seems to 
me that the only real issue is on the addition of the global nsid index 
that is not really fit to be used hierarchically. However, I don't 
understand why having both this global nsid interface and the pid 
interface would hurt, as the goals of isolation and VRF-like support are 
really separate. The idea being those patches is to provide a way to 
manage the networking stack instances easily. The fact that new 
networking stacks instances are created and used for process isolation 
is independent and mostly unrelated. So the IFLA_NET_NS_PID option would 
be kept anyway for this purpose of nesting containers, although adding a 
new internal get_nsid_from_pid() or something similar would make sense 
to avoid duplication of code.

regards,
Vivien.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]         ` <4909DEC8.9090102-L+G57L1VLRbR7s880joybQ@public.gmane.org>
@ 2008-10-30 23:07           ` Eric W. Biederman
       [not found]             ` <m14p2tznoz.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2008-10-30 23:07 UTC (permalink / raw)
  To: Vivien Chappelier; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Vivien Chappelier <vivien.chappelier-L+G57L1VLRbR7s880joybQ@public.gmane.org> writes:

> To bring back the discussion you had with Eric W. Biederman, it seems to 
> me that the only real issue is on the addition of the global nsid index 
> that is not really fit to be used hierarchically. However, I don't 
> understand why having both this global nsid interface and the pid 
> interface would hurt, as the goals of isolation and VRF-like support are 
> really separate.

A global nsid breaks migration, it breaks nested containers, in general it
just hurts.  So it is a bad choice for an interface.

Personally if I have vrf I want to set up a test environment in a container so
I can isolate it from the rest of the system.   Allowing me to play with the
user space side of the functionality without  So these things are not completely
separate concerns.

So from a design point of view I see the following questions.
1) How do we pin a network namespace to allow for routing when no process uses it?
2) How do we create sockets into that pinned network namespace?
3) How do we enter that network namespace so that sockets by default are created in it?

All of these are technically easy things to implement and design wise a challenge.

The best solution I see at the moment is to have something (a fs) we can mount in
the filesystem, keeping the network namespace alive as long as it is mounted.

i.e
mount -t netns none /dev/nets/1
mount -t netns -o newinstance none /dev/nets/2

(The new instance parameter creates the network namespace as well as capturing the
 current one)

char netns[] = "/dev/nets/2"
fd = socket();
err = setsockopt(fd, SOL_SOCKET, SO_NETPATH, netns, strlen(netns) + 1);

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]             ` <m14p2tznoz.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
@ 2008-10-31  9:46               ` Andreas B Aaen
       [not found]                 ` <200810311046.17506.andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas B Aaen @ 2008-10-31  9:46 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Friday 31 October 2008 00:07, Eric W. Biederman wrote:
> A global nsid breaks migration,
Yes.

> it breaks nested containers,
Yes.

> in general it  just hurts.
No.

> So it is a bad choice for an interface. 
Not necessarily. There is a reason why vrf is designed the way it is - and the 
patches that I have worked with had a similar design.

> Personally if I have vrf I want to set up a test environment in a container
> so I can isolate it from the rest of the system.   Allowing me to play with
> the user space side of the functionality without  So these things are not
> completely separate concerns.

Ok. Here is my use case.
I need a to talk to 500 IPv4 networks with possible overlapping IP addresses. 
The packages arrive on 500 VLANs. I want one process to listen to a port on 
each of these networks. I don't want 500 processes that runs in each their 
network namespace and then communicate with each other through e.g. unix 
sockets. This just complicates the task.

> So from a design point of view I see the following questions.
> 1) How do we pin a network namespace to allow for routing when no process
> uses it?
We introduce a global namespace or at least a namespace that unique for a 
process and it's sons.
Maybe a vrf container of network namespaces.
The vrf container numbers it's network namespaces. Each pid points to a vrf 
container. New vrf containers can be made through e.g. unshare(). Migration 
and nesting should be possible.

> 2) How do we create sockets into that pinned network namespace? 
Add a socket option that uses an index (global namespace)

> 3) How do we enter that network namespace so that sockets by default are
> created in it?
I don't need this feature. The VRF patchset does this, so they can implement a 
chvrf utillity.

> All of these are technically easy things to implement and design wise a
> challenge.
Yes.

As I see it network namespaces has provided the splitting of all the protocols 
in the network code. This was the huge task. The vrf patches that I have seen 
a few years back wasn't as mature as this. What's left is actually the 
management of these network namespaces. 

binding network namespaces to processes isn't a good idea for all use cases.  

> The best solution I see at the moment is to have something (a fs) we can
> mount in the filesystem, keeping the network namespace alive as long as it
> is mounted.
>
> i.e
> mount -t netns none /dev/nets/1
> mount -t netns -o newinstance none /dev/nets/2
>
> (The new instance parameter creates the network namespace as well as
> capturing the current one)
>
> char netns[] = "/dev/nets/2"
> fd = socket();
> err = setsockopt(fd, SOL_SOCKET, SO_NETPATH, netns, strlen(netns) + 1);

So the idea here is to let the userspace side choose the naming and ensuring 
the nesting possibility by using the filesystem.

Would you configure this interface on "/dev/nets/2" like this:

ip addr add 10.0.0.1/24 dev eth1 nets "/dev/nets/2" ?

Where the "/dev/nets/2" parameter is set through a SO_NETPATH option to the 
netlink socket that the iproute2 uses in it's implementation.

Is this better or worse than a vrf container with numbered network namespaces 
in?

Regards,
-- 
Andreas Bach Aaen              System Developer, M. Sc. 
Tieto Enator A/S               tel: +45 89 38 51 00
Skanderborgvej 232             fax: +45 89 38 51 01
8260 Viby J      Denmark       andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]                 ` <200810311046.17506.andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org>
@ 2008-10-31 14:17                   ` Daniel Lezcano
       [not found]                     ` <490B1384.7030001-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
  2008-10-31 18:43                   ` Eric W. Biederman
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Lezcano @ 2008-10-31 14:17 UTC (permalink / raw)
  To: Andreas B Aaen
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Andreas B Aaen wrote:
> On Friday 31 October 2008 00:07, Eric W. Biederman wrote:
>> A global nsid breaks migration,
> Yes.
> 
>> it breaks nested containers,
> Yes.
> 
>> in general it  just hurts.
> No.
> 
>> So it is a bad choice for an interface. 
> Not necessarily. There is a reason why vrf is designed the way it is - and the 
> patches that I have worked with had a similar design.
> 
>> Personally if I have vrf I want to set up a test environment in a container
>> so I can isolate it from the rest of the system.   Allowing me to play with
>> the user space side of the functionality without  So these things are not
>> completely separate concerns.
> 
> Ok. Here is my use case.
> I need a to talk to 500 IPv4 networks with possible overlapping IP addresses. 
> The packages arrive on 500 VLANs. I want one process to listen to a port on 
> each of these networks. I don't want 500 processes that runs in each their 
> network namespace and then communicate with each other through e.g. unix 
> sockets. This just complicates the task.

Why don't you unshare 500 times in the same process ? In each namespace 
you create a socket control and the fd number is the identifier of your 
namespace.

>> So from a design point of view I see the following questions.
>> 1) How do we pin a network namespace to allow for routing when no process
>> uses it?
> We introduce a global namespace or at least a namespace that unique for a 
> process and it's sons.
> Maybe a vrf container of network namespaces.
> The vrf container numbers it's network namespaces. Each pid points to a vrf 
> container. New vrf containers can be made through e.g. unshare(). Migration 
> and nesting should be possible.
> 
>> 2) How do we create sockets into that pinned network namespace? 
> Add a socket option that uses an index (global namespace)
> 
>> 3) How do we enter that network namespace so that sockets by default are
>> created in it?
> I don't need this feature. The VRF patchset does this, so they can implement a 
> chvrf utillity.
> 
>> All of these are technically easy things to implement and design wise a
>> challenge.
> Yes.
> 
> As I see it network namespaces has provided the splitting of all the protocols 
> in the network code. This was the huge task. The vrf patches that I have seen 
> a few years back wasn't as mature as this. What's left is actually the 
> management of these network namespaces. 
> 
> binding network namespaces to processes isn't a good idea for all use cases.  
> 
>> The best solution I see at the moment is to have something (a fs) we can
>> mount in the filesystem, keeping the network namespace alive as long as it
>> is mounted.
>>
>> i.e
>> mount -t netns none /dev/nets/1
>> mount -t netns -o newinstance none /dev/nets/2
>>
>> (The new instance parameter creates the network namespace as well as
>> capturing the current one)
>>
>> char netns[] = "/dev/nets/2"
>> fd = socket();
>> err = setsockopt(fd, SOL_SOCKET, SO_NETPATH, netns, strlen(netns) + 1);
> 
> So the idea here is to let the userspace side choose the naming and ensuring 
> the nesting possibility by using the filesystem.
> 
> Would you configure this interface on "/dev/nets/2" like this:
> 
> ip addr add 10.0.0.1/24 dev eth1 nets "/dev/nets/2" ?
> 
> Where the "/dev/nets/2" parameter is set through a SO_NETPATH option to the 
> netlink socket that the iproute2 uses in it's implementation.
> 
> Is this better or worse than a vrf container with numbered network namespaces 
> in?
> 
> Regards,


-- 






















































Sauf indication contraire ci-dessus:
Compagnie IBM France
Siège Social : Tour Descartes, 2, avenue Gambetta, La Défense 5, 92400
Courbevoie
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 542.737.118 ?
SIREN/SIRET : 552 118 465 02430

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]                 ` <200810311046.17506.andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org>
  2008-10-31 14:17                   ` Daniel Lezcano
@ 2008-10-31 18:43                   ` Eric W. Biederman
  1 sibling, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2008-10-31 18:43 UTC (permalink / raw)
  To: Andreas B Aaen; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Andreas B Aaen <andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org> writes:

> On Friday 31 October 2008 00:07, Eric W. Biederman wrote:

> Ok. Here is my use case.
> I need a to talk to 500 IPv4 networks with possible overlapping IP addresses. 
> The packages arrive on 500 VLANs. I want one process to listen to a port on 
> each of these networks. I don't want 500 processes that runs in each their 
> network namespace and then communicate with each other through e.g. unix 
> sockets. This just complicates the task.

Yep.

>> So from a design point of view I see the following questions.
>> 1) How do we pin a network namespace to allow for routing when no process
>> uses it?
> We introduce a global namespace or at least a namespace that unique for a 
> process and it's sons.
> Maybe a vrf container of network namespaces.
> The vrf container numbers it's network namespaces. Each pid points to a vrf 
> container. New vrf containers can be made through e.g. unshare(). Migration 
> and nesting should be possible.

Ah.  The additional namespace approach.

>> 2) How do we create sockets into that pinned network namespace? 
> Add a socket option that uses an index (global namespace)
>
>> 3) How do we enter that network namespace so that sockets by default are
>> created in it?
> I don't need this feature. The VRF patchset does this, so they can implement a 
> chvrf utillity.
>
>> All of these are technically easy things to implement and design wise a
>> challenge.
> Yes.
>
> As I see it network namespaces has provided the splitting of all the protocols 
> in the network code. This was the huge task. The vrf patches that I have seen 
> a few years back wasn't as mature as this. What's left is actually the 
> management of these network namespaces. 
>
> binding network namespaces to processes isn't a good idea for all use cases.  
>
>> The best solution I see at the moment is to have something (a fs) we can
>> mount in the filesystem, keeping the network namespace alive as long as it
>> is mounted.
>>
>> i.e
>> mount -t netns none /dev/nets/1
>> mount -t netns -o newinstance none /dev/nets/2
>>
>> (The new instance parameter creates the network namespace as well as
>> capturing the current one)
>>
>> char netns[] = "/dev/nets/2"
>> fd = socket();
>> err = setsockopt(fd, SOL_SOCKET, SO_NETPATH, netns, strlen(netns) + 1);
>
> So the idea here is to let the userspace side choose the naming and ensuring 
> the nesting possibility by using the filesystem.
>
> Would you configure this interface on "/dev/nets/2" like this:
>
> ip addr add 10.0.0.1/24 dev eth1 nets "/dev/nets/2" ?


Essentially.  I was thinking that you could document /dev/nets in
devices.txt.  Making it the standard and default place for this to
happen so you would only need to say:

ip -nets 2 addr add  10.0.0.1/24 dev eth1

Very much like was previously discussed on this thread.

> Where the "/dev/nets/2" parameter is set through a SO_NETPATH option to the 
> netlink socket that the iproute2 uses in it's implementation.

Yes.

> Is this better or worse than a vrf container with numbered network namespaces 
> in?

Much better, although possibly a little more boiler plate code.

It uses existing namespaces and in particular the mount namespace so you can
create sets of processes that are using it that when they all exit the namespaces
all go away.

It allows recursive containers.
It allows migration.

And all for slightly fewer unique pieces of code than was in the last patchset.

As for the vrf container idea.  I think it gains us just about nothing
in comparison to using the filesystem aka mount namespace (which is
very good at dealing with names), and there are some very useful
things you can do with the mount namespace like mount propagation
which come for free.

From an implementation standpoint we really need as few namespaces
as we can get away with.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]                     ` <490B1384.7030001-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
@ 2008-10-31 18:59                       ` Eric W. Biederman
       [not found]                         ` <m1zlkksi91.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2008-10-31 18:59 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes:

> Andreas B Aaen wrote:

>> Ok. Here is my use case.
>> I need a to talk to 500 IPv4 networks with possible overlapping IP
>> addresses. The packages arrive on 500 VLANs. I want one process to listen to a
>> port on each of these networks. I don't want 500 processes that runs in each
>> their network namespace and then communicate with each other through e.g. unix
>> sockets. This just complicates the task.
>
> Why don't you unshare 500 times in the same process ? In each namespace you
> create a socket control and the fd number is the identifier of your namespace.

That is the other good option I have thought of for doing this.
It is certainly a bit easier to implement.

There are problems with application restart.  So I am concerned with
how well use sockets as identifiers will scale.  But I don't have
any problems in principle.

There is a similar use case where simply have several disjoint domains
that you are performing software routing between and except for
configuration the kernel doesn't need any special support.

I do think just using unshare for the creation and not implementing
a newinstance filesystem option for now makes sense.  That way we can
support mounting of /proc/net and sysfs in those network namespaces
without having to teach them how to parse options as well.

Making the application creation loop something like:
for name in $(seq 1 500) ; do 
	unshare(CLONE_NEWNS);
	mkdir /dev/vrf/$name/proc
	mkdir /dev/vrf/$name/sys
	mkdir /dev/vrf/$name/handle
	mount -t netns none /dev/vrf/$name/handle
	mount -t proc/net none /dev/vrf/$name/proc
	mount -t sysfs none /dev/vrf/$name/sys
done

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]                         ` <m1zlkksi91.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
@ 2008-10-31 19:32                           ` Eric W. Biederman
       [not found]                             ` <m13aicsgr2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2008-10-31 19:32 UTC (permalink / raw)
  To: Daniel Lezcano, Andreas B Aaen,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Vivien Chappelier


Thinking it over a little more I have the following thought.

For binding a socket to a namespace let's use the a fd arg.
That way we can either supply another existing network socket
or the result of an open call.  Simple, and faster if you
are creating more than one socket in the other network namespace.

I really don't like the idea of binding a socket into a namespace.
Especially after looking at the arguments to socket(2).
The network namespace may be incomplete and you may create a socket
in a network namespace that way that we could not exist normally.
That plus it puts lots of races in code that finds the namespace of
a socket.


So in some form let's implement socketat. 
int socketat(int ns, int domain, int type, int protocol, int flags);

We need the flags field so we can accomodate the O_CLOEXEC flag.


That should be very straight forward.  Implementable now, without
a magic filesystem.   And then the filesystem would just provide
the global naming and process independence.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]                             ` <m13aicsgr2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
@ 2008-10-31 20:48                               ` Daniel Lezcano
       [not found]                                 ` <490B6F19.4060206-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Lezcano @ 2008-10-31 20:48 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Eric W. Biederman wrote:
> Thinking it over a little more I have the following thought.
> 
> For binding a socket to a namespace let's use the a fd arg.
> That way we can either supply another existing network socket
> or the result of an open call.  Simple, and faster if you
> are creating more than one socket in the other network namespace.
> 
> I really don't like the idea of binding a socket into a namespace.
> Especially after looking at the arguments to socket(2).
> The network namespace may be incomplete and you may create a socket
> in a network namespace that way that we could not exist normally.
> That plus it puts lots of races in code that finds the namespace of
> a socket.
> 
> 
> So in some form let's implement socketat. 
> int socketat(int ns, int domain, int type, int protocol, int flags);

Is the 'ns' arg a fd from a socket just after the unshare ?

> We need the flags field so we can accomodate the O_CLOEXEC flag.
> 
> 
> That should be very straight forward.  Implementable now, without
> a magic filesystem.   And then the filesystem would just provide
> the global naming and process independence.

Assuming the ns arg is a fd from a socket created in a specific network 
namespace, I agree this is quite easy to implement and consistent with 
the refcounting of the netns. Furthermore that follows the logic of the 
network devices, one can be created in another netns using the pid as 
identifier.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found]                                 ` <490B6F19.4060206-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
@ 2008-10-31 23:10                                   ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2008-10-31 23:10 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

Daniel Lezcano <dlezcano-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> writes:

> Eric W. Biederman wrote:
>> Thinking it over a little more I have the following thought.
>>
>> For binding a socket to a namespace let's use the a fd arg.
>> That way we can either supply another existing network socket
>> or the result of an open call.  Simple, and faster if you
>> are creating more than one socket in the other network namespace.
>>
>> I really don't like the idea of binding a socket into a namespace.
>> Especially after looking at the arguments to socket(2).
>> The network namespace may be incomplete and you may create a socket
>> in a network namespace that way that we could not exist normally.
>> That plus it puts lots of races in code that finds the namespace of
>> a socket.
>>
>>
>> So in some form let's implement socketat. int socketat(int ns, int domain, int
>> type, int protocol, int flags);
>
> Is the 'ns' arg a fd from a socket just after the unshare ?

Yes.   Any socket in the target namespace will do.

>> We need the flags field so we can accomodate the O_CLOEXEC flag.
>>
>>
>> That should be very straight forward.  Implementable now, without
>> a magic filesystem.   And then the filesystem would just provide
>> the global naming and process independence.
>
> Assuming the ns arg is a fd from a socket created in a specific network
> namespace, I agree this is quite easy to implement and consistent with the
> refcounting of the netns. Furthermore that follows the logic of the network
> devices, one can be created in another netns using the pid as identifier.

Yes.  Your assumption is right.

Using a fd as the descriptor we need to touch both socket creation and network
device movement, but that should be sufficient.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] netns: add linux-vrf features via network namespaces
       [not found] ` <4909B10A.8090403-L+G57L1VLRbR7s880joybQ@public.gmane.org>
  2008-10-30 14:38   ` Andreas B Aaen
@ 2009-03-25 18:21   ` Bruce Jones
  1 sibling, 0 replies; 14+ messages in thread
From: Bruce Jones @ 2009-03-25 18:21 UTC (permalink / raw)
  To: containers-qjLDD68F18O7TbgM5vRIOg; +Cc: Vivien Chappelier

Vivien Chappelier wrote:
> Hi,
> 
> The recently introduced network namespaces allow separate standalone
>  network stacks to coexist on the same machine. This is a very useful
>  functionality that we have been needing and using in our products
> for some time, through the VRF ptchset
> (http://linux-vrf.sourceforge.net/). The goal of the VRF patchset and
> network namespaces are very similar, yet some features of the VRF are
> missing that these patches intend to provide.
> 
> 
Hi,

I have been a silent observer of the network namespace work for a few 
months. Like a couple of others, I also am interested in the functionality 
provided by the linux-vrf patches presented here by Vivien. Eric Biederman 
mentioned some sysfs patches that he was working on. Daniel Lezcano 
submitted a proposed new socketat syscall. Other than that this thread 
seems to have died a few months ago as far as I have been able to find. 

I would like to find out if there is any updated status. I offer my help 
in this effort. I am a new Linux developer, but I developed similar vrf 
features in a BSD-based stack under VxWorks.

Thanks,
Bruce

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/6] netns: add linux-vrf features via network namespaces
@ 2009-04-15  3:14 Krishna Vamsi-B22174
  0 siblings, 0 replies; 14+ messages in thread
From: Krishna Vamsi-B22174 @ 2009-04-15  3:14 UTC (permalink / raw)
  To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

 
 
Hi Vivien Chappelier,
 
 
In my use case, a security appliance provides security to 100 networks
and 
a user space process which receives control traffic from these 100
networks.
Earlier we were using VRF ID patch , so my objective of having separate
routing table 
for each network was achieved. 
 
Now I have to customize  2.6.27 kernel to achieve the above requirement.

 
I have compiled the 2.6.27 kernel with 
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
CONFIG_NET_NS=y
CONFIG_SYSFS=y
CONFIG_MACVLAN=y
CONFIG_VETH=y
 
 
Please clarify me
 
1)Will these 6 patches satisfy my requirement ?
  Do I need any additional patches other than the 6 patches?
  
2)Please let me know the recommended iproute2 version  .
 
3)Network Namespace object ID starts from 1 ... 4095. Is my
understanding correct ?
 
4)Are there any test programs to verify ?If so what is the recommended
glibc version to compile these
  test programs?
 
  I will post my comments after testing this patch.
 
Regards
    Vamsi

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-04-15  3:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-30 13:05 [PATCH 0/6] netns: add linux-vrf features via network namespaces Vivien Chappelier
     [not found] ` <4909B10A.8090403-L+G57L1VLRbR7s880joybQ@public.gmane.org>
2008-10-30 14:38   ` Andreas B Aaen
     [not found]     ` <200810301538.08032.andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org>
2008-10-30 15:03       ` Serge E. Hallyn
2008-10-30 16:20       ` Vivien Chappelier
     [not found]         ` <4909DEC8.9090102-L+G57L1VLRbR7s880joybQ@public.gmane.org>
2008-10-30 23:07           ` Eric W. Biederman
     [not found]             ` <m14p2tznoz.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31  9:46               ` Andreas B Aaen
     [not found]                 ` <200810311046.17506.andreas.aaen-546VmZ+UeKYX2WXlbB3fKg@public.gmane.org>
2008-10-31 14:17                   ` Daniel Lezcano
     [not found]                     ` <490B1384.7030001-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-10-31 18:59                       ` Eric W. Biederman
     [not found]                         ` <m1zlkksi91.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31 19:32                           ` Eric W. Biederman
     [not found]                             ` <m13aicsgr2.fsf-B27657KtZYmhTnVgQlOflh2eb7JE58TQ@public.gmane.org>
2008-10-31 20:48                               ` Daniel Lezcano
     [not found]                                 ` <490B6F19.4060206-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-10-31 23:10                                   ` Eric W. Biederman
2008-10-31 18:43                   ` Eric W. Biederman
2009-03-25 18:21   ` Bruce Jones
  -- strict thread matches above, loose matches on Subject: below --
2009-04-15  3:14 Krishna Vamsi-B22174

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.