netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* VRFs and the scalability of namespaces
@ 2014-09-26 22:37 David Ahern
  2014-09-26 23:52 ` Stephen Hemminger
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: David Ahern @ 2014-09-26 22:37 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: nicolas.dichtel, netdev@vger.kernel.org

Hi Eric:

As you suggested [1] I am starting a new thread to discuss scalability 
problems using namespaces for VRFs.

Background
----------
Consider a single system that wants to provide VRF-based features with 
support for N VRFs. N could easily be 2048 (e.g., 6Wind, [2]), 4000 
(e.g., Cisco, [3]) or even higher.

The single system with support for N VRFs runs M services (e.g., quagga, 
cdp, lldp, stp, strongswan, some homegrown routing protocol) and 
includes standard system services like sshd. Furthermore, a system also 
includes monitoring programs like snmpd and tcollector. In short, M is 
easily 20 processes that need to have a presence across all VRFs.


Network Namespaces for VRFs
---------------------------
For the past 4 years or so the response to VRF questions is a drum beat 
of "use network namespaces". But namespaces are not a good match for VRFs.

1. Network namespaces are a complete separation of the networking stack 
from network devices up. VRFs are an L3 concept. Using namespaces forces 
an L3 separation concept onto L2 apps -- lldp, cdp, etc.

There are use cases when you want device level separation, use cases 
where you want only L3 and up separation, and cases where you want both 
(e.g., divy up the netdevices in a system across some small number of 
namespaces and then provide VRF based features within a namespace).


2. Scalability of apps providing service as namespaces are created. How 
do you create the presence for each service in a network namespace?

a. Spawn a new process for each namespace? brute force approach and 
extremely resource intensive. e.g., the quagga example [4]

b. spawn a thread for each namespace? Better than a full process but 
still a heavyweight solution

c. create a socket per namespace. Better but still this is a resource 
intensive solution -- N listen sockets per service and each service 
needs to be modified for namespace support. For opensource software that 
means each project has to agree that namespace awareness is relevant and 
agree to take the patches.


3. Just creating a network namespace consumes non-negligible amount of 
memory -- ~200kB for the 3.10 kernel. I believe the /proc entries are 
the bulk of that memory usage. 200kB/namespace is again a lot of wasted 
memory and overhead.


4. For a single process to straddle multiple namespaces it has to run 
with full root privileges -- CAP_SYS_ADMIN -- to use setns. Using 
network sockets does not require a process to run as root at all unless 
it wants privileged ports in which case CAP_NET_BIND_SERVICE is 
sufficient, not full root.


The Linux kernel needs proper VRF support -- as an L3 concept. A 
capability to run a process in a "VRF any" context provides a resource 
efficient solution where a single process with a single listen socket 
works across all VRFs in a namespace and then connected sockets have a 
specific VRF context.

Before droning on even more, does the above provide better context on 
the general problem?

Thanks,
David


[1] https://lkml.org/lkml/2014/9/26/840

[2] http://www.6wind.com/6windgate-performance/ip-forwarding

[3] 
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/sw/verified_scalability/b_Cisco_Nexus_7000_Series_NX-OS_Verified_Scalability_Guide.html

[4] 
https://lists.quagga.net/pipermail/quagga-users/2010-February/011351.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-26 22:37 VRFs and the scalability of namespaces David Ahern
@ 2014-09-26 23:52 ` Stephen Hemminger
  2014-09-27  0:00   ` David Ahern
  2014-09-27  1:25 ` Eric W. Biederman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2014-09-26 23:52 UTC (permalink / raw)
  To: David Ahern; +Cc: Eric W. Biederman, nicolas.dichtel, netdev@vger.kernel.org

On Fri, 26 Sep 2014 16:37:26 -0600
David Ahern <dsahern@gmail.com> wrote:

> Hi Eric:
> 
> As you suggested [1] I am starting a new thread to discuss scalability 
> problems using namespaces for VRFs.
> 

Whining without suggesting a solution or providing code will probably result
in no action.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-26 23:52 ` Stephen Hemminger
@ 2014-09-27  0:00   ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2014-09-27  0:00 UTC (permalink / raw)
  To: Stephen Hemminger, David Ahern
  Cc: Eric W. Biederman, nicolas.dichtel, netdev@vger.kernel.org

On 9/26/14, 5:52 PM, Stephen Hemminger wrote:
> On Fri, 26 Sep 2014 16:37:26 -0600
> David Ahern <dsahern@gmail.com> wrote:
>
>> Hi Eric:
>>
>> As you suggested [1] I am starting a new thread to discuss scalability
>> problems using namespaces for VRFs.
>>
>
> Whining without suggesting a solution or providing code will probably result
> in no action.

I have a suggestion; I was trying to set a context on the problem.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-26 22:37 VRFs and the scalability of namespaces David Ahern
  2014-09-26 23:52 ` Stephen Hemminger
@ 2014-09-27  1:25 ` Eric W. Biederman
  2014-09-29 12:34   ` David Ahern
  2014-09-27 13:29 ` Hannes Frederic Sowa
  2014-09-29 18:05 ` Cong Wang
  3 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2014-09-27  1:25 UTC (permalink / raw)
  To: David Ahern; +Cc: nicolas.dichtel, netdev@vger.kernel.org, Stephen Hemminger

David Ahern <dsahern@gmail.com> writes:

> Hi Eric:
>
> As you suggested [1] I am starting a new thread to discuss scalability
> problems using namespaces for VRFs.

I will accept as a given that using network namespaces at a scale of
1000s and with lots of little applications listening for new connections
(but generally not doing anything) is outside the classic networking
usage of linux and of namespaces and so does not work out of the box and
that very least some fixes are necessary.

However your premise that network namespaces are unsupportable unfixable
and fundamentally unscalable for what you want to do is unsupported.

The most difficult problem for high levels of efficiency is that that of
modifying applications so that they are VRF/namespace aware.  That they
look at the appropriate set of dns resolvers for the namespace, that
when messages are logged the messages report not just the ip address but
the context that ip address came from.  There are no magic solutions to
make that kinds of deep and fundamental code modifications.

I can totally see it being frustrating to use linux as a switch OS
when it doesn't quite do what you want on the hardware you want to use,
and definitely not with the efficiencies you want.

I will tell you what network namespaces get you.  Network namespaces
deliver the full power of the linux network stack, every interesting
feature works, and network namespaces provide a path where you can use
unmodified linux applications.

When you say "proper VRF support" what I hear is that you think
something new needs to be added to the linux network stack (called a
VRF) with a new userspace interface that somehow because it lacks
features is better.

> Background
> ----------
> Consider a single system that wants to provide VRF-based features with
> support for N VRFs. N could easily be 2048 (e.g., 6Wind, [2]), 4000
> (e.g., Cisco, [3]) or even higher.
>
> The single system with support for N VRFs runs M services (e.g.,
> quagga, cdp, lldp, stp, strongswan, some homegrown routing protocol)
> and includes standard system services like sshd. Furthermore, a system
> also includes monitoring programs like snmpd and tcollector. In short,
> M is easily 20 processes that need to have a presence across all VRFs.

And trying to run it all on what would be considered an underpowered in
most contexts.

> Before droning on even more, does the above provide better context on
> the general problem?

It provides a rough context on what you are trying to do.  Use linux as
the OS to run on a switch.

It doesn't actually provide much in the way of context actual problems
that show up when you try to use network namespaces.  Which is what I
was expecting the discussion would be about, and which would I expect be
a productive conversation.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-26 22:37 VRFs and the scalability of namespaces David Ahern
  2014-09-26 23:52 ` Stephen Hemminger
  2014-09-27  1:25 ` Eric W. Biederman
@ 2014-09-27 13:29 ` Hannes Frederic Sowa
  2014-09-27 14:09   ` Hannes Frederic Sowa
  2014-09-29 13:06   ` David Ahern
  2014-09-29 18:05 ` Cong Wang
  3 siblings, 2 replies; 15+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-27 13:29 UTC (permalink / raw)
  To: David Ahern, Eric W. Biederman; +Cc: nicolas.dichtel, netdev

Hi,

On Sat, Sep 27, 2014, at 00:37, David Ahern wrote:
> The Linux kernel needs proper VRF support -- as an L3 concept. A 
> capability to run a process in a "VRF any" context provides a resource 
> efficient solution where a single process with a single listen socket 
> works across all VRFs in a namespace and then connected sockets have a 
> specific VRF context.

In case you want to have full blown VRF support as in BGP-MPLS-VRF
setups I agree that namespaces seem to be not the right tool for the
job, even not closely. Although if those namespaces get split up in a
more lightweight routing/forwarding namespace and the normal full blown
network namespace it seems to be too cumbersome for user space to manage
those.

Did you already did an investigation how maybe the rule and table
features could be exploited to suite your needs? Some time back I
suggested something like "ip route table foo exec ....", keep an default
routing lookup indicator in task_struct which gets implicitly propagated
to rtnetlink routing table requests/modification for the requested
table. Tables already can be specified via rtnetlink, so no change would
be needed here.

For sockets something like SO_BINDTOTABLE might work, maybe even we can
by default use the task_struct information to also bind the sockets to
the per-process table. We certainly need to preserve the routing
information on the socket as we need those in icmp error handling (e.g.
where to apply ipv4/ipv6 redirects too). Directing incoming packets to
specific table also works via ip-rule-iif match.

Advantage with the ip route table foo exec... method would be, that
conversion of some unmodified routing management daemons might be
easier, others can either use rtnetlink extended attributes which are
already available, and we only need to have per-process context routing
table control, which seems not too hard to implement in ip-rule
subsystem, but I haven't checked.

The problem I see with rules is that some of those tables already work
hand in hand, they already have a implicit semantics, e.g. local, main,
default and unspec (this is even worse for IPv6, where addrconf already
uses hardcoded tables). Working around this might be very tricky and
even more problematic to do from user space.

I think I am not yet sure what features you want from VRFs, some things
seem to match the rule/table features but others I think are pretty hard
to implement.

I worry a bit about icmp error handling and updates of the routing
table, but this is a detail we can look into later if all needed
features are known.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-27 13:29 ` Hannes Frederic Sowa
@ 2014-09-27 14:09   ` Hannes Frederic Sowa
  2014-09-29 13:06   ` David Ahern
  1 sibling, 0 replies; 15+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-27 14:09 UTC (permalink / raw)
  To: David Ahern, Eric W. Biederman; +Cc: nicolas.dichtel, netdev

Hi,

Addendum:

On Sat, Sep 27, 2014, at 15:29, Hannes Frederic Sowa wrote:
> Did you already did an investigation how maybe the rule and table
> features could be exploited to suite your needs? Some time back I
> suggested something like "ip route table foo exec ....", keep an default
> routing lookup indicator in task_struct which gets implicitly propagated
> to rtnetlink routing table requests/modification for the requested
> table. Tables already can be specified via rtnetlink, so no change would
> be needed here.
>
> For sockets something like SO_BINDTOTABLE might work, maybe even we can
> by default use the task_struct information to also bind the sockets to
> the per-process table. We certainly need to preserve the routing
> information on the socket as we need those in icmp error handling (e.g.
> where to apply ipv4/ipv6 redirects too). Directing incoming packets to
> specific table also works via ip-rule-iif match.

Update and lookup rule ids must be separated, so a process might need to
get a tuple of references which table to update and which tables to
match in ip rules.

Also some data structures on matching might be change, e.g. an ->action
which takes an interface and returns the routing table id in O(1)
instead of walking the rules and executing the actions in order.

> The problem I see with rules is that some of those tables already work
> hand in hand, they already have a implicit semantics, e.g. local, main,
> default and unspec (this is even worse for IPv6, where addrconf already
> uses hardcoded tables). Working around this might be very tricky and
> even more problematic to do from user space.

We might also add an rule reference to net_device so we redirect the
route changes during address addition/deletion to a separate table,
otherwise user space has to move them non-atomically.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-27  1:25 ` Eric W. Biederman
@ 2014-09-29 12:34   ` David Ahern
  0 siblings, 0 replies; 15+ messages in thread
From: David Ahern @ 2014-09-29 12:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: nicolas.dichtel, netdev@vger.kernel.org, Stephen Hemminger

Hi Eric

On 9/26/14, 7:25 PM, Eric W. Biederman wrote:
> When you say "proper VRF support" what I hear is that you think
> something new needs to be added to the linux network stack (called a
> VRF) with a new userspace interface that somehow because it lacks
> features is better.

 From my perspective the existing mechanisms do not seem to provide a 
sufficient solution for VRFs.


>> Before droning on even more, does the above provide better context on
>> the general problem?
>
> It provides a rough context on what you are trying to do.  Use linux as
> the OS to run on a switch.
>
> It doesn't actually provide much in the way of context actual problems
> that show up when you try to use network namespaces.  Which is what I
> was expecting the discussion would be about, and which would I expect be
> a productive conversation.

I don't know how else to explain it beyond what I said in the first 
email. I listed several specific examples of how namespaces are not an 
appropriate model for VRFs. Do you disagree on any of those points? Need 
clarification on any of them? ie., What more were you expecting?

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-27 13:29 ` Hannes Frederic Sowa
  2014-09-27 14:09   ` Hannes Frederic Sowa
@ 2014-09-29 13:06   ` David Ahern
  2014-09-29 16:40     ` Ben Greear
  1 sibling, 1 reply; 15+ messages in thread
From: David Ahern @ 2014-09-29 13:06 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Eric W. Biederman; +Cc: nicolas.dichtel, netdev

Hi Hannes:

On 9/27/14, 7:29 AM, Hannes Frederic Sowa wrote:
> Did you already did an investigation how maybe the rule and table
> features could be exploited to suite your needs? Some time back I

I did look into the existing multiple table option but not to the extent 
of creating a POC. It has been on my to-do list for 4+ months now I just 
have not had time to get to it. Based on a number of Google searches to 
review the history of VRFs and the kernel, I did see the use of multiple 
routing tables has been suggested as well and its problems have been 
delineated. e.g.,

     http://www.spinics.net/lists/linux-net/msg17502.html


> suggested something like "ip route table foo exec ....", keep an default
> routing lookup indicator in task_struct which gets implicitly propagated
> to rtnetlink routing table requests/modification for the requested
> table. Tables already can be specified via rtnetlink, so no change would
> be needed here.
>
> For sockets something like SO_BINDTOTABLE might work, maybe even we can
> by default use the task_struct information to also bind the sockets to
> the per-process table. We certainly need to preserve the routing
> information on the socket as we need those in icmp error handling (e.g.
> where to apply ipv4/ipv6 redirects too). Directing incoming packets to
> specific table also works via ip-rule-iif match.
>
> Advantage with the ip route table foo exec... method would be, that
> conversion of some unmodified routing management daemons might be
> easier, others can either use rtnetlink extended attributes which are
> already available, and we only need to have per-process context routing
> table control, which seems not too hard to implement in ip-rule
> subsystem, but I haven't checked.
>
> The problem I see with rules is that some of those tables already work
> hand in hand, they already have a implicit semantics, e.g. local, main,
> default and unspec (this is even worse for IPv6, where addrconf already
> uses hardcoded tables). Working around this might be very tricky and
> even more problematic to do from user space.
>
> I think I am not yet sure what features you want from VRFs, some things
> seem to match the rule/table features but others I think are pretty hard
> to implement.

The features of note:
- resource efficiency -- not having to create a proces/thread/socket per 
VRF to have a "presence" in all VRFs. e.g., a VRF any context that 
allows 1 socket to work across VRFs (L3 raw socket, TCP listen socket, 
unconnected UDP socket). Daemons run a 'vrf any' context; connected 
clients run a specific vrf context. For non-connected sockets VRF 
context can be passed via cmsg.

- same IP address on different interfaces in different vrfs. i.e., VRF 
specific routing and neighbor tables

- cross VRF routing. ability to receive message on 1 vrf and send it on 
another. Can be handled by the process itself (e.g., L3 vpns).

Thanks,
David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-29 13:06   ` David Ahern
@ 2014-09-29 16:40     ` Ben Greear
  2014-09-29 16:50       ` Sowmini Varadhan
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2014-09-29 16:40 UTC (permalink / raw)
  To: David Ahern
  Cc: Hannes Frederic Sowa, Eric W. Biederman, nicolas.dichtel, netdev

On 09/29/2014 06:06 AM, David Ahern wrote:

> The features of note:
> - resource efficiency -- not having to create a proces/thread/socket per VRF to have a "presence" in all VRFs. e.g., a VRF any context that allows 1 socket to
> work across VRFs (L3 raw socket, TCP listen socket, unconnected UDP socket). Daemons run a 'vrf any' context; connected clients run a specific vrf context. For
> non-connected sockets VRF context can be passed via cmsg.
> 
> - same IP address on different interfaces in different vrfs. i.e., VRF specific routing and neighbor tables
> 
> - cross VRF routing. ability to receive message on 1 vrf and send it on another. Can be handled by the process itself (e.g., L3 vpns).

We have implemented support for at least most of this (excepting duplicate IPs)
using routing tables, rules, and (optionally, xorp as the router).

It works ok for our purposes (network simulator), but peformance is not great because
you end up with a large number of ip rules and they are effectively evaluated linearly
it seems.

A quick way to improve performance in our scenario would be to bind rules to
specific interfaces, so that packets process a smaller number of rules when
they enter an interface, I think...but I have not looked into it closely.


It is hard to show you an example of this without you installing our
software to visualize what we are trying to do, but it our software
will work on standard kernels, and we auto-generate a perl script
that sets up all of the rules and such.  You could compare the network
diagram in our GUI with the perl script and I think understand the
basics of what we are doing fairly quickly.  If you want to take a detailed
look, let me know and I'll set you up with a demo license.



Thanks,
Ben


> 
> Thanks,
> David
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-29 16:40     ` Ben Greear
@ 2014-09-29 16:50       ` Sowmini Varadhan
  2014-09-29 17:00         ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Sowmini Varadhan @ 2014-09-29 16:50 UTC (permalink / raw)
  To: Ben Greear
  Cc: David Ahern, Hannes Frederic Sowa, Eric W. Biederman,
	Nicolas Dichtel, netdev

On Mon, Sep 29, 2014 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 09/29/2014 06:06 AM, David Ahern wrote:

>
> We have implemented support for at least most of this (excepting duplicate IPs)
> using routing tables, rules, and (optionally, xorp as the router).
>

My undertanding of multiple routing-tables/rules was that they
are closer in semantics to switch/router ACLs than to VRFs, eg.,
one big difference is that an interface can belong to exactly one
VRF at a time, which is not mandated by multiple routing-tables/rules.

Was I mistaken?

--Sowmini

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-29 16:50       ` Sowmini Varadhan
@ 2014-09-29 17:00         ` Ben Greear
  2014-09-29 23:43           ` David Ahern
  0 siblings, 1 reply; 15+ messages in thread
From: Ben Greear @ 2014-09-29 17:00 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: David Ahern, Hannes Frederic Sowa, Eric W. Biederman,
	Nicolas Dichtel, netdev

On 09/29/2014 09:50 AM, Sowmini Varadhan wrote:
> On Mon, Sep 29, 2014 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>> On 09/29/2014 06:06 AM, David Ahern wrote:
> 
>>
>> We have implemented support for at least most of this (excepting duplicate IPs)
>> using routing tables, rules, and (optionally, xorp as the router).
>>
> 
> My undertanding of multiple routing-tables/rules was that they
> are closer in semantics to switch/router ACLs than to VRFs, eg.,
> one big difference is that an interface can belong to exactly one
> VRF at a time, which is not mandated by multiple routing-tables/rules.
> 
> Was I mistaken?

You can effectively force an interface to belong to a particular virtual
router (table).  It is not trivial to do, and possibly I have still not
covered every possible case.  Some rules grow somewhat exponentially as
interfaces are added to virtual routers (ie, preference 10 rules).

Here is our setup for a system with a single virtual router, which uses
table 10001.  vap0, vap1, and eth1 are in this virtual router.  There are other
interfaces on this system outside of the virtual router, so you can ignore rules
related to those.

You have to add CT zones for each virtual router as well.

[root@ath10k-2220 ~]# ip ru show
10:	from all to 5.1.1.1 iif eth1 lookup local
10:	from all to 4.1.0.1 iif vap0 lookup local
10:	from all to 4.2.0.1 iif vap0 lookup local
10:	from all to 4.2.0.1 iif vap1 lookup local
10:	from all to 5.1.1.1 iif vap0 lookup local
10:	from all to 4.1.0.1 iif vap1 lookup local
10:	from all to 4.1.0.1 iif vap1 lookup local
10:	from all to 5.1.1.1 iif vap1 lookup local
10:	from all to 4.1.0.1 iif eth1 lookup local
10:	from all to 4.2.0.1 iif vap0 lookup local
10:	from all to 4.2.0.1 iif eth1 lookup local
20:	from all iif eth1 lookup 10001
20:	from all iif vap0 lookup 10001
20:	from all iif vap1 lookup 10001
30:	from 5.1.1.1 lookup 10001
30:	from 4.1.0.1 lookup 10001
30:	from 4.2.0.1 lookup 10001
50:	from all oif rddVR0 lookup 6
50:	from all oif rddVR1 lookup 7
50:	from all oif rddVR2 lookup 8
50:	from all oif rddVR3 lookup 9
50:	from all oif wlan0 lookup 4
50:	from all oif wlan1 lookup 5
50:	from all oif eth1 lookup 10001
50:	from all oif vap0 lookup 10001
50:	from all oif vap1 lookup 10001
512:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

[root@ath10k-2220 ~]# ip -4 route show table all
unreachable default  table 10001
4.1.0.0/16 via 4.1.0.1 dev vap0  table 10001
4.2.0.0/16 via 4.2.0.1 dev vap1  table 10001
5.1.1.0/24 dev eth1  table 10001  scope link
default via 192.168.100.1 dev eth0
4.1.0.0/16 dev vap0  proto kernel  scope link  src 4.1.0.1
4.2.0.0/16 dev vap1  proto kernel  scope link  src 4.2.0.1
5.1.1.0/24 dev eth1  proto kernel  scope link  src 5.1.1.1
169.254.0.0/16 dev eth0  scope link  metric 1002
192.168.100.0/24 dev eth0  proto kernel  scope link  src 192.168.100.179
broadcast 4.1.0.0 dev vap0  table local  proto kernel  scope link  src 4.1.0.1
local 4.1.0.1 dev vap0  table local  proto kernel  scope host  src 4.1.0.1
broadcast 4.1.255.255 dev vap0  table local  proto kernel  scope link  src 4.1.0.1
broadcast 4.2.0.0 dev vap1  table local  proto kernel  scope link  src 4.2.0.1
local 4.2.0.1 dev vap1  table local  proto kernel  scope host  src 4.2.0.1
broadcast 4.2.255.255 dev vap1  table local  proto kernel  scope link  src 4.2.0.1
broadcast 5.1.1.0 dev eth1  table local  proto kernel  scope link  src 5.1.1.1
local 5.1.1.1 dev eth1  table local  proto kernel  scope host  src 5.1.1.1
broadcast 5.1.1.255 dev eth1  table local  proto kernel  scope link  src 5.1.1.1
broadcast 127.0.0.0 dev lo  table local  proto kernel  scope link  src 127.0.0.1
local 127.0.0.0/8 dev lo  table local  proto kernel  scope host  src 127.0.0.1
local 127.0.0.1 dev lo  table local  proto kernel  scope host  src 127.0.0.1
broadcast 127.255.255.255 dev lo  table local  proto kernel  scope link  src 127.0.0.1
broadcast 192.168.100.0 dev eth0  table local  proto kernel  scope link  src 192.168.100.179
local 192.168.100.179 dev eth0  table local  proto kernel  scope host  src 192.168.100.179
broadcast 192.168.100.255 dev eth0  table local  proto kernel  scope link  src 192.168.100.179

[root@ath10k-2220 ~]# ip route show table 10001
unreachable default
4.1.0.0/16 via 4.1.0.1 dev vap0
4.2.0.0/16 via 4.2.0.1 dev vap1
5.1.1.0/24 dev eth1  scope link


Thanks,
Ben

> 
> --Sowmini
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-26 22:37 VRFs and the scalability of namespaces David Ahern
                   ` (2 preceding siblings ...)
  2014-09-27 13:29 ` Hannes Frederic Sowa
@ 2014-09-29 18:05 ` Cong Wang
  3 siblings, 0 replies; 15+ messages in thread
From: Cong Wang @ 2014-09-29 18:05 UTC (permalink / raw)
  To: David Ahern; +Cc: Eric W. Biederman, Nicolas Dichtel, netdev@vger.kernel.org

On Fri, Sep 26, 2014 at 3:37 PM, David Ahern <dsahern@gmail.com> wrote:
>
> Network Namespaces for VRFs
> ---------------------------
> For the past 4 years or so the response to VRF questions is a drum beat of
> "use network namespaces". But namespaces are not a good match for VRFs.
>
> 1. Network namespaces are a complete separation of the networking stack from
> network devices up. VRFs are an L3 concept. Using namespaces forces an L3
> separation concept onto L2 apps -- lldp, cdp, etc.
>
> There are use cases when you want device level separation, use cases where
> you want only L3 and up separation, and cases where you want both (e.g.,
> divy up the netdevices in a system across some small number of namespaces
> and then provide VRF based features within a namespace).


With regarding to L3 isolation, not limited to VRF, it _does_ sound
like a cleaner
solution than the current L2 isolation, given the fact that:

1) Isolating L2 relies on /sys etc. in practice, which means we have to use
mount namespace as well. Ideally network isolation should be independent of
other namespaces.

2) Some resources still need to be shared, otherwise we have to restart
some services in each of network namespaces as you mentioned. The
~200kb memory overhead is not a problem I think, as now most servers have 8Gb+
memory, the real problem is some service, for example udevd, has some
dependencies, we would fall back to re-implement a whole init service
chain for each
netns, which is the expensive part.

Overall, rather than all or none isolation, we should give the user
some flexibility
to specify which network resources we want to isolation, which we don't. Again,
I am not saying VRF is the solution, I am looking for some more
flexible isolation
solution.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-29 17:00         ` Ben Greear
@ 2014-09-29 23:43           ` David Ahern
  2014-09-29 23:50             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 15+ messages in thread
From: David Ahern @ 2014-09-29 23:43 UTC (permalink / raw)
  To: Ben Greear, Sowmini Varadhan
  Cc: Hannes Frederic Sowa, Eric W. Biederman, Nicolas Dichtel, netdev

On 9/29/14, 11:00 AM, Ben Greear wrote:
> On 09/29/2014 09:50 AM, Sowmini Varadhan wrote:
>> On Mon, Sep 29, 2014 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>>> On 09/29/2014 06:06 AM, David Ahern wrote:
>>
>>>
>>> We have implemented support for at least most of this (excepting duplicate IPs)
>>> using routing tables, rules, and (optionally, xorp as the router).
>>>
>>
>> My undertanding of multiple routing-tables/rules was that they
>> are closer in semantics to switch/router ACLs than to VRFs, eg.,
>> one big difference is that an interface can belong to exactly one
>> VRF at a time, which is not mandated by multiple routing-tables/rules.
>>
>> Was I mistaken?
>
> You can effectively force an interface to belong to a particular virtual
> router (table).  It is not trivial to do, and possibly I have still not
> covered every possible case.  Some rules grow somewhat exponentially as
> interfaces are added to virtual routers (ie, preference 10 rules).

An interesting way of doing it; thanks for the reference point.

Fundamentally the design should be able to assign interfaces to a single 
VRF, support duplicate IP addresses on different interfaces in different 
VRFs and be able to scale to 10,000+ netdevices -- devices representing 
physical ports as well as logical interfaces built on top of them (e.g., 
sub-interfaces).

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-29 23:43           ` David Ahern
@ 2014-09-29 23:50             ` Hannes Frederic Sowa
  2014-09-30  1:15               ` Ben Greear
  0 siblings, 1 reply; 15+ messages in thread
From: Hannes Frederic Sowa @ 2014-09-29 23:50 UTC (permalink / raw)
  To: David Ahern, Ben Greear, Sowmini Varadhan
  Cc: Eric W. Biederman, Nicolas Dichtel, netdev

On Tue, Sep 30, 2014, at 01:43, David Ahern wrote:
> On 9/29/14, 11:00 AM, Ben Greear wrote:
> > On 09/29/2014 09:50 AM, Sowmini Varadhan wrote:
> >> On Mon, Sep 29, 2014 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
> >>> On 09/29/2014 06:06 AM, David Ahern wrote:
> >>
> >>>
> >>> We have implemented support for at least most of this (excepting duplicate IPs)
> >>> using routing tables, rules, and (optionally, xorp as the router).
> >>>
> >>
> >> My undertanding of multiple routing-tables/rules was that they
> >> are closer in semantics to switch/router ACLs than to VRFs, eg.,
> >> one big difference is that an interface can belong to exactly one
> >> VRF at a time, which is not mandated by multiple routing-tables/rules.
> >>
> >> Was I mistaken?
> >
> > You can effectively force an interface to belong to a particular virtual
> > router (table).  It is not trivial to do, and possibly I have still not
> > covered every possible case.  Some rules grow somewhat exponentially as
> > interfaces are added to virtual routers (ie, preference 10 rules).
> 
> An interesting way of doing it; thanks for the reference point.
> 
> Fundamentally the design should be able to assign interfaces to a single 
> VRF, support duplicate IP addresses on different interfaces in different 
> VRFs and be able to scale to 10,000+ netdevices -- devices representing 
> physical ports as well as logical interfaces built on top of them (e.g., 
> sub-interfaces).

Duplicate IP addresses don't go well with current linux stack being a
soft end model by default. Current separation is done on arp level today
if some kind of strong end model is desired. This calls for some kind of
namespaces again. ;)

Bye,
Hannes

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: VRFs and the scalability of namespaces
  2014-09-29 23:50             ` Hannes Frederic Sowa
@ 2014-09-30  1:15               ` Ben Greear
  0 siblings, 0 replies; 15+ messages in thread
From: Ben Greear @ 2014-09-30  1:15 UTC (permalink / raw)
  To: Hannes Frederic Sowa, David Ahern, Sowmini Varadhan
  Cc: Eric W. Biederman, Nicolas Dichtel, netdev



On 09/29/2014 04:50 PM, Hannes Frederic Sowa wrote:
> On Tue, Sep 30, 2014, at 01:43, David Ahern wrote:
>> On 9/29/14, 11:00 AM, Ben Greear wrote:
>>> On 09/29/2014 09:50 AM, Sowmini Varadhan wrote:
>>>> On Mon, Sep 29, 2014 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>>> On 09/29/2014 06:06 AM, David Ahern wrote:
>>>>
>>>>>
>>>>> We have implemented support for at least most of this (excepting duplicate IPs)
>>>>> using routing tables, rules, and (optionally, xorp as the router).
>>>>>
>>>>
>>>> My undertanding of multiple routing-tables/rules was that they
>>>> are closer in semantics to switch/router ACLs than to VRFs, eg.,
>>>> one big difference is that an interface can belong to exactly one
>>>> VRF at a time, which is not mandated by multiple routing-tables/rules.
>>>>
>>>> Was I mistaken?
>>>
>>> You can effectively force an interface to belong to a particular virtual
>>> router (table).  It is not trivial to do, and possibly I have still not
>>> covered every possible case.  Some rules grow somewhat exponentially as
>>> interfaces are added to virtual routers (ie, preference 10 rules).
>>
>> An interesting way of doing it; thanks for the reference point.
>>
>> Fundamentally the design should be able to assign interfaces to a single
>> VRF, support duplicate IP addresses on different interfaces in different
>> VRFs and be able to scale to 10,000+ netdevices -- devices representing
>> physical ports as well as logical interfaces built on top of them (e.g.,
>> sub-interfaces).
>
> Duplicate IP addresses don't go well with current linux stack being a
> soft end model by default. Current separation is done on arp level today
> if some kind of strong end model is desired. This calls for some kind of
> namespaces again. ;)

arp is per interface as well if you set arp-filter properly, the main problem with duplicate IPs is that
you can't (easily?) set up routing rules that match them properly...

Thanks,
Ben

>
> Bye,
> Hannes
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-09-30  1:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-26 22:37 VRFs and the scalability of namespaces David Ahern
2014-09-26 23:52 ` Stephen Hemminger
2014-09-27  0:00   ` David Ahern
2014-09-27  1:25 ` Eric W. Biederman
2014-09-29 12:34   ` David Ahern
2014-09-27 13:29 ` Hannes Frederic Sowa
2014-09-27 14:09   ` Hannes Frederic Sowa
2014-09-29 13:06   ` David Ahern
2014-09-29 16:40     ` Ben Greear
2014-09-29 16:50       ` Sowmini Varadhan
2014-09-29 17:00         ` Ben Greear
2014-09-29 23:43           ` David Ahern
2014-09-29 23:50             ` Hannes Frederic Sowa
2014-09-30  1:15               ` Ben Greear
2014-09-29 18:05 ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).