From: David Ahern <dsahern@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: nicolas.dichtel@6wind.com,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: VRFs and the scalability of namespaces
Date: Fri, 26 Sep 2014 16:37:26 -0600 [thread overview]
Message-ID: <5425EAA6.7040302@gmail.com> (raw)
Hi Eric:
As you suggested [1] I am starting a new thread to discuss scalability
problems using namespaces for VRFs.
Background
----------
Consider a single system that wants to provide VRF-based features with
support for N VRFs. N could easily be 2048 (e.g., 6Wind, [2]), 4000
(e.g., Cisco, [3]) or even higher.
The single system with support for N VRFs runs M services (e.g., quagga,
cdp, lldp, stp, strongswan, some homegrown routing protocol) and
includes standard system services like sshd. Furthermore, a system also
includes monitoring programs like snmpd and tcollector. In short, M is
easily 20 processes that need to have a presence across all VRFs.
Network Namespaces for VRFs
---------------------------
For the past 4 years or so the response to VRF questions is a drum beat
of "use network namespaces". But namespaces are not a good match for VRFs.
1. Network namespaces are a complete separation of the networking stack
from network devices up. VRFs are an L3 concept. Using namespaces forces
an L3 separation concept onto L2 apps -- lldp, cdp, etc.
There are use cases when you want device level separation, use cases
where you want only L3 and up separation, and cases where you want both
(e.g., divy up the netdevices in a system across some small number of
namespaces and then provide VRF based features within a namespace).
2. Scalability of apps providing service as namespaces are created. How
do you create the presence for each service in a network namespace?
a. Spawn a new process for each namespace? brute force approach and
extremely resource intensive. e.g., the quagga example [4]
b. spawn a thread for each namespace? Better than a full process but
still a heavyweight solution
c. create a socket per namespace. Better but still this is a resource
intensive solution -- N listen sockets per service and each service
needs to be modified for namespace support. For opensource software that
means each project has to agree that namespace awareness is relevant and
agree to take the patches.
3. Just creating a network namespace consumes non-negligible amount of
memory -- ~200kB for the 3.10 kernel. I believe the /proc entries are
the bulk of that memory usage. 200kB/namespace is again a lot of wasted
memory and overhead.
4. For a single process to straddle multiple namespaces it has to run
with full root privileges -- CAP_SYS_ADMIN -- to use setns. Using
network sockets does not require a process to run as root at all unless
it wants privileged ports in which case CAP_NET_BIND_SERVICE is
sufficient, not full root.
The Linux kernel needs proper VRF support -- as an L3 concept. A
capability to run a process in a "VRF any" context provides a resource
efficient solution where a single process with a single listen socket
works across all VRFs in a namespace and then connected sockets have a
specific VRF context.
Before droning on even more, does the above provide better context on
the general problem?
Thanks,
David
[1] https://lkml.org/lkml/2014/9/26/840
[2] http://www.6wind.com/6windgate-performance/ip-forwarding
[3]
http://www.cisco.com/c/en/us/td/docs/switches/datacenter/sw/verified_scalability/b_Cisco_Nexus_7000_Series_NX-OS_Verified_Scalability_Guide.html
[4]
https://lists.quagga.net/pipermail/quagga-users/2010-February/011351.html
next reply other threads:[~2014-09-26 22:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-26 22:37 David Ahern [this message]
2014-09-26 23:52 ` VRFs and the scalability of namespaces Stephen Hemminger
2014-09-27 0:00 ` David Ahern
2014-09-27 1:25 ` Eric W. Biederman
2014-09-29 12:34 ` David Ahern
2014-09-27 13:29 ` Hannes Frederic Sowa
2014-09-27 14:09 ` Hannes Frederic Sowa
2014-09-29 13:06 ` David Ahern
2014-09-29 16:40 ` Ben Greear
2014-09-29 16:50 ` Sowmini Varadhan
2014-09-29 17:00 ` Ben Greear
2014-09-29 23:43 ` David Ahern
2014-09-29 23:50 ` Hannes Frederic Sowa
2014-09-30 1:15 ` Ben Greear
2014-09-29 18:05 ` Cong Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5425EAA6.7040302@gmail.com \
--to=dsahern@gmail.com \
--cc=ebiederm@xmission.com \
--cc=netdev@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).