From: David Ahern <dsahern@gmail.com>
To: Thomas Graf <tgraf@suug.ch>
Cc: netdev@vger.kernel.org, ebiederm@xmission.com
Subject: Re: [RFC PATCH 00/29] net: VRF support
Date: Tue, 10 Feb 2015 13:54:05 -0700 [thread overview]
Message-ID: <54DA6FED.5020907@gmail.com> (raw)
In-Reply-To: <20150210005344.GA6293@casper.infradead.org>
On 2/9/15 5:53 PM, Thomas Graf wrote:
> On 02/04/15 at 06:34pm, David Ahern wrote:
>> Namespaces provide excellent separation of the networking stack from the
>> netdevices and up. The intent of VRFs is to provide an additional,
>> logical separation at the L3 layer within a namespace.
>
> What you ask for seems to be L3 micro segmentation inside netns. I
I would not label it 'micro' but yes a L3 separation within a L1 separation.
> would argue that we already support this through multiple routing
> tables. I would prefer improving the existing architecture to cover
> your use cases: Increase the number of supported tables, extend
> routing rules as needed, ...
I've seen that response for VRFs as well. I have not personally tried
it, but from what I have read it does not work well. I think Roopa
responded that Cumulus has spent time on that path and has hit some
roadblocks.
>
>> The VRF id of tasks defaults to 1 and is inherited parent to child. It can
>> be read via the file '/proc/<pid>/vrf' and can be changed anytime by writing
>> to this file (if preferred this can be made a prctl to change the VRF id).
>> This allows services to be launched in a VRF context using ip, similar to
>> what is done for network namespaces.
>> e.g., ip vrf exec 99 /usr/sbin/sshd
>
> I think such as classification should occur through cgroups instead
> of touching PIDs directly.
That is an interesting idea -- using cgroups for task labeling. It
presents a creation / deletion event for VRFs which I was trying to
avoid, and there will be some amount of overhead with a cgroup. I'll
take a look at that option when I get some time.
As for as the current proposal I am treating VRF as part of a network
context. Today 'ip netns' is used to run a command in a specific network
namespace; the proposal with the VRF layering is to add a vrf context
within a namespace so in keeping with how 'ip netns' works the above
syntax allows a user to supply both a network namespace + VRF for
running a command.
>
>> Network devices belong to a single VRF context which defaults to VRF 1.
>> They can be assigned to another VRF using IFLA_VRF attribute in link
>> messages. Similarly the VRF assignment is returned in the IFLA_VRF
>> attribute. The ip command has been modified to display the VRF id of a
>> device. L2 applications like lldp are not VRF aware and still work through
>> through all network devices within the namespace.
>
> I believe that binding net_devices to VRFs is misleading and the
> concept by itself is non-scalable. You do not want to create 10k
> net_devices for your overlay of choice just to tie them to a
> particular VRF. You want to store the VRF identifier as metadata and
> have a stateless classifier included it in the VRF decision. See the
> recent VXLAN-GBP work.
I'll take a look when I get time.
I have not seen scalability issues creating 1,000+ net_devices.
Certainly the 40k'ish memory per net_device is noticeable but I believe
that can be improved (e.g., a number of entries can be moved under
proper CONFIG_ checks). I do need to repeat the tests on newer kernels.
>
> You could either map whatever selects the VRF to the mark or support it
> natively in the routing rules classifier.
>
> An obvious alternative is OVS. What you describe can be implemented in
> a scalable matter using OVS and mark. I understand that OVS is not for
> everybody but it gets a fundamental principle right: Scalability
> demands for programmability.
>
> I don’t think we should be adding a new single purpose metadata field
> to arbitrary structures for every new use case that comes up. We
> should work on programmability which increases flexibility and allows
> decoupling application interest from networking details.
>
>> On RX skbs get their VRF context from the netdevice the packet is received
>> on. For TX the VRF context for an skb is taken from the socket. The
>> intention is for L3/raw sockets to be able to set the VRF context for a
>> packet TX using cmsg (not coded in this patch set).
>
> Specyfing L3 context in cmsg seems very broken to me. We do not want
> to bind applications any closer to underlying networking infrastructure.
> In fact, we should do the opposite and decouple this completely.
That suggestion is inline with what is done today for other L3
parameters -- TOS, TTL, and a few others.
>
>> The 'any' context applies to listen sockets only; connected sockets are in
>> a VRF context. Child sockets accepted by the daemon acquire the VRF context
>> of the network device the connection originated on.
>
> Linux considers an address local regardless of the interface the packet
> was received on. So you would accept the packet on any interface and
> then bind it to the VRF of that interface even though the route for it
> might be on a different interface.
>
> This really belongs into routing rules from my perspective which takes
> mark and the cgroup context into account.
Expanding the current network namespace checks to a networking context
is a very simple and clean way of implementing VRFs versus cobbling
together a 'VRF like' capability using marks, multiple tables, etc (ie.,
the existing capabilities). Further, the VRF tagging of net_devices
seems to readily fit into the hardware offload and switchdev
capabilities (e.g., add a ndo operation for setting the VRF tag on a
device which passes it to the driver).
Big picture wise where is OCP and switchdev headed? Top-of-rack switches
seem to be the first target, but after that? Will the kernel ever
support MPLS? Will the kernel attain the richer feature set of high-end
routers? If so, how does VRF support fit into the design? As I
understand it a scalable VRF solution is a fundamental building block.
Will a cobbled together solution of cgroups, marks, rules, multiple
tables really work versus the simplicity of an expanded network context?
David
next prev parent reply other threads:[~2015-02-10 20:54 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-05 1:34 [RFC PATCH 00/29] net: VRF support David Ahern
2015-02-05 1:34 ` [RFC PATCH 01/29] net: Introduce net_ctx and macro for context comparison David Ahern
2015-02-05 1:34 ` [RFC PATCH 02/29] net: Flip net_device to use net_ctx David Ahern
2015-02-05 13:47 ` Nicolas Dichtel
2015-02-06 0:45 ` David Ahern
2015-02-05 1:34 ` [RFC PATCH 03/29] net: Flip sock_common to net_ctx David Ahern
2015-02-05 1:34 ` [RFC PATCH 04/29] net: Add net_ctx macros for skbuffs David Ahern
2015-02-05 1:34 ` [RFC PATCH 05/29] net: Flip seq_net_private to net_ctx David Ahern
2015-02-05 1:34 ` [RFC PATCH 06/29] net: Flip fib_rules and fib_rules_ops to use net_ctx David Ahern
2015-02-05 1:34 ` [RFC PATCH 07/29] net: Flip inet_bind_bucket to net_ctx David Ahern
2015-02-05 1:34 ` [RFC PATCH 08/29] net: Flip fib_info " David Ahern
2015-02-05 1:34 ` [RFC PATCH 09/29] net: Flip ip6_flowlabel " David Ahern
2015-02-05 1:34 ` [RFC PATCH 10/29] net: Flip neigh structs " David Ahern
2015-02-05 1:34 ` [RFC PATCH 11/29] net: Flip nl_info " David Ahern
2015-02-05 1:34 ` [RFC PATCH 12/29] net: Add device lookups by net_ctx David Ahern
2015-02-05 1:34 ` [RFC PATCH 13/29] net: Convert function arg from struct net to struct net_ctx David Ahern
2015-02-05 1:34 ` [RFC PATCH 14/29] net: vrf: Introduce vrf header file David Ahern
2015-02-05 13:44 ` Nicolas Dichtel
2015-02-06 0:52 ` David Ahern
2015-02-06 8:53 ` Nicolas Dichtel
2015-02-05 1:34 ` [RFC PATCH 15/29] net: vrf: Add vrf to net_ctx struct David Ahern
2015-02-05 1:34 ` [RFC PATCH 16/29] net: vrf: Set default vrf David Ahern
2015-02-05 1:34 ` [RFC PATCH 17/29] net: vrf: Add vrf context to task struct David Ahern
2015-02-05 1:34 ` [RFC PATCH 18/29] net: vrf: Plumbing for vrf context on a socket David Ahern
2015-02-05 13:44 ` Nicolas Dichtel
2015-02-06 1:18 ` David Ahern
2015-02-05 1:34 ` [RFC PATCH 19/29] net: vrf: Add vrf context to skb David Ahern
2015-02-05 13:45 ` Nicolas Dichtel
2015-02-06 1:21 ` David Ahern
2015-02-06 3:54 ` Eric W. Biederman
2015-02-06 6:00 ` David Ahern
2015-02-05 1:34 ` [RFC PATCH 20/29] net: vrf: Add vrf context to flow struct David Ahern
2015-02-05 1:34 ` [RFC PATCH 21/29] net: vrf: Add vrf context to genid's David Ahern
2015-02-05 1:34 ` [RFC PATCH 22/29] net: vrf: Set VRF id in various network structs David Ahern
2015-02-05 1:34 ` [RFC PATCH 23/29] net: vrf: Enable vrf checks David Ahern
2015-02-05 1:34 ` [RFC PATCH 24/29] net: vrf: Add support to get/set vrf context on a device David Ahern
2015-02-05 1:34 ` [RFC PATCH 25/29] net: vrf: Handle VRF any context David Ahern
2015-02-05 13:46 ` Nicolas Dichtel
2015-02-06 1:23 ` David Ahern
2015-02-05 1:34 ` [RFC PATCH 26/29] net: vrf: Change single_open_net to pass net_ctx David Ahern
2015-02-05 1:34 ` [RFC PATCH 27/29] net: vrf: Add vrf checks and context to ipv4 proc files David Ahern
2015-02-05 1:34 ` [RFC PATCH 28/29] iproute2: vrf: Add vrf subcommand David Ahern
2015-02-05 1:34 ` [RFC PATCH 29/29] iproute2: Add vrf option to ip link command David Ahern
2015-02-05 5:17 ` [RFC PATCH 00/29] net: VRF support roopa
2015-02-05 13:44 ` Nicolas Dichtel
2015-02-06 1:32 ` David Ahern
2015-02-06 8:53 ` Nicolas Dichtel
2015-02-05 23:12 ` roopa
2015-02-06 2:19 ` David Ahern
2015-02-09 16:38 ` roopa
2015-02-10 10:43 ` Derek Fawcus
2015-02-06 6:10 ` Shmulik Ladkani
2015-02-09 15:54 ` roopa
2015-02-11 7:42 ` Shmulik Ladkani
2015-02-06 1:33 ` Stephen Hemminger
2015-02-06 2:10 ` David Ahern
2015-02-06 4:14 ` Eric W. Biederman
2015-02-06 6:15 ` David Ahern
2015-02-06 15:08 ` Nicolas Dichtel
[not found] ` <87iofe7n1x.fsf@x220.int.ebiederm.org>
2015-02-09 20:48 ` Nicolas Dichtel
2015-02-11 4:14 ` David Ahern
2015-02-06 15:10 ` Nicolas Dichtel
2015-02-06 20:50 ` Eric W. Biederman
2015-02-09 0:36 ` David Ahern
2015-02-09 11:30 ` Derek Fawcus
[not found] ` <871tlxtbhd.fsf_-_@x220.int.ebiederm.org>
2015-02-11 2:55 ` network namespace bloat Eric Dumazet
2015-02-11 3:18 ` Eric W. Biederman
2015-02-19 19:49 ` David Miller
2015-03-09 18:22 ` [PATCH net-next 0/6] tcp_metrics: Network namespace bloat reduction Eric W. Biederman
2015-03-09 18:27 ` [PATCH net-next 1/6] tcp_metrics: panic when tcp_metrics can not be allocated Eric W. Biederman
2015-03-09 18:50 ` Sergei Shtylyov
2015-03-11 19:22 ` Sergei Shtylyov
2015-03-09 18:27 ` [PATCH net-next 2/6] tcp_metrics: Mix the network namespace into the hash function Eric W. Biederman
2015-03-09 18:29 ` [PATCH net-next 3/6] tcp_metrics: Add a field tcpm_net and verify it matches on lookup Eric W. Biederman
2015-03-09 20:25 ` Julian Anastasov
2015-03-10 6:59 ` Eric W. Biederman
2015-03-10 8:23 ` Julian Anastasov
2015-03-11 0:58 ` Eric W. Biederman
2015-03-10 16:36 ` David Miller
2015-03-10 17:06 ` Eric W. Biederman
2015-03-10 17:29 ` David Miller
2015-03-10 17:56 ` Eric W. Biederman
2015-03-09 18:30 ` [PATCH net-next 4/6] tcp_metrics: Remove the unused return code from tcp_metrics_flush_all Eric W. Biederman
2015-03-09 18:30 ` [PATCH net-next 5/6] tcp_metrics: Rewrite tcp_metrics_flush_all Eric W. Biederman
2015-03-09 18:31 ` [PATCH net-next 6/6] tcp_metrics: Use a single hash table for all network namespaces Eric W. Biederman
2015-03-09 18:43 ` Eric Dumazet
2015-03-09 18:47 ` Eric Dumazet
2015-03-09 19:35 ` Eric W. Biederman
2015-03-09 20:21 ` Eric Dumazet
2015-03-09 20:09 ` [PATCH net-next 0/6] tcp_metrics: Network namespace bloat reduction David Miller
2015-03-09 20:21 ` Eric W. Biederman
2015-03-11 16:33 ` [PATCH net-next 0/8] tcp_metrics: Network namespace bloat reduction v2 Eric W. Biederman
2015-03-11 16:35 ` [PATCH net-next 1/8] net: Kill hold_net release_net Eric W. Biederman
2015-03-11 16:55 ` Eric Dumazet
2015-03-11 17:34 ` Eric W. Biederman
2015-03-11 17:07 ` Eric Dumazet
2015-03-11 17:08 ` Eric Dumazet
2015-03-11 17:10 ` Eric Dumazet
2015-03-11 17:36 ` Eric W. Biederman
2015-03-11 16:36 ` [PATCH net-next 2/8] net: Introduce possible_net_t Eric W. Biederman
2015-03-11 16:38 ` [PATCH net-next 3/8] tcp_metrics: panic when tcp_metrics_init fails Eric W. Biederman
2015-03-11 16:38 ` [PATCH net-next 4/8] tcp_metrics: Mix the network namespace into the hash function Eric W. Biederman
2015-03-11 16:40 ` [PATCH net-next 5/8] tcp_metrics: Add a field tcpm_net and verify it matches on lookup Eric W. Biederman
2015-03-11 16:41 ` [PATCH net-next 6/8] tcp_metrics: Remove the unused return code from tcp_metrics_flush_all Eric W. Biederman
2015-03-11 16:43 ` [PATCH net-next 7/8] tcp_metrics: Rewrite tcp_metrics_flush_all Eric W. Biederman
2015-03-11 16:43 ` [PATCH net-next 8/8] tcp_metrics: Use a single hash table for all network namespaces Eric W. Biederman
2015-03-13 5:04 ` [PATCH net-next 0/6] tcp_metrics: Network namespace bloat reduction v3 Eric W. Biederman
2015-03-13 5:04 ` [PATCH net-next 1/6] tcp_metrics: panic when tcp_metrics_init fails Eric W. Biederman
2015-03-13 5:05 ` [PATCH net-next 2/6] tcp_metrics: Mix the network namespace into the hash function Eric W. Biederman
2015-03-13 5:05 ` [PATCH net-next 3/6] tcp_metrics: Add a field tcpm_net and verify it matches on lookup Eric W. Biederman
2015-03-13 5:06 ` [PATCH net-next 4/6] tcp_metrics: Remove the unused return code from tcp_metrics_flush_all Eric W. Biederman
2015-03-13 5:07 ` [PATCH net-next 5/6] tcp_metrics: Rewrite tcp_metrics_flush_all Eric W. Biederman
2015-03-13 5:07 ` [PATCH net-next 6/6] tcp_metrics: Use a single hash table for all network namespaces Eric W. Biederman
2015-03-13 5:57 ` [PATCH net-next 0/6] tcp_metrics: Network namespace bloat reduction v3 David Miller
2015-02-11 17:09 ` network namespace bloat Nicolas Dichtel
2015-02-10 0:53 ` [RFC PATCH 00/29] net: VRF support Thomas Graf
2015-02-10 20:54 ` David Ahern [this message]
2016-05-25 16:04 ` Chenna
2016-05-25 19:04 ` David Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54DA6FED.5020907@gmail.com \
--to=dsahern@gmail.com \
--cc=ebiederm@xmission.com \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).