netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ido Schimmel <idosch@idosch.org>
To: Ted Chen <znscnchen@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, andrew+netdev@lunn.ch, netdev@vger.kernel.org
Subject: Re: [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI
Date: Sun, 2 Feb 2025 15:40:35 +0200	[thread overview]
Message-ID: <Z59109CGe8WmZVsJ@shredder> (raw)
In-Reply-To: <20250201113207.107798-1-znscnchen@gmail.com>

On Sat, Feb 01, 2025 at 07:32:07PM +0800, Ted Chen wrote:
> This RFC series proposes an implementation to enable the configuration of vxlan
> devices in a Hub-Spoke Network, allowing multiple vxlan devices to share the
> same VNI while being associated with different remote IPs under the same UDP
> port.
> 
> == Use case ==
> In a Hub-Spoke Network, there is a central VTEP acting as the gateway, along
> with multiple outer VTEPs. Each outer VTEP communicates exclusively with the
> central VTEP and has no direct connection to other outer VTEPs. As a result,
> data exchanged between outer VTEPs must traverse the central VTEP. This design
> enhances security and enables centralized auditing and monitoring at the
> central VTEP.
> 
> == Existing methods ==
> Currently, there are three methods to implement the use case.
> 
> Method 1:
>          The central VTEP establishes a separate vxlan tunnel with each outer
>          VTEP, creating a vxlan device with a different VNI for each tunnel.
>          All vxlan devices are then added to the same Linux bridge to enable
>          forwarding.
> 
>          Drawbacks: Complex configuration.
>          Each tenant requires multiple VNIs.

This looks like the most straightforward option to me.

Why do you view it as complex? Why multiple VNIs per tenant are a
problem when we have 16M of them?

> 
> Method 2:
>         The central VTEP creates a single vxlan device using the same VNI,
>         without configuring a remote IP. The IP addresses of all outer VTEPs
>         are stored in the fdb. To enable forwarding, the vxlan device is added
>         to a Linux bridge with hairpin mode enabled.
> 
>         Drawbacks: unnecessary overhead or network anomalies
>         The hairpin mode may broadcast packets to all outer VTEPs, causing the
>         source outer VTEP receiving packets it originally sent to the central
>         VTEP. If the packet from the source outer VTEP is a broadcast packet,
>         the broadcasting back of the packet can cause network anomalies.
> 
> Method 3:
>         The central VTEP uses the same VNI but different UDP ports to create a
>         vxlan device for each outer VTEP, each tunneling to its corresponding
>         outer VTEP. All the vxlan devices in the central VTEP are then added to
>         the same Linux bridge to enable forwarding.
> 
>         Drawbacks: complex configuration and potential security issues.
>         Multiple UDP ports are required.
> 
> == Proposed implementation ==
> In the central VTEP, each tenant only requires a single VNI, and all tenants
> share the same UDP port. This can avoid the drawbacks of the above three
> methods.

This method also has drawbacks. It breaks existing behavior (see my
comment on patch #1) and it also bloats the VXLAN receive path.

I want to suggest an alternative which allows you to keep the existing
topology (same VNI), but without kernel changes. The configuration of
the outer VTEPs remains the same. The steps below are for the central
VTEP.

First, create a VXLAN device in "external" mode. It will consume all the
VNIs in a namespace, but you can limit it with the "vnifilter" keyword,
if needed:

# ip -n ns_c link add name vx0 type vxlan dstport 4789 nolearning external
# tc -n ns_c qdisc add dev vx0 clsact

Then, for each outer VTEP, create a dummy device and enslave it to the
bridge. Taking outer VTEP1 as an example:

# ip -n ns_c link add name dummy_vtep1 up master br0
# tc -n ns_c qdisc add dev dummy_vtep1 clsact

In order to demultiplex incoming VXLAN packets to the appropriate bridge
member, use an ingress tc filter on the VXLAN device that matches on the
encapsulating source IP (you can't do it w/o the "external" keyword) and
redirects the traffic to the corresponding bridge member:

# tc -n ns_c filter add dev vx0 ingress pref 1 proto all \
	flower enc_key_id 42 enc_src_ip 10.0.0.1 \
	action mirred ingress redirect dev dummy_ns1

(add filters for other VTEPs with "pref 1" to avoid unnecessary
lookups).

For Tx, on each bridge member, configure an egress tc filter that
attaches tunnel metadata for the matching outer VTEP and redirects to
the VXLAN device:

# tc -n ns_c filter add dev dummy_vtep1 egress pref 1 proto all \
	matchall \
	action tunnel_key set src_ip 10.0.0.3 dst_ip 10.0.0.1 id 42 dst_port 4789 \
	action mirred egress redirect dev vx0

The end result should be that the bridge forwards known unicast traffic
to the appropriate outer VTEP and floods BUM traffic to all the outer
VTEPs but the one from which the traffic was received.

> 
> As in below example,
> - a tunnel is established between vxlan42.1 in the central VTEP and vxlan42 in
>   the outer VTEP1:
>   ip link add vxlan42.1 type vxlan id 42 \
>           local 10.0.0.3 remote 10.0.0.1 dstport 4789
> 
> - a tunnel is established between vxlan42.2 in the central VTEP and vxlan42 in
>   the outer VTEP2:
>   ip link add vxlan42.2 type vxlan id 42 \
>   		  local 10.0.0.3 remote 10.0.0.2 dstport 4789
> 
> 
>     ┌────────────────────────────────────────────┐
>     │       ┌─────────────────────────┐  central │
>     │       │          br0            │    VTEP  │
>     │       └─┬────────────────────┬──┘          │
>     │   ┌─────┴───────┐      ┌─────┴───────┐     │          
>     │   │ vxlan42.1   │      │  vxlan42.2  │     │
>     │   └─────────────┘      └─────────────┘     │  
>     └───────────────────┬─┬──────────────────────┘
>                         │ │ eth0 10.0.0.3:4789
>                         │ │            
>                         │ │            
>        ┌────────────────┘ └───────────────┐
>        │eth0 10.0.0.1:4789                │eth0 10.0.0.2:4789
>  ┌─────┴───────┐                    ┌─────┴───────┐
>  │outer VTEP1  │                    │outer VTEP2  │
>  │     vxlan42 │                    │     vxlan42 │
>  └─────────────┘                    └─────────────┘
> 
> 
> == Test scenario ==
> ip netns add ns_1
> ip link add veth1 type veth peer name veth1-peer
> ip link set veth1 netns ns_1
> ip netns exec ns_1 ip addr add 10.0.1.1/24 dev veth1
> ip netns exec ns_1 ip link set veth1 up
> ip netns exec ns_1 ip link add vxlan42 type vxlan id 42 \
>                    remote 10.0.1.3 dstport 4789
> ip netns exec ns_1 ip addr add 192.168.0.1/24 dev vxlan42
> ip netns exec ns_1 ip link set up dev vxlan42
> 
> ip netns add ns_2
> ip link add veth2 type veth peer name veth2-peer
> ip link set veth2 netns ns_2
> ip netns exec ns_2 ip addr add 10.0.1.2/24 dev veth2
> ip netns exec ns_2 ip link set veth2 up
> ip netns exec ns_2 ip link add vxlan42 type vxlan id 42 \
>                    remote 10.0.1.3 dstport 4789
> ip netns exec ns_2 ip addr add 192.168.0.2/24 dev vxlan42
> ip netns exec ns_2 ip link set up dev vxlan42
> 
> ip netns add ns_c
> ip link add veth3 type veth peer name veth3-peer
> ip link set veth3 netns ns_c
> ip netns exec ns_c ip addr add 10.0.1.3/24 dev veth3
> ip netns exec ns_c ip link set veth3 up
> ip netns exec ns_c ip link add vxlan42.1 type vxlan id 42 \
>                    local 10.0.1.3 remote 10.0.1.1 dstport 4789
> ip netns exec ns_c ip link add vxlan42.2 type vxlan id 42 \
>                    local 10.0.1.3 remote 10.0.1.2 dstport 4789
> ip netns exec ns_c ip link set up dev vxlan42.1
> ip netns exec ns_c ip link set up dev vxlan42.2
> ip netns exec ns_c ip link add name br0 type bridge
> ip netns exec ns_c ip link set br0 up
> ip netns exec ns_c ip link set vxlan42.1 master br0
> ip netns exec ns_c ip link set vxlan42.2 master br0
> 
> ip link add name br1 type bridge
> ip link set br1 up
> ip link set veth1-peer up
> ip link set veth2-peer up
> ip link set veth3-peer up
> ip link set veth1-peer master br1
> ip link set veth2-peer master br1
> ip link set veth3-peer master br1
> 
> ip netns exec ns_1 ping 192.168.0.2 -I 192.168.0.1
> 
> Ted Chen (3):
>   vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and
>     remote_ip
>   vxlan: Do not treat vxlan dev as used when unicast remote_ip
>     mismatches
>   vxlan: vxlan_rcv(): Update comment to inlucde ipv6
> 
>  drivers/net/vxlan/vxlan_core.c | 38 +++++++++++++++++++++++++++-------
>  1 file changed, 31 insertions(+), 7 deletions(-)
> 
> -- 
> 2.39.2
> 
> 

  parent reply	other threads:[~2025-02-02 13:40 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-01 11:32 [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip Ted Chen
2025-02-02 11:56   ` Ido Schimmel
2025-02-04 13:09     ` Ted Chen
2025-02-04 14:16       ` Ido Schimmel
2025-02-05 12:27         ` Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 2/3] vxlan: Do not treat vxlan dev as used when unicast remote_ip mismatches Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6 Ted Chen
2025-02-02 12:09   ` Ido Schimmel
2025-02-04 13:13     ` Ted Chen
2025-02-04 14:38       ` Ido Schimmel
2025-02-02 13:40 ` Ido Schimmel [this message]
2025-02-04 13:27   ` [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ted Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z59109CGe8WmZVsJ@shredder \
    --to=idosch@idosch.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=znscnchen@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).