* [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI
@ 2025-02-01 11:32 Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip Ted Chen
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Ted Chen @ 2025-02-01 11:32 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, andrew+netdev; +Cc: netdev, Ted Chen
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6497 bytes --]
This RFC series proposes an implementation to enable the configuration of vxlan
devices in a Hub-Spoke Network, allowing multiple vxlan devices to share the
same VNI while being associated with different remote IPs under the same UDP
port.
== Use case ==
In a Hub-Spoke Network, there is a central VTEP acting as the gateway, along
with multiple outer VTEPs. Each outer VTEP communicates exclusively with the
central VTEP and has no direct connection to other outer VTEPs. As a result,
data exchanged between outer VTEPs must traverse the central VTEP. This design
enhances security and enables centralized auditing and monitoring at the
central VTEP.
== Existing methods ==
Currently, there are three methods to implement the use case.
Method 1:
The central VTEP establishes a separate vxlan tunnel with each outer
VTEP, creating a vxlan device with a different VNI for each tunnel.
All vxlan devices are then added to the same Linux bridge to enable
forwarding.
Drawbacks: Complex configuration.
Each tenant requires multiple VNIs.
Method 2:
The central VTEP creates a single vxlan device using the same VNI,
without configuring a remote IP. The IP addresses of all outer VTEPs
are stored in the fdb. To enable forwarding, the vxlan device is added
to a Linux bridge with hairpin mode enabled.
Drawbacks: unnecessary overhead or network anomalies
The hairpin mode may broadcast packets to all outer VTEPs, causing the
source outer VTEP receiving packets it originally sent to the central
VTEP. If the packet from the source outer VTEP is a broadcast packet,
the broadcasting back of the packet can cause network anomalies.
Method 3:
The central VTEP uses the same VNI but different UDP ports to create a
vxlan device for each outer VTEP, each tunneling to its corresponding
outer VTEP. All the vxlan devices in the central VTEP are then added to
the same Linux bridge to enable forwarding.
Drawbacks: complex configuration and potential security issues.
Multiple UDP ports are required.
== Proposed implementation ==
In the central VTEP, each tenant only requires a single VNI, and all tenants
share the same UDP port. This can avoid the drawbacks of the above three
methods.
As in below example,
- a tunnel is established between vxlan42.1 in the central VTEP and vxlan42 in
the outer VTEP1:
ip link add vxlan42.1 type vxlan id 42 \
local 10.0.0.3 remote 10.0.0.1 dstport 4789
- a tunnel is established between vxlan42.2 in the central VTEP and vxlan42 in
the outer VTEP2:
ip link add vxlan42.2 type vxlan id 42 \
local 10.0.0.3 remote 10.0.0.2 dstport 4789
┌────────────────────────────────────────────┐
│ ┌─────────────────────────┐ central │
│ │ br0 │ VTEP │
│ └─┬────────────────────┬──┘ │
│ ┌─────┴───────┐ ┌─────┴───────┐ │
│ │ vxlan42.1 │ │ vxlan42.2 │ │
│ └─────────────┘ └─────────────┘ │
└───────────────────┬─┬──────────────────────┘
│ │ eth0 10.0.0.3:4789
│ │
│ │
┌────────────────┘ └───────────────┐
│eth0 10.0.0.1:4789 │eth0 10.0.0.2:4789
┌─────┴───────┐ ┌─────┴───────┐
│outer VTEP1 │ │outer VTEP2 │
│ vxlan42 │ │ vxlan42 │
└─────────────┘ └─────────────┘
== Test scenario ==
ip netns add ns_1
ip link add veth1 type veth peer name veth1-peer
ip link set veth1 netns ns_1
ip netns exec ns_1 ip addr add 10.0.1.1/24 dev veth1
ip netns exec ns_1 ip link set veth1 up
ip netns exec ns_1 ip link add vxlan42 type vxlan id 42 \
remote 10.0.1.3 dstport 4789
ip netns exec ns_1 ip addr add 192.168.0.1/24 dev vxlan42
ip netns exec ns_1 ip link set up dev vxlan42
ip netns add ns_2
ip link add veth2 type veth peer name veth2-peer
ip link set veth2 netns ns_2
ip netns exec ns_2 ip addr add 10.0.1.2/24 dev veth2
ip netns exec ns_2 ip link set veth2 up
ip netns exec ns_2 ip link add vxlan42 type vxlan id 42 \
remote 10.0.1.3 dstport 4789
ip netns exec ns_2 ip addr add 192.168.0.2/24 dev vxlan42
ip netns exec ns_2 ip link set up dev vxlan42
ip netns add ns_c
ip link add veth3 type veth peer name veth3-peer
ip link set veth3 netns ns_c
ip netns exec ns_c ip addr add 10.0.1.3/24 dev veth3
ip netns exec ns_c ip link set veth3 up
ip netns exec ns_c ip link add vxlan42.1 type vxlan id 42 \
local 10.0.1.3 remote 10.0.1.1 dstport 4789
ip netns exec ns_c ip link add vxlan42.2 type vxlan id 42 \
local 10.0.1.3 remote 10.0.1.2 dstport 4789
ip netns exec ns_c ip link set up dev vxlan42.1
ip netns exec ns_c ip link set up dev vxlan42.2
ip netns exec ns_c ip link add name br0 type bridge
ip netns exec ns_c ip link set br0 up
ip netns exec ns_c ip link set vxlan42.1 master br0
ip netns exec ns_c ip link set vxlan42.2 master br0
ip link add name br1 type bridge
ip link set br1 up
ip link set veth1-peer up
ip link set veth2-peer up
ip link set veth3-peer up
ip link set veth1-peer master br1
ip link set veth2-peer master br1
ip link set veth3-peer master br1
ip netns exec ns_1 ping 192.168.0.2 -I 192.168.0.1
Ted Chen (3):
vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and
remote_ip
vxlan: Do not treat vxlan dev as used when unicast remote_ip
mismatches
vxlan: vxlan_rcv(): Update comment to inlucde ipv6
drivers/net/vxlan/vxlan_core.c | 38 +++++++++++++++++++++++++++-------
1 file changed, 31 insertions(+), 7 deletions(-)
--
2.39.2
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip
2025-02-01 11:32 [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ted Chen
@ 2025-02-01 11:34 ` Ted Chen
2025-02-02 11:56 ` Ido Schimmel
2025-02-01 11:34 ` [PATCH RFC net-next 2/3] vxlan: Do not treat vxlan dev as used when unicast remote_ip mismatches Ted Chen
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Ted Chen @ 2025-02-01 11:34 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, andrew+netdev; +Cc: netdev, Ted Chen
vxlan_vs_find_vni() currently searches the vni hash table in a vs and
returns a vxlan_dev associated with the specified "vni". While this works
when the remote_ips are stored in the vxlan fdb, it fails to handle the
case where the remote_ip is just configured in the vxlan device outside of
the vxlan fdb, because multiple vxlan devices with different remote_ip may
share a single vni when the remote_ip is configured in the vxlan device
(i.e., not stored in the vxlan fdb). In that case, further check of
remote_ip to identify vxlan_dev more precisely.
Signed-off-by: Ted Chen <znscnchen@gmail.com>
---
drivers/net/vxlan/vxlan_core.c | 32 ++++++++++++++++++++++++++------
1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 05c10acb2a57..3ca74a97c44f 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -94,8 +94,10 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
int ifindex, __be32 vni,
+ const struct sk_buff *skb,
struct vxlan_vni_node **vninode)
{
+ union vxlan_addr saddr;
struct vxlan_vni_node *vnode;
struct vxlan_dev_node *node;
@@ -116,14 +118,31 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
continue;
}
- if (IS_ENABLED(CONFIG_IPV6)) {
- const struct vxlan_config *cfg = &node->vxlan->cfg;
+ const struct vxlan_config *cfg = &node->vxlan->cfg;
+ if (IS_ENABLED(CONFIG_IPV6)) {
if ((cfg->flags & VXLAN_F_IPV6_LINKLOCAL) &&
cfg->remote_ifindex != ifindex)
continue;
}
+ if (vni && !vxlan_addr_any(&cfg->remote_ip) &&
+ !vxlan_addr_multicast(&cfg->remote_ip)) {
+ /* Get address from the outer IP header */
+ if (vxlan_get_sk_family(vs) == AF_INET) {
+ saddr.sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
+ saddr.sa.sa_family = AF_INET;
+#if IS_ENABLED(CONFIG_IPV6)
+ } else {
+ saddr.sin6.sin6_addr = ipv6_hdr(skb)->saddr;
+ saddr.sa.sa_family = AF_INET6;
+#endif
+ }
+
+ if (!vxlan_addr_equal(&cfg->remote_ip, &saddr))
+ continue;
+ }
+
if (vninode)
*vninode = vnode;
return node->vxlan;
@@ -134,6 +153,7 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
/* Look up VNI in a per net namespace table */
static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
+ const struct sk_buff *skb,
__be32 vni, sa_family_t family,
__be16 port, u32 flags)
{
@@ -143,7 +163,7 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
if (!vs)
return NULL;
- return vxlan_vs_find_vni(vs, ifindex, vni, NULL);
+ return vxlan_vs_find_vni(vs, ifindex, vni, skb, NULL);
}
/* Fill in neighbour message in skbuff. */
@@ -1701,7 +1721,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
vni = vxlan_vni(vh->vx_vni);
- vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, &vninode);
+ vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, skb, &vninode);
if (!vxlan) {
reason = SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND;
goto drop;
@@ -1855,7 +1875,7 @@ static int vxlan_err_lookup(struct sock *sk, struct sk_buff *skb)
return -ENOENT;
vni = vxlan_vni(hdr->vx_vni);
- vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, NULL);
+ vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, skb, NULL);
if (!vxlan)
return -ENOENT;
@@ -2330,7 +2350,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
struct vxlan_dev *dst_vxlan;
dst_release(dst);
- dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni,
+ dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, skb, vni,
addr_family, dst_port,
vxlan->cfg.flags);
if (!dst_vxlan) {
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH RFC net-next 2/3] vxlan: Do not treat vxlan dev as used when unicast remote_ip mismatches
2025-02-01 11:32 [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip Ted Chen
@ 2025-02-01 11:34 ` Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6 Ted Chen
2025-02-02 13:40 ` [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ido Schimmel
3 siblings, 0 replies; 13+ messages in thread
From: Ted Chen @ 2025-02-01 11:34 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, andrew+netdev; +Cc: netdev, Ted Chen
Do not treat a vxlan_dev as used if the vxlan to be configured shares the
same vni as an existing vxlan_dev but has a different unicast remote_ip.
This enables multiple vxlan devices with distinct unicast remote_ips to be
bound to a single vni.
Signed-off-by: Ted Chen <znscnchen@gmail.com>
---
drivers/net/vxlan/vxlan_core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 3ca74a97c44f..5ef40ac816cc 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -3723,6 +3723,10 @@ int vxlan_vni_in_use(struct net *src_net, struct vxlan_dev *vxlan,
} else if (tmp->cfg.vni != vni) {
continue;
}
+ if (!vxlan_addr_any(&conf->remote_ip) &&
+ !vxlan_addr_multicast(&conf->remote_ip) &&
+ !vxlan_addr_equal(&tmp->cfg.remote_ip, &conf->remote_ip))
+ continue;
if (tmp->cfg.dst_port != conf->dst_port)
continue;
if ((tmp->cfg.flags & (VXLAN_F_RCV_FLAGS | VXLAN_F_IPV6)) !=
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6
2025-02-01 11:32 [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 2/3] vxlan: Do not treat vxlan dev as used when unicast remote_ip mismatches Ted Chen
@ 2025-02-01 11:34 ` Ted Chen
2025-02-02 12:09 ` Ido Schimmel
2025-02-02 13:40 ` [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ido Schimmel
3 siblings, 1 reply; 13+ messages in thread
From: Ted Chen @ 2025-02-01 11:34 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, andrew+netdev; +Cc: netdev, Ted Chen
Update the comment to indicate that both ipv4/udp.c and ipv6/udp.c invoke
vxlan_rcv() to process packets.
Signed-off-by: Ted Chen <znscnchen@gmail.com>
---
drivers/net/vxlan/vxlan_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 5ef40ac816cc..8bdf91d1fdfe 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -1684,7 +1684,7 @@ static bool vxlan_ecn_decapsulate(struct vxlan_sock *vs, void *oiph,
return err <= 1;
}
-/* Callback from net/ipv4/udp.c to receive packets */
+/* Callback from net/ipv{4,6}/udp.c to receive packets */
static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
{
struct vxlan_vni_node *vninode = NULL;
--
2.39.2
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip
2025-02-01 11:34 ` [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip Ted Chen
@ 2025-02-02 11:56 ` Ido Schimmel
2025-02-04 13:09 ` Ted Chen
0 siblings, 1 reply; 13+ messages in thread
From: Ido Schimmel @ 2025-02-02 11:56 UTC (permalink / raw)
To: Ted Chen; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Sat, Feb 01, 2025 at 07:34:00PM +0800, Ted Chen wrote:
> vxlan_vs_find_vni() currently searches the vni hash table in a vs and
> returns a vxlan_dev associated with the specified "vni". While this works
> when the remote_ips are stored in the vxlan fdb, it fails to handle the
> case where the remote_ip is just configured in the vxlan device outside of
> the vxlan fdb, because multiple vxlan devices with different remote_ip may
> share a single vni when the remote_ip is configured in the vxlan device
> (i.e., not stored in the vxlan fdb). In that case, further check of
> remote_ip to identify vxlan_dev more precisely.
>
> Signed-off-by: Ted Chen <znscnchen@gmail.com>
> ---
> drivers/net/vxlan/vxlan_core.c | 32 ++++++++++++++++++++++++++------
> 1 file changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
> index 05c10acb2a57..3ca74a97c44f 100644
> --- a/drivers/net/vxlan/vxlan_core.c
> +++ b/drivers/net/vxlan/vxlan_core.c
> @@ -94,8 +94,10 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
>
> static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
> int ifindex, __be32 vni,
> + const struct sk_buff *skb,
> struct vxlan_vni_node **vninode)
> {
> + union vxlan_addr saddr;
> struct vxlan_vni_node *vnode;
> struct vxlan_dev_node *node;
>
> @@ -116,14 +118,31 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
> continue;
> }
>
> - if (IS_ENABLED(CONFIG_IPV6)) {
> - const struct vxlan_config *cfg = &node->vxlan->cfg;
> + const struct vxlan_config *cfg = &node->vxlan->cfg;
>
> + if (IS_ENABLED(CONFIG_IPV6)) {
> if ((cfg->flags & VXLAN_F_IPV6_LINKLOCAL) &&
> cfg->remote_ifindex != ifindex)
> continue;
> }
>
> + if (vni && !vxlan_addr_any(&cfg->remote_ip) &&
> + !vxlan_addr_multicast(&cfg->remote_ip)) {
> + /* Get address from the outer IP header */
> + if (vxlan_get_sk_family(vs) == AF_INET) {
> + saddr.sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
> + saddr.sa.sa_family = AF_INET;
> +#if IS_ENABLED(CONFIG_IPV6)
> + } else {
> + saddr.sin6.sin6_addr = ipv6_hdr(skb)->saddr;
> + saddr.sa.sa_family = AF_INET6;
> +#endif
> + }
> +
> + if (!vxlan_addr_equal(&cfg->remote_ip, &saddr))
> + continue;
This breaks existing behavior. Before this patch, a VXLAN device with a
remote address could receive traffic from any VTEP (in the same
broadcast domain).
I think this patch misinterprets the "remote" keyword as P2P when it is
not the case. It is merely the VTEP to which packets are sent when no
other VTEP was found in the FDB. A VXLAN device that was configured with
the "remote" keyword can still send packets to other VTEPs and it should
therefore be able to receive packets from them.
> + }
> +
> if (vninode)
> *vninode = vnode;
> return node->vxlan;
> @@ -134,6 +153,7 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
>
> /* Look up VNI in a per net namespace table */
> static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
> + const struct sk_buff *skb,
> __be32 vni, sa_family_t family,
> __be16 port, u32 flags)
> {
> @@ -143,7 +163,7 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
> if (!vs)
> return NULL;
>
> - return vxlan_vs_find_vni(vs, ifindex, vni, NULL);
> + return vxlan_vs_find_vni(vs, ifindex, vni, skb, NULL);
> }
>
> /* Fill in neighbour message in skbuff. */
> @@ -1701,7 +1721,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
>
> vni = vxlan_vni(vh->vx_vni);
>
> - vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, &vninode);
> + vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, skb, &vninode);
> if (!vxlan) {
> reason = SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND;
> goto drop;
> @@ -1855,7 +1875,7 @@ static int vxlan_err_lookup(struct sock *sk, struct sk_buff *skb)
> return -ENOENT;
>
> vni = vxlan_vni(hdr->vx_vni);
> - vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, NULL);
> + vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, skb, NULL);
> if (!vxlan)
> return -ENOENT;
>
> @@ -2330,7 +2350,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
> struct vxlan_dev *dst_vxlan;
>
> dst_release(dst);
> - dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni,
> + dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, skb, vni,
> addr_family, dst_port,
> vxlan->cfg.flags);
> if (!dst_vxlan) {
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6
2025-02-01 11:34 ` [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6 Ted Chen
@ 2025-02-02 12:09 ` Ido Schimmel
2025-02-04 13:13 ` Ted Chen
0 siblings, 1 reply; 13+ messages in thread
From: Ido Schimmel @ 2025-02-02 12:09 UTC (permalink / raw)
To: Ted Chen; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Sat, Feb 01, 2025 at 07:34:22PM +0800, Ted Chen wrote:
> Update the comment to indicate that both ipv4/udp.c and ipv6/udp.c invoke
Nit: net/ipv4/udp.c and net/ipv6/udp.c
> vxlan_rcv() to process packets.
>
> Signed-off-by: Ted Chen <znscnchen@gmail.com>
> ---
> drivers/net/vxlan/vxlan_core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
> index 5ef40ac816cc..8bdf91d1fdfe 100644
> --- a/drivers/net/vxlan/vxlan_core.c
> +++ b/drivers/net/vxlan/vxlan_core.c
> @@ -1684,7 +1684,7 @@ static bool vxlan_ecn_decapsulate(struct vxlan_sock *vs, void *oiph,
> return err <= 1;
> }
>
> -/* Callback from net/ipv4/udp.c to receive packets */
> +/* Callback from net/ipv{4,6}/udp.c to receive packets */
Maybe just remove the comment? I don't see how anyone can find it
useful.
Regardless, please submit this patch separately as it's not related to
the other patches in the series.
> static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
> {
> struct vxlan_vni_node *vninode = NULL;
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI
2025-02-01 11:32 [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ted Chen
` (2 preceding siblings ...)
2025-02-01 11:34 ` [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6 Ted Chen
@ 2025-02-02 13:40 ` Ido Schimmel
2025-02-04 13:27 ` Ted Chen
3 siblings, 1 reply; 13+ messages in thread
From: Ido Schimmel @ 2025-02-02 13:40 UTC (permalink / raw)
To: Ted Chen; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Sat, Feb 01, 2025 at 07:32:07PM +0800, Ted Chen wrote:
> This RFC series proposes an implementation to enable the configuration of vxlan
> devices in a Hub-Spoke Network, allowing multiple vxlan devices to share the
> same VNI while being associated with different remote IPs under the same UDP
> port.
>
> == Use case ==
> In a Hub-Spoke Network, there is a central VTEP acting as the gateway, along
> with multiple outer VTEPs. Each outer VTEP communicates exclusively with the
> central VTEP and has no direct connection to other outer VTEPs. As a result,
> data exchanged between outer VTEPs must traverse the central VTEP. This design
> enhances security and enables centralized auditing and monitoring at the
> central VTEP.
>
> == Existing methods ==
> Currently, there are three methods to implement the use case.
>
> Method 1:
> The central VTEP establishes a separate vxlan tunnel with each outer
> VTEP, creating a vxlan device with a different VNI for each tunnel.
> All vxlan devices are then added to the same Linux bridge to enable
> forwarding.
>
> Drawbacks: Complex configuration.
> Each tenant requires multiple VNIs.
This looks like the most straightforward option to me.
Why do you view it as complex? Why multiple VNIs per tenant are a
problem when we have 16M of them?
>
> Method 2:
> The central VTEP creates a single vxlan device using the same VNI,
> without configuring a remote IP. The IP addresses of all outer VTEPs
> are stored in the fdb. To enable forwarding, the vxlan device is added
> to a Linux bridge with hairpin mode enabled.
>
> Drawbacks: unnecessary overhead or network anomalies
> The hairpin mode may broadcast packets to all outer VTEPs, causing the
> source outer VTEP receiving packets it originally sent to the central
> VTEP. If the packet from the source outer VTEP is a broadcast packet,
> the broadcasting back of the packet can cause network anomalies.
>
> Method 3:
> The central VTEP uses the same VNI but different UDP ports to create a
> vxlan device for each outer VTEP, each tunneling to its corresponding
> outer VTEP. All the vxlan devices in the central VTEP are then added to
> the same Linux bridge to enable forwarding.
>
> Drawbacks: complex configuration and potential security issues.
> Multiple UDP ports are required.
>
> == Proposed implementation ==
> In the central VTEP, each tenant only requires a single VNI, and all tenants
> share the same UDP port. This can avoid the drawbacks of the above three
> methods.
This method also has drawbacks. It breaks existing behavior (see my
comment on patch #1) and it also bloats the VXLAN receive path.
I want to suggest an alternative which allows you to keep the existing
topology (same VNI), but without kernel changes. The configuration of
the outer VTEPs remains the same. The steps below are for the central
VTEP.
First, create a VXLAN device in "external" mode. It will consume all the
VNIs in a namespace, but you can limit it with the "vnifilter" keyword,
if needed:
# ip -n ns_c link add name vx0 type vxlan dstport 4789 nolearning external
# tc -n ns_c qdisc add dev vx0 clsact
Then, for each outer VTEP, create a dummy device and enslave it to the
bridge. Taking outer VTEP1 as an example:
# ip -n ns_c link add name dummy_vtep1 up master br0
# tc -n ns_c qdisc add dev dummy_vtep1 clsact
In order to demultiplex incoming VXLAN packets to the appropriate bridge
member, use an ingress tc filter on the VXLAN device that matches on the
encapsulating source IP (you can't do it w/o the "external" keyword) and
redirects the traffic to the corresponding bridge member:
# tc -n ns_c filter add dev vx0 ingress pref 1 proto all \
flower enc_key_id 42 enc_src_ip 10.0.0.1 \
action mirred ingress redirect dev dummy_ns1
(add filters for other VTEPs with "pref 1" to avoid unnecessary
lookups).
For Tx, on each bridge member, configure an egress tc filter that
attaches tunnel metadata for the matching outer VTEP and redirects to
the VXLAN device:
# tc -n ns_c filter add dev dummy_vtep1 egress pref 1 proto all \
matchall \
action tunnel_key set src_ip 10.0.0.3 dst_ip 10.0.0.1 id 42 dst_port 4789 \
action mirred egress redirect dev vx0
The end result should be that the bridge forwards known unicast traffic
to the appropriate outer VTEP and floods BUM traffic to all the outer
VTEPs but the one from which the traffic was received.
>
> As in below example,
> - a tunnel is established between vxlan42.1 in the central VTEP and vxlan42 in
> the outer VTEP1:
> ip link add vxlan42.1 type vxlan id 42 \
> local 10.0.0.3 remote 10.0.0.1 dstport 4789
>
> - a tunnel is established between vxlan42.2 in the central VTEP and vxlan42 in
> the outer VTEP2:
> ip link add vxlan42.2 type vxlan id 42 \
> local 10.0.0.3 remote 10.0.0.2 dstport 4789
>
>
> ┌────────────────────────────────────────────┐
> │ ┌─────────────────────────┐ central │
> │ │ br0 │ VTEP │
> │ └─┬────────────────────┬──┘ │
> │ ┌─────┴───────┐ ┌─────┴───────┐ │
> │ │ vxlan42.1 │ │ vxlan42.2 │ │
> │ └─────────────┘ └─────────────┘ │
> └───────────────────┬─┬──────────────────────┘
> │ │ eth0 10.0.0.3:4789
> │ │
> │ │
> ┌────────────────┘ └───────────────┐
> │eth0 10.0.0.1:4789 │eth0 10.0.0.2:4789
> ┌─────┴───────┐ ┌─────┴───────┐
> │outer VTEP1 │ │outer VTEP2 │
> │ vxlan42 │ │ vxlan42 │
> └─────────────┘ └─────────────┘
>
>
> == Test scenario ==
> ip netns add ns_1
> ip link add veth1 type veth peer name veth1-peer
> ip link set veth1 netns ns_1
> ip netns exec ns_1 ip addr add 10.0.1.1/24 dev veth1
> ip netns exec ns_1 ip link set veth1 up
> ip netns exec ns_1 ip link add vxlan42 type vxlan id 42 \
> remote 10.0.1.3 dstport 4789
> ip netns exec ns_1 ip addr add 192.168.0.1/24 dev vxlan42
> ip netns exec ns_1 ip link set up dev vxlan42
>
> ip netns add ns_2
> ip link add veth2 type veth peer name veth2-peer
> ip link set veth2 netns ns_2
> ip netns exec ns_2 ip addr add 10.0.1.2/24 dev veth2
> ip netns exec ns_2 ip link set veth2 up
> ip netns exec ns_2 ip link add vxlan42 type vxlan id 42 \
> remote 10.0.1.3 dstport 4789
> ip netns exec ns_2 ip addr add 192.168.0.2/24 dev vxlan42
> ip netns exec ns_2 ip link set up dev vxlan42
>
> ip netns add ns_c
> ip link add veth3 type veth peer name veth3-peer
> ip link set veth3 netns ns_c
> ip netns exec ns_c ip addr add 10.0.1.3/24 dev veth3
> ip netns exec ns_c ip link set veth3 up
> ip netns exec ns_c ip link add vxlan42.1 type vxlan id 42 \
> local 10.0.1.3 remote 10.0.1.1 dstport 4789
> ip netns exec ns_c ip link add vxlan42.2 type vxlan id 42 \
> local 10.0.1.3 remote 10.0.1.2 dstport 4789
> ip netns exec ns_c ip link set up dev vxlan42.1
> ip netns exec ns_c ip link set up dev vxlan42.2
> ip netns exec ns_c ip link add name br0 type bridge
> ip netns exec ns_c ip link set br0 up
> ip netns exec ns_c ip link set vxlan42.1 master br0
> ip netns exec ns_c ip link set vxlan42.2 master br0
>
> ip link add name br1 type bridge
> ip link set br1 up
> ip link set veth1-peer up
> ip link set veth2-peer up
> ip link set veth3-peer up
> ip link set veth1-peer master br1
> ip link set veth2-peer master br1
> ip link set veth3-peer master br1
>
> ip netns exec ns_1 ping 192.168.0.2 -I 192.168.0.1
>
> Ted Chen (3):
> vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and
> remote_ip
> vxlan: Do not treat vxlan dev as used when unicast remote_ip
> mismatches
> vxlan: vxlan_rcv(): Update comment to inlucde ipv6
>
> drivers/net/vxlan/vxlan_core.c | 38 +++++++++++++++++++++++++++-------
> 1 file changed, 31 insertions(+), 7 deletions(-)
>
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip
2025-02-02 11:56 ` Ido Schimmel
@ 2025-02-04 13:09 ` Ted Chen
2025-02-04 14:16 ` Ido Schimmel
0 siblings, 1 reply; 13+ messages in thread
From: Ted Chen @ 2025-02-04 13:09 UTC (permalink / raw)
To: Ido Schimmel; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Sun, Feb 02, 2025 at 01:56:36PM +0200, Ido Schimmel wrote:
> On Sat, Feb 01, 2025 at 07:34:00PM +0800, Ted Chen wrote:
> > vxlan_vs_find_vni() currently searches the vni hash table in a vs and
> > returns a vxlan_dev associated with the specified "vni". While this works
> > when the remote_ips are stored in the vxlan fdb, it fails to handle the
> > case where the remote_ip is just configured in the vxlan device outside of
> > the vxlan fdb, because multiple vxlan devices with different remote_ip may
> > share a single vni when the remote_ip is configured in the vxlan device
> > (i.e., not stored in the vxlan fdb). In that case, further check of
> > remote_ip to identify vxlan_dev more precisely.
> >
> > Signed-off-by: Ted Chen <znscnchen@gmail.com>
> > ---
> > drivers/net/vxlan/vxlan_core.c | 32 ++++++++++++++++++++++++++------
> > 1 file changed, 26 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
> > index 05c10acb2a57..3ca74a97c44f 100644
> > --- a/drivers/net/vxlan/vxlan_core.c
> > +++ b/drivers/net/vxlan/vxlan_core.c
> > @@ -94,8 +94,10 @@ static struct vxlan_sock *vxlan_find_sock(struct net *net, sa_family_t family,
> >
> > static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
> > int ifindex, __be32 vni,
> > + const struct sk_buff *skb,
> > struct vxlan_vni_node **vninode)
> > {
> > + union vxlan_addr saddr;
> > struct vxlan_vni_node *vnode;
> > struct vxlan_dev_node *node;
> >
> > @@ -116,14 +118,31 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
> > continue;
> > }
> >
> > - if (IS_ENABLED(CONFIG_IPV6)) {
> > - const struct vxlan_config *cfg = &node->vxlan->cfg;
> > + const struct vxlan_config *cfg = &node->vxlan->cfg;
> >
> > + if (IS_ENABLED(CONFIG_IPV6)) {
> > if ((cfg->flags & VXLAN_F_IPV6_LINKLOCAL) &&
> > cfg->remote_ifindex != ifindex)
> > continue;
> > }
> >
> > + if (vni && !vxlan_addr_any(&cfg->remote_ip) &&
> > + !vxlan_addr_multicast(&cfg->remote_ip)) {
> > + /* Get address from the outer IP header */
> > + if (vxlan_get_sk_family(vs) == AF_INET) {
> > + saddr.sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
> > + saddr.sa.sa_family = AF_INET;
> > +#if IS_ENABLED(CONFIG_IPV6)
> > + } else {
> > + saddr.sin6.sin6_addr = ipv6_hdr(skb)->saddr;
> > + saddr.sa.sa_family = AF_INET6;
> > +#endif
> > + }
> > +
> > + if (!vxlan_addr_equal(&cfg->remote_ip, &saddr))
> > + continue;
>
> This breaks existing behavior. Before this patch, a VXLAN device with a
> remote address could receive traffic from any VTEP (in the same
> broadcast domain).
>
> I think this patch misinterprets the "remote" keyword as P2P when it is
> not the case. It is merely the VTEP to which packets are sent when no
Yes. Thanks for pointing that out.
I didn't see target addresses were appended into the FDB when an unicast
remote_ip had been configured.
e.g.
Usually when (2)(3) are invoked, (1) is not called to configure a unicast
remote_ip to the VTEP (though it's allowed to call (1)).
(1) ip link add vxlan42 type vxlan id 42 \
local 10.0.0.1 remote 10.0.0.2 dstport 4789
(2) bridge fdb append to 00:00:00:00:00:00 dst 10.0.0.3 dev vxlan42
(3) bridge fdb append to 00:00:00:00:00:00 dst 10.0.0.4 dev vxlan42
So, this patch just leverages the case when remote_ip is configured in the
VTEP to stand for P2P.
Do you think there's a better way to identify P2P more precisely?
> other VTEP was found in the FDB. A VXLAN device that was configured with
> the "remote" keyword can still send packets to other VTEPs and it should
> therefore be able to receive packets from them.
>
> > + }
> > +
> > if (vninode)
> > *vninode = vnode;
> > return node->vxlan;
> > @@ -134,6 +153,7 @@ static struct vxlan_dev *vxlan_vs_find_vni(struct vxlan_sock *vs,
> >
> > /* Look up VNI in a per net namespace table */
> > static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
> > + const struct sk_buff *skb,
> > __be32 vni, sa_family_t family,
> > __be16 port, u32 flags)
> > {
> > @@ -143,7 +163,7 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, int ifindex,
> > if (!vs)
> > return NULL;
> >
> > - return vxlan_vs_find_vni(vs, ifindex, vni, NULL);
> > + return vxlan_vs_find_vni(vs, ifindex, vni, skb, NULL);
> > }
> >
> > /* Fill in neighbour message in skbuff. */
> > @@ -1701,7 +1721,7 @@ static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
> >
> > vni = vxlan_vni(vh->vx_vni);
> >
> > - vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, &vninode);
> > + vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, skb, &vninode);
> > if (!vxlan) {
> > reason = SKB_DROP_REASON_VXLAN_VNI_NOT_FOUND;
> > goto drop;
> > @@ -1855,7 +1875,7 @@ static int vxlan_err_lookup(struct sock *sk, struct sk_buff *skb)
> > return -ENOENT;
> >
> > vni = vxlan_vni(hdr->vx_vni);
> > - vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, NULL);
> > + vxlan = vxlan_vs_find_vni(vs, skb->dev->ifindex, vni, skb, NULL);
> > if (!vxlan)
> > return -ENOENT;
> >
> > @@ -2330,7 +2350,7 @@ static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
> > struct vxlan_dev *dst_vxlan;
> >
> > dst_release(dst);
> > - dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, vni,
> > + dst_vxlan = vxlan_find_vni(vxlan->net, dst_ifindex, skb, vni,
> > addr_family, dst_port,
> > vxlan->cfg.flags);
> > if (!dst_vxlan) {
> > --
> > 2.39.2
> >
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6
2025-02-02 12:09 ` Ido Schimmel
@ 2025-02-04 13:13 ` Ted Chen
2025-02-04 14:38 ` Ido Schimmel
0 siblings, 1 reply; 13+ messages in thread
From: Ted Chen @ 2025-02-04 13:13 UTC (permalink / raw)
To: Ido Schimmel; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Sun, Feb 02, 2025 at 02:09:23PM +0200, Ido Schimmel wrote:
> On Sat, Feb 01, 2025 at 07:34:22PM +0800, Ted Chen wrote:
> > Update the comment to indicate that both ipv4/udp.c and ipv6/udp.c invoke
>
> Nit: net/ipv4/udp.c and net/ipv6/udp.c
>
> > vxlan_rcv() to process packets.
> >
> > Signed-off-by: Ted Chen <znscnchen@gmail.com>
> > ---
> > drivers/net/vxlan/vxlan_core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
> > index 5ef40ac816cc..8bdf91d1fdfe 100644
> > --- a/drivers/net/vxlan/vxlan_core.c
> > +++ b/drivers/net/vxlan/vxlan_core.c
> > @@ -1684,7 +1684,7 @@ static bool vxlan_ecn_decapsulate(struct vxlan_sock *vs, void *oiph,
> > return err <= 1;
> > }
> >
> > -/* Callback from net/ipv4/udp.c to receive packets */
> > +/* Callback from net/ipv{4,6}/udp.c to receive packets */
>
> Maybe just remove the comment? I don't see how anyone can find it
> useful.
I'm ok with either way.
Please let me know if I need to send a separate one or the current one
is fine.
> Regardless, please submit this patch separately as it's not related to
> the other patches in the series.
I came across this comment when I wrote this series as I found vxlan_rcv()
is called for both IPV4 and IPV6. Besides, I saw vxlan_err_lookup() has a
similar comment.
> > static int vxlan_rcv(struct sock *sk, struct sk_buff *skb)
> > {
> > struct vxlan_vni_node *vninode = NULL;
> > --
> > 2.39.2
> >
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI
2025-02-02 13:40 ` [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ido Schimmel
@ 2025-02-04 13:27 ` Ted Chen
0 siblings, 0 replies; 13+ messages in thread
From: Ted Chen @ 2025-02-04 13:27 UTC (permalink / raw)
To: Ido Schimmel; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Sun, Feb 02, 2025 at 03:40:35PM +0200, Ido Schimmel wrote:
> On Sat, Feb 01, 2025 at 07:32:07PM +0800, Ted Chen wrote:
> > This RFC series proposes an implementation to enable the configuration of vxlan
> > devices in a Hub-Spoke Network, allowing multiple vxlan devices to share the
> > same VNI while being associated with different remote IPs under the same UDP
> > port.
> >
> > == Use case ==
> > In a Hub-Spoke Network, there is a central VTEP acting as the gateway, along
> > with multiple outer VTEPs. Each outer VTEP communicates exclusively with the
> > central VTEP and has no direct connection to other outer VTEPs. As a result,
> > data exchanged between outer VTEPs must traverse the central VTEP. This design
> > enhances security and enables centralized auditing and monitoring at the
> > central VTEP.
> >
> > == Existing methods ==
> > Currently, there are three methods to implement the use case.
> >
> > Method 1:
> > The central VTEP establishes a separate vxlan tunnel with each outer
> > VTEP, creating a vxlan device with a different VNI for each tunnel.
> > All vxlan devices are then added to the same Linux bridge to enable
> > forwarding.
> >
> > Drawbacks: Complex configuration.
> > Each tenant requires multiple VNIs.
>
> This looks like the most straightforward option to me.
>
> Why do you view it as complex? Why multiple VNIs per tenant are a
> problem when we have 16M of them?
Yes, the issue is not due to a lack of VNIs.
IMO, using a single VNI within a single Layer 2 network is clearer and more
intuitive.
> >
> > Method 2:
> > The central VTEP creates a single vxlan device using the same VNI,
> > without configuring a remote IP. The IP addresses of all outer VTEPs
> > are stored in the fdb. To enable forwarding, the vxlan device is added
> > to a Linux bridge with hairpin mode enabled.
> >
> > Drawbacks: unnecessary overhead or network anomalies
> > The hairpin mode may broadcast packets to all outer VTEPs, causing the
> > source outer VTEP receiving packets it originally sent to the central
> > VTEP. If the packet from the source outer VTEP is a broadcast packet,
> > the broadcasting back of the packet can cause network anomalies.
> >
> > Method 3:
> > The central VTEP uses the same VNI but different UDP ports to create a
> > vxlan device for each outer VTEP, each tunneling to its corresponding
> > outer VTEP. All the vxlan devices in the central VTEP are then added to
> > the same Linux bridge to enable forwarding.
> >
> > Drawbacks: complex configuration and potential security issues.
> > Multiple UDP ports are required.
> >
> > == Proposed implementation ==
> > In the central VTEP, each tenant only requires a single VNI, and all tenants
> > share the same UDP port. This can avoid the drawbacks of the above three
> > methods.
>
> This method also has drawbacks. It breaks existing behavior (see my
> comment on patch #1) and it also bloats the VXLAN receive path.
>
> I want to suggest an alternative which allows you to keep the existing
> topology (same VNI), but without kernel changes. The configuration of
> the outer VTEPs remains the same. The steps below are for the central
> VTEP.
>
> First, create a VXLAN device in "external" mode. It will consume all the
> VNIs in a namespace, but you can limit it with the "vnifilter" keyword,
> if needed:
>
> # ip -n ns_c link add name vx0 type vxlan dstport 4789 nolearning external
> # tc -n ns_c qdisc add dev vx0 clsact
>
> Then, for each outer VTEP, create a dummy device and enslave it to the
> bridge. Taking outer VTEP1 as an example:
>
> # ip -n ns_c link add name dummy_vtep1 up master br0
> # tc -n ns_c qdisc add dev dummy_vtep1 clsact
>
> In order to demultiplex incoming VXLAN packets to the appropriate bridge
> member, use an ingress tc filter on the VXLAN device that matches on the
> encapsulating source IP (you can't do it w/o the "external" keyword) and
> redirects the traffic to the corresponding bridge member:
>
> # tc -n ns_c filter add dev vx0 ingress pref 1 proto all \
> flower enc_key_id 42 enc_src_ip 10.0.0.1 \
> action mirred ingress redirect dev dummy_ns1
>
> (add filters for other VTEPs with "pref 1" to avoid unnecessary
> lookups).
>
> For Tx, on each bridge member, configure an egress tc filter that
> attaches tunnel metadata for the matching outer VTEP and redirects to
> the VXLAN device:
>
> # tc -n ns_c filter add dev dummy_vtep1 egress pref 1 proto all \
> matchall \
> action tunnel_key set src_ip 10.0.0.3 dst_ip 10.0.0.1 id 42 dst_port 4789 \
> action mirred egress redirect dev vx0
>
> The end result should be that the bridge forwards known unicast traffic
> to the appropriate outer VTEP and floods BUM traffic to all the outer
> VTEPs but the one from which the traffic was received.
Cool!
I wasn’t aware that TC could be used in this way. Will give it a try.
Thanks a lot!
> >
> > As in below example,
> > - a tunnel is established between vxlan42.1 in the central VTEP and vxlan42 in
> > the outer VTEP1:
> > ip link add vxlan42.1 type vxlan id 42 \
> > local 10.0.0.3 remote 10.0.0.1 dstport 4789
> >
> > - a tunnel is established between vxlan42.2 in the central VTEP and vxlan42 in
> > the outer VTEP2:
> > ip link add vxlan42.2 type vxlan id 42 \
> > local 10.0.0.3 remote 10.0.0.2 dstport 4789
> >
> >
> > ┌────────────────────────────────────────────┐
> > │ ┌─────────────────────────┐ central │
> > │ │ br0 │ VTEP │
> > │ └─┬────────────────────┬──┘ │
> > │ ┌─────┴───────┐ ┌─────┴───────┐ │
> > │ │ vxlan42.1 │ │ vxlan42.2 │ │
> > │ └─────────────┘ └─────────────┘ │
> > └───────────────────┬─┬──────────────────────┘
> > │ │ eth0 10.0.0.3:4789
> > │ │
> > │ │
> > ┌────────────────┘ └───────────────┐
> > │eth0 10.0.0.1:4789 │eth0 10.0.0.2:4789
> > ┌─────┴───────┐ ┌─────┴───────┐
> > │outer VTEP1 │ │outer VTEP2 │
> > │ vxlan42 │ │ vxlan42 │
> > └─────────────┘ └─────────────┘
> >
> >
> > == Test scenario ==
> > ip netns add ns_1
> > ip link add veth1 type veth peer name veth1-peer
> > ip link set veth1 netns ns_1
> > ip netns exec ns_1 ip addr add 10.0.1.1/24 dev veth1
> > ip netns exec ns_1 ip link set veth1 up
> > ip netns exec ns_1 ip link add vxlan42 type vxlan id 42 \
> > remote 10.0.1.3 dstport 4789
> > ip netns exec ns_1 ip addr add 192.168.0.1/24 dev vxlan42
> > ip netns exec ns_1 ip link set up dev vxlan42
> >
> > ip netns add ns_2
> > ip link add veth2 type veth peer name veth2-peer
> > ip link set veth2 netns ns_2
> > ip netns exec ns_2 ip addr add 10.0.1.2/24 dev veth2
> > ip netns exec ns_2 ip link set veth2 up
> > ip netns exec ns_2 ip link add vxlan42 type vxlan id 42 \
> > remote 10.0.1.3 dstport 4789
> > ip netns exec ns_2 ip addr add 192.168.0.2/24 dev vxlan42
> > ip netns exec ns_2 ip link set up dev vxlan42
> >
> > ip netns add ns_c
> > ip link add veth3 type veth peer name veth3-peer
> > ip link set veth3 netns ns_c
> > ip netns exec ns_c ip addr add 10.0.1.3/24 dev veth3
> > ip netns exec ns_c ip link set veth3 up
> > ip netns exec ns_c ip link add vxlan42.1 type vxlan id 42 \
> > local 10.0.1.3 remote 10.0.1.1 dstport 4789
> > ip netns exec ns_c ip link add vxlan42.2 type vxlan id 42 \
> > local 10.0.1.3 remote 10.0.1.2 dstport 4789
> > ip netns exec ns_c ip link set up dev vxlan42.1
> > ip netns exec ns_c ip link set up dev vxlan42.2
> > ip netns exec ns_c ip link add name br0 type bridge
> > ip netns exec ns_c ip link set br0 up
> > ip netns exec ns_c ip link set vxlan42.1 master br0
> > ip netns exec ns_c ip link set vxlan42.2 master br0
> >
> > ip link add name br1 type bridge
> > ip link set br1 up
> > ip link set veth1-peer up
> > ip link set veth2-peer up
> > ip link set veth3-peer up
> > ip link set veth1-peer master br1
> > ip link set veth2-peer master br1
> > ip link set veth3-peer master br1
> >
> > ip netns exec ns_1 ping 192.168.0.2 -I 192.168.0.1
> >
> > Ted Chen (3):
> > vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and
> > remote_ip
> > vxlan: Do not treat vxlan dev as used when unicast remote_ip
> > mismatches
> > vxlan: vxlan_rcv(): Update comment to inlucde ipv6
> >
> > drivers/net/vxlan/vxlan_core.c | 38 +++++++++++++++++++++++++++-------
> > 1 file changed, 31 insertions(+), 7 deletions(-)
> >
> > --
> > 2.39.2
> >
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip
2025-02-04 13:09 ` Ted Chen
@ 2025-02-04 14:16 ` Ido Schimmel
2025-02-05 12:27 ` Ted Chen
0 siblings, 1 reply; 13+ messages in thread
From: Ido Schimmel @ 2025-02-04 14:16 UTC (permalink / raw)
To: Ted Chen; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Tue, Feb 04, 2025 at 09:09:02PM +0800, Ted Chen wrote:
> I didn't see target addresses were appended into the FDB when an unicast
> remote_ip had been configured.
>
> e.g.
> Usually when (2)(3) are invoked, (1) is not called to configure a unicast
> remote_ip to the VTEP (though it's allowed to call (1)).
>
> (1) ip link add vxlan42 type vxlan id 42 \
> local 10.0.0.1 remote 10.0.0.2 dstport 4789
> (2) bridge fdb append to 00:00:00:00:00:00 dst 10.0.0.3 dev vxlan42
> (3) bridge fdb append to 00:00:00:00:00:00 dst 10.0.0.4 dev vxlan42
>
> So, this patch just leverages the case when remote_ip is configured in the
> VTEP to stand for P2P.
>
> Do you think there's a better way to identify P2P more precisely?
I think it will require a new uAPI (e.g., a new VXLAN netlink attribute)
as it's a behavior change, but I really prefer not to go there when the
problem can be solved in other ways (e.g., the tc solution I mentioned
or using multiple VNIs).
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6
2025-02-04 13:13 ` Ted Chen
@ 2025-02-04 14:38 ` Ido Schimmel
0 siblings, 0 replies; 13+ messages in thread
From: Ido Schimmel @ 2025-02-04 14:38 UTC (permalink / raw)
To: Ted Chen; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Tue, Feb 04, 2025 at 09:13:53PM +0800, Ted Chen wrote:
> I'm ok with either way.
> Please let me know if I need to send a separate one or the current one
> is fine.
Yes, need to resend as a standalone patch.
> > Regardless, please submit this patch separately as it's not related to
> > the other patches in the series.
> I came across this comment when I wrote this series as I found vxlan_rcv()
> is called for both IPV4 and IPV6. Besides, I saw vxlan_err_lookup() has a
> similar comment.
OK, so it makes sense to adjust the comment like you did.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip
2025-02-04 14:16 ` Ido Schimmel
@ 2025-02-05 12:27 ` Ted Chen
0 siblings, 0 replies; 13+ messages in thread
From: Ted Chen @ 2025-02-05 12:27 UTC (permalink / raw)
To: Ido Schimmel; +Cc: davem, edumazet, kuba, pabeni, andrew+netdev, netdev
On Tue, Feb 04, 2025 at 04:16:05PM +0200, Ido Schimmel wrote:
> On Tue, Feb 04, 2025 at 09:09:02PM +0800, Ted Chen wrote:
> > I didn't see target addresses were appended into the FDB when an unicast
> > remote_ip had been configured.
> >
> > e.g.
> > Usually when (2)(3) are invoked, (1) is not called to configure a unicast
> > remote_ip to the VTEP (though it's allowed to call (1)).
> >
> > (1) ip link add vxlan42 type vxlan id 42 \
> > local 10.0.0.1 remote 10.0.0.2 dstport 4789
> > (2) bridge fdb append to 00:00:00:00:00:00 dst 10.0.0.3 dev vxlan42
> > (3) bridge fdb append to 00:00:00:00:00:00 dst 10.0.0.4 dev vxlan42
> >
> > So, this patch just leverages the case when remote_ip is configured in the
> > VTEP to stand for P2P.
> >
> > Do you think there's a better way to identify P2P more precisely?
>
> I think it will require a new uAPI (e.g., a new VXLAN netlink attribute)
> as it's a behavior change, but I really prefer not to go there when the
> problem can be solved in other ways (e.g., the tc solution I mentioned
> or using multiple VNIs).
I tried that the mentioned tc solution functions well in my required case.
Thanks a lot!!
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-02-05 12:27 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-01 11:32 [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 1/3] vxlan: vxlan_vs_find_vni(): Find vxlan_dev according to vni and remote_ip Ted Chen
2025-02-02 11:56 ` Ido Schimmel
2025-02-04 13:09 ` Ted Chen
2025-02-04 14:16 ` Ido Schimmel
2025-02-05 12:27 ` Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 2/3] vxlan: Do not treat vxlan dev as used when unicast remote_ip mismatches Ted Chen
2025-02-01 11:34 ` [PATCH RFC net-next 3/3] vxlan: vxlan_rcv(): Update comment to inlucde ipv6 Ted Chen
2025-02-02 12:09 ` Ido Schimmel
2025-02-04 13:13 ` Ted Chen
2025-02-04 14:38 ` Ido Schimmel
2025-02-02 13:40 ` [PATCH RFC net-next 0/3] vxlan: Support of Hub Spoke Network to use the same VNI Ido Schimmel
2025-02-04 13:27 ` Ted Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).