[PATCH net-next v2] seg6: enable route leak for encap routes

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next v2] seg6: enable route leak for encap routes
@ 2026-03-27 14:06 Nicolas Dichtel
  2026-03-29 18:58 ` Andrea Mayer
  0 siblings, 1 reply; 3+ messages in thread
From: Nicolas Dichtel @ 2026-03-27 14:06 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	David Lebrun, Andrea Mayer, Paolo Lungaroni, David Ahern
  Cc: netdev, Nicolas Dichtel

The goal is to support x-vrf route. To avoid breaking existing setup, a new
flag is introduced: nh-vrf.

The dev parameter is mandatory when a seg6 encap route is configured, but
before this commit, it is ignored/not used. After the srv6 encapsulation, a
second route lookup in the same vrf is performed.

The new nh-vrf flag specifies to use the vrf associated with the dev
parameter to perform this second route lookup.

The l3vpn tests show the inconsistency: the specified nexthop dev is
ignored and a second route is added in the same vrf for the segment address
(before the commit, the route to 'fc00:21:100::6046' was put in the vrf-100
table while the encap route was pointing to veth0, which is not associated
with a vrf).

The tests are updated to use the nh-vrf flag when available.

Before:
> $ ip -n rt_2-Rh5GP7 -6 r list vrf vrf-100 | grep fc00:21:100::6046
> cafe::1  encap seg6 mode encap segs 1 [ fc00:21:100::6046 ] dev veth0 metric 1024 pref medium
> fc00:21:100::6046 via fd00::1 dev veth0 metric 1024 pref medium

After:
> $ ip -n rt_2-Rh5GP7 -6 r list vrf vrf-100 | grep fc00:21:100::6046
> cafe::1  encap seg6 mode encap segs 1 [ fc00:21:100::6046 ] nh-vrf dev veth0 metric 1024 pref medium
> $ ip -n rt_2-Rh5GP7 -6 r list | grep fc00:21:100::6046
> fc00:21:100::6046 via fd00::1 dev veth0 metric 1024 pref medium

Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

v1 -> v2:
 - target net-next instead of net
 - add a new attribute to avoid breaking the legacy behavior
 - use dst_dev_rcu()

 include/uapi/linux/seg6_iptunnel.h            |  1 +
 net/ipv6/seg6_iptunnel.c                      | 23 ++++++++++--
 .../selftests/net/srv6_end_dt46_l3vpn_test.sh | 35 +++++++++++++++++--
 .../selftests/net/srv6_end_dt4_l3vpn_test.sh  | 33 +++++++++++++++--
 .../selftests/net/srv6_end_dt6_l3vpn_test.sh  | 33 +++++++++++++++--
 5 files changed, 116 insertions(+), 9 deletions(-)

diff --git a/include/uapi/linux/seg6_iptunnel.h b/include/uapi/linux/seg6_iptunnel.h
index 485889b19900..d7d6aa2f72c5 100644
--- a/include/uapi/linux/seg6_iptunnel.h
+++ b/include/uapi/linux/seg6_iptunnel.h
@@ -21,6 +21,7 @@ enum {
 	SEG6_IPTUNNEL_UNSPEC,
 	SEG6_IPTUNNEL_SRH,
 	SEG6_IPTUNNEL_SRC,	/* struct in6_addr */
+	SEG6_IPTUNNEL_NH_VRF,
 	__SEG6_IPTUNNEL_MAX,
 };
 #define SEG6_IPTUNNEL_MAX (__SEG6_IPTUNNEL_MAX - 1)
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index e76cc0cc481e..f8c6f0d719be 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -50,6 +50,7 @@ static size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo)
 struct seg6_lwt {
 	struct dst_cache cache;
 	struct in6_addr tunsrc;
+	bool nh_vrf;
 	struct seg6_iptunnel_encap tuninfo[];
 };
 
@@ -67,6 +68,7 @@ seg6_encap_lwtunnel(struct lwtunnel_state *lwt)
 static const struct nla_policy seg6_iptunnel_policy[SEG6_IPTUNNEL_MAX + 1] = {
 	[SEG6_IPTUNNEL_SRH]	= { .type = NLA_BINARY },
 	[SEG6_IPTUNNEL_SRC]	= NLA_POLICY_EXACT_LEN(sizeof(struct in6_addr)),
+	[SEG6_IPTUNNEL_NH_VRF]	= { .type = NLA_FLAG },
 };
 
 static int nla_put_srh(struct sk_buff *skb, int attrtype,
@@ -499,9 +501,15 @@ static int seg6_input_core(struct net *net, struct sock *sk,
 	 * now and use it later as a comparison.
 	 */
 	lwtst = orig_dst->lwtstate;
-
 	slwt = seg6_lwt_lwtunnel(lwtst);
 
+	if (slwt->nh_vrf) {
+		rcu_read_lock();
+		skb->dev = l3mdev_master_dev_rcu(dst_dev_rcu(orig_dst)) ?:
+			dev_net(skb->dev)->loopback_dev;
+		rcu_read_unlock();
+	}
+
 	local_bh_disable();
 	dst = dst_cache_get(&slwt->cache);
 	local_bh_enable();
@@ -724,6 +732,7 @@ static int seg6_build_state(struct net *net, struct nlattr *nla,
 	if (err)
 		goto free_lwt_state;
 
+	slwt->nh_vrf = !!tb[SEG6_IPTUNNEL_NH_VRF];
 	memcpy(&slwt->tuninfo, tuninfo, tuninfo_len);
 
 	if (tb[SEG6_IPTUNNEL_SRC]) {
@@ -775,6 +784,10 @@ static int seg6_fill_encap_info(struct sk_buff *skb,
 	    nla_put_in6_addr(skb, SEG6_IPTUNNEL_SRC, &slwt->tunsrc))
 		return -EMSGSIZE;
 
+	if (slwt->nh_vrf &&
+	    nla_put_flag(skb, SEG6_IPTUNNEL_NH_VRF))
+		return -EMSGSIZE;
+
 	return 0;
 }
 
@@ -786,9 +799,14 @@ static int seg6_encap_nlsize(struct lwtunnel_state *lwtstate)
 
 	nlsize = nla_total_size(SEG6_IPTUN_ENCAP_SIZE(tuninfo));
 
+	/* SEG6_IPTUNNEL_SRC */
 	if (!ipv6_addr_any(&slwt->tunsrc))
 		nlsize += nla_total_size(sizeof(slwt->tunsrc));
 
+	/* SEG6_IPTUNNEL_NH_VRF */
+	if (slwt->nh_vrf)
+		nlsize += nla_total_size(0);
+
 	return nlsize;
 }
 
@@ -803,7 +821,8 @@ static int seg6_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
 	if (len != SEG6_IPTUN_ENCAP_SIZE(b_hdr))
 		return 1;
 
-	if (!ipv6_addr_equal(&a_slwt->tunsrc, &b_slwt->tunsrc))
+	if (!ipv6_addr_equal(&a_slwt->tunsrc, &b_slwt->tunsrc) ||
+	    a_slwt->nh_vrf != b_slwt->nh_vrf)
 		return 1;
 
 	return memcmp(a_hdr, b_hdr, len);
diff --git a/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
index a5e959a080bb..abf0a523518a 100755
--- a/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
@@ -201,6 +201,7 @@ readonly IPv6_HS_NETWORK=cafe
 readonly IPv4_HS_NETWORK=10.0.0
 readonly VPN_LOCATOR_SERVICE=fc00
 PING_TIMEOUT_SEC=4
+NH_VRF_FLAG=
 
 ret=0
 
@@ -319,6 +320,7 @@ setup_vpn_config()
 	local hsdst=$3
 	local rtdst=$4
 	local tid=$5
+	local vrf=
 
 	eval local rtsrc_name=\${rt_${rtsrc}}
 	eval local rtdst_name=\${rt_${rtdst}}
@@ -330,10 +332,11 @@ setup_vpn_config()
 	# set the encap route for encapsulating packets which arrive from the
 	# host hssrc and destined to the access router rtsrc.
 	ip -netns ${rtsrc_name} -6 route add ${IPv6_HS_NETWORK}::${hsdst}/128 vrf vrf-${tid} \
-		encap seg6 mode encap segs ${vpn_sid} dev veth0
+		encap seg6 mode encap segs ${vpn_sid} ${NH_VRF_FLAG} dev veth0
 	ip -netns ${rtsrc_name} -4 route add ${IPv4_HS_NETWORK}.${hsdst}/32 vrf vrf-${tid} \
-		encap seg6 mode encap segs ${vpn_sid} dev veth0
-	ip -netns ${rtsrc_name} -6 route add ${vpn_sid}/128 vrf vrf-${tid} \
+		encap seg6 mode encap segs ${vpn_sid} ${NH_VRF_FLAG} dev veth0
+	[ -z ${NH_VRF_FLAG} ] && vrf="vrf vrf-${tid}"
+	ip -netns ${rtsrc_name} -6 route add ${vpn_sid}/128 ${vrf} \
 		via fd00::${rtdst} dev veth0
 
 	# set the decap route for decapsulating packets which arrive from
@@ -536,6 +539,26 @@ host_vpn_isolation_tests()
 	done
 }
 
+# check if the nh-vrf flag is supported
+check_nh_vrf_support()
+{
+	setup_ns nh_vrf_ns
+
+	if ! ip -netns "${nh_vrf_ns}" nexthop add id 1235 encap seg6 \
+			mode encap nh-vrf segs 2001:db8:1:1:1::2 dev lo &>/dev/null; then
+		cleanup_ns "${nh_vrf_ns}"
+		return
+	fi
+
+	if ! ip -netns "${nh_vrf_ns}" nexthop get id 1235 &>/dev/null; then
+		cleanup_ns "${nh_vrf_ns}"
+		return
+	fi
+
+	cleanup_ns "${nh_vrf_ns}"
+	NH_VRF_FLAG="nh-vrf"
+}
+
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
 	exit $ksft_skip
@@ -554,6 +577,12 @@ fi
 
 cleanup &>/dev/null
 
+check_nh_vrf_support
+if [ -z ${NH_VRF_FLAG} ]; then
+	echo "nh-vrf support: no"
+else
+	echo "nh-vrf support: yes"
+fi
 setup
 
 router_tests
diff --git a/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
index a649dba3cb77..e1ae80bcb86d 100755
--- a/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
@@ -170,6 +170,7 @@ readonly IPv6_RT_NETWORK=fd00
 readonly IPv4_HS_NETWORK=10.0.0
 readonly VPN_LOCATOR_SERVICE=fc00
 PING_TIMEOUT_SEC=4
+NH_VRF_FLAG=
 
 ret=0
 
@@ -278,6 +279,7 @@ setup_vpn_config()
 	local hsdst=$3
 	local rtdst=$4
 	local tid=$5
+	local vrf=
 
 	eval local rtsrc_name=\${rt_${rtsrc}}
 	eval local rtdst_name=\${rt_${rtdst}}
@@ -286,8 +288,9 @@ setup_vpn_config()
 	# set the encap route for encapsulating packets which arrive from the
 	# host hssrc and destined to the access router rtsrc.
 	ip -netns ${rtsrc_name} -4 route add ${IPv4_HS_NETWORK}.${hsdst}/32 vrf vrf-${tid} \
-		encap seg6 mode encap segs ${vpn_sid} dev veth0
-	ip -netns ${rtsrc_name} -6 route add ${vpn_sid}/128 vrf vrf-${tid} \
+		encap seg6 mode encap segs ${vpn_sid} ${NH_VRF_FLAG} dev veth0
+	[ -z ${NH_VRF_FLAG} ] && vrf="vrf vrf-${tid}"
+	ip -netns ${rtsrc_name} -6 route add ${vpn_sid}/128 ${vrf} \
 		via fd00::${rtdst} dev veth0
 
 	# set the decap route for decapsulating packets which arrive from
@@ -459,6 +462,26 @@ host_vpn_isolation_tests()
 	done
 }
 
+# check if the nh-vrf flag is supported
+check_nh_vrf_support()
+{
+	setup_ns nh_vrf_ns
+
+	if ! ip -netns "${nh_vrf_ns}" nexthop add id 1235 encap seg6 \
+			mode encap nh-vrf segs 2001:db8:1:1:1::2 dev lo &>/dev/null; then
+		cleanup_ns "${nh_vrf_ns}"
+		return
+	fi
+
+	if ! ip -netns "${nh_vrf_ns}" nexthop get id 1235 &>/dev/null; then
+		cleanup_ns "${nh_vrf_ns}"
+		return
+	fi
+
+	cleanup_ns "${nh_vrf_ns}"
+	NH_VRF_FLAG="nh-vrf"
+}
+
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
 	exit $ksft_skip
@@ -477,6 +500,12 @@ fi
 
 cleanup &>/dev/null
 
+check_nh_vrf_support
+if [ -z ${NH_VRF_FLAG} ]; then
+	echo "nh-vrf support: no"
+else
+	echo "nh-vrf support: yes"
+fi
 setup
 
 router_tests
diff --git a/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
index e408406d8489..3a8ff96a78ef 100755
--- a/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
@@ -171,6 +171,7 @@ readonly IPv6_RT_NETWORK=fd00
 readonly IPv6_HS_NETWORK=cafe
 readonly VPN_LOCATOR_SERVICE=fc00
 PING_TIMEOUT_SEC=4
+NH_VRF_FLAG=
 
 ret=0
 
@@ -285,6 +286,7 @@ setup_vpn_config()
 	local hsdst=$3
 	local rtdst=$4
 	local tid=$5
+	local vrf=
 
 	eval local rtsrc_name=\${rt_${rtsrc}}
 	eval local rtdst_name=\${rt_${rtdst}}
@@ -296,8 +298,9 @@ setup_vpn_config()
 	# set the encap route for encapsulating packets which arrive from the
 	# host hssrc and destined to the access router rtsrc.
 	ip -netns ${rtsrc_name} -6 route add ${IPv6_HS_NETWORK}::${hsdst}/128 vrf vrf-${tid} \
-		encap seg6 mode encap segs ${vpn_sid} dev veth0
-	ip -netns ${rtsrc_name} -6 route add ${vpn_sid}/128 vrf vrf-${tid} \
+		encap seg6 mode encap segs ${vpn_sid} ${NH_VRF_FLAG} dev veth0
+	[ -z ${NH_VRF_FLAG} ] && vrf="vrf vrf-${tid}"
+	ip -netns ${rtsrc_name} -6 route add ${vpn_sid}/128 ${vrf} \
 		via fd00::${rtdst} dev veth0
 
 	# set the decap route for decapsulating packets which arrive from
@@ -469,6 +472,26 @@ host_vpn_isolation_tests()
 	done
 }
 
+# check if the nh-vrf flag is supported
+check_nh_vrf_support()
+{
+	setup_ns nh_vrf_ns
+
+	if ! ip -netns "${nh_vrf_ns}" nexthop add id 1235 encap seg6 \
+			mode encap nh-vrf segs 2001:db8:1:1:1::2 dev lo &>/dev/null; then
+		cleanup_ns "${nh_vrf_ns}"
+		return
+	fi
+
+	if ! ip -netns "${nh_vrf_ns}" nexthop get id 1235 &>/dev/null; then
+		cleanup_ns "${nh_vrf_ns}"
+		return
+	fi
+
+	cleanup_ns "${nh_vrf_ns}"
+	NH_VRF_FLAG="nh-vrf"
+}
+
 if [ "$(id -u)" -ne 0 ];then
 	echo "SKIP: Need root privileges"
 	exit $ksft_skip
@@ -487,6 +510,12 @@ fi
 
 cleanup &>/dev/null
 
+check_nh_vrf_support
+if [ -z ${NH_VRF_FLAG} ]; then
+	echo "nh-vrf support: no"
+else
+	echo "nh-vrf support: yes"
+fi
 setup
 
 router_tests
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next v2] seg6: enable route leak for encap routes
  2026-03-27 14:06 [PATCH net-next v2] seg6: enable route leak for encap routes Nicolas Dichtel
@ 2026-03-29 18:58 ` Andrea Mayer
  2026-03-31 15:57   ` Nicolas Dichtel
  0 siblings, 1 reply; 3+ messages in thread
From: Andrea Mayer @ 2026-03-29 18:58 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	David Lebrun, David Ahern, netdev, stefano.salsano,
	Paolo Lungaroni, ahabdels, Andrea Mayer

On Fri, 27 Mar 2026 15:06:24 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> The goal is to support x-vrf route. To avoid breaking existing setup, a new
> flag is introduced: nh-vrf.
> 
> The dev parameter is mandatory when a seg6 encap route is configured, but
> before this commit, it is ignored/not used. After the srv6 encapsulation, a
> second route lookup in the same vrf is performed.
> 
> The new nh-vrf flag specifies to use the vrf associated with the dev
> parameter to perform this second route lookup.
> 

Hi Nicolas,

thanks for looking into this. The use case is valid and limiting the
effect of nh-vrf to routes that explicitly carry the attribute avoids
regressions, which is good.

I have a few thoughts on the nh-vrf semantics though. The attribute
says "use the VRF of dev", but in your use case dev is not in any
VRF, so the lookup ends up in the main table as a side effect of
the loopback fallback, which is not obvious from the attribute
name. Today dev is used for source address selection in
set_tun_src() and has no routing role; nh-vrf would give it one
that does not match what actually happens.

>
> [snip]
> 
> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")

This looks like a leftover from v1; since this is new UAPI on
net-next, the Fixes tag should not be here.

> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> v1 -> v2:
>  - target net-next instead of net
>  - add a new attribute to avoid breaking the legacy behavior
>  - use dst_dev_rcu()
> 
>  include/uapi/linux/seg6_iptunnel.h            |  1 +
>  net/ipv6/seg6_iptunnel.c                      | 23 ++++++++++--
>  .../selftests/net/srv6_end_dt46_l3vpn_test.sh | 35 +++++++++++++++++--
>  .../selftests/net/srv6_end_dt4_l3vpn_test.sh  | 33 +++++++++++++++--
>  .../selftests/net/srv6_end_dt6_l3vpn_test.sh  | 33 +++++++++++++++--
>  5 files changed, 116 insertions(+), 9 deletions(-)
> 
> diff --git a/include/uapi/linux/seg6_iptunnel.h b/include/uapi/linux/seg6_iptunnel.h
> index 485889b19900..d7d6aa2f72c5 100644
> --- a/include/uapi/linux/seg6_iptunnel.h
> +++ b/include/uapi/linux/seg6_iptunnel.h
> @@ -21,6 +21,7 @@ enum {
>  	SEG6_IPTUNNEL_UNSPEC,
>  	SEG6_IPTUNNEL_SRH,
>  	SEG6_IPTUNNEL_SRC,	/* struct in6_addr */
> +	SEG6_IPTUNNEL_NH_VRF,
>  	__SEG6_IPTUNNEL_MAX,
>  };
>  #define SEG6_IPTUNNEL_MAX (__SEG6_IPTUNNEL_MAX - 1)
> diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
> index e76cc0cc481e..f8c6f0d719be 100644
> --- a/net/ipv6/seg6_iptunnel.c
> +++ b/net/ipv6/seg6_iptunnel.c
> @@ -50,6 +50,7 @@ static size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo)
>  struct seg6_lwt {
>  	struct dst_cache cache;
>  	struct in6_addr tunsrc;
> +	bool nh_vrf;
>  	struct seg6_iptunnel_encap tuninfo[];
>  };
>  
> @@ -67,6 +68,7 @@ seg6_encap_lwtunnel(struct lwtunnel_state *lwt)
>  static const struct nla_policy seg6_iptunnel_policy[SEG6_IPTUNNEL_MAX + 1] = {
>  	[SEG6_IPTUNNEL_SRH]	= { .type = NLA_BINARY },
>  	[SEG6_IPTUNNEL_SRC]	= NLA_POLICY_EXACT_LEN(sizeof(struct in6_addr)),
> +	[SEG6_IPTUNNEL_NH_VRF]	= { .type = NLA_FLAG },
>  };
>  
>  static int nla_put_srh(struct sk_buff *skb, int attrtype,
> @@ -499,9 +501,15 @@ static int seg6_input_core(struct net *net, struct sock *sk,
>  	 * now and use it later as a comparison.
>  	 */
>  	lwtst = orig_dst->lwtstate;
> -
>  	slwt = seg6_lwt_lwtunnel(lwtst);
>  
> +	if (slwt->nh_vrf) {
> +		rcu_read_lock();
> +		skb->dev = l3mdev_master_dev_rcu(dst_dev_rcu(orig_dst)) ?:
> +			dev_net(skb->dev)->loopback_dev;
> +		rcu_read_unlock();
> +	}
> +
> [snip]
>

Overwriting skb->dev alters flowi6_iif, which affects ip rule matching
on the ingress interface and changes what netfilter FORWARD sees as "in"
device. Also, seg6_output_core() never checks nh_vrf and its fl6 is
built without involving skb->dev, so this only works for forwarded
traffic. These aspects would need to be addressed in any case.

Looking at this from a different angle, specifying the FIB table ID
directly could be a more natural fit here. Something like
SEG6_IPTUNNEL_TABLE (u32) with fib6_get_table() + ip6_pol_route(),
similar to seg6_lookup_any_nexthop() in seg6_local.c.
It would work for both input and output with no need to touch skb->dev.
For example:

  ip -6 route add cafe::1/128 vrf vrf-100 \
      encap seg6 mode encap segs fc00::1 lookup 254 dev veth0

Beyond the semantics, a table ID is also more general: it covers
the main table, tables associated with VRFs, and custom tables with
the same mechanism, and keeps dev consistent with its current role
across behaviors. I think we should explore this direction before
moving forward. I am happy to help if you want.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next v2] seg6: enable route leak for encap routes
  2026-03-29 18:58 ` Andrea Mayer
@ 2026-03-31 15:57   ` Nicolas Dichtel
  0 siblings, 0 replies; 3+ messages in thread
From: Nicolas Dichtel @ 2026-03-31 15:57 UTC (permalink / raw)
  To: Andrea Mayer
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	David Lebrun, David Ahern, netdev, stefano.salsano,
	Paolo Lungaroni, ahabdels

Le 29/03/2026 à 20:58, Andrea Mayer a écrit :
> On Fri, 27 Mar 2026 15:06:24 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> 
>> The goal is to support x-vrf route. To avoid breaking existing setup, a new
>> flag is introduced: nh-vrf.
>>
>> The dev parameter is mandatory when a seg6 encap route is configured, but
>> before this commit, it is ignored/not used. After the srv6 encapsulation, a
>> second route lookup in the same vrf is performed.
>>
>> The new nh-vrf flag specifies to use the vrf associated with the dev
>> parameter to perform this second route lookup.
>>
> 
> Hi Nicolas,
>  
> thanks for looking into this. The use case is valid and limiting the
> effect of nh-vrf to routes that explicitly carry the attribute avoids
> regressions, which is good.
> 
> 
> I have a few thoughts on the nh-vrf semantics though. The attribute
> says "use the VRF of dev", but in your use case dev is not in any
> VRF, so the lookup ends up in the main table as a side effect of
> the loopback fallback, which is not obvious from the attribute
> name. Today dev is used for source address selection in
The 'no-vrf' / 'default-vrf', I don't know how to call it, is a kind of vrf. I
can try to find another name if it's clearer.
What about 'dev-context'?

> set_tun_src() and has no routing role; nh-vrf would give it one
> that does not match what actually happens.
> 
> 
>>
>> [snip]
>>
>> Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
> 
> This looks like a leftover from v1; since this is new UAPI on
> net-next, the Fixes tag should not be here.
Yep, I will remove it.

> 
> 
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
>>
>> v1 -> v2:
>>  - target net-next instead of net
>>  - add a new attribute to avoid breaking the legacy behavior
>>  - use dst_dev_rcu()
>>
>>  include/uapi/linux/seg6_iptunnel.h            |  1 +
>>  net/ipv6/seg6_iptunnel.c                      | 23 ++++++++++--
>>  .../selftests/net/srv6_end_dt46_l3vpn_test.sh | 35 +++++++++++++++++--
>>  .../selftests/net/srv6_end_dt4_l3vpn_test.sh  | 33 +++++++++++++++--
>>  .../selftests/net/srv6_end_dt6_l3vpn_test.sh  | 33 +++++++++++++++--
>>  5 files changed, 116 insertions(+), 9 deletions(-)
>>
>> diff --git a/include/uapi/linux/seg6_iptunnel.h b/include/uapi/linux/seg6_iptunnel.h
>> index 485889b19900..d7d6aa2f72c5 100644
>> --- a/include/uapi/linux/seg6_iptunnel.h
>> +++ b/include/uapi/linux/seg6_iptunnel.h
>> @@ -21,6 +21,7 @@ enum {
>>  	SEG6_IPTUNNEL_UNSPEC,
>>  	SEG6_IPTUNNEL_SRH,
>>  	SEG6_IPTUNNEL_SRC,	/* struct in6_addr */
>> +	SEG6_IPTUNNEL_NH_VRF,
>>  	__SEG6_IPTUNNEL_MAX,
>>  };
>>  #define SEG6_IPTUNNEL_MAX (__SEG6_IPTUNNEL_MAX - 1)
>> diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
>> index e76cc0cc481e..f8c6f0d719be 100644
>> --- a/net/ipv6/seg6_iptunnel.c
>> +++ b/net/ipv6/seg6_iptunnel.c
>> @@ -50,6 +50,7 @@ static size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo)
>>  struct seg6_lwt {
>>  	struct dst_cache cache;
>>  	struct in6_addr tunsrc;
>> +	bool nh_vrf;
>>  	struct seg6_iptunnel_encap tuninfo[];
>>  };
>>  
>> @@ -67,6 +68,7 @@ seg6_encap_lwtunnel(struct lwtunnel_state *lwt)
>>  static const struct nla_policy seg6_iptunnel_policy[SEG6_IPTUNNEL_MAX + 1] = {
>>  	[SEG6_IPTUNNEL_SRH]	= { .type = NLA_BINARY },
>>  	[SEG6_IPTUNNEL_SRC]	= NLA_POLICY_EXACT_LEN(sizeof(struct in6_addr)),
>> +	[SEG6_IPTUNNEL_NH_VRF]	= { .type = NLA_FLAG },
>>  };
>>  
>>  static int nla_put_srh(struct sk_buff *skb, int attrtype,
>> @@ -499,9 +501,15 @@ static int seg6_input_core(struct net *net, struct sock *sk,
>>  	 * now and use it later as a comparison.
>>  	 */
>>  	lwtst = orig_dst->lwtstate;
>> -
>>  	slwt = seg6_lwt_lwtunnel(lwtst);
>>  
>> +	if (slwt->nh_vrf) {
>> +		rcu_read_lock();
>> +		skb->dev = l3mdev_master_dev_rcu(dst_dev_rcu(orig_dst)) ?:
>> +			dev_net(skb->dev)->loopback_dev;
>> +		rcu_read_unlock();
>> +	}
>> +
>> [snip]
>>
> 
> Overwriting skb->dev alters flowi6_iif, which affects ip rule matching
yes, this is the goal of the patch.

> on the ingress interface and changes what netfilter FORWARD sees as "in"
> device. Also, seg6_output_core() never checks nh_vrf and its fl6 is
Sure, but it enables filtering on the vrf interface. It won't break anything
because the flag doesn't exist for now.

> built without involving skb->dev, so this only works for forwarded
> traffic. These aspects would need to be addressed in any case.
Yes, I saw this. There is already a difference between the forwarding path and
the local output path. On the forwarding path, the input vrf is used for the
second route lookup. On the local output path, the 'default-vrf' is always used.
I didn't want to mix problems, so I targeted the forwarding path to start.

> 
> 
> Looking at this from a different angle, specifying the FIB table ID
> directly could be a more natural fit here. Something like
> SEG6_IPTUNNEL_TABLE (u32) with fib6_get_table() + ip6_pol_route(),
> similar to seg6_lookup_any_nexthop() in seg6_local.c.
> It would work for both input and output with no need to touch skb->dev.
> For example:
>  
>   ip -6 route add cafe::1/128 vrf vrf-100 \
>       encap seg6 mode encap segs fc00::1 lookup 254 dev veth0
> 
>  
> Beyond the semantics, a table ID is also more general: it covers
> the main table, tables associated with VRFs, and custom tables with
> the same mechanism, and keeps dev consistent with its current role
> across behaviors. I think we should explore this direction before
> moving forward. I am happy to help if you want.
My goal is to 'fix' the current behavior, not to add a new feature.
Today, the dev arg is mandatory but not used, this is misleading. The selftests
shows the inconsistency. The device of the encap route is in the 'default-vrf'
but another route in the same vrf is needed, with the same nexthop (dev).

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-31 15:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 14:06 [PATCH net-next v2] seg6: enable route leak for encap routes Nicolas Dichtel
2026-03-29 18:58 ` Andrea Mayer
2026-03-31 15:57   ` Nicolas Dichtel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox