linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs
@ 2025-09-01  8:30 Ido Schimmel
  2025-09-01  8:30 ` [PATCH net-next 1/8] ipv4: cipso: Simplify IP options handling in cipso_v4_error() Ido Schimmel
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

Align IPv4 with IPv6 and in the presence of VRFs generate ICMP error
messages with a source IP that is derived from the receiving interface
and not from its VRF master. This is especially important when the error
messages are "Time Exceeded" messages as it means that utilities like
traceroute will show an incorrect packet path.

Patches #1-#2 are preparations.

Patch #3 is the actual change.

Patches #4-#7 make small improvements in the existing traceroute test.

Patch #8 extends the traceroute test with VRF test cases for both IPv4
and IPv6.

Ido Schimmel (8):
  ipv4: cipso: Simplify IP options handling in cipso_v4_error()
  ipv4: icmp: Pass IPv4 control block structure as an argument to
    __icmp_send()
  ipv4: icmp: Fix source IP derivation in presence of VRFs
  selftests: traceroute: Return correct value on failure
  selftests: traceroute: Use require_command()
  selftests: traceroute: Reword comment
  selftests: traceroute: Test traceroute with different source IPs
  selftests: traceroute: Add VRF tests

 include/net/icmp.h                        |  10 +-
 net/ipv4/cipso_ipv4.c                     |  13 +-
 net/ipv4/icmp.c                           |  15 +-
 net/ipv4/route.c                          |  10 +-
 tools/testing/selftests/net/traceroute.sh | 250 ++++++++++++++++++----
 5 files changed, 229 insertions(+), 69 deletions(-)

-- 
2.51.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 1/8] ipv4: cipso: Simplify IP options handling in cipso_v4_error()
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
@ 2025-09-01  8:30 ` Ido Schimmel
  2025-09-02  2:36   ` David Ahern
  2025-09-01  8:30 ` [PATCH net-next 2/8] ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send() Ido Schimmel
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

When __ip_options_compile() is called with an skb, the IP options are
parsed from the skb data into the provided IP option argument. This is
in contrast to the case where the skb argument is NULL and the options
are parsed from opt->__data.

Given that cipso_v4_error() always passes an skb to
__ip_options_compile(), there is no need to allocate an extra 40 bytes
(maximum IP options size).

Therefore, simplify the function by removing these extra bytes and make
the function similar to ipv4_send_dest_unreach() which also calls both
__ip_options_compile() and __icmp_send().

This is a preparation for changing the arguments being passed to
__icmp_send().

No functional changes intended.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 net/ipv4/cipso_ipv4.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 740af8541d2f..c7c949c37e2d 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1715,8 +1715,7 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
  */
 void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
 {
-	unsigned char optbuf[sizeof(struct ip_options) + 40];
-	struct ip_options *opt = (struct ip_options *)optbuf;
+	struct ip_options opt;
 	int res;
 
 	if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
@@ -1727,19 +1726,19 @@ void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
 	 * so we can not use icmp_send and IPCB here.
 	 */
 
-	memset(opt, 0, sizeof(struct ip_options));
-	opt->optlen = ip_hdr(skb)->ihl*4 - sizeof(struct iphdr);
+	memset(&opt, 0, sizeof(opt));
+	opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr);
 	rcu_read_lock();
-	res = __ip_options_compile(dev_net(skb->dev), opt, skb, NULL);
+	res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL);
 	rcu_read_unlock();
 
 	if (res)
 		return;
 
 	if (gateway)
-		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, opt);
+		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, &opt);
 	else
-		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, opt);
+		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, &opt);
 }
 
 /**
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 2/8] ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send()
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
  2025-09-01  8:30 ` [PATCH net-next 1/8] ipv4: cipso: Simplify IP options handling in cipso_v4_error() Ido Schimmel
@ 2025-09-01  8:30 ` Ido Schimmel
  2025-09-02  2:36   ` David Ahern
  2025-09-01  8:30 ` [PATCH net-next 3/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

__icmp_send() is used to generate ICMP error messages in response to
various situations such as MTU errors (i.e., "Fragmentation Required")
and too many hops (i.e., "Time Exceeded").

The skb that generated the error does not necessarily come from the IPv4
layer and does not always have a valid IPv4 control block in skb->cb.

Therefore, commit 9ef6b42ad6fd ("net: Add __icmp_send helper.") changed
the function to take the IP options structure as argument instead of
deriving it from the skb's control block. Some callers of this function
such as icmp_send() pass the IP options structure from the skb's control
block as in these call paths the control block is known to be valid, but
other callers simply pass a zeroed structure.

A subsequent patch will need __icmp_send() to access more information
from the IPv4 control block (specifically, the ifindex of the input
interface). As a preparation for this change, change the function to
take the IPv4 control block structure as an argument instead of the IP
options structure. This makes the function similar to its IPv6
counterpart that already takes the IPv6 control block structure as an
argument.

No functional changes intended.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 include/net/icmp.h    | 10 ++++++----
 net/ipv4/cipso_ipv4.c | 12 ++++++------
 net/ipv4/icmp.c       | 12 +++++++-----
 net/ipv4/route.c      | 10 +++++-----
 4 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/include/net/icmp.h b/include/net/icmp.h
index caddf4a59ad1..935ee13d9ae9 100644
--- a/include/net/icmp.h
+++ b/include/net/icmp.h
@@ -37,10 +37,10 @@ struct sk_buff;
 struct net;
 
 void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
-		 const struct ip_options *opt);
+		 const struct inet_skb_parm *parm);
 static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 {
-	__icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
+	__icmp_send(skb_in, type, code, info, IPCB(skb_in));
 }
 
 #if IS_ENABLED(CONFIG_NF_NAT)
@@ -48,8 +48,10 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info);
 #else
 static inline void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 {
-	struct ip_options opts = { 0 };
-	__icmp_send(skb_in, type, code, info, &opts);
+	struct inet_skb_parm parm;
+
+	memset(&parm, 0, sizeof(parm));
+	__icmp_send(skb_in, type, code, info, &parm);
 }
 #endif
 
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index c7c949c37e2d..709021197e1c 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1715,7 +1715,7 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
  */
 void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
 {
-	struct ip_options opt;
+	struct inet_skb_parm parm;
 	int res;
 
 	if (ip_hdr(skb)->protocol == IPPROTO_ICMP || error != -EACCES)
@@ -1726,19 +1726,19 @@ void cipso_v4_error(struct sk_buff *skb, int error, u32 gateway)
 	 * so we can not use icmp_send and IPCB here.
 	 */
 
-	memset(&opt, 0, sizeof(opt));
-	opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr);
+	memset(&parm, 0, sizeof(parm));
+	parm.opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr);
 	rcu_read_lock();
-	res = __ip_options_compile(dev_net(skb->dev), &opt, skb, NULL);
+	res = __ip_options_compile(dev_net(skb->dev), &parm.opt, skb, NULL);
 	rcu_read_unlock();
 
 	if (res)
 		return;
 
 	if (gateway)
-		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, &opt);
+		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_NET_ANO, 0, &parm);
 	else
-		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, &opt);
+		__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_ANO, 0, &parm);
 }
 
 /**
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 823c70e34de8..92fb7aef4abf 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -594,7 +594,7 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
  */
 
 void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
-		 const struct ip_options *opt)
+		 const struct inet_skb_parm *parm)
 {
 	struct iphdr *iph;
 	int room;
@@ -725,7 +725,8 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
 					   iph->tos;
 	mark = IP4_REPLY_MARK(net, skb_in->mark);
 
-	if (__ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in, opt))
+	if (__ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in,
+			      &parm->opt))
 		goto out_unlock;
 
 
@@ -799,14 +800,15 @@ EXPORT_SYMBOL(__icmp_send);
 void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 {
 	struct sk_buff *cloned_skb = NULL;
-	struct ip_options opts = { 0 };
 	enum ip_conntrack_info ctinfo;
+	struct inet_skb_parm parm;
 	struct nf_conn *ct;
 	__be32 orig_ip;
 
+	memset(&parm, 0, sizeof(parm));
 	ct = nf_ct_get(skb_in, &ctinfo);
 	if (!ct || !(ct->status & IPS_SRC_NAT)) {
-		__icmp_send(skb_in, type, code, info, &opts);
+		__icmp_send(skb_in, type, code, info, &parm);
 		return;
 	}
 
@@ -821,7 +823,7 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 
 	orig_ip = ip_hdr(skb_in)->saddr;
 	ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
-	__icmp_send(skb_in, type, code, info, &opts);
+	__icmp_send(skb_in, type, code, info, &parm);
 	ip_hdr(skb_in)->saddr = orig_ip;
 out:
 	consume_skb(cloned_skb);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 50309f2ab132..6d27d3610c1c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1222,8 +1222,8 @@ EXPORT_INDIRECT_CALLABLE(ipv4_dst_check);
 
 static void ipv4_send_dest_unreach(struct sk_buff *skb)
 {
+	struct inet_skb_parm parm;
 	struct net_device *dev;
-	struct ip_options opt;
 	int res;
 
 	/* Recompile ip options since IPCB may not be valid anymore.
@@ -1233,21 +1233,21 @@ static void ipv4_send_dest_unreach(struct sk_buff *skb)
 	    ip_hdr(skb)->version != 4 || ip_hdr(skb)->ihl < 5)
 		return;
 
-	memset(&opt, 0, sizeof(opt));
+	memset(&parm, 0, sizeof(parm));
 	if (ip_hdr(skb)->ihl > 5) {
 		if (!pskb_network_may_pull(skb, ip_hdr(skb)->ihl * 4))
 			return;
-		opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr);
+		parm.opt.optlen = ip_hdr(skb)->ihl * 4 - sizeof(struct iphdr);
 
 		rcu_read_lock();
 		dev = skb->dev ? skb->dev : skb_rtable(skb)->dst.dev;
-		res = __ip_options_compile(dev_net(dev), &opt, skb, NULL);
+		res = __ip_options_compile(dev_net(dev), &parm.opt, skb, NULL);
 		rcu_read_unlock();
 
 		if (res)
 			return;
 	}
-	__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &opt);
+	__icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0, &parm);
 }
 
 static void ipv4_link_failure(struct sk_buff *skb)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 3/8] ipv4: icmp: Fix source IP derivation in presence of VRFs
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
  2025-09-01  8:30 ` [PATCH net-next 1/8] ipv4: cipso: Simplify IP options handling in cipso_v4_error() Ido Schimmel
  2025-09-01  8:30 ` [PATCH net-next 2/8] ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send() Ido Schimmel
@ 2025-09-01  8:30 ` Ido Schimmel
  2025-09-02  2:37   ` David Ahern
  2025-09-01  8:30 ` [PATCH net-next 4/8] selftests: traceroute: Return correct value on failure Ido Schimmel
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

When the "icmp_errors_use_inbound_ifaddr" sysctl is enabled, the source
IP of ICMP error messages should be the "primary address of the
interface that received the packet that caused the icmp error".

The IPv4 ICMP code determines this interface using inet_iif() which in
the input path translates to skb->skb_iif. If the interface that
received the packet is a VRF port, skb->skb_iif will contain the ifindex
of the VRF device and not that of the receiving interface. This is
because in the input path the VRF driver overrides skb->skb_iif with the
ifindex of the VRF device itself (see vrf_ip_rcv()).

As such, the source IP that will be chosen for the ICMP error message is
either an address assigned to the VRF device itself (if present) or an
address assigned to some VRF port, not necessarily the input or output
interface.

This behavior is especially problematic when the error messages are
"Time Exceeded" messages as it means that utilities like traceroute will
show an incorrect packet path.

Solve this by determining the input interface based on the iif field in
the control block, if present. This field is set in the input path to
skb->skb_iif and is not later overridden by the VRF driver, unlike
skb->skb_iif.

This behavior is consistent with the IPv6 counterpart that already uses
the iif from the control block.

Reported-by: Andy Roulin <aroulin@nvidia.com>
Reported-by: Rajkumar Srinivasan <rajsrinivasa@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 net/ipv4/icmp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 92fb7aef4abf..7547db362ec7 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -710,7 +710,8 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
 		rcu_read_lock();
 		if (rt_is_input_route(rt) &&
 		    READ_ONCE(net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr))
-			dev = dev_get_by_index_rcu(net, inet_iif(skb_in));
+			dev = dev_get_by_index_rcu(net, parm->iif ? parm->iif :
+						   inet_iif(skb_in));
 
 		if (dev)
 			saddr = inet_select_addr(dev, iph->saddr,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 4/8] selftests: traceroute: Return correct value on failure
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
                   ` (2 preceding siblings ...)
  2025-09-01  8:30 ` [PATCH net-next 3/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
@ 2025-09-01  8:30 ` Ido Schimmel
  2025-09-02  2:37   ` David Ahern
  2025-09-01  8:30 ` [PATCH net-next 5/8] selftests: traceroute: Use require_command() Ido Schimmel
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

The test always returns success even if some tests were modified to
fail. Fix by converting the test to use the appropriate library
functions instead of using its own functions.

Before:

 # ./traceroute.sh
 TEST: IPV6 traceroute                                               [FAIL]
 TEST: IPV4 traceroute                                               [ OK ]

 Tests passed:   1
 Tests failed:   1
 $ echo $?
 0

After:

 # ./traceroute.sh
 TEST: IPv6 traceroute                                               [FAIL]
         traceroute6 did not return 2000:102::2
 TEST: IPv4 traceroute                                               [ OK ]
 $ echo $?
 1

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 tools/testing/selftests/net/traceroute.sh | 38 ++++++-----------------
 1 file changed, 9 insertions(+), 29 deletions(-)

diff --git a/tools/testing/selftests/net/traceroute.sh b/tools/testing/selftests/net/traceroute.sh
index 282f14760940..46cb37e124ce 100755
--- a/tools/testing/selftests/net/traceroute.sh
+++ b/tools/testing/selftests/net/traceroute.sh
@@ -10,28 +10,6 @@ PAUSE_ON_FAIL=no
 
 ################################################################################
 #
-log_test()
-{
-	local rc=$1
-	local expected=$2
-	local msg="$3"
-
-	if [ ${rc} -eq ${expected} ]; then
-		printf "TEST: %-60s  [ OK ]\n" "${msg}"
-		nsuccess=$((nsuccess+1))
-	else
-		ret=1
-		nfail=$((nfail+1))
-		printf "TEST: %-60s  [FAIL]\n" "${msg}"
-		if [ "${PAUSE_ON_FAIL}" = "yes" ]; then
-			echo
-			echo "hit enter to continue, 'q' to quit"
-			read a
-			[ "$a" = "q" ] && exit 1
-		fi
-	fi
-}
-
 run_cmd()
 {
 	local ns
@@ -210,9 +188,12 @@ run_traceroute6()
 
 	setup_traceroute6
 
+	RET=0
+
 	# traceroute6 host-2 from host-1 (expects 2000:102::2)
 	run_cmd $h1 "traceroute6 2000:103::4 | grep -q 2000:102::2"
-	log_test $? 0 "IPV6 traceroute"
+	check_err $? "traceroute6 did not return 2000:102::2"
+	log_test "IPv6 traceroute"
 
 	cleanup_traceroute6
 }
@@ -275,9 +256,12 @@ run_traceroute()
 
 	setup_traceroute
 
+	RET=0
+
 	# traceroute host-2 from host-1 (expects 1.0.1.1). Takes a while.
 	run_cmd $h1 "traceroute 1.0.2.4 | grep -q 1.0.1.1"
-	log_test $? 0 "IPV4 traceroute"
+	check_err $? "traceroute did not return 1.0.1.1"
+	log_test "IPv4 traceroute"
 
 	cleanup_traceroute
 }
@@ -294,9 +278,6 @@ run_tests()
 ################################################################################
 # main
 
-declare -i nfail=0
-declare -i nsuccess=0
-
 while getopts :pv o
 do
 	case $o in
@@ -308,5 +289,4 @@ done
 
 run_tests
 
-printf "\nTests passed: %3d\n" ${nsuccess}
-printf "Tests failed: %3d\n"   ${nfail}
+exit "${EXIT_STATUS}"
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 5/8] selftests: traceroute: Use require_command()
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
                   ` (3 preceding siblings ...)
  2025-09-01  8:30 ` [PATCH net-next 4/8] selftests: traceroute: Return correct value on failure Ido Schimmel
@ 2025-09-01  8:30 ` Ido Schimmel
  2025-09-02  2:38   ` David Ahern
  2025-09-01  8:30 ` [PATCH net-next 6/8] selftests: traceroute: Reword comment Ido Schimmel
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

Use require_command() so that the test will return SKIP (4) when a
required command is not present.

Before:

 # ./traceroute.sh
 SKIP: Could not run IPV6 test without traceroute6
 SKIP: Could not run IPV4 test without traceroute
 $ echo $?
 0

After:

 # ./traceroute.sh
 TEST: traceroute6 not installed                                    [SKIP]
 $ echo $?
 4

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 tools/testing/selftests/net/traceroute.sh | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/net/traceroute.sh b/tools/testing/selftests/net/traceroute.sh
index 46cb37e124ce..1ac91eebd16f 100755
--- a/tools/testing/selftests/net/traceroute.sh
+++ b/tools/testing/selftests/net/traceroute.sh
@@ -181,11 +181,6 @@ setup_traceroute6()
 
 run_traceroute6()
 {
-	if [ ! -x "$(command -v traceroute6)" ]; then
-		echo "SKIP: Could not run IPV6 test without traceroute6"
-		return
-	fi
-
 	setup_traceroute6
 
 	RET=0
@@ -249,11 +244,6 @@ setup_traceroute()
 
 run_traceroute()
 {
-	if [ ! -x "$(command -v traceroute)" ]; then
-		echo "SKIP: Could not run IPV4 test without traceroute"
-		return
-	fi
-
 	setup_traceroute
 
 	RET=0
@@ -287,6 +277,9 @@ do
 	esac
 done
 
+require_command traceroute6
+require_command traceroute
+
 run_tests
 
 exit "${EXIT_STATUS}"
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 6/8] selftests: traceroute: Reword comment
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
                   ` (4 preceding siblings ...)
  2025-09-01  8:30 ` [PATCH net-next 5/8] selftests: traceroute: Use require_command() Ido Schimmel
@ 2025-09-01  8:30 ` Ido Schimmel
  2025-09-01  8:30 ` [PATCH net-next 7/8] selftests: traceroute: Test traceroute with different source IPs Ido Schimmel
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

Both of the addresses are configured as primary addresses, but the
kernel is expected to choose 10.0.1.1/24 as the source IP of the ICMP
error message since it is on the same subnet as the destination IP of
the message (10.0.1.3/24). Reword the comment to reflect that.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 tools/testing/selftests/net/traceroute.sh | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/traceroute.sh b/tools/testing/selftests/net/traceroute.sh
index 1ac91eebd16f..8dc4e5d03e43 100755
--- a/tools/testing/selftests/net/traceroute.sh
+++ b/tools/testing/selftests/net/traceroute.sh
@@ -203,10 +203,10 @@ run_traceroute6()
 # |H1|--------------------------|R1|--------------------------|H2|
 # ----            N1            ----            N2            ----
 #
-# where net.ipv4.icmp_errors_use_inbound_ifaddr is set on R1 and
-# 1.0.3.1/24 and 1.0.1.1/24 are respectively R1's primary and secondary
-# address on N1.
-#
+# where net.ipv4.icmp_errors_use_inbound_ifaddr is set on R1 and 1.0.3.1/24 and
+# 1.0.1.1/24 are R1's primary addresses on N1. The kernel is expected to prefer
+# a source address that is on the same subnet as the destination IP of the ICMP
+# error message.
 
 cleanup_traceroute()
 {
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 7/8] selftests: traceroute: Test traceroute with different source IPs
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
                   ` (5 preceding siblings ...)
  2025-09-01  8:30 ` [PATCH net-next 6/8] selftests: traceroute: Reword comment Ido Schimmel
@ 2025-09-01  8:30 ` Ido Schimmel
  2025-09-02  2:39   ` David Ahern
  2025-09-01  8:30 ` [PATCH net-next 8/8] selftests: traceroute: Add VRF tests Ido Schimmel
  2025-09-01  8:40 ` [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
  8 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

When generating ICMP error messages, the kernel will prefer a source IP
that is on the same subnet as the destination IP (see
inet_select_addr()). Test this behavior by invoking traceroute with
different source IPs and checking that the ICMP error message is
generated with a source IP in the same subnet.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 tools/testing/selftests/net/traceroute.sh | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/traceroute.sh b/tools/testing/selftests/net/traceroute.sh
index 8dc4e5d03e43..0ab9eccf1499 100755
--- a/tools/testing/selftests/net/traceroute.sh
+++ b/tools/testing/selftests/net/traceroute.sh
@@ -196,9 +196,10 @@ run_traceroute6()
 ################################################################################
 # traceroute test
 #
-# Verify that traceroute from H1 to H2 shows 1.0.1.1 in this scenario
+# Verify that traceroute from H1 to H2 shows 1.0.3.1 and 1.0.1.1 when
+# traceroute uses 1.0.3.3 and 1.0.1.3 as the source IP, respectively.
 #
-#                    1.0.3.1/24
+#      1.0.3.3/24    1.0.3.1/24
 # ---- 1.0.1.3/24    1.0.1.1/24 ---- 1.0.2.1/24    1.0.2.4/24 ----
 # |H1|--------------------------|R1|--------------------------|H2|
 # ----            N1            ----            N2            ----
@@ -226,6 +227,7 @@ setup_traceroute()
 
 	connect_ns $h1 eth0 1.0.1.3/24 - \
 	           $router eth1 1.0.3.1/24 -
+	ip -n "$h1" addr add 1.0.3.3/24 dev eth0
 	ip netns exec $h1 ip route add default via 1.0.1.1
 
 	ip netns exec $router ip addr add 1.0.1.1/24 dev eth1
@@ -248,9 +250,12 @@ run_traceroute()
 
 	RET=0
 
-	# traceroute host-2 from host-1 (expects 1.0.1.1). Takes a while.
-	run_cmd $h1 "traceroute 1.0.2.4 | grep -q 1.0.1.1"
+	# traceroute host-2 from host-1. Expect a source IP that is on the same
+	# subnet as destination IP of the ICMP error message.
+	run_cmd "$h1" "traceroute -s 1.0.1.3 1.0.2.4 | grep -q 1.0.1.1"
 	check_err $? "traceroute did not return 1.0.1.1"
+	run_cmd "$h1" "traceroute -s 1.0.3.3 1.0.2.4 | grep -q 1.0.3.1"
+	check_err $? "traceroute did not return 1.0.3.1"
 	log_test "IPv4 traceroute"
 
 	cleanup_traceroute
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 8/8] selftests: traceroute: Add VRF tests
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
                   ` (6 preceding siblings ...)
  2025-09-01  8:30 ` [PATCH net-next 7/8] selftests: traceroute: Test traceroute with different source IPs Ido Schimmel
@ 2025-09-01  8:30 ` Ido Schimmel
  2025-09-02  2:40   ` David Ahern
  2025-09-01  8:40 ` [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
  8 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:30 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, dsahern, petrm,
	linux-security-module, Ido Schimmel

Create versions of the existing test cases where the routers generating
the ICMP error messages are using VRFs. Check that the source IPs of
these messages do not change in the presence of VRFs.

IPv6 always behaved correctly, but IPv4 fails when reverting "ipv4:
icmp: Fix source IP derivation in presence of VRFs".

Without IPv4 change:

 # ./traceroute.sh
 TEST: IPv6 traceroute                                               [ OK ]
 TEST: IPv6 traceroute with VRF                                      [ OK ]
 TEST: IPv4 traceroute                                               [ OK ]
 TEST: IPv4 traceroute with VRF                                      [FAIL]
         traceroute did not return 1.0.3.1
 $ echo $?
 1

The test fails because the ICMP error message is sent with the VRF
device's IP (1.0.4.1):

 # traceroute -n -s 1.0.1.3 1.0.2.4
 traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
  1  1.0.4.1  0.165 ms  0.110 ms  0.103 ms
  2  1.0.2.4  0.098 ms  0.085 ms  0.078 ms
 # traceroute -n -s 1.0.3.3 1.0.2.4
 traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
  1  1.0.4.1  0.201 ms  0.138 ms  0.129 ms
  2  1.0.2.4  0.123 ms  0.105 ms  0.098 ms

With IPv4 change:

 # ./traceroute.sh
 TEST: IPv6 traceroute                                               [ OK ]
 TEST: IPv6 traceroute with VRF                                      [ OK ]
 TEST: IPv4 traceroute                                               [ OK ]
 TEST: IPv4 traceroute with VRF                                      [ OK ]
 $ echo $?
 0

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---
 tools/testing/selftests/net/traceroute.sh | 178 ++++++++++++++++++++++
 1 file changed, 178 insertions(+)

diff --git a/tools/testing/selftests/net/traceroute.sh b/tools/testing/selftests/net/traceroute.sh
index 0ab9eccf1499..dbb34c7e09ce 100755
--- a/tools/testing/selftests/net/traceroute.sh
+++ b/tools/testing/selftests/net/traceroute.sh
@@ -193,6 +193,110 @@ run_traceroute6()
 	cleanup_traceroute6
 }
 
+################################################################################
+# traceroute6 with VRF test
+#
+# Verify that in this scenario
+#
+#        ------------------------ N2
+#         |                    |
+#       ------              ------  N3  ----
+#       | R1 |              | R2 |------|H2|
+#       ------              ------      ----
+#         |                    |
+#        ------------------------ N1
+#                  |
+#                 ----
+#                 |H1|
+#                 ----
+#
+# Where H1's default route goes through R1 and R1's default route goes through
+# R2 over N2, traceroute6 from H1 to H2 reports R2's address on N2 and not N1.
+# The interfaces connecting R2 to the different subnets are membmer in a VRF
+# and the intention is to check that traceroute6 does not report the VRF's
+# address.
+#
+# Addresses are assigned as follows:
+#
+# N1: 2000:101::/64
+# N2: 2000:102::/64
+# N3: 2000:103::/64
+#
+# R1's host part of address: 1
+# R2's host part of address: 2
+# H1's host part of address: 3
+# H2's host part of address: 4
+#
+# For example:
+# the IPv6 address of R1's interface on N2 is 2000:102::1/64
+
+cleanup_traceroute6_vrf()
+{
+	cleanup_all_ns
+}
+
+setup_traceroute6_vrf()
+{
+	# Start clean
+	cleanup_traceroute6_vrf
+
+	setup_ns h1 h2 r1 r2
+	create_ns "$h1"
+	create_ns "$h2"
+	create_ns "$r1"
+	create_ns "$r2"
+
+	ip -n "$r2" link add name vrf100 up type vrf table 100
+	ip -n "$r2" addr add 2001:db8:100::1/64 dev vrf100
+
+	# Setup N3
+	connect_ns "$r2" eth3 - 2000:103::2/64 "$h2" eth3 - 2000:103::4/64
+
+	ip -n "$r2" link set dev eth3 master vrf100
+
+	ip -n "$h2" route add default via 2000:103::2
+
+	# Setup N2
+	connect_ns "$r1" eth2 - 2000:102::1/64 "$r2" eth2 - 2000:102::2/64
+
+	ip -n "$r1" route add default via 2000:102::2
+
+	ip -n "$r2" link set dev eth2 master vrf100
+
+	# Setup N1. host-1 and router-2 connect to a bridge in router-1.
+	ip -n "$r1" link add name br100 up type bridge
+	ip -n "$r1" addr add 2000:101::1/64 dev br100
+
+	connect_ns "$h1" eth0 - 2000:101::3/64 "$r1" eth0 - -
+
+	ip -n "$h1" route add default via 2000:101::1
+
+	ip -n "$r1" link set dev eth0 master br100
+
+	connect_ns "$r2" eth1 - 2000:101::2/64 "$r1" eth1 - -
+
+	ip -n "$r2" link set dev eth1 master vrf100
+
+	ip -n "$r1" link set dev eth1 master br100
+
+	# Prime the network
+	ip netns exec "$h1" ping6 -c5 2000:103::4 >/dev/null 2>&1
+}
+
+run_traceroute6_vrf()
+{
+	setup_traceroute6_vrf
+
+	RET=0
+
+	# traceroute6 host-2 from host-1 (expects 2000:102::2)
+	run_cmd "$h1" "traceroute6 2000:103::4 | grep 2000:102::2"
+	check_err $? "traceroute6 did not return 2000:102::2"
+	log_test "IPv6 traceroute with VRF"
+
+	cleanup_traceroute6_vrf
+}
+
 ################################################################################
 # traceroute test
 #
@@ -261,13 +365,87 @@ run_traceroute()
 	cleanup_traceroute
 }
 
+################################################################################
+# traceroute with VRF test
+#
+# Verify that traceroute from H1 to H2 shows 1.0.3.1 and 1.0.1.1 when
+# traceroute uses 1.0.3.3 and 1.0.1.3 as the source IP, respectively. The
+# intention is to check that the kernel does not choose an IP assigned to the
+# VRF device, but rather an address from the VRF port (eth1) that received the
+# packet that generates the ICMP error message.
+#
+#                          1.0.4.1/24 (vrf100)
+#      1.0.3.3/24    1.0.3.1/24
+# ---- 1.0.1.3/24    1.0.1.1/24 ---- 1.0.2.1/24    1.0.2.4/24 ----
+# |H1|--------------------------|R1|--------------------------|H2|
+# ----            N1            ----            N2            ----
+
+cleanup_traceroute_vrf()
+{
+	cleanup_all_ns
+}
+
+setup_traceroute_vrf()
+{
+	# Start clean
+	cleanup_traceroute_vrf
+
+	setup_ns h1 h2 router
+	create_ns "$h1"
+	create_ns "$h2"
+	create_ns "$router"
+
+	ip -n "$router" link add name vrf100 up type vrf table 100
+	ip -n "$router" addr add 1.0.4.1/24 dev vrf100
+
+	connect_ns "$h1" eth0 1.0.1.3/24 - \
+	           "$router" eth1 1.0.1.1/24 -
+
+	ip -n "$h1" addr add 1.0.3.3/24 dev eth0
+	ip -n "$h1" route add default via 1.0.1.1
+
+	ip -n "$router" link set dev eth1 master vrf100
+	ip -n "$router" addr add 1.0.3.1/24 dev eth1
+	ip netns exec "$router" sysctl -qw \
+		net.ipv4.icmp_errors_use_inbound_ifaddr=1
+
+	connect_ns "$h2" eth0 1.0.2.4/24 - \
+	           "$router" eth2 1.0.2.1/24 -
+
+	ip -n "$h2" route add default via 1.0.2.1
+
+	ip -n "$router" link set dev eth2 master vrf100
+
+	# Prime the network
+	ip netns exec "$h1" ping -c5 1.0.2.4 >/dev/null 2>&1
+}
+
+run_traceroute_vrf()
+{
+	setup_traceroute_vrf
+
+	RET=0
+
+	# traceroute host-2 from host-1. Expect a source IP that is on the same
+	# subnet as destination IP of the ICMP error message.
+	run_cmd "$h1" "traceroute -s 1.0.1.3 1.0.2.4 | grep 1.0.1.1"
+	check_err $? "traceroute did not return 1.0.1.1"
+	run_cmd "$h1" "traceroute -s 1.0.3.3 1.0.2.4 | grep 1.0.3.1"
+	check_err $? "traceroute did not return 1.0.3.1"
+	log_test "IPv4 traceroute with VRF"
+
+	cleanup_traceroute_vrf
+}
+
 ################################################################################
 # Run tests
 
 run_tests()
 {
 	run_traceroute6
+	run_traceroute6_vrf
 	run_traceroute
+	run_traceroute_vrf
 }
 
 ################################################################################
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs
  2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
                   ` (7 preceding siblings ...)
  2025-09-01  8:30 ` [PATCH net-next 8/8] selftests: traceroute: Add VRF tests Ido Schimmel
@ 2025-09-01  8:40 ` Ido Schimmel
  2025-09-01 18:43   ` Jakub Kicinski
  8 siblings, 1 reply; 18+ messages in thread
From: Ido Schimmel @ 2025-09-01  8:40 UTC (permalink / raw)
  To: netdev, kuba, pabeni
  Cc: davem, edumazet, horms, paul, dsahern, petrm,
	linux-security-module

On Mon, Sep 01, 2025 at 11:30:19AM +0300, Ido Schimmel wrote:
> Align IPv4 with IPv6 and in the presence of VRFs generate ICMP error
> messages with a source IP that is derived from the receiving interface
> and not from its VRF master. This is especially important when the error
> messages are "Time Exceeded" messages as it means that utilities like
> traceroute will show an incorrect packet path.
> 
> Patches #1-#2 are preparations.
> 
> Patch #3 is the actual change.
> 
> Patches #4-#7 make small improvements in the existing traceroute test.
> 
> Patch #8 extends the traceroute test with VRF test cases for both IPv4
> and IPv6.

Jakub / Paolo, patch #2 is going to conflict with the following net
patch:

https://lore.kernel.org/all/20250828091435.161962-1-fabian@blaese.de/

Resolution is below. Please let me know if you prefer that I repost next
week in order to avoid the conflict.

@@ -799,15 +800,16 @@ EXPORT_SYMBOL(__icmp_send);
 void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 {
        struct sk_buff *cloned_skb = NULL;
-       struct ip_options opts = { 0 };
        enum ip_conntrack_info ctinfo;
        enum ip_conntrack_dir dir;
+       struct inet_skb_parm parm;
        struct nf_conn *ct;
        __be32 orig_ip;
 
+       memset(&parm, 0, sizeof(parm));
        ct = nf_ct_get(skb_in, &ctinfo);
        if (!ct || !(READ_ONCE(ct->status) & IPS_NAT_MASK)) {
-               __icmp_send(skb_in, type, code, info, &opts);
+               __icmp_send(skb_in, type, code, info, &parm);
                return;
        }
 
@@ -823,7 +825,7 @@ void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
        orig_ip = ip_hdr(skb_in)->saddr;
        dir = CTINFO2DIR(ctinfo);
        ip_hdr(skb_in)->saddr = ct->tuplehash[dir].tuple.src.u3.ip;
-       __icmp_send(skb_in, type, code, info, &opts);
+       __icmp_send(skb_in, type, code, info, &parm);
        ip_hdr(skb_in)->saddr = orig_ip;
 out:
        consume_skb(cloned_skb);


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs
  2025-09-01  8:40 ` [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
@ 2025-09-01 18:43   ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2025-09-01 18:43 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, pabeni, davem, edumazet, horms, paul, dsahern, petrm,
	linux-security-module

On Mon, 1 Sep 2025 11:40:26 +0300 Ido Schimmel wrote:
> Resolution is below. Please let me know if you prefer that I repost next
> week in order to avoid the conflict.

Yes, please!
-- 
pw-bot: defer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/8] ipv4: cipso: Simplify IP options handling in cipso_v4_error()
  2025-09-01  8:30 ` [PATCH net-next 1/8] ipv4: cipso: Simplify IP options handling in cipso_v4_error() Ido Schimmel
@ 2025-09-02  2:36   ` David Ahern
  0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2025-09-02  2:36 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, petrm,
	linux-security-module

On 9/1/25 2:30 AM, Ido Schimmel wrote:
> When __ip_options_compile() is called with an skb, the IP options are
> parsed from the skb data into the provided IP option argument. This is
> in contrast to the case where the skb argument is NULL and the options
> are parsed from opt->__data.
> 
> Given that cipso_v4_error() always passes an skb to
> __ip_options_compile(), there is no need to allocate an extra 40 bytes
> (maximum IP options size).
> 
> Therefore, simplify the function by removing these extra bytes and make
> the function similar to ipv4_send_dest_unreach() which also calls both
> __ip_options_compile() and __icmp_send().
> 
> This is a preparation for changing the arguments being passed to
> __icmp_send().
> 
> No functional changes intended.
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/ipv4/cipso_ipv4.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 2/8] ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send()
  2025-09-01  8:30 ` [PATCH net-next 2/8] ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send() Ido Schimmel
@ 2025-09-02  2:36   ` David Ahern
  0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2025-09-02  2:36 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, petrm,
	linux-security-module

On 9/1/25 2:30 AM, Ido Schimmel wrote:
> __icmp_send() is used to generate ICMP error messages in response to
> various situations such as MTU errors (i.e., "Fragmentation Required")
> and too many hops (i.e., "Time Exceeded").
> 
> The skb that generated the error does not necessarily come from the IPv4
> layer and does not always have a valid IPv4 control block in skb->cb.
> 
> Therefore, commit 9ef6b42ad6fd ("net: Add __icmp_send helper.") changed
> the function to take the IP options structure as argument instead of
> deriving it from the skb's control block. Some callers of this function
> such as icmp_send() pass the IP options structure from the skb's control
> block as in these call paths the control block is known to be valid, but
> other callers simply pass a zeroed structure.
> 
> A subsequent patch will need __icmp_send() to access more information
> from the IPv4 control block (specifically, the ifindex of the input
> interface). As a preparation for this change, change the function to
> take the IPv4 control block structure as an argument instead of the IP
> options structure. This makes the function similar to its IPv6
> counterpart that already takes the IPv6 control block structure as an
> argument.
> 
> No functional changes intended.
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  include/net/icmp.h    | 10 ++++++----
>  net/ipv4/cipso_ipv4.c | 12 ++++++------
>  net/ipv4/icmp.c       | 12 +++++++-----
>  net/ipv4/route.c      | 10 +++++-----
>  4 files changed, 24 insertions(+), 20 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/8] ipv4: icmp: Fix source IP derivation in presence of VRFs
  2025-09-01  8:30 ` [PATCH net-next 3/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
@ 2025-09-02  2:37   ` David Ahern
  0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2025-09-02  2:37 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, petrm,
	linux-security-module

On 9/1/25 2:30 AM, Ido Schimmel wrote:
> When the "icmp_errors_use_inbound_ifaddr" sysctl is enabled, the source
> IP of ICMP error messages should be the "primary address of the
> interface that received the packet that caused the icmp error".
> 
> The IPv4 ICMP code determines this interface using inet_iif() which in
> the input path translates to skb->skb_iif. If the interface that
> received the packet is a VRF port, skb->skb_iif will contain the ifindex
> of the VRF device and not that of the receiving interface. This is
> because in the input path the VRF driver overrides skb->skb_iif with the
> ifindex of the VRF device itself (see vrf_ip_rcv()).
> 
> As such, the source IP that will be chosen for the ICMP error message is
> either an address assigned to the VRF device itself (if present) or an
> address assigned to some VRF port, not necessarily the input or output
> interface.
> 
> This behavior is especially problematic when the error messages are
> "Time Exceeded" messages as it means that utilities like traceroute will
> show an incorrect packet path.
> 
> Solve this by determining the input interface based on the iif field in
> the control block, if present. This field is set in the input path to
> skb->skb_iif and is not later overridden by the VRF driver, unlike
> skb->skb_iif.
> 
> This behavior is consistent with the IPv6 counterpart that already uses
> the iif from the control block.
> 
> Reported-by: Andy Roulin <aroulin@nvidia.com>
> Reported-by: Rajkumar Srinivasan <rajsrinivasa@nvidia.com>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  net/ipv4/icmp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 4/8] selftests: traceroute: Return correct value on failure
  2025-09-01  8:30 ` [PATCH net-next 4/8] selftests: traceroute: Return correct value on failure Ido Schimmel
@ 2025-09-02  2:37   ` David Ahern
  0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2025-09-02  2:37 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, petrm,
	linux-security-module

On 9/1/25 2:30 AM, Ido Schimmel wrote:
> The test always returns success even if some tests were modified to
> fail. Fix by converting the test to use the appropriate library
> functions instead of using its own functions.
> 
> Before:
> 
>  # ./traceroute.sh
>  TEST: IPV6 traceroute                                               [FAIL]
>  TEST: IPV4 traceroute                                               [ OK ]
> 
>  Tests passed:   1
>  Tests failed:   1
>  $ echo $?
>  0
> 
> After:
> 
>  # ./traceroute.sh
>  TEST: IPv6 traceroute                                               [FAIL]
>          traceroute6 did not return 2000:102::2
>  TEST: IPv4 traceroute                                               [ OK ]
>  $ echo $?
>  1
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  tools/testing/selftests/net/traceroute.sh | 38 ++++++-----------------
>  1 file changed, 9 insertions(+), 29 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 5/8] selftests: traceroute: Use require_command()
  2025-09-01  8:30 ` [PATCH net-next 5/8] selftests: traceroute: Use require_command() Ido Schimmel
@ 2025-09-02  2:38   ` David Ahern
  0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2025-09-02  2:38 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, petrm,
	linux-security-module

On 9/1/25 2:30 AM, Ido Schimmel wrote:
> Use require_command() so that the test will return SKIP (4) when a
> required command is not present.
> 
> Before:
> 
>  # ./traceroute.sh
>  SKIP: Could not run IPV6 test without traceroute6
>  SKIP: Could not run IPV4 test without traceroute
>  $ echo $?
>  0
> 
> After:
> 
>  # ./traceroute.sh
>  TEST: traceroute6 not installed                                    [SKIP]
>  $ echo $?
>  4
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  tools/testing/selftests/net/traceroute.sh | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 7/8] selftests: traceroute: Test traceroute with different source IPs
  2025-09-01  8:30 ` [PATCH net-next 7/8] selftests: traceroute: Test traceroute with different source IPs Ido Schimmel
@ 2025-09-02  2:39   ` David Ahern
  0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2025-09-02  2:39 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, petrm,
	linux-security-module

On 9/1/25 2:30 AM, Ido Schimmel wrote:
> When generating ICMP error messages, the kernel will prefer a source IP
> that is on the same subnet as the destination IP (see
> inet_select_addr()). Test this behavior by invoking traceroute with
> different source IPs and checking that the ICMP error message is
> generated with a source IP in the same subnet.
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  tools/testing/selftests/net/traceroute.sh | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 8/8] selftests: traceroute: Add VRF tests
  2025-09-01  8:30 ` [PATCH net-next 8/8] selftests: traceroute: Add VRF tests Ido Schimmel
@ 2025-09-02  2:40   ` David Ahern
  0 siblings, 0 replies; 18+ messages in thread
From: David Ahern @ 2025-09-02  2:40 UTC (permalink / raw)
  To: Ido Schimmel, netdev
  Cc: davem, kuba, pabeni, edumazet, horms, paul, petrm,
	linux-security-module

On 9/1/25 2:30 AM, Ido Schimmel wrote:
> Create versions of the existing test cases where the routers generating
> the ICMP error messages are using VRFs. Check that the source IPs of
> these messages do not change in the presence of VRFs.
> 
> IPv6 always behaved correctly, but IPv4 fails when reverting "ipv4:
> icmp: Fix source IP derivation in presence of VRFs".
> 
> Without IPv4 change:
> 
>  # ./traceroute.sh
>  TEST: IPv6 traceroute                                               [ OK ]
>  TEST: IPv6 traceroute with VRF                                      [ OK ]
>  TEST: IPv4 traceroute                                               [ OK ]
>  TEST: IPv4 traceroute with VRF                                      [FAIL]
>          traceroute did not return 1.0.3.1
>  $ echo $?
>  1
> 
> The test fails because the ICMP error message is sent with the VRF
> device's IP (1.0.4.1):
> 
>  # traceroute -n -s 1.0.1.3 1.0.2.4
>  traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
>   1  1.0.4.1  0.165 ms  0.110 ms  0.103 ms
>   2  1.0.2.4  0.098 ms  0.085 ms  0.078 ms
>  # traceroute -n -s 1.0.3.3 1.0.2.4
>  traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
>   1  1.0.4.1  0.201 ms  0.138 ms  0.129 ms
>   2  1.0.2.4  0.123 ms  0.105 ms  0.098 ms
> 
> With IPv4 change:
> 
>  # ./traceroute.sh
>  TEST: IPv6 traceroute                                               [ OK ]
>  TEST: IPv6 traceroute with VRF                                      [ OK ]
>  TEST: IPv4 traceroute                                               [ OK ]
>  TEST: IPv4 traceroute with VRF                                      [ OK ]
>  $ echo $?
>  0
> 
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>  tools/testing/selftests/net/traceroute.sh | 178 ++++++++++++++++++++++
>  1 file changed, 178 insertions(+)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-09-02  2:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-01  8:30 [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
2025-09-01  8:30 ` [PATCH net-next 1/8] ipv4: cipso: Simplify IP options handling in cipso_v4_error() Ido Schimmel
2025-09-02  2:36   ` David Ahern
2025-09-01  8:30 ` [PATCH net-next 2/8] ipv4: icmp: Pass IPv4 control block structure as an argument to __icmp_send() Ido Schimmel
2025-09-02  2:36   ` David Ahern
2025-09-01  8:30 ` [PATCH net-next 3/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
2025-09-02  2:37   ` David Ahern
2025-09-01  8:30 ` [PATCH net-next 4/8] selftests: traceroute: Return correct value on failure Ido Schimmel
2025-09-02  2:37   ` David Ahern
2025-09-01  8:30 ` [PATCH net-next 5/8] selftests: traceroute: Use require_command() Ido Schimmel
2025-09-02  2:38   ` David Ahern
2025-09-01  8:30 ` [PATCH net-next 6/8] selftests: traceroute: Reword comment Ido Schimmel
2025-09-01  8:30 ` [PATCH net-next 7/8] selftests: traceroute: Test traceroute with different source IPs Ido Schimmel
2025-09-02  2:39   ` David Ahern
2025-09-01  8:30 ` [PATCH net-next 8/8] selftests: traceroute: Add VRF tests Ido Schimmel
2025-09-02  2:40   ` David Ahern
2025-09-01  8:40 ` [PATCH net-next 0/8] ipv4: icmp: Fix source IP derivation in presence of VRFs Ido Schimmel
2025-09-01 18:43   ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).