[PATCH net-next v2 0/3] icmp: Add RFC 5837 support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v2 0/3] icmp: Add RFC 5837 support
@ 2025-10-27  8:22 Ido Schimmel
  2025-10-27  8:22 ` [PATCH net-next v2 1/3] ipv4: " Ido Schimmel
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Ido Schimmel @ 2025-10-27  8:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom, Ido Schimmel

tl;dr
=====

This patchset extends certain ICMP error messages (e.g., "Time
Exceeded") with incoming interface information in accordance with RFC
5837 [1]. This is required for more meaningful traceroute results in
unnumbered networks. Like other ICMP settings, the feature is controlled
via a per-{netns, address family} sysctl. The interface and the
implementation are designed to support more ICMP extensions.

Motivation
==========

Over the years, the kernel was extended with the ability to derive the
source IP of ICMP error messages from the interface that received the
datagram which elicited the ICMP error [2][3][4]. This is especially
important for "Time Exceeded" messages as it allows traceroute users to
trace the actual packet path along the network.

The above scheme does not work in unnumbered networks. In these
networks, only the loopback / VRF interface is assigned a global IP
address while router interfaces are assigned IPv6 link-local addresses.
As such, ICMP error messages are generated with a source IP derived from
the loopback / VRF interface, making it impossible to trace the actual
packet path when parallel links exist between routers.

The problem can be solved by implementing the solution proposed by RFC
4884 [5] and RFC 5837. The former defines an ICMP extension structure
that can be appended to selected ICMP messages and carry extension
objects. The latter defines an extension object called the "Interface
Information Object" (IIO) that can carry interface information (e.g.,
name, index, MTU) about interfaces with certain roles such as the
interface that received the datagram which elicited the ICMP error.

The payload of the datagram that elicited the error (potentially padded
/ trimmed) along with the ICMP extension structure will be queued to the
error queue of the originating socket, thereby allowing traceroute
applications to parse and display the information encoded in the ICMP
extension structure. Example:

 # traceroute6 -e 2001:db8:1::3
 traceroute to 2001:db8:1::3 (2001:db8:1::3), 30 hops max, 80 byte packets
  1  2001:db8:1::2 (2001:db8:1::2) <INC:11,"eth1",mtu=1500>  0.214 ms  0.171 ms  0.162 ms
  2  2001:db8:1::3 (2001:db8:1::3) <INC:12,"eth2",mtu=1500>  0.154 ms  0.135 ms  0.127 ms

 # traceroute -e 192.0.2.3
 traceroute to 192.0.2.3 (192.0.2.3), 30 hops max, 60 byte packets
  1  192.0.2.2 (192.0.2.2) <INC:11,"eth1",mtu=1500>  0.191 ms  0.148 ms  0.144 ms
  2  192.0.2.3 (192.0.2.3) <INC:12,"eth2",mtu=1500>  0.137 ms  0.122 ms  0.114 ms

Implementation
==============

As previously stated, the feature is controlled via a per-{netns,
address} sysctl. Specifically, a bit mask where each bit controls the
addition of a different ICMP extension to ICMP error messages.
Currently, only a single value is supported, to append the incoming
interface information.

Key points:

1. Global knob vs finer control. I am not aware of users who require
finer control, but it is possible that some users will want to avoid
appending ICMP extensions when the packet is sent out of a specific
interface (e.g., the management interface) or to a specific subnet. This
can be accomplished via a tc-bpf program that trims the ICMP extension
structure. An example program can be found here [6].

2. Split implementation between IPv4 / IPv6. While the implementation is
currently similar, there are some differences between both address
families. In addition, some extensions (e.g., RFC 8883 [7]) are
IPv6-specific. Given the above and given that the implementation is not
very complex, it makes sense to keep both implementations separate.

3. Compatibility with legacy applications. RFC 4884 from 2007 extended
certain ICMP messages with a length field that encodes the length of the
"original datagram" field, so that applications will be able to tell
where the "original datagram" ends and where the ICMP extension
structure starts.

Before the introduction of the IP{,6}_RECVERR_RFC4884 socket options
[8][9] in 2020 it was impossible for applications to know where the ICMP
extension structure starts and to this day some applications assume that
it starts at offset 128, which is the minimum length of the "original
datagram" field as specified by RFC 4884.

Therefore, in order to be compatible with both legacy and modern
applications, the datagram that elicited the ICMP error is trimmed /
padded to 128 bytes before appending the ICMP extension structure.

This behavior is specifically called out by RFC 4884: "Those wishing to
be backward compatible with non-compliant TRACEROUTE implementations
will include exactly 128 octets" [10].

Note that in 128 bytes we should be able to include enough headers for
the originating node to match the ICMP error message with the relevant
socket. For example, the following headers will be present in the
"original datagram" field when a VXLAN encapsulated IPv6 packet elicits
an ICMP error in an IPv6 underlay: IPv6 (40) | UDP (8) | VXLAN (8) | Eth
(14) | IPv6 (40) | UDP (8). Overall, 118 bytes.

If the 128 bytes limit proves to be insufficient for some use case, we
can consider dedicating a new bit in the previously mentioned sysctl to
allow for more bytes to be included in the "original datagram" field.

4. Extensibility. This patchset adds partial support for a single ICMP
extension. However, the interface and the implementation should be able
to support more extensions, if needed. Examples:

* More interface information objects as part of RFC 5837. We should be
  able to derive the outgoing interface information and nexthop IP from
  the dst entry attached to the packet that elicited the error.

* Node identification object (e.g., hostname / loopback IP) [11].

* Extended Information object which encodes aggregate header limits as
  part of RFC 8883.

A previous proposal from Ishaan Gandhi and Ron Bonica is available here
[12].

Testing
=======

The existing traceroute selftest is extended to test that ICMP
extensions are reported correctly when enabled. Both address families
are tested and with different packet sizes in order to make sure that
trimming / padding works correctly. Tested that packets are parsed
correctly by the IP{,6}_RECVERR_RFC4884 socket options using Willem's
selftest [13].

Changelog
=========

Changes since v1 [14]:

* Patches #1-#2: Added a comment about field ordering and review tags.

* Patch #3: Converted "sysctl" to "echo" when testing the return value.
  Added a check to skip the test if traceroute version is older
  than 2.1.5.

[1] https://datatracker.ietf.org/doc/html/rfc5837
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c2fb7f93cb20621772bf304f3dba0849942e5db
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fac6fce9bdb59837bb89930c3a92f5e0d1482f0b
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4a8c416602d97a4e2073ed563d4d4c7627de19cf
[5] https://datatracker.ietf.org/doc/html/rfc4884
[6] https://gist.github.com/idosch/5013448cdb5e9e060e6bfdc8b433577c
[7] https://datatracker.ietf.org/doc/html/rfc8883
[8] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eba75c587e811d3249c8bd50d22bb2266ccd3c0f
[9] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=01370434df85eb76ecb1527a4466013c4aca2436
[10] https://datatracker.ietf.org/doc/html/rfc4884#section-5.3
[11] https://datatracker.ietf.org/doc/html/draft-ietf-intarea-extended-icmp-nodeid-04
[12] https://lore.kernel.org/netdev/20210317221959.4410-1-ishaangandhi@gmail.com/
[13] https://lore.kernel.org/netdev/aPpMItF35gwpgzZx@shredder/
[14] https://lore.kernel.org/netdev/20251022065349.434123-1-idosch@nvidia.com/

Ido Schimmel (3):
  ipv4: icmp: Add RFC 5837 support
  ipv6: icmp: Add RFC 5837 support
  selftests: traceroute: Add ICMP extensions tests

 Documentation/networking/ip-sysctl.rst    |  34 +++
 include/linux/icmp.h                      |  32 +++
 include/net/netns/ipv4.h                  |   1 +
 include/net/netns/ipv6.h                  |   1 +
 net/core/dev.c                            |   1 +
 net/ipv4/icmp.c                           | 191 ++++++++++++-
 net/ipv4/sysctl_net_ipv4.c                |  11 +
 net/ipv6/af_inet6.c                       |   1 +
 net/ipv6/icmp.c                           | 214 ++++++++++++++-
 tools/testing/selftests/net/traceroute.sh | 313 ++++++++++++++++++++++
 10 files changed, 796 insertions(+), 3 deletions(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 1/3] ipv4: icmp: Add RFC 5837 support
  2025-10-27  8:22 [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Ido Schimmel
@ 2025-10-27  8:22 ` Ido Schimmel
  2025-10-27  8:35   ` Eric Dumazet
  2025-11-07 17:33   ` David 'equinox' Lamparter
  2025-10-27  8:22 ` [PATCH net-next v2 2/3] ipv6: " Ido Schimmel
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Ido Schimmel @ 2025-10-27  8:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom, Ido Schimmel

Add the ability to append the incoming IP interface information to
ICMPv4 error messages in accordance with RFC 5837 and RFC 4884. This is
required for more meaningful traceroute results in unnumbered networks.

The feature is disabled by default and controlled via a new sysctl
("net.ipv4.icmp_errors_extension_mask") which accepts a bitmask of ICMP
extensions to append to ICMP error messages. Currently, only a single
value is supported, but the interface and the implementation should be
able to support more extensions, if needed.

Clone the skb and copy the relevant data portions before modifying the
skb as the caller of __icmp_send() still owns the skb after the function
returns. This should be fine since by default ICMP error messages are
rate limited to 1000 per second and no more than 1 per second per
specific host.

Trim or pad the packet to 128 bytes before appending the ICMP extension
structure in order to be compatible with legacy applications that assume
that the ICMP extension structure always starts at this offset (the
minimum length specified by RFC 4884).

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---

Notes:
    v2:
    * Add a comment about field ordering.

 Documentation/networking/ip-sysctl.rst |  17 +++
 include/linux/icmp.h                   |  32 +++++
 include/net/netns/ipv4.h               |   1 +
 net/ipv4/icmp.c                        | 191 ++++++++++++++++++++++++-
 net/ipv4/sysctl_net_ipv4.c             |  11 ++
 5 files changed, 251 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index a06cb99d66dc..ece1187ba0f1 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -1796,6 +1796,23 @@ icmp_errors_use_inbound_ifaddr - BOOLEAN
 
 	Default: 0 (disabled)
 
+icmp_errors_extension_mask - UNSIGNED INTEGER
+	Bitmask of ICMP extensions to append to ICMPv4 error messages
+	("Destination Unreachable", "Time Exceeded" and "Parameter Problem").
+	The original datagram is trimmed / padded to 128 bytes in order to be
+	compatible with applications that do not comply with RFC 4884.
+
+	Possible extensions are:
+
+	==== ==============================================================
+	0x01 Incoming IP interface information according to RFC 5837.
+	     Extension will include the index, IPv4 address (if present),
+	     name and MTU of the IP interface that received the datagram
+	     which elicited the ICMP error.
+	==== ==============================================================
+
+	Default: 0x00 (no extensions)
+
 igmp_max_memberships - INTEGER
 	Change the maximum number of multicast groups we can subscribe to.
 	Default: 20
diff --git a/include/linux/icmp.h b/include/linux/icmp.h
index 0af4d210ee31..043ec5d9c882 100644
--- a/include/linux/icmp.h
+++ b/include/linux/icmp.h
@@ -40,4 +40,36 @@ void ip_icmp_error_rfc4884(const struct sk_buff *skb,
 			   struct sock_ee_data_rfc4884 *out,
 			   int thlen, int off);
 
+/* RFC 4884 */
+#define ICMP_EXT_ORIG_DGRAM_MIN_LEN	128
+#define ICMP_EXT_VERSION_2		2
+
+/* ICMP Extension Object Classes */
+#define ICMP_EXT_OBJ_CLASS_IIO		2	/* RFC 5837 */
+
+/* Interface Information Object - RFC 5837 */
+enum {
+	ICMP_EXT_CTYPE_IIO_ROLE_IIF,
+};
+
+#define ICMP_EXT_CTYPE_IIO_ROLE(ROLE)	((ROLE) << 6)
+#define ICMP_EXT_CTYPE_IIO_MTU		BIT(0)
+#define ICMP_EXT_CTYPE_IIO_NAME		BIT(1)
+#define ICMP_EXT_CTYPE_IIO_IPADDR	BIT(2)
+#define ICMP_EXT_CTYPE_IIO_IFINDEX	BIT(3)
+
+struct icmp_ext_iio_name_subobj {
+	u8 len;
+	char name[IFNAMSIZ];
+};
+
+enum {
+	/* RFC 5837 - Incoming IP Interface Role */
+	ICMP_ERR_EXT_IIO_IIF,
+	/* Add new constants above. Used by "icmp_errors_extension_mask"
+	 * sysctl.
+	 */
+	ICMP_ERR_EXT_COUNT,
+};
+
 #endif	/* _LINUX_ICMP_H */
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 34eb3aecb3f2..0e96c90e56c6 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -135,6 +135,7 @@ struct netns_ipv4 {
 	u8 sysctl_icmp_echo_ignore_broadcasts;
 	u8 sysctl_icmp_ignore_bogus_error_responses;
 	u8 sysctl_icmp_errors_use_inbound_ifaddr;
+	u8 sysctl_icmp_errors_extension_mask;
 	int sysctl_icmp_ratelimit;
 	int sysctl_icmp_ratemask;
 	int sysctl_icmp_msgs_per_sec;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 1b7fb5d935ed..4abbec2f47ef 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -582,6 +582,185 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
 	return ERR_PTR(err);
 }
 
+struct icmp_ext_iio_addr4_subobj {
+	__be16 afi;
+	__be16 reserved;
+	__be32 addr4;
+};
+
+static unsigned int icmp_ext_iio_len(void)
+{
+	return sizeof(struct icmp_extobj_hdr) +
+		/* ifIndex */
+		sizeof(__be32) +
+		/* Interface Address Sub-Object */
+		sizeof(struct icmp_ext_iio_addr4_subobj) +
+		/* Interface Name Sub-Object. Length must be a multiple of 4
+		 * bytes.
+		 */
+		ALIGN(sizeof(struct icmp_ext_iio_name_subobj), 4) +
+		/* MTU */
+		sizeof(__be32);
+}
+
+static unsigned int icmp_ext_max_len(u8 ext_objs)
+{
+	unsigned int ext_max_len;
+
+	ext_max_len = sizeof(struct icmp_ext_hdr);
+
+	if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF))
+		ext_max_len += icmp_ext_iio_len();
+
+	return ext_max_len;
+}
+
+static __be32 icmp_ext_iio_addr4_find(const struct net_device *dev)
+{
+	struct in_device *in_dev;
+	struct in_ifaddr *ifa;
+
+	in_dev = __in_dev_get_rcu(dev);
+	if (!in_dev)
+		return 0;
+
+	/* It is unclear from RFC 5837 which IP address should be chosen, but
+	 * it makes sense to choose a global unicast address.
+	 */
+	in_dev_for_each_ifa_rcu(ifa, in_dev) {
+		if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY)
+			continue;
+		if (ifa->ifa_scope != RT_SCOPE_UNIVERSE ||
+		    ipv4_is_multicast(ifa->ifa_address))
+			continue;
+		return ifa->ifa_address;
+	}
+
+	return 0;
+}
+
+static void icmp_ext_iio_iif_append(struct net *net, struct sk_buff *skb,
+				    int iif)
+{
+	struct icmp_ext_iio_name_subobj *name_subobj;
+	struct icmp_extobj_hdr *objh;
+	struct net_device *dev;
+	__be32 data;
+
+	if (!iif)
+		return;
+
+	/* Add the fields in the order specified by RFC 5837. */
+	objh = skb_put(skb, sizeof(*objh));
+	objh->class_num = ICMP_EXT_OBJ_CLASS_IIO;
+	objh->class_type = ICMP_EXT_CTYPE_IIO_ROLE(ICMP_EXT_CTYPE_IIO_ROLE_IIF);
+
+	data = htonl(iif);
+	skb_put_data(skb, &data, sizeof(__be32));
+	objh->class_type |= ICMP_EXT_CTYPE_IIO_IFINDEX;
+
+	rcu_read_lock();
+
+	dev = dev_get_by_index_rcu(net, iif);
+	if (!dev)
+		goto out;
+
+	data = icmp_ext_iio_addr4_find(dev);
+	if (data) {
+		struct icmp_ext_iio_addr4_subobj *addr4_subobj;
+
+		addr4_subobj = skb_put_zero(skb, sizeof(*addr4_subobj));
+		addr4_subobj->afi = htons(ICMP_AFI_IP);
+		addr4_subobj->addr4 = data;
+		objh->class_type |= ICMP_EXT_CTYPE_IIO_IPADDR;
+	}
+
+	name_subobj = skb_put_zero(skb, ALIGN(sizeof(*name_subobj), 4));
+	name_subobj->len = ALIGN(sizeof(*name_subobj), 4);
+	netdev_copy_name(dev, name_subobj->name);
+	objh->class_type |= ICMP_EXT_CTYPE_IIO_NAME;
+
+	data = htonl(READ_ONCE(dev->mtu));
+	skb_put_data(skb, &data, sizeof(__be32));
+	objh->class_type |= ICMP_EXT_CTYPE_IIO_MTU;
+
+out:
+	rcu_read_unlock();
+	objh->length = htons(skb_tail_pointer(skb) - (unsigned char *)objh);
+}
+
+static void icmp_ext_objs_append(struct net *net, struct sk_buff *skb,
+				 u8 ext_objs, int iif)
+{
+	if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF))
+		icmp_ext_iio_iif_append(net, skb, iif);
+}
+
+static struct sk_buff *
+icmp_ext_append(struct net *net, struct sk_buff *skb_in, struct icmphdr *icmph,
+		unsigned int room, int iif)
+{
+	unsigned int payload_len, ext_max_len, ext_len;
+	struct icmp_ext_hdr *ext_hdr;
+	struct sk_buff *skb;
+	u8 ext_objs;
+	int nhoff;
+
+	switch (icmph->type) {
+	case ICMP_DEST_UNREACH:
+	case ICMP_TIME_EXCEEDED:
+	case ICMP_PARAMETERPROB:
+		break;
+	default:
+		return NULL;
+	}
+
+	ext_objs = READ_ONCE(net->ipv4.sysctl_icmp_errors_extension_mask);
+	if (!ext_objs)
+		return NULL;
+
+	ext_max_len = icmp_ext_max_len(ext_objs);
+	if (ICMP_EXT_ORIG_DGRAM_MIN_LEN + ext_max_len > room)
+		return NULL;
+
+	skb = skb_clone(skb_in, GFP_ATOMIC);
+	if (!skb)
+		return NULL;
+
+	nhoff = skb_network_offset(skb);
+	payload_len = min(skb->len - nhoff, ICMP_EXT_ORIG_DGRAM_MIN_LEN);
+
+	if (!pskb_network_may_pull(skb, payload_len))
+		goto free_skb;
+
+	if (pskb_trim(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN) ||
+	    __skb_put_padto(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN, false))
+		goto free_skb;
+
+	if (pskb_expand_head(skb, 0, ext_max_len, GFP_ATOMIC))
+		goto free_skb;
+
+	ext_hdr = skb_put_zero(skb, sizeof(*ext_hdr));
+	ext_hdr->version = ICMP_EXT_VERSION_2;
+
+	icmp_ext_objs_append(net, skb, ext_objs, iif);
+
+	/* Do not send an empty extension structure. */
+	ext_len = skb_tail_pointer(skb) - (unsigned char *)ext_hdr;
+	if (ext_len == sizeof(*ext_hdr))
+		goto free_skb;
+
+	ext_hdr->checksum = ip_compute_csum(ext_hdr, ext_len);
+	/* The length of the original datagram in 32-bit words (RFC 4884). */
+	icmph->un.reserved[1] = ICMP_EXT_ORIG_DGRAM_MIN_LEN / sizeof(u32);
+
+	return skb;
+
+free_skb:
+	consume_skb(skb);
+	return NULL;
+}
+
 /*
  *	Send an ICMP message in response to a situation
  *
@@ -601,6 +780,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
 	struct icmp_bxm icmp_param;
 	struct rtable *rt = skb_rtable(skb_in);
 	bool apply_ratelimit = false;
+	struct sk_buff *ext_skb;
 	struct ipcm_cookie ipc;
 	struct flowi4 fl4;
 	__be32 saddr;
@@ -770,7 +950,12 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
 	if (room <= (int)sizeof(struct iphdr))
 		goto ende;
 
-	icmp_param.data_len = skb_in->len - icmp_param.offset;
+	ext_skb = icmp_ext_append(net, skb_in, &icmp_param.data.icmph, room,
+				  parm->iif);
+	if (ext_skb)
+		icmp_param.skb = ext_skb;
+
+	icmp_param.data_len = icmp_param.skb->len - icmp_param.offset;
 	if (icmp_param.data_len > room)
 		icmp_param.data_len = room;
 	icmp_param.head_len = sizeof(struct icmphdr);
@@ -785,6 +970,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
 	trace_icmp_send(skb_in, type, code);
 
 	icmp_push_reply(sk, &icmp_param, &fl4, &ipc, &rt);
+
+	if (ext_skb)
+		consume_skb(ext_skb);
 ende:
 	ip_rt_put(rt);
 out_unlock:
@@ -1502,6 +1690,7 @@ static int __net_init icmp_sk_init(struct net *net)
 	net->ipv4.sysctl_icmp_ratelimit = 1 * HZ;
 	net->ipv4.sysctl_icmp_ratemask = 0x1818;
 	net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr = 0;
+	net->ipv4.sysctl_icmp_errors_extension_mask = 0;
 	net->ipv4.sysctl_icmp_msgs_per_sec = 1000;
 	net->ipv4.sysctl_icmp_msgs_burst = 50;
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 24dbc603cc44..0c7c8f9041cb 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -48,6 +48,8 @@ static int tcp_plb_max_rounds = 31;
 static int tcp_plb_max_cong_thresh = 256;
 static unsigned int tcp_tw_reuse_delay_max = TCP_PAWS_MSL * MSEC_PER_SEC;
 static int tcp_ecn_mode_max = 2;
+static u32 icmp_errors_extension_mask_all =
+	GENMASK_U8(ICMP_ERR_EXT_COUNT - 1, 0);
 
 /* obsolete */
 static int sysctl_tcp_low_latency __read_mostly;
@@ -674,6 +676,15 @@ static struct ctl_table ipv4_net_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE
 	},
+	{
+		.procname	= "icmp_errors_extension_mask",
+		.data		= &init_net.ipv4.sysctl_icmp_errors_extension_mask,
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= proc_dou8vec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= &icmp_errors_extension_mask_all,
+	},
 	{
 		.procname	= "icmp_ratelimit",
 		.data		= &init_net.ipv4.sysctl_icmp_ratelimit,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 2/3] ipv6: icmp: Add RFC 5837 support
  2025-10-27  8:22 [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Ido Schimmel
  2025-10-27  8:22 ` [PATCH net-next v2 1/3] ipv4: " Ido Schimmel
@ 2025-10-27  8:22 ` Ido Schimmel
  2025-10-27  8:39   ` Eric Dumazet
  2025-10-27  8:22 ` [PATCH net-next v2 3/3] selftests: traceroute: Add ICMP extensions tests Ido Schimmel
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Ido Schimmel @ 2025-10-27  8:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom, Ido Schimmel

Add the ability to append the incoming IP interface information to
ICMPv6 error messages in accordance with RFC 5837 and RFC 4884. This is
required for more meaningful traceroute results in unnumbered networks.

The feature is disabled by default and controlled via a new sysctl
("net.ipv6.icmp.errors_extension_mask") which accepts a bitmask of ICMP
extensions to append to ICMP error messages. Currently, only a single
value is supported, but the interface and the implementation should be
able to support more extensions, if needed.

Clone the skb and copy the relevant data portions before modifying the
skb as the caller of icmp6_send() still owns the skb after the function
returns. This should be fine since by default ICMP error messages are
rate limited to 1000 per second and no more than 1 per second per
specific host.

Trim or pad the packet to 128 bytes before appending the ICMP extension
structure in order to be compatible with legacy applications that assume
that the ICMP extension structure always starts at this offset (the
minimum length specified by RFC 4884).

Since commit 20e1954fe238 ("ipv6: RFC 4884 partial support for SIT/GRE
tunnels") it is possible for icmp6_send() to be called with an skb that
already contains ICMP extensions. This can happen when we receive an
ICMPv4 message with extensions from a tunnel and translate it to an
ICMPv6 message towards an IPv6 host in the overlay network. I could not
find an RFC that supports this behavior, but it makes sense to not
overwrite the original extensions that were appended to the packet.
Therefore, avoid appending extensions if the length field in the
provided ICMPv6 header is already filled.

Export netdev_copy_name() using EXPORT_IPV6_MOD_GPL() to make it
available to IPv6 when it is built as a module.

Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---

Notes:
    v2:
    * Add a comment about field ordering.

 Documentation/networking/ip-sysctl.rst |  17 ++
 include/net/netns/ipv6.h               |   1 +
 net/core/dev.c                         |   1 +
 net/ipv6/af_inet6.c                    |   1 +
 net/ipv6/icmp.c                        | 214 ++++++++++++++++++++++++-
 5 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index ece1187ba0f1..7cd35bfd39e6 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -3279,6 +3279,23 @@ error_anycast_as_unicast - BOOLEAN
 
 	Default: 0 (disabled)
 
+errors_extension_mask - UNSIGNED INTEGER
+	Bitmask of ICMP extensions to append to ICMPv6 error messages
+	("Destination Unreachable" and "Time Exceeded"). The original datagram
+	is trimmed / padded to 128 bytes in order to be compatible with
+	applications that do not comply with RFC 4884.
+
+	Possible extensions are:
+
+	==== ==============================================================
+	0x01 Incoming IP interface information according to RFC 5837.
+	     Extension will include the index, IPv6 address (if present),
+	     name and MTU of the IP interface that received the datagram
+	     which elicited the ICMP error.
+	==== ==============================================================
+
+	Default: 0x00 (no extensions)
+
 xfrm6_gc_thresh - INTEGER
 	(Obsolete since linux-4.14)
 	The threshold at which we will start garbage collecting for IPv6
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 47dc70d8100a..08d2ecc96e2b 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -56,6 +56,7 @@ struct netns_sysctl_ipv6 {
 	u8 skip_notify_on_dev_down;
 	u8 fib_notify_on_flag_change;
 	u8 icmpv6_error_anycast_as_unicast;
+	u8 icmpv6_errors_extension_mask;
 };
 
 struct netns_ipv6 {
diff --git a/net/core/dev.c b/net/core/dev.c
index 378c2d010faf..e6cc0fbc5e2a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1163,6 +1163,7 @@ void netdev_copy_name(struct net_device *dev, char *name)
 		strscpy(name, dev->name, IFNAMSIZ);
 	} while (read_seqretry(&netdev_rename_lock, seq));
 }
+EXPORT_IPV6_MOD_GPL(netdev_copy_name);
 
 /**
  *	netdev_get_name - get a netdevice name, knowing its ifindex.
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 1b0314644e0c..44d7de1eec4f 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -960,6 +960,7 @@ static int __net_init inet6_net_init(struct net *net)
 	net->ipv6.sysctl.icmpv6_echo_ignore_multicast = 0;
 	net->ipv6.sysctl.icmpv6_echo_ignore_anycast = 0;
 	net->ipv6.sysctl.icmpv6_error_anycast_as_unicast = 0;
+	net->ipv6.sysctl.icmpv6_errors_extension_mask = 0;
 
 	/* By default, rate limit error messages.
 	 * Except for pmtu discovery, it would break it.
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 56c974cf75d1..5d2f90babaa5 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -444,6 +444,193 @@ static int icmp6_iif(const struct sk_buff *skb)
 	return icmp6_dev(skb)->ifindex;
 }
 
+struct icmp6_ext_iio_addr6_subobj {
+	__be16 afi;
+	__be16 reserved;
+	struct in6_addr addr6;
+};
+
+static unsigned int icmp6_ext_iio_len(void)
+{
+	return sizeof(struct icmp_extobj_hdr) +
+		/* ifIndex */
+		sizeof(__be32) +
+		/* Interface Address Sub-Object */
+		sizeof(struct icmp6_ext_iio_addr6_subobj) +
+		/* Interface Name Sub-Object. Length must be a multiple of 4
+		 * bytes.
+		 */
+		ALIGN(sizeof(struct icmp_ext_iio_name_subobj), 4) +
+		/* MTU */
+		sizeof(__be32);
+}
+
+static unsigned int icmp6_ext_max_len(u8 ext_objs)
+{
+	unsigned int ext_max_len;
+
+	ext_max_len = sizeof(struct icmp_ext_hdr);
+
+	if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF))
+		ext_max_len += icmp6_ext_iio_len();
+
+	return ext_max_len;
+}
+
+static struct in6_addr *icmp6_ext_iio_addr6_find(const struct net_device *dev)
+{
+	struct inet6_dev *in6_dev;
+	struct inet6_ifaddr *ifa;
+
+	in6_dev = __in6_dev_get(dev);
+	if (!in6_dev)
+		return NULL;
+
+	/* It is unclear from RFC 5837 which IP address should be chosen, but
+	 * it makes sense to choose a global unicast address.
+	 */
+	list_for_each_entry_rcu(ifa, &in6_dev->addr_list, if_list) {
+		if (ifa->flags & (IFA_F_TENTATIVE | IFA_F_DADFAILED))
+			continue;
+		if (ipv6_addr_type(&ifa->addr) != IPV6_ADDR_UNICAST ||
+		    ipv6_addr_src_scope(&ifa->addr) != IPV6_ADDR_SCOPE_GLOBAL)
+			continue;
+		return &ifa->addr;
+	}
+
+	return NULL;
+}
+
+static void icmp6_ext_iio_iif_append(struct net *net, struct sk_buff *skb,
+				     int iif)
+{
+	struct icmp_ext_iio_name_subobj *name_subobj;
+	struct icmp_extobj_hdr *objh;
+	struct net_device *dev;
+	struct in6_addr *addr6;
+	__be32 data;
+
+	if (!iif)
+		return;
+
+	/* Add the fields in the order specified by RFC 5837. */
+	objh = skb_put(skb, sizeof(*objh));
+	objh->class_num = ICMP_EXT_OBJ_CLASS_IIO;
+	objh->class_type = ICMP_EXT_CTYPE_IIO_ROLE(ICMP_EXT_CTYPE_IIO_ROLE_IIF);
+
+	data = htonl(iif);
+	skb_put_data(skb, &data, sizeof(__be32));
+	objh->class_type |= ICMP_EXT_CTYPE_IIO_IFINDEX;
+
+	rcu_read_lock();
+
+	dev = dev_get_by_index_rcu(net, iif);
+	if (!dev)
+		goto out;
+
+	addr6 = icmp6_ext_iio_addr6_find(dev);
+	if (addr6) {
+		struct icmp6_ext_iio_addr6_subobj *addr6_subobj;
+
+		addr6_subobj = skb_put_zero(skb, sizeof(*addr6_subobj));
+		addr6_subobj->afi = htons(ICMP_AFI_IP6);
+		addr6_subobj->addr6 = *addr6;
+		objh->class_type |= ICMP_EXT_CTYPE_IIO_IPADDR;
+	}
+
+	name_subobj = skb_put_zero(skb, ALIGN(sizeof(*name_subobj), 4));
+	name_subobj->len = ALIGN(sizeof(*name_subobj), 4);
+	netdev_copy_name(dev, name_subobj->name);
+	objh->class_type |= ICMP_EXT_CTYPE_IIO_NAME;
+
+	data = htonl(READ_ONCE(dev->mtu));
+	skb_put_data(skb, &data, sizeof(__be32));
+	objh->class_type |= ICMP_EXT_CTYPE_IIO_MTU;
+
+out:
+	rcu_read_unlock();
+	objh->length = htons(skb_tail_pointer(skb) - (unsigned char *)objh);
+}
+
+static void icmp6_ext_objs_append(struct net *net, struct sk_buff *skb,
+				  u8 ext_objs, int iif)
+{
+	if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF))
+		icmp6_ext_iio_iif_append(net, skb, iif);
+}
+
+static struct sk_buff *
+icmp6_ext_append(struct net *net, struct sk_buff *skb_in,
+		 struct icmp6hdr *icmp6h, unsigned int room, int iif)
+{
+	unsigned int payload_len, ext_max_len, ext_len;
+	struct icmp_ext_hdr *ext_hdr;
+	struct sk_buff *skb;
+	u8 ext_objs;
+	int nhoff;
+
+	switch (icmp6h->icmp6_type) {
+	case ICMPV6_DEST_UNREACH:
+	case ICMPV6_TIME_EXCEED:
+		break;
+	default:
+		return NULL;
+	}
+
+	/* Do not overwrite existing extensions. This can happen when we
+	 * receive an ICMPv4 message with extensions from a tunnel and
+	 * translate it to an ICMPv6 message towards an IPv6 host in the
+	 * overlay network.
+	 */
+	if (icmp6h->icmp6_datagram_len)
+		return NULL;
+
+	ext_objs = READ_ONCE(net->ipv6.sysctl.icmpv6_errors_extension_mask);
+	if (!ext_objs)
+		return NULL;
+
+	ext_max_len = icmp6_ext_max_len(ext_objs);
+	if (ICMP_EXT_ORIG_DGRAM_MIN_LEN + ext_max_len > room)
+		return NULL;
+
+	skb = skb_clone(skb_in, GFP_ATOMIC);
+	if (!skb)
+		return NULL;
+
+	nhoff = skb_network_offset(skb);
+	payload_len = min(skb->len - nhoff, ICMP_EXT_ORIG_DGRAM_MIN_LEN);
+
+	if (!pskb_network_may_pull(skb, payload_len))
+		goto free_skb;
+
+	if (pskb_trim(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN) ||
+	    __skb_put_padto(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN, false))
+		goto free_skb;
+
+	if (pskb_expand_head(skb, 0, ext_max_len, GFP_ATOMIC))
+		goto free_skb;
+
+	ext_hdr = skb_put_zero(skb, sizeof(*ext_hdr));
+	ext_hdr->version = ICMP_EXT_VERSION_2;
+
+	icmp6_ext_objs_append(net, skb, ext_objs, iif);
+
+	/* Do not send an empty extension structure. */
+	ext_len = skb_tail_pointer(skb) - (unsigned char *)ext_hdr;
+	if (ext_len == sizeof(*ext_hdr))
+		goto free_skb;
+
+	ext_hdr->checksum = ip_compute_csum(ext_hdr, ext_len);
+	/* The length of the original datagram in 64-bit words (RFC 4884). */
+	icmp6h->icmp6_datagram_len = ICMP_EXT_ORIG_DGRAM_MIN_LEN / sizeof(u64);
+
+	return skb;
+
+free_skb:
+	consume_skb(skb);
+	return NULL;
+}
+
 /*
  *	Send an ICMP message in response to a packet in error
  */
@@ -458,7 +645,9 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	struct ipv6_pinfo *np;
 	const struct in6_addr *saddr = NULL;
 	bool apply_ratelimit = false;
+	struct sk_buff *ext_skb;
 	struct dst_entry *dst;
+	unsigned int room;
 	struct icmp6hdr tmp_hdr;
 	struct flowi6 fl6;
 	struct icmpv6_msg msg;
@@ -612,8 +801,13 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	msg.offset = skb_network_offset(skb);
 	msg.type = type;
 
-	len = skb->len - msg.offset;
-	len = min_t(unsigned int, len, IPV6_MIN_MTU - sizeof(struct ipv6hdr) - sizeof(struct icmp6hdr));
+	room = IPV6_MIN_MTU - sizeof(struct ipv6hdr) - sizeof(struct icmp6hdr);
+	ext_skb = icmp6_ext_append(net, skb, &tmp_hdr, room, parm->iif);
+	if (ext_skb)
+		msg.skb = ext_skb;
+
+	len = msg.skb->len - msg.offset;
+	len = min_t(unsigned int, len, room);
 	if (len < 0) {
 		net_dbg_ratelimited("icmp: len problem [%pI6c > %pI6c]\n",
 				    &hdr->saddr, &hdr->daddr);
@@ -635,6 +829,8 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
 	}
 
 out_dst_release:
+	if (ext_skb)
+		consume_skb(ext_skb);
 	dst_release(dst);
 out_unlock:
 	icmpv6_xmit_unlock(sk);
@@ -1171,6 +1367,10 @@ int icmpv6_err_convert(u8 type, u8 code, int *err)
 EXPORT_SYMBOL(icmpv6_err_convert);
 
 #ifdef CONFIG_SYSCTL
+
+static u32 icmpv6_errors_extension_mask_all =
+	GENMASK_U8(ICMP_ERR_EXT_COUNT - 1, 0);
+
 static struct ctl_table ipv6_icmp_table_template[] = {
 	{
 		.procname	= "ratelimit",
@@ -1216,6 +1416,15 @@ static struct ctl_table ipv6_icmp_table_template[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
+	{
+		.procname	= "errors_extension_mask",
+		.data		= &init_net.ipv6.sysctl.icmpv6_errors_extension_mask,
+		.maxlen		= sizeof(u8),
+		.mode		= 0644,
+		.proc_handler	= proc_dou8vec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= &icmpv6_errors_extension_mask_all,
+	},
 };
 
 struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net)
@@ -1233,6 +1442,7 @@ struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net)
 		table[3].data = &net->ipv6.sysctl.icmpv6_echo_ignore_anycast;
 		table[4].data = &net->ipv6.sysctl.icmpv6_ratemask_ptr;
 		table[5].data = &net->ipv6.sysctl.icmpv6_error_anycast_as_unicast;
+		table[6].data = &net->ipv6.sysctl.icmpv6_errors_extension_mask;
 	}
 	return table;
 }
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v2 3/3] selftests: traceroute: Add ICMP extensions tests
  2025-10-27  8:22 [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Ido Schimmel
  2025-10-27  8:22 ` [PATCH net-next v2 1/3] ipv4: " Ido Schimmel
  2025-10-27  8:22 ` [PATCH net-next v2 2/3] ipv6: " Ido Schimmel
@ 2025-10-27  8:22 ` Ido Schimmel
  2025-10-29  1:04 ` [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Jakub Kicinski
  2025-10-30  1:50 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2025-10-27  8:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, pabeni, edumazet, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom, Ido Schimmel

Test that ICMP extensions are reported correctly when enabled and not
reported when disabled. Test both IPv4 and IPv6 and using different
packet sizes, to make sure trimming / padding works correctly.

Disable ICMP rate limiting (defaults to 1 per-second per-target) so that
the kernel will always generate ICMP errors when needed.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
---

Notes:
    v2:
    * Use "echo" instead of "sysctl" when we are testing the return value.
    * Skip if traceroute version is older than 2.1.5.

 tools/testing/selftests/net/traceroute.sh | 313 ++++++++++++++++++++++
 1 file changed, 313 insertions(+)

diff --git a/tools/testing/selftests/net/traceroute.sh b/tools/testing/selftests/net/traceroute.sh
index dbb34c7e09ce..a7c6ab8a0347 100755
--- a/tools/testing/selftests/net/traceroute.sh
+++ b/tools/testing/selftests/net/traceroute.sh
@@ -36,6 +36,35 @@ run_cmd()
 	return $rc
 }
 
+__check_traceroute_version()
+{
+	local cmd=$1; shift
+	local req_ver=$1; shift
+	local ver
+
+	req_ver=$(echo "$req_ver" | sed 's/\.//g')
+	ver=$($cmd -V 2>&1 | grep -Eo '[0-9]+.[0-9]+.[0-9]+' | sed 's/\.//g')
+	if [[ $ver -lt $req_ver ]]; then
+		return 1
+	else
+		return 0
+	fi
+}
+
+check_traceroute6_version()
+{
+	local req_ver=$1; shift
+
+	__check_traceroute_version traceroute6 "$req_ver"
+}
+
+check_traceroute_version()
+{
+	local req_ver=$1; shift
+
+	__check_traceroute_version traceroute "$req_ver"
+}
+
 ################################################################################
 # create namespaces and interconnects
 
@@ -59,6 +88,8 @@ create_ns()
 	ip netns exec ${ns} ip -6 ro add unreachable default metric 8192
 
 	ip netns exec ${ns} sysctl -qw net.ipv4.ip_forward=1
+	ip netns exec ${ns} sysctl -qw net.ipv4.icmp_ratelimit=0
+	ip netns exec ${ns} sysctl -qw net.ipv6.icmp.ratelimit=0
 	ip netns exec ${ns} sysctl -qw net.ipv6.conf.all.keep_addr_on_down=1
 	ip netns exec ${ns} sysctl -qw net.ipv6.conf.all.forwarding=1
 	ip netns exec ${ns} sysctl -qw net.ipv6.conf.default.forwarding=1
@@ -297,6 +328,144 @@ run_traceroute6_vrf()
 	cleanup_traceroute6_vrf
 }
 
+################################################################################
+# traceroute6 with ICMP extensions test
+#
+# Verify that in this scenario
+#
+# ----                          ----                          ----
+# |H1|--------------------------|R1|--------------------------|H2|
+# ----            N1            ----            N2            ----
+#
+# ICMP extensions are correctly reported. The loopback interfaces on all the
+# nodes are assigned global addresses and the interfaces connecting the nodes
+# are assigned IPv6 link-local addresses.
+
+cleanup_traceroute6_ext()
+{
+	cleanup_all_ns
+}
+
+setup_traceroute6_ext()
+{
+	# Start clean
+	cleanup_traceroute6_ext
+
+	setup_ns h1 r1 h2
+	create_ns "$h1"
+	create_ns "$r1"
+	create_ns "$h2"
+
+	# Setup N1
+	connect_ns "$h1" eth1 - fe80::1/64 "$r1" eth1 - fe80::2/64
+	# Setup N2
+	connect_ns "$r1" eth2 - fe80::3/64 "$h2" eth2 - fe80::4/64
+
+	# Setup H1
+	ip -n "$h1" address add 2001:db8:1::1/128 dev lo
+	ip -n "$h1" route add ::/0 nexthop via fe80::2 dev eth1
+
+	# Setup R1
+	ip -n "$r1" address add 2001:db8:1::2/128 dev lo
+	ip -n "$r1" route add 2001:db8:1::1/128 nexthop via fe80::1 dev eth1
+	ip -n "$r1" route add 2001:db8:1::3/128 nexthop via fe80::4 dev eth2
+
+	# Setup H2
+	ip -n "$h2" address add 2001:db8:1::3/128 dev lo
+	ip -n "$h2" route add ::/0 nexthop via fe80::3 dev eth2
+
+	# Prime the network
+	ip netns exec "$h1" ping6 -c5 2001:db8:1::3 >/dev/null 2>&1
+}
+
+traceroute6_ext_iio_iif_test()
+{
+	local r1_ifindex h2_ifindex
+	local pkt_len=$1; shift
+
+	# Test that incoming interface info is not appended by default.
+	run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep INC"
+	check_fail $? "Incoming interface info appended by default when should not"
+
+	# Test that the extension is appended when enabled.
+	run_cmd "$r1" "bash -c \"echo 0x01 > /proc/sys/net/ipv6/icmp/errors_extension_mask\""
+	check_err $? "Failed to enable incoming interface info extension on R1"
+
+	run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep INC"
+	check_err $? "Incoming interface info not appended after enable"
+
+	# Test that the extension is not appended when disabled.
+	run_cmd "$r1" "bash -c \"echo 0x00 > /proc/sys/net/ipv6/icmp/errors_extension_mask\""
+	check_err $? "Failed to disable incoming interface info extension on R1"
+
+	run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep INC"
+	check_fail $? "Incoming interface info appended after disable"
+
+	# Test that the extension is sent correctly from both R1 and H2.
+	run_cmd "$r1" "sysctl -w net.ipv6.icmp.errors_extension_mask=0x01"
+	r1_ifindex=$(ip -n "$r1" -j link show dev eth1 | jq '.[]["ifindex"]')
+	run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep '<INC:$r1_ifindex,\"eth1\",mtu=1500>'"
+	check_err $? "Wrong incoming interface info reported from R1"
+
+	run_cmd "$h2" "sysctl -w net.ipv6.icmp.errors_extension_mask=0x01"
+	h2_ifindex=$(ip -n "$h2" -j link show dev eth2 | jq '.[]["ifindex"]')
+	run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep '<INC:$h2_ifindex,\"eth2\",mtu=1500>'"
+	check_err $? "Wrong incoming interface info reported from H2"
+
+	# Add a global address on the incoming interface of R1 and check that
+	# it is reported.
+	run_cmd "$r1" "ip address add 2001:db8:100::1/64 dev eth1 nodad"
+	run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep '<INC:$r1_ifindex,2001:db8:100::1,\"eth1\",mtu=1500>'"
+	check_err $? "Wrong incoming interface info reported from R1 after address addition"
+	run_cmd "$r1" "ip address del 2001:db8:100::1/64 dev eth1"
+
+	# Change name and MTU and make sure the result is still correct.
+	run_cmd "$r1" "ip link set dev eth1 name eth1tag mtu 1501"
+	run_cmd "$h1" "traceroute6 -e 2001:db8:1::3 $pkt_len | grep '<INC:$r1_ifindex,\"eth1tag\",mtu=1501>'"
+	check_err $? "Wrong incoming interface info reported from R1 after name and MTU change"
+	run_cmd "$r1" "ip link set dev eth1tag name eth1 mtu 1500"
+
+	run_cmd "$r1" "sysctl -w net.ipv6.icmp.errors_extension_mask=0x00"
+	run_cmd "$h2" "sysctl -w net.ipv6.icmp.errors_extension_mask=0x00"
+}
+
+run_traceroute6_ext()
+{
+	# Need at least version 2.1.5 for RFC 5837 support.
+	if ! check_traceroute6_version 2.1.5; then
+		log_test_skip "traceroute6 too old, missing ICMP extensions support"
+		return
+	fi
+
+	setup_traceroute6_ext
+
+	RET=0
+
+	## General ICMP extensions tests
+
+	# Test that ICMP extensions are disabled by default.
+	run_cmd "$h1" "sysctl net.ipv6.icmp.errors_extension_mask | grep \"= 0$\""
+	check_err $? "ICMP extensions are not disabled by default"
+
+	# Test that unsupported values are rejected. Do not use "sysctl" as
+	# older versions do not return an error code upon failure.
+	run_cmd "$h1" "bash -c \"echo 0x80 > /proc/sys/net/ipv6/icmp/errors_extension_mask\""
+	check_fail $? "Unsupported sysctl value was not rejected"
+
+	## Extension-specific tests
+
+	# Incoming interface info test. Test with various packet sizes,
+	# including the default one.
+	traceroute6_ext_iio_iif_test
+	traceroute6_ext_iio_iif_test 127
+	traceroute6_ext_iio_iif_test 128
+	traceroute6_ext_iio_iif_test 129
+
+	log_test "IPv6 traceroute with ICMP extensions"
+
+	cleanup_traceroute6_ext
+}
+
 ################################################################################
 # traceroute test
 #
@@ -437,6 +606,147 @@ run_traceroute_vrf()
 	cleanup_traceroute_vrf
 }
 
+################################################################################
+# traceroute with ICMP extensions test
+#
+# Verify that in this scenario
+#
+# ----                          ----                          ----
+# |H1|--------------------------|R1|--------------------------|H2|
+# ----            N1            ----            N2            ----
+#
+# ICMP extensions are correctly reported. The loopback interfaces on all the
+# nodes are assigned global addresses and the interfaces connecting the nodes
+# are assigned IPv6 link-local addresses.
+
+cleanup_traceroute_ext()
+{
+	cleanup_all_ns
+}
+
+setup_traceroute_ext()
+{
+	# Start clean
+	cleanup_traceroute_ext
+
+	setup_ns h1 r1 h2
+	create_ns "$h1"
+	create_ns "$r1"
+	create_ns "$h2"
+
+	# Setup N1
+	connect_ns "$h1" eth1 - fe80::1/64 "$r1" eth1 - fe80::2/64
+	# Setup N2
+	connect_ns "$r1" eth2 - fe80::3/64 "$h2" eth2 - fe80::4/64
+
+	# Setup H1
+	ip -n "$h1" address add 192.0.2.1/32 dev lo
+	ip -n "$h1" route add 0.0.0.0/0 nexthop via inet6 fe80::2 dev eth1
+
+	# Setup R1
+	ip -n "$r1" address add 192.0.2.2/32 dev lo
+	ip -n "$r1" route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev eth1
+	ip -n "$r1" route add 192.0.2.3/32 nexthop via inet6 fe80::4 dev eth2
+
+	# Setup H2
+	ip -n "$h2" address add 192.0.2.3/32 dev lo
+	ip -n "$h2" route add 0.0.0.0/0 nexthop via inet6 fe80::3 dev eth2
+
+	# Prime the network
+	ip netns exec "$h1" ping -c5 192.0.2.3 >/dev/null 2>&1
+}
+
+traceroute_ext_iio_iif_test()
+{
+	local r1_ifindex h2_ifindex
+	local pkt_len=$1; shift
+
+	# Test that incoming interface info is not appended by default.
+	run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep INC"
+	check_fail $? "Incoming interface info appended by default when should not"
+
+	# Test that the extension is appended when enabled.
+	run_cmd "$r1" "bash -c \"echo 0x01 > /proc/sys/net/ipv4/icmp_errors_extension_mask\""
+	check_err $? "Failed to enable incoming interface info extension on R1"
+
+	run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep INC"
+	check_err $? "Incoming interface info not appended after enable"
+
+	# Test that the extension is not appended when disabled.
+	run_cmd "$r1" "bash -c \"echo 0x00 > /proc/sys/net/ipv4/icmp_errors_extension_mask\""
+	check_err $? "Failed to disable incoming interface info extension on R1"
+
+	run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep INC"
+	check_fail $? "Incoming interface info appended after disable"
+
+	# Test that the extension is sent correctly from both R1 and H2.
+	run_cmd "$r1" "sysctl -w net.ipv4.icmp_errors_extension_mask=0x01"
+	r1_ifindex=$(ip -n "$r1" -j link show dev eth1 | jq '.[]["ifindex"]')
+	run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep '<INC:$r1_ifindex,\"eth1\",mtu=1500>'"
+	check_err $? "Wrong incoming interface info reported from R1"
+
+	run_cmd "$h2" "sysctl -w net.ipv4.icmp_errors_extension_mask=0x01"
+	h2_ifindex=$(ip -n "$h2" -j link show dev eth2 | jq '.[]["ifindex"]')
+	run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep '<INC:$h2_ifindex,\"eth2\",mtu=1500>'"
+	check_err $? "Wrong incoming interface info reported from H2"
+
+	# Add a global address on the incoming interface of R1 and check that
+	# it is reported.
+	run_cmd "$r1" "ip address add 198.51.100.1/24 dev eth1"
+	run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep '<INC:$r1_ifindex,198.51.100.1,\"eth1\",mtu=1500>'"
+	check_err $? "Wrong incoming interface info reported from R1 after address addition"
+	run_cmd "$r1" "ip address del 198.51.100.1/24 dev eth1"
+
+	# Change name and MTU and make sure the result is still correct.
+	# Re-add the route towards H1 since it was deleted when we removed the
+	# last IPv4 address from eth1 on R1.
+	run_cmd "$r1" "ip route add 192.0.2.1/32 nexthop via inet6 fe80::1 dev eth1"
+	run_cmd "$r1" "ip link set dev eth1 name eth1tag mtu 1501"
+	run_cmd "$h1" "traceroute -e 192.0.2.3 $pkt_len | grep '<INC:$r1_ifindex,\"eth1tag\",mtu=1501>'"
+	check_err $? "Wrong incoming interface info reported from R1 after name and MTU change"
+	run_cmd "$r1" "ip link set dev eth1tag name eth1 mtu 1500"
+
+	run_cmd "$r1" "sysctl -w net.ipv4.icmp_errors_extension_mask=0x00"
+	run_cmd "$h2" "sysctl -w net.ipv4.icmp_errors_extension_mask=0x00"
+}
+
+run_traceroute_ext()
+{
+	# Need at least version 2.1.5 for RFC 5837 support.
+	if ! check_traceroute_version 2.1.5; then
+		log_test_skip "traceroute too old, missing ICMP extensions support"
+		return
+	fi
+
+	setup_traceroute_ext
+
+	RET=0
+
+	## General ICMP extensions tests
+
+	# Test that ICMP extensions are disabled by default.
+	run_cmd "$h1" "sysctl net.ipv4.icmp_errors_extension_mask | grep \"= 0$\""
+	check_err $? "ICMP extensions are not disabled by default"
+
+	# Test that unsupported values are rejected. Do not use "sysctl" as
+	# older versions do not return an error code upon failure.
+	run_cmd "$h1" "bash -c \"echo 0x80 > /proc/sys/net/ipv4/icmp_errors_extension_mask\""
+	check_fail $? "Unsupported sysctl value was not rejected"
+
+	## Extension-specific tests
+
+	# Incoming interface info test. Test with various packet sizes,
+	# including the default one.
+	traceroute_ext_iio_iif_test
+	traceroute_ext_iio_iif_test 127
+	traceroute_ext_iio_iif_test 128
+	traceroute_ext_iio_iif_test 129
+
+	log_test "IPv4 traceroute with ICMP extensions"
+
+	cleanup_traceroute_ext
+}
+
 ################################################################################
 # Run tests
 
@@ -444,8 +754,10 @@ run_tests()
 {
 	run_traceroute6
 	run_traceroute6_vrf
+	run_traceroute6_ext
 	run_traceroute
 	run_traceroute_vrf
+	run_traceroute_ext
 }
 
 ################################################################################
@@ -462,6 +774,7 @@ done
 
 require_command traceroute6
 require_command traceroute
+require_command jq
 
 run_tests
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 1/3] ipv4: icmp: Add RFC 5837 support
  2025-10-27  8:22 ` [PATCH net-next v2 1/3] ipv4: " Ido Schimmel
@ 2025-10-27  8:35   ` Eric Dumazet
  2025-11-07 17:33   ` David 'equinox' Lamparter
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2025-10-27  8:35 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, pabeni, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom

On Mon, Oct 27, 2025 at 1:24 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> Add the ability to append the incoming IP interface information to
> ICMPv4 error messages in accordance with RFC 5837 and RFC 4884. This is
> required for more meaningful traceroute results in unnumbered networks.
>
> The feature is disabled by default and controlled via a new sysctl
> ("net.ipv4.icmp_errors_extension_mask") which accepts a bitmask of ICMP
> extensions to append to ICMP error messages. Currently, only a single
> value is supported, but the interface and the implementation should be
> able to support more extensions, if needed.
>
> Clone the skb and copy the relevant data portions before modifying the
> skb as the caller of __icmp_send() still owns the skb after the function
> returns. This should be fine since by default ICMP error messages are
> rate limited to 1000 per second and no more than 1 per second per
> specific host.
>
> Trim or pad the packet to 128 bytes before appending the ICMP extension
> structure in order to be compatible with legacy applications that assume
> that the ICMP extension structure always starts at this offset (the
> minimum length specified by RFC 4884).
>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: David Ahern <dsahern@kernel.org>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
>
> Notes:
>     v2:
>     * Add a comment about field ordering.
>
>  Documentation/networking/ip-sysctl.rst |  17 +++
>  include/linux/icmp.h                   |  32 +++++
>  include/net/netns/ipv4.h               |   1 +
>  net/ipv4/icmp.c                        | 191 ++++++++++++++++++++++++-
>  net/ipv4/sysctl_net_ipv4.c             |  11 ++
>  5 files changed, 251 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index a06cb99d66dc..ece1187ba0f1 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1796,6 +1796,23 @@ icmp_errors_use_inbound_ifaddr - BOOLEAN
>
>         Default: 0 (disabled)
>
> +icmp_errors_extension_mask - UNSIGNED INTEGER
> +       Bitmask of ICMP extensions to append to ICMPv4 error messages
> +       ("Destination Unreachable", "Time Exceeded" and "Parameter Problem").
> +       The original datagram is trimmed / padded to 128 bytes in order to be
> +       compatible with applications that do not comply with RFC 4884.
> +
> +       Possible extensions are:
> +
> +       ==== ==============================================================
> +       0x01 Incoming IP interface information according to RFC 5837.
> +            Extension will include the index, IPv4 address (if present),
> +            name and MTU of the IP interface that received the datagram
> +            which elicited the ICMP error.
> +       ==== ==============================================================
> +
> +       Default: 0x00 (no extensions)
> +
>  igmp_max_memberships - INTEGER
>         Change the maximum number of multicast groups we can subscribe to.
>         Default: 20
> diff --git a/include/linux/icmp.h b/include/linux/icmp.h
> index 0af4d210ee31..043ec5d9c882 100644
> --- a/include/linux/icmp.h
> +++ b/include/linux/icmp.h
> @@ -40,4 +40,36 @@ void ip_icmp_error_rfc4884(const struct sk_buff *skb,
>                            struct sock_ee_data_rfc4884 *out,
>                            int thlen, int off);
>
> +/* RFC 4884 */
> +#define ICMP_EXT_ORIG_DGRAM_MIN_LEN    128
> +#define ICMP_EXT_VERSION_2             2
> +
> +/* ICMP Extension Object Classes */
> +#define ICMP_EXT_OBJ_CLASS_IIO         2       /* RFC 5837 */
> +
> +/* Interface Information Object - RFC 5837 */
> +enum {
> +       ICMP_EXT_CTYPE_IIO_ROLE_IIF,
> +};
> +
> +#define ICMP_EXT_CTYPE_IIO_ROLE(ROLE)  ((ROLE) << 6)
> +#define ICMP_EXT_CTYPE_IIO_MTU         BIT(0)
> +#define ICMP_EXT_CTYPE_IIO_NAME                BIT(1)
> +#define ICMP_EXT_CTYPE_IIO_IPADDR      BIT(2)
> +#define ICMP_EXT_CTYPE_IIO_IFINDEX     BIT(3)
> +
> +struct icmp_ext_iio_name_subobj {
> +       u8 len;
> +       char name[IFNAMSIZ];
> +};
> +
> +enum {
> +       /* RFC 5837 - Incoming IP Interface Role */
> +       ICMP_ERR_EXT_IIO_IIF,
> +       /* Add new constants above. Used by "icmp_errors_extension_mask"
> +        * sysctl.
> +        */
> +       ICMP_ERR_EXT_COUNT,
> +};
> +
>  #endif /* _LINUX_ICMP_H */
> diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
> index 34eb3aecb3f2..0e96c90e56c6 100644
> --- a/include/net/netns/ipv4.h
> +++ b/include/net/netns/ipv4.h
> @@ -135,6 +135,7 @@ struct netns_ipv4 {
>         u8 sysctl_icmp_echo_ignore_broadcasts;
>         u8 sysctl_icmp_ignore_bogus_error_responses;
>         u8 sysctl_icmp_errors_use_inbound_ifaddr;
> +       u8 sysctl_icmp_errors_extension_mask;
>         int sysctl_icmp_ratelimit;
>         int sysctl_icmp_ratemask;
>         int sysctl_icmp_msgs_per_sec;
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index 1b7fb5d935ed..4abbec2f47ef 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -582,6 +582,185 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
>         return ERR_PTR(err);
>  }
>
> +struct icmp_ext_iio_addr4_subobj {
> +       __be16 afi;
> +       __be16 reserved;
> +       __be32 addr4;
> +};
> +
> +static unsigned int icmp_ext_iio_len(void)
> +{
> +       return sizeof(struct icmp_extobj_hdr) +
> +               /* ifIndex */
> +               sizeof(__be32) +
> +               /* Interface Address Sub-Object */
> +               sizeof(struct icmp_ext_iio_addr4_subobj) +
> +               /* Interface Name Sub-Object. Length must be a multiple of 4
> +                * bytes.
> +                */
> +               ALIGN(sizeof(struct icmp_ext_iio_name_subobj), 4) +
> +               /* MTU */
> +               sizeof(__be32);
> +}
> +
> +static unsigned int icmp_ext_max_len(u8 ext_objs)
> +{
> +       unsigned int ext_max_len;
> +
> +       ext_max_len = sizeof(struct icmp_ext_hdr);
> +
> +       if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF))
> +               ext_max_len += icmp_ext_iio_len();
> +
> +       return ext_max_len;
> +}
> +
> +static __be32 icmp_ext_iio_addr4_find(const struct net_device *dev)
> +{
> +       struct in_device *in_dev;
> +       struct in_ifaddr *ifa;
> +
> +       in_dev = __in_dev_get_rcu(dev);
> +       if (!in_dev)
> +               return 0;
> +
> +       /* It is unclear from RFC 5837 which IP address should be chosen, but
> +        * it makes sense to choose a global unicast address.
> +        */
> +       in_dev_for_each_ifa_rcu(ifa, in_dev) {
> +               if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY)
> +                       continue;
> +               if (ifa->ifa_scope != RT_SCOPE_UNIVERSE ||
> +                   ipv4_is_multicast(ifa->ifa_address))
> +                       continue;
> +               return ifa->ifa_address;
> +       }
> +
> +       return 0;
> +}
> +
> +static void icmp_ext_iio_iif_append(struct net *net, struct sk_buff *skb,
> +                                   int iif)
> +{
> +       struct icmp_ext_iio_name_subobj *name_subobj;
> +       struct icmp_extobj_hdr *objh;
> +       struct net_device *dev;
> +       __be32 data;
> +
> +       if (!iif)
> +               return;
> +
> +       /* Add the fields in the order specified by RFC 5837. */
> +       objh = skb_put(skb, sizeof(*objh));
> +       objh->class_num = ICMP_EXT_OBJ_CLASS_IIO;
> +       objh->class_type = ICMP_EXT_CTYPE_IIO_ROLE(ICMP_EXT_CTYPE_IIO_ROLE_IIF);
> +
> +       data = htonl(iif);
> +       skb_put_data(skb, &data, sizeof(__be32));
> +       objh->class_type |= ICMP_EXT_CTYPE_IIO_IFINDEX;
> +
> +       rcu_read_lock();
> +
> +       dev = dev_get_by_index_rcu(net, iif);
> +       if (!dev)
> +               goto out;
> +
> +       data = icmp_ext_iio_addr4_find(dev);
> +       if (data) {
> +               struct icmp_ext_iio_addr4_subobj *addr4_subobj;
> +
> +               addr4_subobj = skb_put_zero(skb, sizeof(*addr4_subobj));
> +               addr4_subobj->afi = htons(ICMP_AFI_IP);
> +               addr4_subobj->addr4 = data;
> +               objh->class_type |= ICMP_EXT_CTYPE_IIO_IPADDR;
> +       }
> +
> +       name_subobj = skb_put_zero(skb, ALIGN(sizeof(*name_subobj), 4));
> +       name_subobj->len = ALIGN(sizeof(*name_subobj), 4);
> +       netdev_copy_name(dev, name_subobj->name);
> +       objh->class_type |= ICMP_EXT_CTYPE_IIO_NAME;
> +
> +       data = htonl(READ_ONCE(dev->mtu));
> +       skb_put_data(skb, &data, sizeof(__be32));
> +       objh->class_type |= ICMP_EXT_CTYPE_IIO_MTU;
> +
> +out:
> +       rcu_read_unlock();
> +       objh->length = htons(skb_tail_pointer(skb) - (unsigned char *)objh);
> +}
> +
> +static void icmp_ext_objs_append(struct net *net, struct sk_buff *skb,
> +                                u8 ext_objs, int iif)
> +{
> +       if (ext_objs & BIT(ICMP_ERR_EXT_IIO_IIF))
> +               icmp_ext_iio_iif_append(net, skb, iif);
> +}
> +
> +static struct sk_buff *
> +icmp_ext_append(struct net *net, struct sk_buff *skb_in, struct icmphdr *icmph,
> +               unsigned int room, int iif)
> +{
> +       unsigned int payload_len, ext_max_len, ext_len;
> +       struct icmp_ext_hdr *ext_hdr;
> +       struct sk_buff *skb;
> +       u8 ext_objs;
> +       int nhoff;
> +
> +       switch (icmph->type) {
> +       case ICMP_DEST_UNREACH:
> +       case ICMP_TIME_EXCEEDED:
> +       case ICMP_PARAMETERPROB:
> +               break;
> +       default:
> +               return NULL;
> +       }
> +
> +       ext_objs = READ_ONCE(net->ipv4.sysctl_icmp_errors_extension_mask);
> +       if (!ext_objs)
> +               return NULL;
> +
> +       ext_max_len = icmp_ext_max_len(ext_objs);
> +       if (ICMP_EXT_ORIG_DGRAM_MIN_LEN + ext_max_len > room)
> +               return NULL;
> +
> +       skb = skb_clone(skb_in, GFP_ATOMIC);
> +       if (!skb)
> +               return NULL;
> +
> +       nhoff = skb_network_offset(skb);
> +       payload_len = min(skb->len - nhoff, ICMP_EXT_ORIG_DGRAM_MIN_LEN);
> +
> +       if (!pskb_network_may_pull(skb, payload_len))
> +               goto free_skb;
> +
> +       if (pskb_trim(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN) ||
> +           __skb_put_padto(skb, nhoff + ICMP_EXT_ORIG_DGRAM_MIN_LEN, false))
> +               goto free_skb;
> +
> +       if (pskb_expand_head(skb, 0, ext_max_len, GFP_ATOMIC))
> +               goto free_skb;
> +
> +       ext_hdr = skb_put_zero(skb, sizeof(*ext_hdr));
> +       ext_hdr->version = ICMP_EXT_VERSION_2;
> +
> +       icmp_ext_objs_append(net, skb, ext_objs, iif);
> +
> +       /* Do not send an empty extension structure. */
> +       ext_len = skb_tail_pointer(skb) - (unsigned char *)ext_hdr;
> +       if (ext_len == sizeof(*ext_hdr))
> +               goto free_skb;
> +
> +       ext_hdr->checksum = ip_compute_csum(ext_hdr, ext_len);
> +       /* The length of the original datagram in 32-bit words (RFC 4884). */
> +       icmph->un.reserved[1] = ICMP_EXT_ORIG_DGRAM_MIN_LEN / sizeof(u32);
> +
> +       return skb;
> +
> +free_skb:
> +       consume_skb(skb);
> +       return NULL;
> +}
> +
>  /*
>   *     Send an ICMP message in response to a situation
>   *
> @@ -601,6 +780,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
>         struct icmp_bxm icmp_param;
>         struct rtable *rt = skb_rtable(skb_in);
>         bool apply_ratelimit = false;
> +       struct sk_buff *ext_skb;
>         struct ipcm_cookie ipc;
>         struct flowi4 fl4;
>         __be32 saddr;
> @@ -770,7 +950,12 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
>         if (room <= (int)sizeof(struct iphdr))
>                 goto ende;
>
> -       icmp_param.data_len = skb_in->len - icmp_param.offset;
> +       ext_skb = icmp_ext_append(net, skb_in, &icmp_param.data.icmph, room,
> +                                 parm->iif);
> +       if (ext_skb)
> +               icmp_param.skb = ext_skb;
> +
> +       icmp_param.data_len = icmp_param.skb->len - icmp_param.offset;
>         if (icmp_param.data_len > room)
>                 icmp_param.data_len = room;
>         icmp_param.head_len = sizeof(struct icmphdr);
> @@ -785,6 +970,9 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
>         trace_icmp_send(skb_in, type, code);
>
>         icmp_push_reply(sk, &icmp_param, &fl4, &ipc, &rt);
> +
> +       if (ext_skb)
> +               consume_skb(ext_skb);

nit : if (ext_skb) is not needed, consume_skb(NULL) is ok.

No need to resend.

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 2/3] ipv6: icmp: Add RFC 5837 support
  2025-10-27  8:22 ` [PATCH net-next v2 2/3] ipv6: " Ido Schimmel
@ 2025-10-27  8:39   ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2025-10-27  8:39 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, pabeni, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom

On Mon, Oct 27, 2025 at 1:24 AM Ido Schimmel <idosch@nvidia.com> wrote:
>
> Add the ability to append the incoming IP interface information to
> ICMPv6 error messages in accordance with RFC 5837 and RFC 4884. This is
> required for more meaningful traceroute results in unnumbered networks.
>
> The feature is disabled by default and controlled via a new sysctl
> ("net.ipv6.icmp.errors_extension_mask") which accepts a bitmask of ICMP
> extensions to append to ICMP error messages. Currently, only a single
> value is supported, but the interface and the implementation should be
> able to support more extensions, if needed.
>
> Clone the skb and copy the relevant data portions before modifying the
> skb as the caller of icmp6_send() still owns the skb after the function
> returns. This should be fine since by default ICMP error messages are
> rate limited to 1000 per second and no more than 1 per second per
> specific host.
>
> Trim or pad the packet to 128 bytes before appending the ICMP extension
> structure in order to be compatible with legacy applications that assume
> that the ICMP extension structure always starts at this offset (the
> minimum length specified by RFC 4884).
>
> Since commit 20e1954fe238 ("ipv6: RFC 4884 partial support for SIT/GRE
> tunnels") it is possible for icmp6_send() to be called with an skb that
> already contains ICMP extensions. This can happen when we receive an
> ICMPv4 message with extensions from a tunnel and translate it to an
> ICMPv6 message towards an IPv6 host in the overlay network. I could not
> find an RFC that supports this behavior, but it makes sense to not
> overwrite the original extensions that were appended to the packet.
> Therefore, avoid appending extensions if the length field in the
> provided ICMPv6 header is already filled.
>
> Export netdev_copy_name() using EXPORT_IPV6_MOD_GPL() to make it
> available to IPv6 when it is built as a module.
>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Reviewed-by: David Ahern <dsahern@kernel.org>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---

Same small remark about consume_skb().

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/3] icmp: Add RFC 5837 support
  2025-10-27  8:22 [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Ido Schimmel
                   ` (2 preceding siblings ...)
  2025-10-27  8:22 ` [PATCH net-next v2 3/3] selftests: traceroute: Add ICMP extensions tests Ido Schimmel
@ 2025-10-29  1:04 ` Jakub Kicinski
  2025-10-29  9:54   ` Ido Schimmel
  2025-10-30  1:38   ` Justin Iurman
  2025-10-30  1:50 ` patchwork-bot+netdevbpf
  4 siblings, 2 replies; 15+ messages in thread
From: Jakub Kicinski @ 2025-10-29  1:04 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, pabeni, edumazet, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom, Justin Iurman

On Mon, 27 Oct 2025 10:22:29 +0200 Ido Schimmel wrote:
> This patchset extends certain ICMP error messages (e.g., "Time
> Exceeded") with incoming interface information in accordance with RFC
> 5837 [1]. This is required for more meaningful traceroute results in
> unnumbered networks. Like other ICMP settings, the feature is controlled
> via a per-{netns, address family} sysctl. The interface and the
> implementation are designed to support more ICMP extensions.

Is there supposed to be any relation between the ICMP message attrs 
and what's provided via IOAM? For interface ID in IOAM we have
the ioam6_id attr instead of ifindex.

Would it make sense to add some info about relation to IOAM to the
commit msg (or even docs?). Or is it obvious to folks more familiar
with IP RFCs than I am?

cc: Justin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/3] icmp: Add RFC 5837 support
  2025-10-29  1:04 ` [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Jakub Kicinski
@ 2025-10-29  9:54   ` Ido Schimmel
  2025-10-30  1:31     ` Jakub Kicinski
  2025-10-30  1:38   ` Justin Iurman
  1 sibling, 1 reply; 15+ messages in thread
From: Ido Schimmel @ 2025-10-29  9:54 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, davem, pabeni, edumazet, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom, Justin Iurman

On Tue, Oct 28, 2025 at 06:04:32PM -0700, Jakub Kicinski wrote:
> On Mon, 27 Oct 2025 10:22:29 +0200 Ido Schimmel wrote:
> > This patchset extends certain ICMP error messages (e.g., "Time
> > Exceeded") with incoming interface information in accordance with RFC
> > 5837 [1]. This is required for more meaningful traceroute results in
> > unnumbered networks. Like other ICMP settings, the feature is controlled
> > via a per-{netns, address family} sysctl. The interface and the
> > implementation are designed to support more ICMP extensions.
> 
> Is there supposed to be any relation between the ICMP message attrs 
> and what's provided via IOAM? For interface ID in IOAM we have
> the ioam6_id attr instead of ifindex.

RFC 5837 precedes IOAM and I don't see any references from IOAM to RFC
5837. RFC 5837 is pretty clear about the interface index that should be
provided:

"The ifIndex of the interface of interest MAY be included. This is the
32-bit ifIndex assigned to the interface by the device as specified by
the Interfaces Group MIB [RFC2863]".

> Would it make sense to add some info about relation to IOAM to the
> commit msg (or even docs?). Or is it obvious to folks more familiar
> with IP RFCs than I am?

I'm pretty sure that when people are told that the message contains "the
32-bit ifIndex" they don't think about "the wide IOAM id of this
interface".

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/3] icmp: Add RFC 5837 support
  2025-10-29  9:54   ` Ido Schimmel
@ 2025-10-30  1:31     ` Jakub Kicinski
  2025-11-07 17:20       ` David 'equinox' Lamparter
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2025-10-30  1:31 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, pabeni, edumazet, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom, Justin Iurman

On Wed, 29 Oct 2025 11:54:43 +0200 Ido Schimmel wrote:
> > Is there supposed to be any relation between the ICMP message attrs 
> > and what's provided via IOAM? For interface ID in IOAM we have
> > the ioam6_id attr instead of ifindex.  
> 
> RFC 5837 precedes IOAM and I don't see any references from IOAM to RFC
> 5837. RFC 5837 is pretty clear about the interface index that should be
> provided:
> 
> "The ifIndex of the interface of interest MAY be included. This is the
> 32-bit ifIndex assigned to the interface by the device as specified by
> the Interfaces Group MIB [RFC2863]".

Makes sense, thanks. And we have another 4 weeks to change our mind, 
in case someone from IETF pipes up..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/3] icmp: Add RFC 5837 support
  2025-10-29  1:04 ` [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Jakub Kicinski
  2025-10-29  9:54   ` Ido Schimmel
@ 2025-10-30  1:38   ` Justin Iurman
  1 sibling, 0 replies; 15+ messages in thread
From: Justin Iurman @ 2025-10-30  1:38 UTC (permalink / raw)
  To: Jakub Kicinski, Ido Schimmel
  Cc: netdev, davem, pabeni, edumazet, horms, dsahern, petrm, willemb,
	daniel, fw, ishaangandhi, rbonica, tom

On 10/29/25 02:04, Jakub Kicinski wrote:
> On Mon, 27 Oct 2025 10:22:29 +0200 Ido Schimmel wrote:
>> This patchset extends certain ICMP error messages (e.g., "Time
>> Exceeded") with incoming interface information in accordance with RFC
>> 5837 [1]. This is required for more meaningful traceroute results in
>> unnumbered networks. Like other ICMP settings, the feature is controlled
>> via a per-{netns, address family} sysctl. The interface and the
>> implementation are designed to support more ICMP extensions.
> 
> Is there supposed to be any relation between the ICMP message attrs
> and what's provided via IOAM? For interface ID in IOAM we have
> the ioam6_id attr instead of ifindex.
> 
> Would it make sense to add some info about relation to IOAM to the
> commit msg (or even docs?). Or is it obvious to folks more familiar
> with IP RFCs than I am?
> 
> cc: Justin

I concur with what Ido said in his reply. There is no direct relation 
between them, unfortunately. The interface ID in IOAM context could be 
totally different, although one could see the benefit of having the same 
value.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/3] icmp: Add RFC 5837 support
  2025-10-27  8:22 [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Ido Schimmel
                   ` (3 preceding siblings ...)
  2025-10-29  1:04 ` [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Jakub Kicinski
@ 2025-10-30  1:50 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-30  1:50 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, pabeni, edumazet, horms, dsahern, petrm,
	willemb, daniel, fw, ishaangandhi, rbonica, tom

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 27 Oct 2025 10:22:29 +0200 you wrote:
> tl;dr
> =====
> 
> This patchset extends certain ICMP error messages (e.g., "Time
> Exceeded") with incoming interface information in accordance with RFC
> 5837 [1]. This is required for more meaningful traceroute results in
> unnumbered networks. Like other ICMP settings, the feature is controlled
> via a per-{netns, address family} sysctl. The interface and the
> implementation are designed to support more ICMP extensions.
> 
> [...]

Here is the summary with links:
  - [net-next,v2,1/3] ipv4: icmp: Add RFC 5837 support
    https://git.kernel.org/netdev/net-next/c/f0e7036fc9cb
  - [net-next,v2,2/3] ipv6: icmp: Add RFC 5837 support
    https://git.kernel.org/netdev/net-next/c/d12d04d221f8
  - [net-next,v2,3/3] selftests: traceroute: Add ICMP extensions tests
    https://git.kernel.org/netdev/net-next/c/02da59575183

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/3] icmp: Add RFC 5837 support
  2025-10-30  1:31     ` Jakub Kicinski
@ 2025-11-07 17:20       ` David 'equinox' Lamparter
  2025-11-08 15:39         ` Ido Schimmel
  0 siblings, 1 reply; 15+ messages in thread
From: David 'equinox' Lamparter @ 2025-11-07 17:20 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jakub Kicinski, netdev, davem, pabeni, edumazet, horms, dsahern,
	petrm, willemb, daniel, fw, ishaangandhi, rbonica, tom,
	Justin Iurman

On Wed, Oct 29, 2025 at 06:31:43PM -0700, Jakub Kicinski wrote:
> On Wed, 29 Oct 2025 11:54:43 +0200 Ido Schimmel wrote:
> > > Is there supposed to be any relation between the ICMP message attrs 
> > > and what's provided via IOAM? For interface ID in IOAM we have
> > > the ioam6_id attr instead of ifindex.  
> > 
> > RFC 5837 precedes IOAM and I don't see any references from IOAM to RFC
> > 5837. RFC 5837 is pretty clear about the interface index that should be
> > provided:
> > 
> > "The ifIndex of the interface of interest MAY be included. This is the
> > 32-bit ifIndex assigned to the interface by the device as specified by
> > the Interfaces Group MIB [RFC2863]".
> 
> Makes sense, thanks. And we have another 4 weeks to change our mind, 
> in case someone from IETF pipes up..

The IETF is in fact doing draft-ietf-intarea-extended-icmp-nodeid, which
is past last call.  The good news is that it's extremely similar,
different class value but same C-Type bitmask, the main distinction is
that 5837 had forbidden the use of "cross-address-family" addresses.

Note that for unnumbered networks, 5837 is wrong - it's
interface/nexthop information.  But the interface has no address, the
node does.  draft-ietf-intarea-extended-icmp-nodeid is about node
information and the correct thing to use for that case.

The good news is that the draft is past last call, IANA values have been
assigned, there's a bunch of text bashing going on but it's well into
the publishing process.

-David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 1/3] ipv4: icmp: Add RFC 5837 support
  2025-10-27  8:22 ` [PATCH net-next v2 1/3] ipv4: " Ido Schimmel
  2025-10-27  8:35   ` Eric Dumazet
@ 2025-11-07 17:33   ` David 'equinox' Lamparter
  2025-11-08 16:01     ` Ido Schimmel
  1 sibling, 1 reply; 15+ messages in thread
From: David 'equinox' Lamparter @ 2025-11-07 17:33 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev, davem, kuba, pabeni, edumazet, horms, dsahern, petrm,
	willemb, daniel, fw, ishaangandhi, rbonica, tom

On Mon, Oct 27, 2025 at 10:22:30AM +0200, Ido Schimmel wrote:
> +/* ICMP Extension Object Classes */
> +#define ICMP_EXT_OBJ_CLASS_IIO		2	/* RFC 5837 */
> +
> +/* Interface Information Object - RFC 5837 */
> +enum {
> +	ICMP_EXT_CTYPE_IIO_ROLE_IIF,
> +};

...

> +static __be32 icmp_ext_iio_addr4_find(const struct net_device *dev)
> +{
> +	struct in_device *in_dev;
> +	struct in_ifaddr *ifa;
> +
> +	in_dev = __in_dev_get_rcu(dev);
> +	if (!in_dev)
> +		return 0;
> +
> +	/* It is unclear from RFC 5837 which IP address should be chosen, but
> +	 * it makes sense to choose a global unicast address.
> +	 */
> +	in_dev_for_each_ifa_rcu(ifa, in_dev) {
> +		if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY)
> +			continue;
> +		if (ifa->ifa_scope != RT_SCOPE_UNIVERSE ||
> +		    ipv4_is_multicast(ifa->ifa_address))
> +			continue;
> +		return ifa->ifa_address;

For 5837, this should be an address identifying the interface.  This
sets up a rather tricky situation if there's a /32 configured on the
interface in the context of unnumbered operation.  Arguably, in that
case class 5 (node info) should be used rather than class 2 (interface
info).  Class 5 also allows sticking an IPv6 address in an ICMPv4 reply.

I would argue the logic here should be an order of preference:

1. any global non-/32 address on the interface, in a class 2 object
2. any global /32 on the interface, in a class 5 object
3. any global IPv6 on the interface, in a class 5 object
4. any global address from any interface in the VRF, preferring
   loopback, in a class 5 object (addrsel logic, really)

[class 5 is draft-ietf-intarea-extended-icmp-nodeid]

+ analog for IPv6

(cf. my other mail in the thread)

Cheers,

-equi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 0/3] icmp: Add RFC 5837 support
  2025-11-07 17:20       ` David 'equinox' Lamparter
@ 2025-11-08 15:39         ` Ido Schimmel
  0 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2025-11-08 15:39 UTC (permalink / raw)
  To: David 'equinox' Lamparter
  Cc: Jakub Kicinski, netdev, davem, pabeni, edumazet, horms, dsahern,
	petrm, willemb, daniel, fw, ishaangandhi, rbonica, tom,
	Justin Iurman

On Fri, Nov 07, 2025 at 06:20:57PM +0100, David 'equinox' Lamparter wrote:
> The IETF is in fact doing draft-ietf-intarea-extended-icmp-nodeid, which
> is past last call.  The good news is that it's extremely similar,
> different class value but same C-Type bitmask, the main distinction is
> that 5837 had forbidden the use of "cross-address-family" addresses.

I mentioned the node ID extension in both the cover letter and in my
reply to Willem:

https://lore.kernel.org/netdev/20251027082232.232571-1-idosch@nvidia.com/
https://lore.kernel.org/netdev/aPnw2PkF3ZMP9EJr@shredder/

> Note that for unnumbered networks, 5837 is wrong - it's
> interface/nexthop information.  But the interface has no address, the
> node does.  draft-ietf-intarea-extended-icmp-nodeid is about node
> information and the correct thing to use for that case.

RFC 5837 and draft-ietf-intarea-extended-icmp-nodeid solve different
problems.

The motivating use case for this work is a deployment where router /
infrastructure interfaces are only assigned IPv6 link-local addresses
and the loopback devices are assigned global IPv4 and IPv6 addresses. As
such, each node will generate ICMPv4 error messages with a source IP
that uniquely identifies the node.

draft-ietf-intarea-extended-icmp-nodeid is needed in cases where nodes
completely lack IPv4 addresses. In these cases all the nodes will
generate ICMPv4 error messages with the same source IP of 192.0.0.8
("IPv4 dummy address").

While this patchset does not add support for the node ID extension (as I
don't currently have a use case for it), the implementation does not
preclude it. After it is implemented, an administrator can then choose
to include both the incoming interface information and the node
identification in ICMP error messages.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v2 1/3] ipv4: icmp: Add RFC 5837 support
  2025-11-07 17:33   ` David 'equinox' Lamparter
@ 2025-11-08 16:01     ` Ido Schimmel
  0 siblings, 0 replies; 15+ messages in thread
From: Ido Schimmel @ 2025-11-08 16:01 UTC (permalink / raw)
  To: David 'equinox' Lamparter
  Cc: netdev, davem, kuba, pabeni, edumazet, horms, dsahern, petrm,
	willemb, daniel, fw, ishaangandhi, rbonica, tom

On Fri, Nov 07, 2025 at 06:33:30PM +0100, David 'equinox' Lamparter wrote:
> On Mon, Oct 27, 2025 at 10:22:30AM +0200, Ido Schimmel wrote:
> > +/* ICMP Extension Object Classes */
> > +#define ICMP_EXT_OBJ_CLASS_IIO		2	/* RFC 5837 */
> > +
> > +/* Interface Information Object - RFC 5837 */
> > +enum {
> > +	ICMP_EXT_CTYPE_IIO_ROLE_IIF,
> > +};
> 
> ...
> 
> > +static __be32 icmp_ext_iio_addr4_find(const struct net_device *dev)
> > +{
> > +	struct in_device *in_dev;
> > +	struct in_ifaddr *ifa;
> > +
> > +	in_dev = __in_dev_get_rcu(dev);
> > +	if (!in_dev)
> > +		return 0;
> > +
> > +	/* It is unclear from RFC 5837 which IP address should be chosen, but
> > +	 * it makes sense to choose a global unicast address.
> > +	 */
> > +	in_dev_for_each_ifa_rcu(ifa, in_dev) {
> > +		if (READ_ONCE(ifa->ifa_flags) & IFA_F_SECONDARY)
> > +			continue;
> > +		if (ifa->ifa_scope != RT_SCOPE_UNIVERSE ||
> > +		    ipv4_is_multicast(ifa->ifa_address))
> > +			continue;
> > +		return ifa->ifa_address;
> 
> For 5837, this should be an address identifying the interface.  This
> sets up a rather tricky situation if there's a /32 configured on the
> interface in the context of unnumbered operation.  Arguably, in that
> case class 5 (node info) should be used rather than class 2 (interface
> info).  Class 5 also allows sticking an IPv6 address in an ICMPv4 reply.

This patchset does not add support for class 5 objects and the above
code is strictly about choosing an address for the "IP Address
Sub-Object" in a class 2 object. Support for class 5 objects can be
added in a different patchset. An administrator can then choose to
include both objects in ICMP error messages.

The kernel does not avoid using a /32 as the source IP of ICMP error
messages, so I don't see a reason to avoid using such an address in the
"IP Address Sub-Object".

> 
> I would argue the logic here should be an order of preference:
> 
> 1. any global non-/32 address on the interface, in a class 2 object
> 2. any global /32 on the interface, in a class 5 object
> 3. any global IPv6 on the interface, in a class 5 object
> 4. any global address from any interface in the VRF, preferring
>    loopback, in a class 5 object (addrsel logic, really)
> 
> [class 5 is draft-ietf-intarea-extended-icmp-nodeid]
> 
> + analog for IPv6
> 
> (cf. my other mail in the thread)
> 
> Cheers,
> 
> 
> -equi

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-11-08 16:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27  8:22 [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Ido Schimmel
2025-10-27  8:22 ` [PATCH net-next v2 1/3] ipv4: " Ido Schimmel
2025-10-27  8:35   ` Eric Dumazet
2025-11-07 17:33   ` David 'equinox' Lamparter
2025-11-08 16:01     ` Ido Schimmel
2025-10-27  8:22 ` [PATCH net-next v2 2/3] ipv6: " Ido Schimmel
2025-10-27  8:39   ` Eric Dumazet
2025-10-27  8:22 ` [PATCH net-next v2 3/3] selftests: traceroute: Add ICMP extensions tests Ido Schimmel
2025-10-29  1:04 ` [PATCH net-next v2 0/3] icmp: Add RFC 5837 support Jakub Kicinski
2025-10-29  9:54   ` Ido Schimmel
2025-10-30  1:31     ` Jakub Kicinski
2025-11-07 17:20       ` David 'equinox' Lamparter
2025-11-08 15:39         ` Ido Schimmel
2025-10-30  1:38   ` Justin Iurman
2025-10-30  1:50 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).