netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v6 0/5] net: gso: restore outer ip ids correctly
@ 2025-09-16 14:48 Richard Gobert
  2025-09-16 14:48 ` [PATCH net-next v6 1/5] net: gro: remove is_ipv6 from napi_gro_cb Richard Gobert
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Richard Gobert @ 2025-09-16 14:48 UTC (permalink / raw)
  To: netdev, pabeni, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers, Richard Gobert

GRO currently ignores outer IPv4 header IDs for encapsulated packets
that have their don't-fragment flag set. GSO, however, always assumes
that outer IP IDs are incrementing. This results in GSO mangling the
outer IDs when they aren't incrementing. For example, GSO mangles the
outer IDs of IPv6 packets that were converted to IPv4, which must
have an ID of 0 according to RFC 6145, sect. 5.1.

GRO+GSO is supposed to be entirely transparent by default. GSO already
correctly restores inner IDs and IDs of non-encapsulated packets. The
tx-tcp-mangleid-segmentation feature can be enabled to allow the
mangling of such IDs so that TSO can be used.

This series fixes outer ID restoration for encapsulated packets when
tx-tcp-mangleid-segmentation is disabled. It also allows GRO to merge
packets with fixed IDs that don't have their don't-fragment flag set.

v5 -> v6:
 - Fix typo
 - Fix formatting
 - Update comment and commit message

v4 -> v5:
 - Updated documentation and comments
 - Remove explicit inline keyword in fou_core.c
 - Fix reverse xmas tree formatting in ef100_tx.c
 - Remove added KSFT_MACHINE_SLOW check in selftest

v3 -> v4:
 - Specify that mangleid for outer ids cannot turn incrementing ids to fixed if DF is unset
 - Update segmentation-offload documentation
 - Fix setting fixed ids in ef100 TSO
 - Reformat gro_receive_network_flush again

v2 -> v3:
 - Make argument const in fou_gro_ops helper
 - Rename SKB_GSO_TCP_FIXEDID_OUTER to SKB_GSO_TCP_FIXEDID
 - Fix formatting in selftest, gro_receive_network_flush and tcp4_gro_complete

v1 -> v2:
 - Add fou_gro_ops helper
 - Clarify why sk_family check works
 - Fix ipip packet generation in selftest

Links:
 - v1: https://lore.kernel.org/netdev/20250814114030.7683-1-richardbgobert@gmail.com/
 - v2: https://lore.kernel.org/netdev/20250819063223.5239-1-richardbgobert@gmail.com/
 - v3: https://lore.kernel.org/netdev/20250821073047.2091-1-richardbgobert@gmail.com/
 - v4: https://lore.kernel.org/netdev/20250901113826.6508-1-richardbgobert@gmail.com/
 - v5: https://lore.kernel.org/netdev/20250915113933.3293-1-richardbgobert@gmail.com/

Richard Gobert (5):
  net: gro: remove is_ipv6 from napi_gro_cb
  net: gro: only merge packets with incrementing or fixed outer ids
  net: gso: restore ids of outer ip headers correctly
  net: gro: remove unnecessary df checks
  selftests/net: test ipip packets in gro.sh

 .../networking/segmentation-offloads.rst      | 22 ++++---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  8 ++-
 drivers/net/ethernet/sfc/ef100_tx.c           | 17 ++++--
 include/linux/netdevice.h                     |  9 ++-
 include/linux/skbuff.h                        |  8 ++-
 include/net/gro.h                             | 32 ++++------
 net/core/dev.c                                |  8 ++-
 net/ipv4/af_inet.c                            | 10 +---
 net/ipv4/fou_core.c                           | 32 +++++-----
 net/ipv4/udp_offload.c                        |  2 -
 net/ipv6/udp_offload.c                        |  2 -
 tools/testing/selftests/net/gro.c             | 58 ++++++++++++++-----
 tools/testing/selftests/net/gro.sh            |  2 +-
 13 files changed, 126 insertions(+), 84 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next v6 1/5] net: gro: remove is_ipv6 from napi_gro_cb
  2025-09-16 14:48 [PATCH net-next v6 0/5] net: gso: restore outer ip ids correctly Richard Gobert
@ 2025-09-16 14:48 ` Richard Gobert
  2025-09-16 14:48 ` [PATCH net-next v6 2/5] net: gro: only merge packets with incrementing or fixed outer ids Richard Gobert
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Richard Gobert @ 2025-09-16 14:48 UTC (permalink / raw)
  To: netdev, pabeni, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers, Richard Gobert, Willem de Bruijn

Remove is_ipv6 from napi_gro_cb and use sk->sk_family instead.
This frees up space for another ip_fixedid bit that will be added
in the next commit.

udp_sock_create always creates either a AF_INET or a AF_INET6 socket,
so using sk->sk_family is reliable. In IPv6-FOU, cfg->ipv6_v6only is
always enabled.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 include/net/gro.h      |  3 ---
 net/ipv4/fou_core.c    | 32 ++++++++++++++------------------
 net/ipv4/udp_offload.c |  2 --
 net/ipv6/udp_offload.c |  2 --
 4 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index a0fca7ac6e7e..87c68007f949 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -71,9 +71,6 @@ struct napi_gro_cb {
 		/* Free the skb? */
 		u8	free:2;
 
-		/* Used in foo-over-udp, set in udp[46]_gro_receive */
-		u8	is_ipv6:1;
-
 		/* Used in GRE, set in fou/gue_gro_receive */
 		u8	is_fou:1;
 
diff --git a/net/ipv4/fou_core.c b/net/ipv4/fou_core.c
index 3e30745e2c09..3970b6b7ace5 100644
--- a/net/ipv4/fou_core.c
+++ b/net/ipv4/fou_core.c
@@ -228,21 +228,27 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
+static const struct net_offload *fou_gro_ops(const struct sock *sk,
+					     int proto)
+{
+	const struct net_offload __rcu **offloads;
+
+	/* FOU doesn't allow IPv4 on IPv6 sockets. */
+	offloads = sk->sk_family == AF_INET6 ? inet6_offloads : inet_offloads;
+	return rcu_dereference(offloads[proto]);
+}
+
 static struct sk_buff *fou_gro_receive(struct sock *sk,
 				       struct list_head *head,
 				       struct sk_buff *skb)
 {
-	const struct net_offload __rcu **offloads;
 	struct fou *fou = fou_from_sock(sk);
 	const struct net_offload *ops;
 	struct sk_buff *pp = NULL;
-	u8 proto;
 
 	if (!fou)
 		goto out;
 
-	proto = fou->protocol;
-
 	/* We can clear the encap_mark for FOU as we are essentially doing
 	 * one of two possible things.  We are either adding an L4 tunnel
 	 * header to the outer L3 tunnel header, or we are simply
@@ -254,8 +260,7 @@ static struct sk_buff *fou_gro_receive(struct sock *sk,
 	/* Flag this frame as already having an outer encap header */
 	NAPI_GRO_CB(skb)->is_fou = 1;
 
-	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
+	ops = fou_gro_ops(sk, fou->protocol);
 	if (!ops || !ops->callbacks.gro_receive)
 		goto out;
 
@@ -268,10 +273,8 @@ static struct sk_buff *fou_gro_receive(struct sock *sk,
 static int fou_gro_complete(struct sock *sk, struct sk_buff *skb,
 			    int nhoff)
 {
-	const struct net_offload __rcu **offloads;
 	struct fou *fou = fou_from_sock(sk);
 	const struct net_offload *ops;
-	u8 proto;
 	int err;
 
 	if (!fou) {
@@ -279,10 +282,7 @@ static int fou_gro_complete(struct sock *sk, struct sk_buff *skb,
 		goto out;
 	}
 
-	proto = fou->protocol;
-
-	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
+	ops = fou_gro_ops(sk, fou->protocol);
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete)) {
 		err = -ENOSYS;
 		goto out;
@@ -323,7 +323,6 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
 				       struct list_head *head,
 				       struct sk_buff *skb)
 {
-	const struct net_offload __rcu **offloads;
 	const struct net_offload *ops;
 	struct sk_buff *pp = NULL;
 	struct sk_buff *p;
@@ -450,8 +449,7 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
 	/* Flag this frame as already having an outer encap header */
 	NAPI_GRO_CB(skb)->is_fou = 1;
 
-	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
+	ops = fou_gro_ops(sk, proto);
 	if (!ops || !ops->callbacks.gro_receive)
 		goto out;
 
@@ -467,7 +465,6 @@ static struct sk_buff *gue_gro_receive(struct sock *sk,
 static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 {
 	struct guehdr *guehdr = (struct guehdr *)(skb->data + nhoff);
-	const struct net_offload __rcu **offloads;
 	const struct net_offload *ops;
 	unsigned int guehlen = 0;
 	u8 proto;
@@ -494,8 +491,7 @@ static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 		return err;
 	}
 
-	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
+	ops = fou_gro_ops(sk, proto);
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
 		goto out;
 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index b1f3fd302e9d..19d0b5b09ffa 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -891,8 +891,6 @@ struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb)
 		skb_gro_checksum_try_convert(skb, IPPROTO_UDP,
 					     inet_gro_compute_pseudo);
 skip:
-	NAPI_GRO_CB(skb)->is_ipv6 = 0;
-
 	if (static_branch_unlikely(&udp_encap_needed_key))
 		sk = udp4_gro_lookup_skb(skb, uh->source, uh->dest);
 
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index d8445ac1b2e4..046f13b1d77a 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -154,8 +154,6 @@ struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 					     ip6_gro_compute_pseudo);
 
 skip:
-	NAPI_GRO_CB(skb)->is_ipv6 = 1;
-
 	if (static_branch_unlikely(&udpv6_encap_needed_key))
 		sk = udp6_gro_lookup_skb(skb, uh->source, uh->dest);
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v6 2/5] net: gro: only merge packets with incrementing or fixed outer ids
  2025-09-16 14:48 [PATCH net-next v6 0/5] net: gso: restore outer ip ids correctly Richard Gobert
  2025-09-16 14:48 ` [PATCH net-next v6 1/5] net: gro: remove is_ipv6 from napi_gro_cb Richard Gobert
@ 2025-09-16 14:48 ` Richard Gobert
  2025-09-22 19:28   ` Willem de Bruijn
  2025-09-16 14:48 ` [PATCH net-next v6 3/5] net: gso: restore ids of outer ip headers correctly Richard Gobert
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Richard Gobert @ 2025-09-16 14:48 UTC (permalink / raw)
  To: netdev, pabeni, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers, Richard Gobert

Only merge encapsulated packets if their outer IDs are either
incrementing or fixed, just like for inner IDs and IDs of non-encapsulated
packets.

Add another ip_fixedid bit for a total of two bits: one for outer IDs (and
for unencapsulated packets) and one for inner IDs.

This commit preserves the current behavior of GSO where only the IDs of the
inner-most headers are restored correctly.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
 include/net/gro.h      | 26 +++++++++++---------------
 net/ipv4/tcp_offload.c |  5 ++++-
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index 87c68007f949..e7997a9fb30b 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -75,7 +75,7 @@ struct napi_gro_cb {
 		u8	is_fou:1;
 
 		/* Used to determine if ipid_offset can be ignored */
-		u8	ip_fixedid:1;
+		u8	ip_fixedid:2;
 
 		/* Number of gro_receive callbacks this packet already went through */
 		u8 recursion_counter:4;
@@ -442,29 +442,26 @@ static inline __wsum ip6_gro_compute_pseudo(const struct sk_buff *skb,
 }
 
 static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2,
-				 struct sk_buff *p, bool outer)
+				 struct sk_buff *p, bool inner)
 {
 	const u32 id = ntohl(*(__be32 *)&iph->id);
 	const u32 id2 = ntohl(*(__be32 *)&iph2->id);
 	const u16 ipid_offset = (id >> 16) - (id2 >> 16);
 	const u16 count = NAPI_GRO_CB(p)->count;
 	const u32 df = id & IP_DF;
-	int flush;
 
 	/* All fields must match except length and checksum. */
-	flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF));
-
-	if (flush | (outer && df))
-		return flush;
+	if ((iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF)))
+		return true;
 
 	/* When we receive our second frame we can make a decision on if we
 	 * continue this flow as an atomic flow with a fixed ID or if we use
 	 * an incrementing ID.
 	 */
 	if (count == 1 && df && !ipid_offset)
-		NAPI_GRO_CB(p)->ip_fixedid = true;
+		NAPI_GRO_CB(p)->ip_fixedid |= 1 << inner;
 
-	return ipid_offset ^ (count * !NAPI_GRO_CB(p)->ip_fixedid);
+	return ipid_offset ^ (count * !(NAPI_GRO_CB(p)->ip_fixedid & (1 << inner)));
 }
 
 static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr *iph2)
@@ -479,7 +476,7 @@ static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr
 
 static inline int __gro_receive_network_flush(const void *th, const void *th2,
 					      struct sk_buff *p, const u16 diff,
-					      bool outer)
+					      bool inner)
 {
 	const void *nh = th - diff;
 	const void *nh2 = th2 - diff;
@@ -487,19 +484,18 @@ static inline int __gro_receive_network_flush(const void *th, const void *th2,
 	if (((struct iphdr *)nh)->version == 6)
 		return ipv6_gro_flush(nh, nh2);
 	else
-		return inet_gro_flush(nh, nh2, p, outer);
+		return inet_gro_flush(nh, nh2, p, inner);
 }
 
 static inline int gro_receive_network_flush(const void *th, const void *th2,
 					    struct sk_buff *p)
 {
-	const bool encap_mark = NAPI_GRO_CB(p)->encap_mark;
 	int off = skb_transport_offset(p);
 	int flush;
 
-	flush = __gro_receive_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offset, encap_mark);
-	if (encap_mark)
-		flush |= __gro_receive_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->inner_network_offset, false);
+	flush = __gro_receive_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offset, false);
+	if (NAPI_GRO_CB(p)->encap_mark)
+		flush |= __gro_receive_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->inner_network_offset, true);
 
 	return flush;
 }
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index e6612bd84d09..1949eede9ec9 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -471,6 +471,7 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
 	const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
 	struct tcphdr *th = tcp_hdr(skb);
+	bool is_fixedid;
 
 	if (unlikely(NAPI_GRO_CB(skb)->is_flist)) {
 		skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV4;
@@ -484,8 +485,10 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 	th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr,
 				  iph->daddr, 0);
 
+	is_fixedid = (NAPI_GRO_CB(skb)->ip_fixedid >> skb->encapsulation) & 1;
+
 	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4 |
-			(NAPI_GRO_CB(skb)->ip_fixedid * SKB_GSO_TCP_FIXEDID);
+			(is_fixedid * SKB_GSO_TCP_FIXEDID);
 
 	tcp_gro_complete(skb);
 	return 0;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v6 3/5] net: gso: restore ids of outer ip headers correctly
  2025-09-16 14:48 [PATCH net-next v6 0/5] net: gso: restore outer ip ids correctly Richard Gobert
  2025-09-16 14:48 ` [PATCH net-next v6 1/5] net: gro: remove is_ipv6 from napi_gro_cb Richard Gobert
  2025-09-16 14:48 ` [PATCH net-next v6 2/5] net: gro: only merge packets with incrementing or fixed outer ids Richard Gobert
@ 2025-09-16 14:48 ` Richard Gobert
  2025-09-22 19:35   ` Willem de Bruijn
  2025-09-16 14:48 ` [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks Richard Gobert
  2025-09-16 14:48 ` [PATCH net-next v6 5/5] selftests/net: test ipip packets in gro.sh Richard Gobert
  4 siblings, 1 reply; 15+ messages in thread
From: Richard Gobert @ 2025-09-16 14:48 UTC (permalink / raw)
  To: netdev, pabeni, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers, Richard Gobert

Currently, NETIF_F_TSO_MANGLEID indicates that the inner-most ID can
be mangled. Outer IDs can always be mangled.

Make GSO preserve outer IDs by default, with NETIF_F_TSO_MANGLEID allowing
both inner and outer IDs to be mangled.

This commit also modifies a few drivers that use SKB_GSO_FIXEDID directly.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc
---
 .../networking/segmentation-offloads.rst      | 22 ++++++++++++-------
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  8 +++++--
 drivers/net/ethernet/sfc/ef100_tx.c           | 17 ++++++++++----
 include/linux/netdevice.h                     |  9 ++++++--
 include/linux/skbuff.h                        |  8 ++++++-
 net/core/dev.c                                |  8 +++++--
 net/ipv4/af_inet.c                            | 13 +++++------
 net/ipv4/tcp_offload.c                        |  5 +----
 8 files changed, 60 insertions(+), 30 deletions(-)

diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst
index 085e8fab03fd..72f69b22b28c 100644
--- a/Documentation/networking/segmentation-offloads.rst
+++ b/Documentation/networking/segmentation-offloads.rst
@@ -43,10 +43,19 @@ also point to the TCP header of the packet.
 For IPv4 segmentation we support one of two types in terms of the IP ID.
 The default behavior is to increment the IP ID with every segment.  If the
 GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
-ID and all segments will use the same IP ID.  If a device has
-NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
-and we will either increment the IP ID for all frames, or leave it at a
-static value based on driver preference.
+ID and all segments will use the same IP ID.
+
+For encapsulated packets, SKB_GSO_TCP_FIXEDID refers only to the outer header.
+SKB_GSO_TCP_FIXEDID_INNER can be used to specify the same for the inner header.
+Any combination of these two GSO types is allowed.
+
+If a device has NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when
+performing TSO and we will either increment the IP ID for all frames, or leave
+it at a static value based on driver preference.  For encapsulated packets,
+NETIF_F_TSO_MANGLEID is relevant for both outer and inner headers, unless the
+DF bit is not set on the outer header, in which case the device driver must
+guarantee that the IP ID field is incremented in the outer header with every
+segment.
 
 
 UDP Fragmentation Offload
@@ -124,10 +133,7 @@ Generic Receive Offload
 Generic receive offload is the complement to GSO.  Ideally any frame
 assembled by GRO should be segmented to create an identical sequence of
 frames using GSO, and any sequence of frames segmented by GSO should be
-able to be reassembled back to the original by GRO.  The only exception to
-this is IPv4 ID in the case that the DF bit is set for a given IP header.
-If the value of the IPv4 ID is not sequentially incrementing it will be
-altered so that it is when a frame assembled via GRO is segmented via GSO.
+able to be reassembled back to the original by GRO.
 
 
 Partial Generic Segmentation Offload
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b8c609d91d11..480f66e21132 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1289,8 +1289,12 @@ static void mlx5e_shampo_update_ipv4_tcp_hdr(struct mlx5e_rq *rq, struct iphdr *
 	tcp->check = ~tcp_v4_check(skb->len - tcp_off, ipv4->saddr,
 				   ipv4->daddr, 0);
 	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4;
-	if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id)
-		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
+	if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id) {
+		bool encap = rq->hw_gro_data->fk.control.flags & FLOW_DIS_ENCAPSULATION;
+
+		skb_shinfo(skb)->gso_type |= encap ? SKB_GSO_TCP_FIXEDID_INNER :
+						     SKB_GSO_TCP_FIXEDID;
+	}
 
 	skb->csum_start = (unsigned char *)tcp - skb->head;
 	skb->csum_offset = offsetof(struct tcphdr, check);
diff --git a/drivers/net/ethernet/sfc/ef100_tx.c b/drivers/net/ethernet/sfc/ef100_tx.c
index e6b6be549581..03005757c060 100644
--- a/drivers/net/ethernet/sfc/ef100_tx.c
+++ b/drivers/net/ethernet/sfc/ef100_tx.c
@@ -189,6 +189,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
 {
 	bool gso_partial = skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL;
 	unsigned int len, ip_offset, tcp_offset, payload_segs;
+	u32 mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
 	u32 mangleid = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
 	unsigned int outer_ip_offset, outer_l4_offset;
 	u16 vlan_tci = skb_vlan_tag_get(skb);
@@ -200,8 +201,17 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
 	bool outer_csum;
 	u32 paylen;
 
-	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
-		mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
+	if (encap) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID_INNER)
+			mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
+			mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
+	} else {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
+			mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
+		mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
+	}
+
 	if (efx->net_dev->features & NETIF_F_HW_VLAN_CTAG_TX)
 		vlan_enable = skb_vlan_tag_present(skb);
 
@@ -245,8 +255,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
 			      ESF_GZ_TX_TSO_OUTER_L4_OFF_W, outer_l4_offset >> 1,
 			      ESF_GZ_TX_TSO_ED_OUTER_UDP_LEN, udp_encap && !gso_partial,
 			      ESF_GZ_TX_TSO_ED_OUTER_IP_LEN, encap && !gso_partial,
-			      ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, encap ? mangleid :
-								     ESE_GZ_TX_DESC_IP4_ID_NO_OP,
+			      ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, mangleid_outer,
 			      ESF_GZ_TX_TSO_VLAN_INSERT_EN, vlan_enable,
 			      ESF_GZ_TX_TSO_VLAN_INSERT_TCI, vlan_tci
 		);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f3a3b761abfb..3d19c888b839 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -5290,13 +5290,18 @@ void skb_warn_bad_offload(const struct sk_buff *skb);
 
 static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 {
-	netdev_features_t feature = (netdev_features_t)gso_type << NETIF_F_GSO_SHIFT;
+	netdev_features_t feature;
+
+	if (gso_type & (SKB_GSO_TCP_FIXEDID | SKB_GSO_TCP_FIXEDID_INNER))
+		gso_type |= __SKB_GSO_TCP_FIXEDID;
+
+	feature = ((netdev_features_t)gso_type << NETIF_F_GSO_SHIFT) & NETIF_F_GSO_MASK;
 
 	/* check flags correspondence */
 	BUILD_BUG_ON(SKB_GSO_TCPV4   != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_DODGY   != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
-	BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
+	BUILD_BUG_ON(__SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_TCPV6   != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_FCOE    != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_GRE     != (NETIF_F_GSO_GRE >> NETIF_F_GSO_SHIFT));
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ca8be45dd8be..646fb66ba948 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -674,7 +674,7 @@ enum {
 	/* This indicates the tcp segment has CWR set. */
 	SKB_GSO_TCP_ECN = 1 << 2,
 
-	SKB_GSO_TCP_FIXEDID = 1 << 3,
+	__SKB_GSO_TCP_FIXEDID = 1 << 3,
 
 	SKB_GSO_TCPV6 = 1 << 4,
 
@@ -707,6 +707,12 @@ enum {
 	SKB_GSO_FRAGLIST = 1 << 18,
 
 	SKB_GSO_TCP_ACCECN = 1 << 19,
+
+	/* These indirectly map onto the same netdev feature.
+	 * If NETIF_F_TSO_MANGLEID is set it may mangle both inner and outer IDs.
+	 */
+	SKB_GSO_TCP_FIXEDID = 1 << 30,
+	SKB_GSO_TCP_FIXEDID_INNER = 1 << 31,
 };
 
 #if BITS_PER_LONG > 32
diff --git a/net/core/dev.c b/net/core/dev.c
index 93a25d87b86b..6b34b3e857d4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3768,8 +3768,12 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
 	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL))
 		features &= ~dev->gso_partial_features;
 
-	/* Make sure to clear the IPv4 ID mangling feature if the
-	 * IPv4 header has the potential to be fragmented.
+	/* Make sure to clear the IPv4 ID mangling feature if the IPv4 header
+	 * has the potential to be fragmented. For encapsulated packets, the ID
+	 * mangling feature is guaranteed not to use the same ID for the outer
+	 * IPv4 headers of the generated segments if the headers have the
+	 * potential to be fragmented, so there is no need to clear the IPv4 ID
+	 * mangling feature.
 	 */
 	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
 		struct iphdr *iph = skb->encapsulation ?
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 76e38092cd8a..fc7a6955fa0a 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1393,14 +1393,13 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 
 	segs = ERR_PTR(-EPROTONOSUPPORT);
 
-	if (!skb->encapsulation || encap) {
-		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
-		fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID);
+	/* fixed ID is invalid if DF bit is not set */
+	fixedid = !!(skb_shinfo(skb)->gso_type & (SKB_GSO_TCP_FIXEDID << encap));
+	if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
+		goto out;
 
-		/* fixed ID is invalid if DF bit is not set */
-		if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
-			goto out;
-	}
+	if (!skb->encapsulation || encap)
+		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
 
 	ops = rcu_dereference(inet_offloads[proto]);
 	if (likely(ops && ops->callbacks.gso_segment)) {
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 1949eede9ec9..e6612bd84d09 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -471,7 +471,6 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
 	const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
 	struct tcphdr *th = tcp_hdr(skb);
-	bool is_fixedid;
 
 	if (unlikely(NAPI_GRO_CB(skb)->is_flist)) {
 		skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV4;
@@ -485,10 +484,8 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 	th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr,
 				  iph->daddr, 0);
 
-	is_fixedid = (NAPI_GRO_CB(skb)->ip_fixedid >> skb->encapsulation) & 1;
-
 	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4 |
-			(is_fixedid * SKB_GSO_TCP_FIXEDID);
+			(NAPI_GRO_CB(skb)->ip_fixedid * SKB_GSO_TCP_FIXEDID);
 
 	tcp_gro_complete(skb);
 	return 0;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks
  2025-09-16 14:48 [PATCH net-next v6 0/5] net: gso: restore outer ip ids correctly Richard Gobert
                   ` (2 preceding siblings ...)
  2025-09-16 14:48 ` [PATCH net-next v6 3/5] net: gso: restore ids of outer ip headers correctly Richard Gobert
@ 2025-09-16 14:48 ` Richard Gobert
  2025-09-18  8:10   ` Paolo Abeni
  2025-09-16 14:48 ` [PATCH net-next v6 5/5] selftests/net: test ipip packets in gro.sh Richard Gobert
  4 siblings, 1 reply; 15+ messages in thread
From: Richard Gobert @ 2025-09-16 14:48 UTC (permalink / raw)
  To: netdev, pabeni, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers, Richard Gobert

Currently, packets with fixed IDs will be merged only if their
don't-fragment bit is set. This restriction is unnecessary since
packets without the don't-fragment bit will be forwarded as-is even
if they were merged together. The merged packets will be segmented
into their original forms before being forwarded, either by GSO or
by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
is set, in which case the IDs can become incrementing, which is also fine.

Note that IP fragmentation is not an issue here, since packets are
segmented before being further fragmented. Fragmentation happens the
same way regardless of whether the packets were first merged together.

Clean up the code by removing the unnecessary don't-fragment checks.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
 include/net/gro.h                 | 5 ++---
 net/ipv4/af_inet.c                | 3 ---
 tools/testing/selftests/net/gro.c | 9 ++++-----
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/net/gro.h b/include/net/gro.h
index e7997a9fb30b..e3affb2e2ca8 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -448,17 +448,16 @@ static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *ip
 	const u32 id2 = ntohl(*(__be32 *)&iph2->id);
 	const u16 ipid_offset = (id >> 16) - (id2 >> 16);
 	const u16 count = NAPI_GRO_CB(p)->count;
-	const u32 df = id & IP_DF;
 
 	/* All fields must match except length and checksum. */
-	if ((iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF)))
+	if ((iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | ((id ^ id2) & IP_DF))
 		return true;
 
 	/* When we receive our second frame we can make a decision on if we
 	 * continue this flow as an atomic flow with a fixed ID or if we use
 	 * an incrementing ID.
 	 */
-	if (count == 1 && df && !ipid_offset)
+	if (count == 1 && !ipid_offset)
 		NAPI_GRO_CB(p)->ip_fixedid |= 1 << inner;
 
 	return ipid_offset ^ (count * !(NAPI_GRO_CB(p)->ip_fixedid & (1 << inner)));
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index fc7a6955fa0a..c0542d9187e2 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1393,10 +1393,7 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 
 	segs = ERR_PTR(-EPROTONOSUPPORT);
 
-	/* fixed ID is invalid if DF bit is not set */
 	fixedid = !!(skb_shinfo(skb)->gso_type & (SKB_GSO_TCP_FIXEDID << encap));
-	if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
-		goto out;
 
 	if (!skb->encapsulation || encap)
 		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
index d5824eadea10..3d4a82a2607c 100644
--- a/tools/testing/selftests/net/gro.c
+++ b/tools/testing/selftests/net/gro.c
@@ -670,7 +670,7 @@ static void send_flush_id_case(int fd, struct sockaddr_ll *daddr, int tcase)
 		iph2->id = htons(9);
 		break;
 
-	case 3: /* DF=0, Fixed - should not coalesce */
+	case 3: /* DF=0, Fixed - should coalesce */
 		iph1->frag_off &= ~htons(IP_DF);
 		iph1->id = htons(8);
 
@@ -1188,10 +1188,9 @@ static void gro_receiver(void)
 			correct_payload[0] = PAYLOAD_LEN * 2;
 			check_recv_pkts(rxfd, correct_payload, 1);
 
-			printf("DF=0, Fixed - should not coalesce: ");
-			correct_payload[0] = PAYLOAD_LEN;
-			correct_payload[1] = PAYLOAD_LEN;
-			check_recv_pkts(rxfd, correct_payload, 2);
+			printf("DF=0, Fixed - should coalesce: ");
+			correct_payload[0] = PAYLOAD_LEN * 2;
+			check_recv_pkts(rxfd, correct_payload, 1);
 
 			printf("DF=1, 2 Incrementing and one fixed - should coalesce only first 2 packets: ");
 			correct_payload[0] = PAYLOAD_LEN * 2;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next v6 5/5] selftests/net: test ipip packets in gro.sh
  2025-09-16 14:48 [PATCH net-next v6 0/5] net: gso: restore outer ip ids correctly Richard Gobert
                   ` (3 preceding siblings ...)
  2025-09-16 14:48 ` [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks Richard Gobert
@ 2025-09-16 14:48 ` Richard Gobert
  4 siblings, 0 replies; 15+ messages in thread
From: Richard Gobert @ 2025-09-16 14:48 UTC (permalink / raw)
  To: netdev, pabeni, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers, Richard Gobert, Willem de Bruijn

Add IPIP test-cases to the GRO selftest.

This selftest already contains IP ID test-cases. They are now
also tested for encapsulated packets.

This commit also fixes ipip packet generation in the test.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/net/gro.c  | 49 ++++++++++++++++++++++++------
 tools/testing/selftests/net/gro.sh |  2 +-
 2 files changed, 40 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
index 3d4a82a2607c..2b1d9f2b3e9e 100644
--- a/tools/testing/selftests/net/gro.c
+++ b/tools/testing/selftests/net/gro.c
@@ -93,6 +93,7 @@ static bool tx_socket = true;
 static int tcp_offset = -1;
 static int total_hdr_len = -1;
 static int ethhdr_proto = -1;
+static bool ipip;
 static const int num_flush_id_cases = 6;
 
 static void vlog(const char *fmt, ...)
@@ -114,7 +115,9 @@ static void setup_sock_filter(int fd)
 	int ipproto_off, opt_ipproto_off;
 	int next_off;
 
-	if (proto == PF_INET)
+	if (ipip)
+		next_off = sizeof(struct iphdr) + offsetof(struct iphdr, protocol);
+	else if (proto == PF_INET)
 		next_off = offsetof(struct iphdr, protocol);
 	else
 		next_off = offsetof(struct ipv6hdr, nexthdr);
@@ -244,7 +247,7 @@ static void fill_datalinklayer(void *buf)
 	eth->h_proto = ethhdr_proto;
 }
 
-static void fill_networklayer(void *buf, int payload_len)
+static void fill_networklayer(void *buf, int payload_len, int protocol)
 {
 	struct ipv6hdr *ip6h = buf;
 	struct iphdr *iph = buf;
@@ -254,7 +257,7 @@ static void fill_networklayer(void *buf, int payload_len)
 
 		ip6h->version = 6;
 		ip6h->payload_len = htons(sizeof(struct tcphdr) + payload_len);
-		ip6h->nexthdr = IPPROTO_TCP;
+		ip6h->nexthdr = protocol;
 		ip6h->hop_limit = 8;
 		if (inet_pton(AF_INET6, addr6_src, &ip6h->saddr) != 1)
 			error(1, errno, "inet_pton source ip6");
@@ -266,7 +269,7 @@ static void fill_networklayer(void *buf, int payload_len)
 		iph->version = 4;
 		iph->ihl = 5;
 		iph->ttl = 8;
-		iph->protocol	= IPPROTO_TCP;
+		iph->protocol	= protocol;
 		iph->tot_len = htons(sizeof(struct tcphdr) +
 				payload_len + sizeof(struct iphdr));
 		iph->frag_off = htons(0x4000); /* DF = 1, MF = 0 */
@@ -313,9 +316,19 @@ static void create_packet(void *buf, int seq_offset, int ack_offset,
 {
 	memset(buf, 0, total_hdr_len);
 	memset(buf + total_hdr_len, 'a', payload_len);
+
 	fill_transportlayer(buf + tcp_offset, seq_offset, ack_offset,
 			    payload_len, fin);
-	fill_networklayer(buf + ETH_HLEN, payload_len);
+
+	if (ipip) {
+		fill_networklayer(buf + ETH_HLEN, payload_len + sizeof(struct iphdr),
+				  IPPROTO_IPIP);
+		fill_networklayer(buf + ETH_HLEN + sizeof(struct iphdr),
+				  payload_len, IPPROTO_TCP);
+	} else {
+		fill_networklayer(buf + ETH_HLEN, payload_len, IPPROTO_TCP);
+	}
+
 	fill_datalinklayer(buf);
 }
 
@@ -416,6 +429,13 @@ static void recompute_packet(char *buf, char *no_ext, int extlen)
 		iph->tot_len = htons(ntohs(iph->tot_len) + extlen);
 		iph->check = 0;
 		iph->check = checksum_fold(iph, sizeof(struct iphdr), 0);
+
+		if (ipip) {
+			iph += 1;
+			iph->tot_len = htons(ntohs(iph->tot_len) + extlen);
+			iph->check = 0;
+			iph->check = checksum_fold(iph, sizeof(struct iphdr), 0);
+		}
 	} else {
 		ip6h->payload_len = htons(ntohs(ip6h->payload_len) + extlen);
 	}
@@ -777,7 +797,7 @@ static void send_fragment4(int fd, struct sockaddr_ll *daddr)
 	 */
 	memset(buf + total_hdr_len, 'a', PAYLOAD_LEN * 2);
 	fill_transportlayer(buf + tcp_offset, PAYLOAD_LEN, 0, PAYLOAD_LEN * 2, 0);
-	fill_networklayer(buf + ETH_HLEN, PAYLOAD_LEN);
+	fill_networklayer(buf + ETH_HLEN, PAYLOAD_LEN, IPPROTO_TCP);
 	fill_datalinklayer(buf);
 
 	iph->frag_off = htons(0x6000); // DF = 1, MF = 1
@@ -1071,7 +1091,7 @@ static void gro_sender(void)
 		 * and min ipv6hdr size. Like MAX_HDR_SIZE,
 		 * MAX_PAYLOAD is defined with the larger header of the two.
 		 */
-		int offset = proto == PF_INET ? 20 : 0;
+		int offset = (proto == PF_INET && !ipip) ? 20 : 0;
 		int remainder = (MAX_PAYLOAD + offset) % MSS;
 
 		send_large(txfd, &daddr, remainder);
@@ -1221,7 +1241,7 @@ static void gro_receiver(void)
 			check_recv_pkts(rxfd, correct_payload, 2);
 		}
 	} else if (strcmp(testname, "large") == 0) {
-		int offset = proto == PF_INET ? 20 : 0;
+		int offset = (proto == PF_INET && !ipip) ? 20 : 0;
 		int remainder = (MAX_PAYLOAD + offset) % MSS;
 
 		correct_payload[0] = (MAX_PAYLOAD + offset);
@@ -1250,6 +1270,7 @@ static void parse_args(int argc, char **argv)
 		{ "iface", required_argument, NULL, 'i' },
 		{ "ipv4", no_argument, NULL, '4' },
 		{ "ipv6", no_argument, NULL, '6' },
+		{ "ipip", no_argument, NULL, 'e' },
 		{ "rx", no_argument, NULL, 'r' },
 		{ "saddr", required_argument, NULL, 's' },
 		{ "smac", required_argument, NULL, 'S' },
@@ -1259,7 +1280,7 @@ static void parse_args(int argc, char **argv)
 	};
 	int c;
 
-	while ((c = getopt_long(argc, argv, "46d:D:i:rs:S:t:v", opts, NULL)) != -1) {
+	while ((c = getopt_long(argc, argv, "46d:D:ei:rs:S:t:v", opts, NULL)) != -1) {
 		switch (c) {
 		case '4':
 			proto = PF_INET;
@@ -1269,6 +1290,11 @@ static void parse_args(int argc, char **argv)
 			proto = PF_INET6;
 			ethhdr_proto = htons(ETH_P_IPV6);
 			break;
+		case 'e':
+			ipip = true;
+			proto = PF_INET;
+			ethhdr_proto = htons(ETH_P_IP);
+			break;
 		case 'd':
 			addr4_dst = addr6_dst = optarg;
 			break;
@@ -1304,7 +1330,10 @@ int main(int argc, char **argv)
 {
 	parse_args(argc, argv);
 
-	if (proto == PF_INET) {
+	if (ipip) {
+		tcp_offset = ETH_HLEN + sizeof(struct iphdr) * 2;
+		total_hdr_len = tcp_offset + sizeof(struct tcphdr);
+	} else if (proto == PF_INET) {
 		tcp_offset = ETH_HLEN + sizeof(struct iphdr);
 		total_hdr_len = tcp_offset + sizeof(struct tcphdr);
 	} else if (proto == PF_INET6) {
diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
index 9e3f186bc2a1..4c5144c6f652 100755
--- a/tools/testing/selftests/net/gro.sh
+++ b/tools/testing/selftests/net/gro.sh
@@ -4,7 +4,7 @@
 readonly SERVER_MAC="aa:00:00:00:00:02"
 readonly CLIENT_MAC="aa:00:00:00:00:01"
 readonly TESTS=("data" "ack" "flags" "tcp" "ip" "large")
-readonly PROTOS=("ipv4" "ipv6")
+readonly PROTOS=("ipv4" "ipv6" "ipip")
 dev=""
 test="all"
 proto="ipv4"
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks
  2025-09-16 14:48 ` [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks Richard Gobert
@ 2025-09-18  8:10   ` Paolo Abeni
  2025-09-18 14:01     ` Richard Gobert
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2025-09-18  8:10 UTC (permalink / raw)
  To: Richard Gobert, netdev, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers

On 9/16/25 4:48 PM, Richard Gobert wrote:
> Currently, packets with fixed IDs will be merged only if their
> don't-fragment bit is set. This restriction is unnecessary since
> packets without the don't-fragment bit will be forwarded as-is even
> if they were merged together. The merged packets will be segmented
> into their original forms before being forwarded, either by GSO or
> by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
> is set, in which case the IDs can become incrementing, which is also fine.
> 
> Note that IP fragmentation is not an issue here, since packets are
> segmented before being further fragmented. Fragmentation happens the
> same way regardless of whether the packets were first merged together.

I agree with Willem, that an explicit assertion somewhere (in
ip_do_fragmentation?!?) could be useful.

Also I'm not sure that "packets are segmented before being further
fragmented" is always true for the OVS forwarding scenario.

/P


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks
  2025-09-18  8:10   ` Paolo Abeni
@ 2025-09-18 14:01     ` Richard Gobert
  2025-09-22  8:19       ` Richard Gobert
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Gobert @ 2025-09-18 14:01 UTC (permalink / raw)
  To: Paolo Abeni, netdev, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers

Paolo Abeni wrote:
> On 9/16/25 4:48 PM, Richard Gobert wrote:
>> Currently, packets with fixed IDs will be merged only if their
>> don't-fragment bit is set. This restriction is unnecessary since
>> packets without the don't-fragment bit will be forwarded as-is even
>> if they were merged together. The merged packets will be segmented
>> into their original forms before being forwarded, either by GSO or
>> by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
>> is set, in which case the IDs can become incrementing, which is also fine.
>>
>> Note that IP fragmentation is not an issue here, since packets are
>> segmented before being further fragmented. Fragmentation happens the
>> same way regardless of whether the packets were first merged together.
> 
> I agree with Willem, that an explicit assertion somewhere (in
> ip_do_fragmentation?!?) could be useful.
> 

As I replied to Willem, I'll mention ip_finish_output_gso explicitly in the
commit message.

Or did you mean I should add some type of WARN_ON assertion that ip_do_fragment isn't
called for GSO packets?

> Also I'm not sure that "packets are segmented before being further
> fragmented" is always true for the OVS forwarding scenario.
> 

If this is really the case, it is a bug in OVS. Segmentation is required before
fragmentation as otherwise GRO isn't transparent and fragments will be forwarded
that contain data from multiple different packets. It's also probably less efficient,
if the segment size is smaller than the MTU. I think this should be addressed in a
separate patch series.

I'll also mention OVS in the commit message.

> /P
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks
  2025-09-18 14:01     ` Richard Gobert
@ 2025-09-22  8:19       ` Richard Gobert
  2025-09-25 10:15         ` Paolo Abeni
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Gobert @ 2025-09-22  8:19 UTC (permalink / raw)
  To: Paolo Abeni, netdev, ecree.xilinx, willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers

Richard Gobert wrote:
> Paolo Abeni wrote:
>> On 9/16/25 4:48 PM, Richard Gobert wrote:
>>> Currently, packets with fixed IDs will be merged only if their
>>> don't-fragment bit is set. This restriction is unnecessary since
>>> packets without the don't-fragment bit will be forwarded as-is even
>>> if they were merged together. The merged packets will be segmented
>>> into their original forms before being forwarded, either by GSO or
>>> by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
>>> is set, in which case the IDs can become incrementing, which is also fine.
>>>
>>> Note that IP fragmentation is not an issue here, since packets are
>>> segmented before being further fragmented. Fragmentation happens the
>>> same way regardless of whether the packets were first merged together.
>>
>> I agree with Willem, that an explicit assertion somewhere (in
>> ip_do_fragmentation?!?) could be useful.
>>
> 
> As I replied to Willem, I'll mention ip_finish_output_gso explicitly in the
> commit message.
> 
> Or did you mean I should add some type of WARN_ON assertion that ip_do_fragment isn't
> called for GSO packets?
> 
>> Also I'm not sure that "packets are segmented before being further
>> fragmented" is always true for the OVS forwarding scenario.
>>
> 
> If this is really the case, it is a bug in OVS. Segmentation is required before
> fragmentation as otherwise GRO isn't transparent and fragments will be forwarded
> that contain data from multiple different packets. It's also probably less efficient,
> if the segment size is smaller than the MTU. I think this should be addressed in a
> separate patch series.
> 
> I'll also mention OVS in the commit message.
> 

I looked into it, and it seems that you are correct. Looks like fragmentation
can occur without segmentation in the OVS forwarding case. As I said, this is
a bug since generated fragments may contain data from multiple packets. Still,
this can already happen for packets with incrementing IDs and nothing special
in particular will happen for the packets discussed in this patch. This should
be fixed in a separate patch series, as do all other cases where ip_do_fragment
is called directly without segmenting the packets first.

No changes are required for the current series as it does not introduce this
issue or give it more exposure. I'll remove the comment about fragmentation
from the commit message since it's not entirely correct and it is rather
irrelevant to this patch anyway.

>> /P
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 2/5] net: gro: only merge packets with incrementing or fixed outer ids
  2025-09-16 14:48 ` [PATCH net-next v6 2/5] net: gro: only merge packets with incrementing or fixed outer ids Richard Gobert
@ 2025-09-22 19:28   ` Willem de Bruijn
  0 siblings, 0 replies; 15+ messages in thread
From: Willem de Bruijn @ 2025-09-22 19:28 UTC (permalink / raw)
  To: Richard Gobert, netdev, pabeni, ecree.xilinx,
	willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers, Richard Gobert

Richard Gobert wrote:
> Only merge encapsulated packets if their outer IDs are either
> incrementing or fixed, just like for inner IDs and IDs of non-encapsulated
> packets.
> 
> Add another ip_fixedid bit for a total of two bits: one for outer IDs (and
> for unencapsulated packets) and one for inner IDs.
> 
> This commit preserves the current behavior of GSO where only the IDs of the
> inner-most headers are restored correctly.
> 
> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 3/5] net: gso: restore ids of outer ip headers correctly
  2025-09-16 14:48 ` [PATCH net-next v6 3/5] net: gso: restore ids of outer ip headers correctly Richard Gobert
@ 2025-09-22 19:35   ` Willem de Bruijn
  2025-09-23  8:29     ` Richard Gobert
  0 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2025-09-22 19:35 UTC (permalink / raw)
  To: Richard Gobert, netdev, pabeni, ecree.xilinx,
	willemdebruijn.kernel
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers, Richard Gobert

Richard Gobert wrote:
> Currently, NETIF_F_TSO_MANGLEID indicates that the inner-most ID can
> be mangled. Outer IDs can always be mangled.
> 
> Make GSO preserve outer IDs by default, with NETIF_F_TSO_MANGLEID allowing
> both inner and outer IDs to be mangled.
> 
> This commit also modifies a few drivers that use SKB_GSO_FIXEDID directly.
> 
> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc
> ---
>  .../networking/segmentation-offloads.rst      | 22 ++++++++++++-------
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  8 +++++--
>  drivers/net/ethernet/sfc/ef100_tx.c           | 17 ++++++++++----
>  include/linux/netdevice.h                     |  9 ++++++--
>  include/linux/skbuff.h                        |  8 ++++++-
>  net/core/dev.c                                |  8 +++++--
>  net/ipv4/af_inet.c                            | 13 +++++------
>  net/ipv4/tcp_offload.c                        |  5 +----
>  8 files changed, 60 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst
> index 085e8fab03fd..72f69b22b28c 100644
> --- a/Documentation/networking/segmentation-offloads.rst
> +++ b/Documentation/networking/segmentation-offloads.rst
> @@ -43,10 +43,19 @@ also point to the TCP header of the packet.
>  For IPv4 segmentation we support one of two types in terms of the IP ID.
>  The default behavior is to increment the IP ID with every segment.  If the
>  GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
> -ID and all segments will use the same IP ID.  If a device has
> -NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
> -and we will either increment the IP ID for all frames, or leave it at a
> -static value based on driver preference.
> +ID and all segments will use the same IP ID.
> +
> +For encapsulated packets, SKB_GSO_TCP_FIXEDID refers only to the outer header.
> +SKB_GSO_TCP_FIXEDID_INNER can be used to specify the same for the inner header.
> +Any combination of these two GSO types is allowed.
> +
> +If a device has NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when
> +performing TSO and we will either increment the IP ID for all frames, or leave
> +it at a static value based on driver preference.  For encapsulated packets,
> +NETIF_F_TSO_MANGLEID is relevant for both outer and inner headers, unless the
> +DF bit is not set on the outer header, in which case the device driver must
> +guarantee that the IP ID field is incremented in the outer header with every
> +segment.
>  
>  
>  UDP Fragmentation Offload
> @@ -124,10 +133,7 @@ Generic Receive Offload
>  Generic receive offload is the complement to GSO.  Ideally any frame
>  assembled by GRO should be segmented to create an identical sequence of
>  frames using GSO, and any sequence of frames segmented by GSO should be
> -able to be reassembled back to the original by GRO.  The only exception to
> -this is IPv4 ID in the case that the DF bit is set for a given IP header.
> -If the value of the IPv4 ID is not sequentially incrementing it will be
> -altered so that it is when a frame assembled via GRO is segmented via GSO.
> +able to be reassembled back to the original by GRO.
>  
>  
>  Partial Generic Segmentation Offload
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index b8c609d91d11..480f66e21132 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -1289,8 +1289,12 @@ static void mlx5e_shampo_update_ipv4_tcp_hdr(struct mlx5e_rq *rq, struct iphdr *
>  	tcp->check = ~tcp_v4_check(skb->len - tcp_off, ipv4->saddr,
>  				   ipv4->daddr, 0);
>  	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4;
> -	if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id)
> -		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
> +	if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id) {
> +		bool encap = rq->hw_gro_data->fk.control.flags & FLOW_DIS_ENCAPSULATION;
> +
> +		skb_shinfo(skb)->gso_type |= encap ? SKB_GSO_TCP_FIXEDID_INNER :
> +						     SKB_GSO_TCP_FIXEDID;
> +	}
>  
>  	skb->csum_start = (unsigned char *)tcp - skb->head;
>  	skb->csum_offset = offsetof(struct tcphdr, check);
> diff --git a/drivers/net/ethernet/sfc/ef100_tx.c b/drivers/net/ethernet/sfc/ef100_tx.c
> index e6b6be549581..03005757c060 100644
> --- a/drivers/net/ethernet/sfc/ef100_tx.c
> +++ b/drivers/net/ethernet/sfc/ef100_tx.c
> @@ -189,6 +189,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>  {
>  	bool gso_partial = skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL;
>  	unsigned int len, ip_offset, tcp_offset, payload_segs;
> +	u32 mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
>  	u32 mangleid = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
>  	unsigned int outer_ip_offset, outer_l4_offset;
>  	u16 vlan_tci = skb_vlan_tag_get(skb);
> @@ -200,8 +201,17 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>  	bool outer_csum;
>  	u32 paylen;
>  
> -	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
> -		mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +	if (encap) {
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID_INNER)
> +			mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
> +			mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +	} else {
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
> +			mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +		mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +	}
> +
>  	if (efx->net_dev->features & NETIF_F_HW_VLAN_CTAG_TX)
>  		vlan_enable = skb_vlan_tag_present(skb);
>  
> @@ -245,8 +255,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>  			      ESF_GZ_TX_TSO_OUTER_L4_OFF_W, outer_l4_offset >> 1,
>  			      ESF_GZ_TX_TSO_ED_OUTER_UDP_LEN, udp_encap && !gso_partial,
>  			      ESF_GZ_TX_TSO_ED_OUTER_IP_LEN, encap && !gso_partial,
> -			      ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, encap ? mangleid :
> -								     ESE_GZ_TX_DESC_IP4_ID_NO_OP,
> +			      ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, mangleid_outer,
>  			      ESF_GZ_TX_TSO_VLAN_INSERT_EN, vlan_enable,
>  			      ESF_GZ_TX_TSO_VLAN_INSERT_TCI, vlan_tci
>  		);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index f3a3b761abfb..3d19c888b839 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -5290,13 +5290,18 @@ void skb_warn_bad_offload(const struct sk_buff *skb);
>  
>  static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>  {
> -	netdev_features_t feature = (netdev_features_t)gso_type << NETIF_F_GSO_SHIFT;
> +	netdev_features_t feature;
> +
> +	if (gso_type & (SKB_GSO_TCP_FIXEDID | SKB_GSO_TCP_FIXEDID_INNER))
> +		gso_type |= __SKB_GSO_TCP_FIXEDID;

Should this also remove the original features from the type. Given
that no NETIF_F equivalent exists for those.

> +
> +	feature = ((netdev_features_t)gso_type << NETIF_F_GSO_SHIFT) & NETIF_F_GSO_MASK;
>  
>  	/* check flags correspondence */
>  	BUILD_BUG_ON(SKB_GSO_TCPV4   != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_DODGY   != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
> -	BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
> +	BUILD_BUG_ON(__SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_TCPV6   != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_FCOE    != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_GRE     != (NETIF_F_GSO_GRE >> NETIF_F_GSO_SHIFT));
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index ca8be45dd8be..646fb66ba948 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -674,7 +674,7 @@ enum {
>  	/* This indicates the tcp segment has CWR set. */
>  	SKB_GSO_TCP_ECN = 1 << 2,
>  
> -	SKB_GSO_TCP_FIXEDID = 1 << 3,
> +	__SKB_GSO_TCP_FIXEDID = 1 << 3,
>  
>  	SKB_GSO_TCPV6 = 1 << 4,
>  
> @@ -707,6 +707,12 @@ enum {
>  	SKB_GSO_FRAGLIST = 1 << 18,
>  
>  	SKB_GSO_TCP_ACCECN = 1 << 19,
> +
> +	/* These indirectly map onto the same netdev feature.
> +	 * If NETIF_F_TSO_MANGLEID is set it may mangle both inner and outer IDs.
> +	 */
> +	SKB_GSO_TCP_FIXEDID = 1 << 30,
> +	SKB_GSO_TCP_FIXEDID_INNER = 1 << 31,
>  };
>  
>  #if BITS_PER_LONG > 32
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 93a25d87b86b..6b34b3e857d4 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3768,8 +3768,12 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>  	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL))
>  		features &= ~dev->gso_partial_features;
>  
> -	/* Make sure to clear the IPv4 ID mangling feature if the
> -	 * IPv4 header has the potential to be fragmented.
> +	/* Make sure to clear the IPv4 ID mangling feature if the IPv4 header
> +	 * has the potential to be fragmented. For encapsulated packets, the ID
> +	 * mangling feature is guaranteed not to use the same ID for the outer
> +	 * IPv4 headers of the generated segments if the headers have the
> +	 * potential to be fragmented, so there is no need to clear the IPv4 ID
> +	 * mangling feature.

If respinning: same comment from v5: please convert the assertion
that ID mangling is guaranteed to not to use the same ID for !DF
to an explanation: point to or copy the statement from the GSO
partial documentation.

>  	 */
>  	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
>  		struct iphdr *iph = skb->encapsulation ?
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 76e38092cd8a..fc7a6955fa0a 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1393,14 +1393,13 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>  
>  	segs = ERR_PTR(-EPROTONOSUPPORT);
>  
> -	if (!skb->encapsulation || encap) {
> -		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
> -		fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID);
> +	/* fixed ID is invalid if DF bit is not set */
> +	fixedid = !!(skb_shinfo(skb)->gso_type & (SKB_GSO_TCP_FIXEDID << encap));
> +	if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
> +		goto out;
>  
> -		/* fixed ID is invalid if DF bit is not set */
> -		if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
> -			goto out;
> -	}
> +	if (!skb->encapsulation || encap)
> +		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
>  
>  	ops = rcu_dereference(inet_offloads[proto]);
>  	if (likely(ops && ops->callbacks.gso_segment)) {
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 1949eede9ec9..e6612bd84d09 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -471,7 +471,6 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>  	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
>  	const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
>  	struct tcphdr *th = tcp_hdr(skb);
> -	bool is_fixedid;
>  
>  	if (unlikely(NAPI_GRO_CB(skb)->is_flist)) {
>  		skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV4;
> @@ -485,10 +484,8 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>  	th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr,
>  				  iph->daddr, 0);
>  
> -	is_fixedid = (NAPI_GRO_CB(skb)->ip_fixedid >> skb->encapsulation) & 1;
> -
>  	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4 |
> -			(is_fixedid * SKB_GSO_TCP_FIXEDID);
> +			(NAPI_GRO_CB(skb)->ip_fixedid * SKB_GSO_TCP_FIXEDID);
>  
>  	tcp_gro_complete(skb);
>  	return 0;
> -- 
> 2.36.1
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 3/5] net: gso: restore ids of outer ip headers correctly
  2025-09-22 19:35   ` Willem de Bruijn
@ 2025-09-23  8:29     ` Richard Gobert
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Gobert @ 2025-09-23  8:29 UTC (permalink / raw)
  To: Willem de Bruijn, netdev, pabeni, ecree.xilinx
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, kuniyu, shuah, sdf, aleksander.lobakin,
	florian.fainelli, alexander.duyck, linux-kernel,
	linux-net-drivers

Willem de Bruijn wrote:
> Richard Gobert wrote:
>> Currently, NETIF_F_TSO_MANGLEID indicates that the inner-most ID can
>> be mangled. Outer IDs can always be mangled.
>>
>> Make GSO preserve outer IDs by default, with NETIF_F_TSO_MANGLEID allowing
>> both inner and outer IDs to be mangled.
>>
>> This commit also modifies a few drivers that use SKB_GSO_FIXEDID directly.
>>
>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
>> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc
>> ---
>>  .../networking/segmentation-offloads.rst      | 22 ++++++++++++-------
>>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  8 +++++--
>>  drivers/net/ethernet/sfc/ef100_tx.c           | 17 ++++++++++----
>>  include/linux/netdevice.h                     |  9 ++++++--
>>  include/linux/skbuff.h                        |  8 ++++++-
>>  net/core/dev.c                                |  8 +++++--
>>  net/ipv4/af_inet.c                            | 13 +++++------
>>  net/ipv4/tcp_offload.c                        |  5 +----
>>  8 files changed, 60 insertions(+), 30 deletions(-)
>>
>> diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst
>> index 085e8fab03fd..72f69b22b28c 100644
>> --- a/Documentation/networking/segmentation-offloads.rst
>> +++ b/Documentation/networking/segmentation-offloads.rst
>> @@ -43,10 +43,19 @@ also point to the TCP header of the packet.
>>  For IPv4 segmentation we support one of two types in terms of the IP ID.
>>  The default behavior is to increment the IP ID with every segment.  If the
>>  GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
>> -ID and all segments will use the same IP ID.  If a device has
>> -NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
>> -and we will either increment the IP ID for all frames, or leave it at a
>> -static value based on driver preference.
>> +ID and all segments will use the same IP ID.
>> +
>> +For encapsulated packets, SKB_GSO_TCP_FIXEDID refers only to the outer header.
>> +SKB_GSO_TCP_FIXEDID_INNER can be used to specify the same for the inner header.
>> +Any combination of these two GSO types is allowed.
>> +
>> +If a device has NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when
>> +performing TSO and we will either increment the IP ID for all frames, or leave
>> +it at a static value based on driver preference.  For encapsulated packets,
>> +NETIF_F_TSO_MANGLEID is relevant for both outer and inner headers, unless the
>> +DF bit is not set on the outer header, in which case the device driver must
>> +guarantee that the IP ID field is incremented in the outer header with every
>> +segment.
>>  
>>  
>>  UDP Fragmentation Offload
>> @@ -124,10 +133,7 @@ Generic Receive Offload
>>  Generic receive offload is the complement to GSO.  Ideally any frame
>>  assembled by GRO should be segmented to create an identical sequence of
>>  frames using GSO, and any sequence of frames segmented by GSO should be
>> -able to be reassembled back to the original by GRO.  The only exception to
>> -this is IPv4 ID in the case that the DF bit is set for a given IP header.
>> -If the value of the IPv4 ID is not sequentially incrementing it will be
>> -altered so that it is when a frame assembled via GRO is segmented via GSO.
>> +able to be reassembled back to the original by GRO.
>>  
>>  
>>  Partial Generic Segmentation Offload
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> index b8c609d91d11..480f66e21132 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> @@ -1289,8 +1289,12 @@ static void mlx5e_shampo_update_ipv4_tcp_hdr(struct mlx5e_rq *rq, struct iphdr *
>>  	tcp->check = ~tcp_v4_check(skb->len - tcp_off, ipv4->saddr,
>>  				   ipv4->daddr, 0);
>>  	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4;
>> -	if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id)
>> -		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
>> +	if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id) {
>> +		bool encap = rq->hw_gro_data->fk.control.flags & FLOW_DIS_ENCAPSULATION;
>> +
>> +		skb_shinfo(skb)->gso_type |= encap ? SKB_GSO_TCP_FIXEDID_INNER :
>> +						     SKB_GSO_TCP_FIXEDID;
>> +	}
>>  
>>  	skb->csum_start = (unsigned char *)tcp - skb->head;
>>  	skb->csum_offset = offsetof(struct tcphdr, check);
>> diff --git a/drivers/net/ethernet/sfc/ef100_tx.c b/drivers/net/ethernet/sfc/ef100_tx.c
>> index e6b6be549581..03005757c060 100644
>> --- a/drivers/net/ethernet/sfc/ef100_tx.c
>> +++ b/drivers/net/ethernet/sfc/ef100_tx.c
>> @@ -189,6 +189,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>>  {
>>  	bool gso_partial = skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL;
>>  	unsigned int len, ip_offset, tcp_offset, payload_segs;
>> +	u32 mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
>>  	u32 mangleid = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
>>  	unsigned int outer_ip_offset, outer_l4_offset;
>>  	u16 vlan_tci = skb_vlan_tag_get(skb);
>> @@ -200,8 +201,17 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>>  	bool outer_csum;
>>  	u32 paylen;
>>  
>> -	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
>> -		mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> +	if (encap) {
>> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID_INNER)
>> +			mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
>> +			mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> +	} else {
>> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
>> +			mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> +		mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> +	}
>> +
>>  	if (efx->net_dev->features & NETIF_F_HW_VLAN_CTAG_TX)
>>  		vlan_enable = skb_vlan_tag_present(skb);
>>  
>> @@ -245,8 +255,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>>  			      ESF_GZ_TX_TSO_OUTER_L4_OFF_W, outer_l4_offset >> 1,
>>  			      ESF_GZ_TX_TSO_ED_OUTER_UDP_LEN, udp_encap && !gso_partial,
>>  			      ESF_GZ_TX_TSO_ED_OUTER_IP_LEN, encap && !gso_partial,
>> -			      ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, encap ? mangleid :
>> -								     ESE_GZ_TX_DESC_IP4_ID_NO_OP,
>> +			      ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, mangleid_outer,
>>  			      ESF_GZ_TX_TSO_VLAN_INSERT_EN, vlan_enable,
>>  			      ESF_GZ_TX_TSO_VLAN_INSERT_TCI, vlan_tci
>>  		);
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index f3a3b761abfb..3d19c888b839 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -5290,13 +5290,18 @@ void skb_warn_bad_offload(const struct sk_buff *skb);
>>  
>>  static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>>  {
>> -	netdev_features_t feature = (netdev_features_t)gso_type << NETIF_F_GSO_SHIFT;
>> +	netdev_features_t feature;
>> +
>> +	if (gso_type & (SKB_GSO_TCP_FIXEDID | SKB_GSO_TCP_FIXEDID_INNER))
>> +		gso_type |= __SKB_GSO_TCP_FIXEDID;
> 
> Should this also remove the original features from the type. Given
> that no NETIF_F equivalent exists for those.
> 

This is already done with NETIF_F_GSO_MASK in the line below.

>> +
>> +	feature = ((netdev_features_t)gso_type << NETIF_F_GSO_SHIFT) & NETIF_F_GSO_MASK;
>>  
>>  	/* check flags correspondence */
>>  	BUILD_BUG_ON(SKB_GSO_TCPV4   != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_DODGY   != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>> -	BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
>> +	BUILD_BUG_ON(__SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_TCPV6   != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_FCOE    != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
>>  	BUILD_BUG_ON(SKB_GSO_GRE     != (NETIF_F_GSO_GRE >> NETIF_F_GSO_SHIFT));
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index ca8be45dd8be..646fb66ba948 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -674,7 +674,7 @@ enum {
>>  	/* This indicates the tcp segment has CWR set. */
>>  	SKB_GSO_TCP_ECN = 1 << 2,
>>  
>> -	SKB_GSO_TCP_FIXEDID = 1 << 3,
>> +	__SKB_GSO_TCP_FIXEDID = 1 << 3,
>>  
>>  	SKB_GSO_TCPV6 = 1 << 4,
>>  
>> @@ -707,6 +707,12 @@ enum {
>>  	SKB_GSO_FRAGLIST = 1 << 18,
>>  
>>  	SKB_GSO_TCP_ACCECN = 1 << 19,
>> +
>> +	/* These indirectly map onto the same netdev feature.
>> +	 * If NETIF_F_TSO_MANGLEID is set it may mangle both inner and outer IDs.
>> +	 */
>> +	SKB_GSO_TCP_FIXEDID = 1 << 30,
>> +	SKB_GSO_TCP_FIXEDID_INNER = 1 << 31,
>>  };
>>  
>>  #if BITS_PER_LONG > 32
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 93a25d87b86b..6b34b3e857d4 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3768,8 +3768,12 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>>  	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL))
>>  		features &= ~dev->gso_partial_features;
>>  
>> -	/* Make sure to clear the IPv4 ID mangling feature if the
>> -	 * IPv4 header has the potential to be fragmented.
>> +	/* Make sure to clear the IPv4 ID mangling feature if the IPv4 header
>> +	 * has the potential to be fragmented. For encapsulated packets, the ID
>> +	 * mangling feature is guaranteed not to use the same ID for the outer
>> +	 * IPv4 headers of the generated segments if the headers have the
>> +	 * potential to be fragmented, so there is no need to clear the IPv4 ID
>> +	 * mangling feature.
> 
> If respinning: same comment from v5: please convert the assertion
> that ID mangling is guaranteed to not to use the same ID for !DF
> to an explanation: point to or copy the statement from the GSO
> partial documentation.
> 

I'll add a direct reference to the documentation and send v8 shortly.

>>  	 */
>>  	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
>>  		struct iphdr *iph = skb->encapsulation ?
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 76e38092cd8a..fc7a6955fa0a 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -1393,14 +1393,13 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>  
>>  	segs = ERR_PTR(-EPROTONOSUPPORT);
>>  
>> -	if (!skb->encapsulation || encap) {
>> -		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
>> -		fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID);
>> +	/* fixed ID is invalid if DF bit is not set */
>> +	fixedid = !!(skb_shinfo(skb)->gso_type & (SKB_GSO_TCP_FIXEDID << encap));
>> +	if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
>> +		goto out;
>>  
>> -		/* fixed ID is invalid if DF bit is not set */
>> -		if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
>> -			goto out;
>> -	}
>> +	if (!skb->encapsulation || encap)
>> +		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
>>  
>>  	ops = rcu_dereference(inet_offloads[proto]);
>>  	if (likely(ops && ops->callbacks.gso_segment)) {
>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
>> index 1949eede9ec9..e6612bd84d09 100644
>> --- a/net/ipv4/tcp_offload.c
>> +++ b/net/ipv4/tcp_offload.c
>> @@ -471,7 +471,6 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>>  	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
>>  	const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
>>  	struct tcphdr *th = tcp_hdr(skb);
>> -	bool is_fixedid;
>>  
>>  	if (unlikely(NAPI_GRO_CB(skb)->is_flist)) {
>>  		skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV4;
>> @@ -485,10 +484,8 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>>  	th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr,
>>  				  iph->daddr, 0);
>>  
>> -	is_fixedid = (NAPI_GRO_CB(skb)->ip_fixedid >> skb->encapsulation) & 1;
>> -
>>  	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4 |
>> -			(is_fixedid * SKB_GSO_TCP_FIXEDID);
>> +			(NAPI_GRO_CB(skb)->ip_fixedid * SKB_GSO_TCP_FIXEDID);
>>  
>>  	tcp_gro_complete(skb);
>>  	return 0;
>> -- 
>> 2.36.1
>>
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks
  2025-09-22  8:19       ` Richard Gobert
@ 2025-09-25 10:15         ` Paolo Abeni
  2025-09-29 11:09           ` Ilya Maximets
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2025-09-25 10:15 UTC (permalink / raw)
  To: Aaron Conole, Eelco Chaudron, Ilya Maximets
  Cc: davem, edumazet, kuba, horms, corbet, saeedm, tariqt, mbloch,
	leon, dsahern, ncardwell, ecree.xilinx, Richard Gobert, kuniyu,
	shuah, sdf, aleksander.lobakin, florian.fainelli, alexander.duyck,
	linux-kernel, linux-net-drivers, netdev, willemdebruijn.kernel

Adding the OVS maintainers for awareness..

On 9/22/25 10:19 AM, Richard Gobert wrote:
> Richard Gobert wrote:
>> Paolo Abeni wrote:
>>> On 9/16/25 4:48 PM, Richard Gobert wrote:
>>>> Currently, packets with fixed IDs will be merged only if their
>>>> don't-fragment bit is set. This restriction is unnecessary since
>>>> packets without the don't-fragment bit will be forwarded as-is even
>>>> if they were merged together. The merged packets will be segmented
>>>> into their original forms before being forwarded, either by GSO or
>>>> by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
>>>> is set, in which case the IDs can become incrementing, which is also fine.
>>>>
>>>> Note that IP fragmentation is not an issue here, since packets are
>>>> segmented before being further fragmented. Fragmentation happens the
>>>> same way regardless of whether the packets were first merged together.
>>>
>>> I agree with Willem, that an explicit assertion somewhere (in
>>> ip_do_fragmentation?!?) could be useful.
>>>
>>
>> As I replied to Willem, I'll mention ip_finish_output_gso explicitly in the
>> commit message.
>>
>> Or did you mean I should add some type of WARN_ON assertion that ip_do_fragment isn't
>> called for GSO packets?
>>
>>> Also I'm not sure that "packets are segmented before being further
>>> fragmented" is always true for the OVS forwarding scenario.
>>>
>>
>> If this is really the case, it is a bug in OVS. Segmentation is required before
>> fragmentation as otherwise GRO isn't transparent and fragments will be forwarded
>> that contain data from multiple different packets. It's also probably less efficient,
>> if the segment size is smaller than the MTU. I think this should be addressed in a
>> separate patch series.
>>
>> I'll also mention OVS in the commit message.
>>
> 
> I looked into it, and it seems that you are correct. Looks like fragmentation
> can occur without segmentation in the OVS forwarding case. As I said, this is
> a bug since generated fragments may contain data from multiple packets. Still,
> this can already happen for packets with incrementing IDs and nothing special
> in particular will happen for the packets discussed in this patch. This should
> be fixed in a separate patch series, as do all other cases where ip_do_fragment
> is called directly without segmenting the packets first.

TL;DR: apparently there is a bug in OVS segmentation/fragmentation code:
OVS can do fragmentation of GSO packets without segmenting them
beforehands, please see the threads under:

https://lore.kernel.org/netdev/20250916144841.4884-5-richardbgobert@gmail.com/

for the whole discussion.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks
  2025-09-25 10:15         ` Paolo Abeni
@ 2025-09-29 11:09           ` Ilya Maximets
  2025-09-29 14:20             ` Willem de Bruijn
  0 siblings, 1 reply; 15+ messages in thread
From: Ilya Maximets @ 2025-09-29 11:09 UTC (permalink / raw)
  To: Paolo Abeni, Aaron Conole, Eelco Chaudron
  Cc: i.maximets, davem, edumazet, kuba, horms, corbet, saeedm, tariqt,
	mbloch, leon, dsahern, ncardwell, ecree.xilinx, Richard Gobert,
	kuniyu, shuah, sdf, aleksander.lobakin, florian.fainelli,
	alexander.duyck, linux-kernel, linux-net-drivers, netdev,
	willemdebruijn.kernel

On 9/25/25 12:15 PM, Paolo Abeni wrote:
> Adding the OVS maintainers for awareness..
> 
> On 9/22/25 10:19 AM, Richard Gobert wrote:
>> Richard Gobert wrote:
>>> Paolo Abeni wrote:
>>>> On 9/16/25 4:48 PM, Richard Gobert wrote:
>>>>> Currently, packets with fixed IDs will be merged only if their
>>>>> don't-fragment bit is set. This restriction is unnecessary since
>>>>> packets without the don't-fragment bit will be forwarded as-is even
>>>>> if they were merged together. The merged packets will be segmented
>>>>> into their original forms before being forwarded, either by GSO or
>>>>> by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
>>>>> is set, in which case the IDs can become incrementing, which is also fine.
>>>>>
>>>>> Note that IP fragmentation is not an issue here, since packets are
>>>>> segmented before being further fragmented. Fragmentation happens the
>>>>> same way regardless of whether the packets were first merged together.
>>>>
>>>> I agree with Willem, that an explicit assertion somewhere (in
>>>> ip_do_fragmentation?!?) could be useful.
>>>>
>>>
>>> As I replied to Willem, I'll mention ip_finish_output_gso explicitly in the
>>> commit message.
>>>
>>> Or did you mean I should add some type of WARN_ON assertion that ip_do_fragment isn't
>>> called for GSO packets?
>>>
>>>> Also I'm not sure that "packets are segmented before being further
>>>> fragmented" is always true for the OVS forwarding scenario.
>>>>
>>>
>>> If this is really the case, it is a bug in OVS. Segmentation is required before
>>> fragmentation as otherwise GRO isn't transparent and fragments will be forwarded
>>> that contain data from multiple different packets. It's also probably less efficient,
>>> if the segment size is smaller than the MTU. I think this should be addressed in a
>>> separate patch series.
>>>
>>> I'll also mention OVS in the commit message.
>>>
>>
>> I looked into it, and it seems that you are correct. Looks like fragmentation
>> can occur without segmentation in the OVS forwarding case. As I said, this is
>> a bug since generated fragments may contain data from multiple packets. Still,
>> this can already happen for packets with incrementing IDs and nothing special
>> in particular will happen for the packets discussed in this patch. This should
>> be fixed in a separate patch series, as do all other cases where ip_do_fragment
>> is called directly without segmenting the packets first.
> 
> TL;DR: apparently there is a bug in OVS segmentation/fragmentation code:
> OVS can do fragmentation of GSO packets without segmenting them
> beforehands, please see the threads under:
> 
> https://lore.kernel.org/netdev/20250916144841.4884-5-richardbgobert@gmail.com/
> 
> for the whole discussion.

Hmm.  Thanks for pointing that out.  It does seem like OVS will fragment
GSO packets without segmenting them first in case MRU of that packet is
larger than the MTU of the destination port.  In practice though, MRU of
a GSO packet should not exceed path MTU in a general case.  I suppose it
can still happen in some corner cases, e.g. if MTU suddenly changed, in
which case the packet should probably be dropped instead of re-fragmenting.

I also looked through other parts of the kernel and it seems like GSO
packets are not fragmented after being segmented in other places like
the br-netfilter code.  Which suggests that MRU supposed to be smaller
than MTU and so the fragmentation is not necessary, otherwise the packets
will be dropped.

Does that sound correct or am I missing some cases here?

Best regards, Ilya Maximets.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks
  2025-09-29 11:09           ` Ilya Maximets
@ 2025-09-29 14:20             ` Willem de Bruijn
  0 siblings, 0 replies; 15+ messages in thread
From: Willem de Bruijn @ 2025-09-29 14:20 UTC (permalink / raw)
  To: Ilya Maximets, Paolo Abeni, Aaron Conole, Eelco Chaudron
  Cc: i.maximets, davem, edumazet, kuba, horms, corbet, saeedm, tariqt,
	mbloch, leon, dsahern, ncardwell, ecree.xilinx, Richard Gobert,
	kuniyu, shuah, sdf, aleksander.lobakin, florian.fainelli,
	alexander.duyck, linux-kernel, linux-net-drivers, netdev,
	willemdebruijn.kernel

Ilya Maximets wrote:
> On 9/25/25 12:15 PM, Paolo Abeni wrote:
> > Adding the OVS maintainers for awareness..
> > 
> > On 9/22/25 10:19 AM, Richard Gobert wrote:
> >> Richard Gobert wrote:
> >>> Paolo Abeni wrote:
> >>>> On 9/16/25 4:48 PM, Richard Gobert wrote:
> >>>>> Currently, packets with fixed IDs will be merged only if their
> >>>>> don't-fragment bit is set. This restriction is unnecessary since
> >>>>> packets without the don't-fragment bit will be forwarded as-is even
> >>>>> if they were merged together. The merged packets will be segmented
> >>>>> into their original forms before being forwarded, either by GSO or
> >>>>> by TSO. The IDs will also remain identical unless NETIF_F_TSO_MANGLEID
> >>>>> is set, in which case the IDs can become incrementing, which is also fine.
> >>>>>
> >>>>> Note that IP fragmentation is not an issue here, since packets are
> >>>>> segmented before being further fragmented. Fragmentation happens the
> >>>>> same way regardless of whether the packets were first merged together.
> >>>>
> >>>> I agree with Willem, that an explicit assertion somewhere (in
> >>>> ip_do_fragmentation?!?) could be useful.
> >>>>
> >>>
> >>> As I replied to Willem, I'll mention ip_finish_output_gso explicitly in the
> >>> commit message.
> >>>
> >>> Or did you mean I should add some type of WARN_ON assertion that ip_do_fragment isn't
> >>> called for GSO packets?
> >>>
> >>>> Also I'm not sure that "packets are segmented before being further
> >>>> fragmented" is always true for the OVS forwarding scenario.
> >>>>
> >>>
> >>> If this is really the case, it is a bug in OVS. Segmentation is required before
> >>> fragmentation as otherwise GRO isn't transparent and fragments will be forwarded
> >>> that contain data from multiple different packets. It's also probably less efficient,
> >>> if the segment size is smaller than the MTU. I think this should be addressed in a
> >>> separate patch series.
> >>>
> >>> I'll also mention OVS in the commit message.
> >>>
> >>
> >> I looked into it, and it seems that you are correct. Looks like fragmentation
> >> can occur without segmentation in the OVS forwarding case. As I said, this is
> >> a bug since generated fragments may contain data from multiple packets. Still,
> >> this can already happen for packets with incrementing IDs and nothing special
> >> in particular will happen for the packets discussed in this patch. This should
> >> be fixed in a separate patch series, as do all other cases where ip_do_fragment
> >> is called directly without segmenting the packets first.
> > 
> > TL;DR: apparently there is a bug in OVS segmentation/fragmentation code:
> > OVS can do fragmentation of GSO packets without segmenting them
> > beforehands, please see the threads under:
> > 
> > https://lore.kernel.org/netdev/20250916144841.4884-5-richardbgobert@gmail.com/
> > 
> > for the whole discussion.
> 
> Hmm.  Thanks for pointing that out.  It does seem like OVS will fragment
> GSO packets without segmenting them first in case MRU of that packet is
> larger than the MTU of the destination port.  In practice though, MRU of
> a GSO packet should not exceed path MTU in a general case.  I suppose it
> can still happen in some corner cases, e.g. if MTU suddenly changed, in
> which case the packet should probably be dropped instead of re-fragmenting.
> 
> I also looked through other parts of the kernel and it seems like GSO
> packets are not fragmented after being segmented in other places like
> the br-netfilter code.  Which suggests that MRU supposed to be smaller
> than MTU and so the fragmentation is not necessary, otherwise the packets
> will be dropped.
> 
> Does that sound correct or am I missing some cases here?

One of the discussed cases is where a packet is transformed from
IPv4 to IPv6, e.g., with a BPF program. Similar would be tunnel encap.
Or just forwarding between devices with different MTU.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-09-29 14:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-16 14:48 [PATCH net-next v6 0/5] net: gso: restore outer ip ids correctly Richard Gobert
2025-09-16 14:48 ` [PATCH net-next v6 1/5] net: gro: remove is_ipv6 from napi_gro_cb Richard Gobert
2025-09-16 14:48 ` [PATCH net-next v6 2/5] net: gro: only merge packets with incrementing or fixed outer ids Richard Gobert
2025-09-22 19:28   ` Willem de Bruijn
2025-09-16 14:48 ` [PATCH net-next v6 3/5] net: gso: restore ids of outer ip headers correctly Richard Gobert
2025-09-22 19:35   ` Willem de Bruijn
2025-09-23  8:29     ` Richard Gobert
2025-09-16 14:48 ` [PATCH net-next v6 4/5] net: gro: remove unnecessary df checks Richard Gobert
2025-09-18  8:10   ` Paolo Abeni
2025-09-18 14:01     ` Richard Gobert
2025-09-22  8:19       ` Richard Gobert
2025-09-25 10:15         ` Paolo Abeni
2025-09-29 11:09           ` Ilya Maximets
2025-09-29 14:20             ` Willem de Bruijn
2025-09-16 14:48 ` [PATCH net-next v6 5/5] selftests/net: test ipip packets in gro.sh Richard Gobert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).