netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Richard Gobert <richardbgobert@gmail.com>,
	 netdev@vger.kernel.org,  pabeni@redhat.com,
	 ecree.xilinx@gmail.com,  willemdebruijn.kernel@gmail.com
Cc: davem@davemloft.net,  edumazet@google.com,  kuba@kernel.org,
	 horms@kernel.org,  corbet@lwn.net,  saeedm@nvidia.com,
	 tariqt@nvidia.com,  mbloch@nvidia.com,  leon@kernel.org,
	 dsahern@kernel.org,  ncardwell@google.com,  kuniyu@google.com,
	 shuah@kernel.org,  sdf@fomichev.me,
	 aleksander.lobakin@intel.com,  florian.fainelli@broadcom.com,
	 alexander.duyck@gmail.com,  linux-kernel@vger.kernel.org,
	 linux-net-drivers@amd.com,
	 Richard Gobert <richardbgobert@gmail.com>
Subject: Re: [PATCH net-next v5 3/5] net: gso: restore ids of outer ip headers correctly
Date: Mon, 15 Sep 2025 20:26:10 -0400	[thread overview]
Message-ID: <willemdebruijn.kernel.1b773a265e8dc@gmail.com> (raw)
In-Reply-To: <20250915113933.3293-4-richardbgobert@gmail.com>

Richard Gobert wrote:
> Currently, NETIF_F_TSO_MANGLEID indicates that the inner-most ID can
> be mangled. Outer IDs can always be mangled.
> 
> Make GSO preserve outer IDs by default, with NETIF_F_TSO_MANGLEID allowing
> both inner and outer IDs to be mangled.
> 
> This commit also modifies a few drivers that use SKB_GSO_FIXEDID directly.
> 
> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
> Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc
> ---
>  .../networking/segmentation-offloads.rst      | 22 ++++++++++++-------
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  8 +++++--
>  drivers/net/ethernet/sfc/ef100_tx.c           | 17 ++++++++++----
>  include/linux/netdevice.h                     |  9 ++++++--
>  include/linux/skbuff.h                        |  9 +++++++-
>  net/core/dev.c                                |  5 ++++-
>  net/ipv4/af_inet.c                            | 13 +++++------
>  net/ipv4/tcp_offload.c                        |  5 +----
>  8 files changed, 59 insertions(+), 29 deletions(-)
> 
> diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst
> index 085e8fab03fd..72f69b22b28c 100644
> --- a/Documentation/networking/segmentation-offloads.rst
> +++ b/Documentation/networking/segmentation-offloads.rst
> @@ -43,10 +43,19 @@ also point to the TCP header of the packet.
>  For IPv4 segmentation we support one of two types in terms of the IP ID.
>  The default behavior is to increment the IP ID with every segment.  If the
>  GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
> -ID and all segments will use the same IP ID.  If a device has
> -NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
> -and we will either increment the IP ID for all frames, or leave it at a
> -static value based on driver preference.
> +ID and all segments will use the same IP ID.
> +
> +For encapsulated packets, SKB_GSO_TCP_FIXEDID refers only to the outer header.
> +SKB_GSO_TCP_FIXEDID_INNER can be used to specify the same for the inner header.
> +Any combination of these two GSO types is allowed.
> +
> +If a device has NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when
> +performing TSO and we will either increment the IP ID for all frames, or leave
> +it at a static value based on driver preference.  For encapsulated packets,
> +NETIF_F_TSO_MANGLEID is relevant for both outer and inner headers, unless the
> +DF bit is not set on the outer header, in which case the device driver must
> +guarantee that the IP ID field is incremented in the outer header with every
> +segment.

Is this introducing a new device requirement for advertising
NETIF_F_TSO_MANGLEID that existing devices may not meet?
  
>  
>  UDP Fragmentation Offload
> @@ -124,10 +133,7 @@ Generic Receive Offload
>  Generic receive offload is the complement to GSO.  Ideally any frame
>  assembled by GRO should be segmented to create an identical sequence of
>  frames using GSO, and any sequence of frames segmented by GSO should be
> -able to be reassembled back to the original by GRO.  The only exception to
> -this is IPv4 ID in the case that the DF bit is set for a given IP header.
> -If the value of the IPv4 ID is not sequentially incrementing it will be
> -altered so that it is when a frame assembled via GRO is segmented via GSO.
> +able to be reassembled back to the original by GRO.
>  
>  
>  Partial Generic Segmentation Offload
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index b8c609d91d11..505c4ce7cef8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -1289,8 +1289,12 @@ static void mlx5e_shampo_update_ipv4_tcp_hdr(struct mlx5e_rq *rq, struct iphdr *
>  	tcp->check = ~tcp_v4_check(skb->len - tcp_off, ipv4->saddr,
>  				   ipv4->daddr, 0);
>  	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4;
> -	if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id)
> -		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
> +	if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id) {
> +		bool encap = rq->hw_gro_data->fk.control.flags & FLOW_DIS_ENCAPSULATION;
> +
> +		skb_shinfo(skb)->gso_type |= encap ?
> +					     SKB_GSO_TCP_FIXEDID_INNER : SKB_GSO_TCP_FIXEDID;

I think more common style is

    encap ? SKB_GSO_TCP_FIXEDID_INNER :
            SKB_GSO_TCP_FIXEDID;

(as used in ef100_make_tso_desc below)

> +	}
>  
>  	skb->csum_start = (unsigned char *)tcp - skb->head;
>  	skb->csum_offset = offsetof(struct tcphdr, check);
> diff --git a/drivers/net/ethernet/sfc/ef100_tx.c b/drivers/net/ethernet/sfc/ef100_tx.c
> index e6b6be549581..03005757c060 100644
> --- a/drivers/net/ethernet/sfc/ef100_tx.c
> +++ b/drivers/net/ethernet/sfc/ef100_tx.c
> @@ -189,6 +189,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>  {
>  	bool gso_partial = skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL;
>  	unsigned int len, ip_offset, tcp_offset, payload_segs;
> +	u32 mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
>  	u32 mangleid = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
>  	unsigned int outer_ip_offset, outer_l4_offset;
>  	u16 vlan_tci = skb_vlan_tag_get(skb);
> @@ -200,8 +201,17 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>  	bool outer_csum;
>  	u32 paylen;
>  
> -	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
> -		mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +	if (encap) {
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID_INNER)
> +			mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
> +			mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +	} else {
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
> +			mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +		mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
> +	}
> +
>  	if (efx->net_dev->features & NETIF_F_HW_VLAN_CTAG_TX)
>  		vlan_enable = skb_vlan_tag_present(skb);
>  
> @@ -245,8 +255,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>  			      ESF_GZ_TX_TSO_OUTER_L4_OFF_W, outer_l4_offset >> 1,
>  			      ESF_GZ_TX_TSO_ED_OUTER_UDP_LEN, udp_encap && !gso_partial,
>  			      ESF_GZ_TX_TSO_ED_OUTER_IP_LEN, encap && !gso_partial,
> -			      ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, encap ? mangleid :
> -								     ESE_GZ_TX_DESC_IP4_ID_NO_OP,
> +			      ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, mangleid_outer,
>  			      ESF_GZ_TX_TSO_VLAN_INSERT_EN, vlan_enable,
>  			      ESF_GZ_TX_TSO_VLAN_INSERT_TCI, vlan_tci
>  		);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index f3a3b761abfb..3d19c888b839 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -5290,13 +5290,18 @@ void skb_warn_bad_offload(const struct sk_buff *skb);
>  
>  static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>  {
> -	netdev_features_t feature = (netdev_features_t)gso_type << NETIF_F_GSO_SHIFT;
> +	netdev_features_t feature;
> +
> +	if (gso_type & (SKB_GSO_TCP_FIXEDID | SKB_GSO_TCP_FIXEDID_INNER))
> +		gso_type |= __SKB_GSO_TCP_FIXEDID;
> +
> +	feature = ((netdev_features_t)gso_type << NETIF_F_GSO_SHIFT) & NETIF_F_GSO_MASK;
>  
>  	/* check flags correspondence */
>  	BUILD_BUG_ON(SKB_GSO_TCPV4   != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_DODGY   != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
> -	BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
> +	BUILD_BUG_ON(__SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_TCPV6   != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_FCOE    != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_GRE     != (NETIF_F_GSO_GRE >> NETIF_F_GSO_SHIFT));
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index ca8be45dd8be..937acb1869a1 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -674,7 +674,7 @@ enum {
>  	/* This indicates the tcp segment has CWR set. */
>  	SKB_GSO_TCP_ECN = 1 << 2,
>  
> -	SKB_GSO_TCP_FIXEDID = 1 << 3,
> +	__SKB_GSO_TCP_FIXEDID = 1 << 3,
>  
>  	SKB_GSO_TCPV6 = 1 << 4,
>  
> @@ -707,6 +707,13 @@ enum {
>  	SKB_GSO_FRAGLIST = 1 << 18,
>  
>  	SKB_GSO_TCP_ACCECN = 1 << 19,
> +
> +	/* These indirectly map onto the same netdev feature.
> +	 * If NETIF_F_TSO_MANGLEID is set it may mangle both inner and outer
> +	 * IDs.

prefer to keep IDs. on the previous line even if over 80 chars.

> +	 */
> +	SKB_GSO_TCP_FIXEDID = 1 << 30,
> +	SKB_GSO_TCP_FIXEDID_INNER = 1 << 31,
>  };
>  
>  #if BITS_PER_LONG > 32
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 93a25d87b86b..17cb399cdc2a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3769,7 +3769,10 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>  		features &= ~dev->gso_partial_features;
>  
>  	/* Make sure to clear the IPv4 ID mangling feature if the
> -	 * IPv4 header has the potential to be fragmented.
> +	 * IPv4 header has the potential to be fragmented. For
> +	 * encapsulated packets, the outer headers are guaranteed to
> +	 * have incrementing IDs if DF is not set so there is no need
> +	 * to clear the IPv4 ID mangling feature.

Why is this true. Or, why is it not also true for non-encapsulated or
inner headers?

The same preconditions (incl on DF) are now tested in inet_gro_flush

>  	 */
>  	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
>  		struct iphdr *iph = skb->encapsulation ?
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 76e38092cd8a..fc7a6955fa0a 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1393,14 +1393,13 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>  
>  	segs = ERR_PTR(-EPROTONOSUPPORT);
>  
> -	if (!skb->encapsulation || encap) {
> -		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
> -		fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID);
> +	/* fixed ID is invalid if DF bit is not set */
> +	fixedid = !!(skb_shinfo(skb)->gso_type & (SKB_GSO_TCP_FIXEDID << encap));
> +	if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
> +		goto out;
>  
> -		/* fixed ID is invalid if DF bit is not set */
> -		if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
> -			goto out;
> -	}
> +	if (!skb->encapsulation || encap)
> +		udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
>  
>  	ops = rcu_dereference(inet_offloads[proto]);
>  	if (likely(ops && ops->callbacks.gso_segment)) {
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 1949eede9ec9..e6612bd84d09 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -471,7 +471,6 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>  	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
>  	const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
>  	struct tcphdr *th = tcp_hdr(skb);
> -	bool is_fixedid;
>  
>  	if (unlikely(NAPI_GRO_CB(skb)->is_flist)) {
>  		skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV4;
> @@ -485,10 +484,8 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>  	th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr,
>  				  iph->daddr, 0);
>  
> -	is_fixedid = (NAPI_GRO_CB(skb)->ip_fixedid >> skb->encapsulation) & 1;
> -
>  	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4 |
> -			(is_fixedid * SKB_GSO_TCP_FIXEDID);
> +			(NAPI_GRO_CB(skb)->ip_fixedid * SKB_GSO_TCP_FIXEDID);

ip_fixedid can now be 0, 1 (outer), 2 (inner) or 3 (inner and outer).

Not all generate a valid SKB_GSO type.
>  
>  	tcp_gro_complete(skb);
>  	return 0;
> -- 
> 2.36.1
> 



  reply	other threads:[~2025-09-16  0:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-15 11:39 [PATCH net-next v5 0/5] net: gso: restore outer ip ids correctly Richard Gobert
2025-09-15 11:39 ` [PATCH net-next v5 1/5] net: gro: remove is_ipv6 from napi_gro_cb Richard Gobert
2025-09-16  0:01   ` Willem de Bruijn
2025-09-15 11:39 ` [PATCH net-next v5 2/5] net: gro: only merge packets with incrementing or fixed outer ids Richard Gobert
2025-09-16  0:05   ` Willem de Bruijn
2025-09-15 11:39 ` [PATCH net-next v5 3/5] net: gso: restore ids of outer ip headers correctly Richard Gobert
2025-09-16  0:26   ` Willem de Bruijn [this message]
2025-09-16 13:46     ` Richard Gobert
2025-09-16 15:52       ` Willem de Bruijn
2025-09-18 11:56         ` Richard Gobert
2025-09-15 11:39 ` [PATCH net-next v5 4/5] net: gro: remove unnecessary df checks Richard Gobert
2025-09-16  0:32   ` Willem de Bruijn
2025-09-16 13:53     ` Richard Gobert
2025-09-16 15:57       ` Willem de Bruijn
2025-09-18 11:59         ` Richard Gobert
2025-09-15 11:39 ` [PATCH net-next v5 5/5] selftests/net: test ipip packets in gro.sh Richard Gobert
2025-09-16  0:34   ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=willemdebruijn.kernel.1b773a265e8dc@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=alexander.duyck@gmail.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=ecree.xilinx@gmail.com \
    --cc=edumazet@google.com \
    --cc=florian.fainelli@broadcom.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-net-drivers@amd.com \
    --cc=mbloch@nvidia.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardbgobert@gmail.com \
    --cc=saeedm@nvidia.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).