Re: [PATCH net v3] net: gso: Forbid IPv6 TSO with extensions on devices with only IPV6_CSUM

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: xietangxin <xietangxin@yeah.net>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Jakub Ramaseuski <jramaseu@redhat.com>,
	netdev@vger.kernel.org
Cc: kuba@kernel.org, horms@kernel.org, edumazet@google.com,
	sdf@fomichev.me, ahmed.zaki@intel.com,
	aleksander.lobakin@intel.com, benoit.monin@gmx.fr,
	willemb@google.com, Tianhao Zhao <tizhao@redhat.com>,
	Michal Schmidt <mschmidt@redhat.com>
Subject: Re: [PATCH net v3] net: gso: Forbid IPv6 TSO with extensions on devices with only IPV6_CSUM
Date: Fri, 20 Mar 2026 17:38:43 +0800	[thread overview]
Message-ID: <c0779757-8318-4ecb-93ee-284b325c11d2@yeah.net> (raw)
In-Reply-To: <willemdebruijn.kernel.2e5ca236c023e@gmail.com>



On 3/15/2026 12:19 AM, Willem de Bruijn wrote:
> Paolo Abeni wrote:
>> On 3/6/26 7:32 AM, xietangxin wrote:
>>> On 3/5/2026 11:21 PM, Paolo Abeni wrote:
>>>> On 3/5/26 3:57 PM, Willem de Bruijn wrote:
>>>>> xietangxin wrote:
>>>>>> 在 2025/8/14 18:51, Jakub Ramaseuski 写道:
>>>>>>> When performing Generic Segmentation Offload (GSO) on an IPv6 packet that
>>>>>>> contains extension headers, the kernel incorrectly requests checksum offload
>>>>>>> if the egress device only advertises NETIF_F_IPV6_CSUM feature, which has 
>>>>>>> a strict contract: it supports checksum offload only for plain TCP or UDP 
>>>>>>> over IPv6 and explicitly does not support packets with extension headers.
>>>>>>> The current GSO logic violates this contract by failing to disable the feature
>>>>>>> for packets with extension headers, such as those used in GREoIPv6 tunnels.
>>>>>>>
>>>>>>> This violation results in the device being asked to perform an operation
>>>>>>> it cannot support, leading to a `skb_warn_bad_offload` warning and a collapse
>>>>>>> of network throughput. While device TSO/USO is correctly bypassed in favor
>>>>>>> of software GSO for these packets, the GSO stack must be explicitly told not 
>>>>>>> to request checksum offload.
>>>>>>>
>>>>>>> Mask NETIF_F_IPV6_CSUM, NETIF_F_TSO6 and NETIF_F_GSO_UDP_L4
>>>>>>> in gso_features_check if the IPv6 header contains extension headers to compute
>>>>>>> checksum in software.
>>>>>>>
>>>>>>> The exception is a BIG TCP extension, which, as stated in commit
>>>>>>> 68e068cabd2c6c53 ("net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets"):
>>>>>>> "The feature is only enabled on devices that support BIG TCP TSO.
>>>>>>> The header is only present for PF_PACKET taps like tcpdump,
>>>>>>> and not transmitted by physical devices."
>>>>>>>
>>>>>>> kernel log output (truncated):
>>>>>>> WARNING: CPU: 1 PID: 5273 at net/core/dev.c:3535 skb_warn_bad_offload+0x81/0x140
>>>>>>> ...
>>>>>>> Call Trace:
>>>>>>>  <TASK>
>>>>>>>  skb_checksum_help+0x12a/0x1f0
>>>>>>>  validate_xmit_skb+0x1a3/0x2d0
>>>>>>>  validate_xmit_skb_list+0x4f/0x80
>>>>>>>  sch_direct_xmit+0x1a2/0x380
>>>>>>>  __dev_xmit_skb+0x242/0x670
>>>>>>>  __dev_queue_xmit+0x3fc/0x7f0
>>>>>>>  ip6_finish_output2+0x25e/0x5d0
>>>>>>>  ip6_finish_output+0x1fc/0x3f0
>>>>>>>  ip6_tnl_xmit+0x608/0xc00 [ip6_tunnel]
>>>>>>>  ip6gre_tunnel_xmit+0x1c0/0x390 [ip6_gre]
>>>>>>>  dev_hard_start_xmit+0x63/0x1c0
>>>>>>>  __dev_queue_xmit+0x6d0/0x7f0
>>>>>>>  ip6_finish_output2+0x214/0x5d0
>>>>>>>  ip6_finish_output+0x1fc/0x3f0
>>>>>>>  ip6_xmit+0x2ca/0x6f0
>>>>>>>  ip6_finish_output+0x1fc/0x3f0
>>>>>>>  ip6_xmit+0x2ca/0x6f0
>>>>>>>  inet6_csk_xmit+0xeb/0x150
>>>>>>>  __tcp_transmit_skb+0x555/0xa80
>>>>>>>  tcp_write_xmit+0x32a/0xe90
>>>>>>>  tcp_sendmsg_locked+0x437/0x1110
>>>>>>>  tcp_sendmsg+0x2f/0x50
>>>>>>> ...
>>>>>>> skb linear:   00000000: e4 3d 1a 7d ec 30 e4 3d 1a 7e 5d 90 86 dd 60 0e
>>>>>>> skb linear:   00000010: 00 0a 1b 34 3c 40 20 11 00 00 00 00 00 00 00 00
>>>>>>> skb linear:   00000020: 00 00 00 00 00 12 20 11 00 00 00 00 00 00 00 00
>>>>>>> skb linear:   00000030: 00 00 00 00 00 11 2f 00 04 01 04 01 01 00 00 00
>>>>>>> skb linear:   00000040: 86 dd 60 0e 00 0a 1b 00 06 40 20 23 00 00 00 00
>>>>>>> skb linear:   00000050: 00 00 00 00 00 00 00 00 00 12 20 23 00 00 00 00
>>>>>>> skb linear:   00000060: 00 00 00 00 00 00 00 00 00 11 bf 96 14 51 13 f9
>>>>>>> skb linear:   00000070: ae 27 a0 a8 2b e3 80 18 00 40 5b 6f 00 00 01 01
>>>>>>> skb linear:   00000080: 08 0a 42 d4 50 d5 4b 70 f8 1a
>>>>>>>
>>>>>>> Fixes: 04c20a9356f283da ("net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension")
>>>>>>> Reported-by: Tianhao Zhao <tizhao@redhat.com>
>>>>>>> Suggested-by: Michal Schmidt <mschmidt@redhat.com>
>>>>>>> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
>>>>>>> Signed-off-by: Jakub Ramaseuski <jramaseu@redhat.com>
>>>>>>> ---
>>>>>>> ---
>>>>>>>  net/core/dev.c | 12 ++++++++++++
>>>>>>>  1 file changed, 12 insertions(+)
>>>>>>>
>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>> index b28ce68830b2b..1d8a4d1da911e 100644
>>>>>>> --- a/net/core/dev.c
>>>>>>> +++ b/net/core/dev.c
>>>>>>> @@ -3778,6 +3778,18 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>>>>>>>  		if (!(iph->frag_off & htons(IP_DF)))
>>>>>>>  			features &= ~NETIF_F_TSO_MANGLEID;
>>>>>>>  	}
>>>>>>> +
>>>>>>> +	/* NETIF_F_IPV6_CSUM does not support IPv6 extension headers,
>>>>>>> +	 * so neither does TSO that depends on it.
>>>>>>> +	 */
>>>>>>> +	if (features & NETIF_F_IPV6_CSUM &&
>>>>>>> +	    (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
>>>>>>> +	     (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
>>>>>>> +	      vlan_get_protocol(skb) == htons(ETH_P_IPV6))) &&
>>>>>>> +	    skb_transport_header_was_set(skb) &&
>>>>>>> +	    skb_network_header_len(skb) != sizeof(struct ipv6hdr) &&
>>>>>>> +	    !ipv6_has_hopopt_jumbo(skb))
>>>>>>> +		features &= ~(NETIF_F_IPV6_CSUM | NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4);
>>>>>>>  
>>>>>>>  	return features;
>>>>>>>  }
>>>>>> question about this patch affecting tunneled IPv6-in-IPv4 packets
>>>>>>
>>>>>> In our environment with a hinic NIC, we use VXLAN tunnels where
>>>>>> the outer header is IPv4 and the inner is IPv6. After this commit,
>>>>>> large packets no longer use hardware TSO and fall back to software segmentation.
>>>>>>
>>>>>> In the VXLAN IPv6-in-IPv4 case, `skb_shinfo(skb)->gso_type` includes
>>>>>> `SKB_GSO_TCPV6` (inner is IPv6 TCP), but the network header points to the outer
>>>>>> IPv4 header. Thus `skb_network_header_len(skb)` returns the IPv4 header length
>>>>>> (usually 20), which is not equal to `sizeof(struct ipv6hdr)` (40). This causes
>>>>>> the condition to trigger and clears `NETIF_F_TSO6`, even though the inner IPv6
>>>>>> packet has no extension headers and the device is capable of handling TSO for
>>>>>> such packets.
>>>>>>
>>>>>> Is it the intended behavior to disable TSO for all tunneled IPv6-in-IPv4 packets
>>>>>> when the NIC lacks NETIF_F_HW_CSUM, even if the inner IPv6 header has no extensions?
>>>>>>
>>>>>> Any feedback or guidance would be greatly appreciated.
>>>>>
>>>>> That is definitely unintended.
>>>>>
>>>>> Thanks for the clear analysis.
>>>>>
>>>>> I was about to write a refinement that might catch this case,
>>>>> something like
>>>>>
>>>>> @@ -3819,8 +3819,10 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>>>>>             (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
>>>>>              (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
>>>>>               vlan_get_protocol(skb) == htons(ETH_P_IPV6))) &&
>>>>> -           skb_transport_header_was_set(skb) &&
>>>>> -           skb_network_header_len(skb) != sizeof(struct ipv6hdr))
>>>>> +             ((!skb->encapsulation &&
>>>>> +               skb_transport_header_was_set(skb) &&
>>>>> +               skb_network_header_len(skb) != sizeof(struct ipv6hdr)) ||
>>>>> +              (skb_inner_network_header_len(skb) != sizeof(struct ipv6hdr))))
>>>>>                 features &= ~(NETIF_F_IPV6_CSUM | NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4);
>>>>>
>>>>> But, how are these VXLAN IPv6-in-IPv4 packets having
>>>>> vlan_get_protocol(skb) == htons(ETH_P_IPV6)?
>>>>>
>>>>> Shouldn't that be the protocol of the outer headr, so ETH_P_IP, and
>>>>> thus this branch not reached at all? (Which itself would leave a false
>>>>> positive as now an inner network header with extensions would not be
>>>>> caught..)
>>>>
>>>> Also the tunnel could have ENCAP_TYPE_IPPROTO, and likely we need to
>>>> disable csum even in that case? Possibly something alike the following
>>>> could work?
>>>>
>>>> Side note, I *think* that replacing SKB_GSO_UDP_L4 with separate
>>>> SKB_GSO_UDPV4_L4 SKB_GSO_UDPV6_L4 would remove a bit of complexity in
>>>> serveral places, but I'm not sure how much invasive would be such a change.
>>>>
>>>> ---
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 4af4cf2d63a4..f9824dfef376 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -3769,6 +3769,22 @@ static netdev_features_t
>>>> dflt_features_check(struct sk_buff *skb,
>>>>  	return vlan_features_check(skb, features);
>>>>  }
>>>>
>>>> +static bool skb_gso_has_extension_hdr(const struct sk_buff *skb)
>>>> +{
>>>> +	if (!skb->encapsulation)
>>>> +		return ((skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
>>>> +			(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
>>>> +			 vlan_get_protocol(skb) == htons(ETH_P_IPV6))) &&
>>>> +			skb_transport_header_was_set(skb) &&
>>>> +			skb_network_header_len(skb) != sizeof(struct ipv6hdr));
>>>> +
>>>> +	return (skb->inner_protocol_type == ENCAP_TYPE_IPPROTO ||
>>>> +		((skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
>>>> +		  (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
>>>> +		   inner_ip_hdr(skb)->version == 6)) &&
>>>> +		 skb_inner_network_header_len(skb) != sizeof(struct ipv6hdr)));
>>>> +}
>>>> +
>>>>  static netdev_features_t gso_features_check(const struct sk_buff *skb,
>>>>  					    struct net_device *dev,
>>>>  					    netdev_features_t features)
>>>> @@ -3815,12 +3831,7 @@ static netdev_features_t gso_features_check(const
>>>> struct sk_buff *skb,
>>>>  	/* NETIF_F_IPV6_CSUM does not support IPv6 extension headers,
>>>>  	 * so neither does TSO that depends on it.
>>>>  	 */
>>>> -	if (features & NETIF_F_IPV6_CSUM &&
>>>> -	    (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
>>>> -	     (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
>>>> -	      vlan_get_protocol(skb) == htons(ETH_P_IPV6))) &&
>>>> -	    skb_transport_header_was_set(skb) &&
>>>> -	    skb_network_header_len(skb) != sizeof(struct ipv6hdr))
>>>> +	if (features & NETIF_F_IPV6_CSUM && skb_gso_has_extension_hdr(skb))
>>>>  		features &= ~(NETIF_F_IPV6_CSUM | NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4);
>>>>
>>>>  	return features;
>>>>
>>> Hi Paolo, Willem,
>>>
>>> Thank you both for the insightful analysis and the proposed fix.
>>>
>>> I have backported and tested Paolo's patch in our environment with hinic NIC.
>>> We focused on the VXLAN (IPv6-in-IPv4) scenario and the Native IPv6 scenario :
>>>
>>> Scenario               | IPv6 Ext-Headers | Result | Behavior
>>> -----------------------|------------------|--------|---------------
>>> VXLAN (IPv6-in-IPv4)   | No               | PASS   | HW TSO enabled
>>> VXLAN (IPv6-in-IPv4)   | Yes              | PASS   | SW GSO fallback
>>> Native IPv6            | No               | PASS   | HW TSO enabled
>>> Native IPv6            | Yes              | PASS   | SW GSO fallback
>>>
>>> Thanks again for the help!
>> Please, if you will and can, take it over to cook it in a formal patch.
> 
> Otherwise I can.
> 
> The check is also needed for tunnels that set ENCAP_TYPE_IPPROTO, such
> as sit. That condition can be removed as far as I can tell?
> 
> Only, I still do not see how this condition can have triggered, as
> vlan_get_protocol(skb) should be htons(ETH_P_IP).
> 
> I built a simple reproducer using vxlan over veth in virtme-ng, while
> changing veth's NETIF_F_.._CSUM to reach this code. That indeed shows
> correct ETH_P_IP.
> 
> Tangxin, can you show a stack trace when this condition hits? For
> instance by adding a WARN_ON_ONCE(1) inside that branch, or by using
> bpftrace:
> 
> sudo bpftrace -e 'kfunc:netif_skb_features { if (args->skb->encapsulation && args->skb->protocol == 0xDD86) { @[kstack] = count(); } }'
Hi Willem,

Sorry for the late reply.

I have tested this on Linux 7.0-rc4
(commit f338e77383789c0cae23ca3d48adcc5e9e137e3c).

In my VXLAN (IPv6-in-IPv4) environment,
vlan_get_protocol(skb) indeed returns htons(ETH_P_IP) as you expected.
However, the condition is still triggered because skb_shinfo(skb)->gso_type
contains SKB_GSO_TCPV6 (since the inner packet is IPv6 TCP).

Below is the call trace captured via WARN_ON_ONCE when the condition is hit:

WARNING: net/core/dev.c:3824 at gso_features_check+0xbc/0x158, CPU#10: python3/16193
CPU: 10 UID: 0 PID: 16193 Comm: python3 Kdump: loaded Not tainted 7.0.0-rc4+ #12 PREEMPT
Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 1.86 01/10/2022
pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : gso_features_check (net/core/dev.c:3824 (discriminator 1))
lr : netif_skb_features (net/core/dev.c:3837)
...
Call trace:
 gso_features_check (net/core/dev.c:3824 (discriminator 1))
 netif_skb_features (net/core/dev.c:3837)
 validate_xmit_skb (net/core/dev.c:4012)
 validate_xmit_skb_list (net/core/dev.c:4075)
 sch_direct_xmit (net/sched/sch_generic.c:335)
 __dev_xmit_skb (net/core/dev.c:4255 (discriminator 4))
 __dev_queue_xmit (net/core/dev.c:4804)
 neigh_hh_output (./include/net/neighbour.h:541)
 ip_finish_output2 (./include/net/neighbour.h:554 net/ipv4/ip_output.c:237)
 __ip_finish_output (net/ipv4/ip_output.c:315 net/ipv4/ip_output.c:297)
 ip_finish_output (net/ipv4/ip_output.c:325)
 ip_output (./include/linux/netfilter.h:307 net/ipv4/ip_output.c:438)
 ip_local_out (net/ipv4/ip_output.c:134)
 iptunnel_xmit (net/ipv4/ip_tunnel_core.c:99 (discriminator 4))
 udp_tunnel_xmit_skb (net/ipv4/udp_tunnel_core.c:195) [udp_tunnel]
 vxlan_xmit_one (drivers/net/vxlan/vxlan_core.c:2544) [vxlan]
 vxlan_xmit (drivers/net/vxlan/vxlan_core.c:2832) [vxlan]
 dev_hard_start_xmit (net/core/dev.c:3889)
 __dev_queue_xmit (net/core/dev.c:4836)
 ... (TCP/IPv6 stack)
 tcp_sendmsg (net/ipv4/tcp.c:1465)
 inet6_sendmsg (net/ipv6/af_inet6.c:653 (discriminator 2))

look forward to your patch.

Best regards,
Tangxin Xie

next prev parent reply	other threads:[~2026-03-20  9:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-14 10:51 [PATCH net v3] net: gso: Forbid IPv6 TSO with extensions on devices with only IPV6_CSUM Jakub Ramaseuski
2025-08-14 13:11 ` Willem de Bruijn
2025-08-19  0:30 ` patchwork-bot+netdevbpf
2026-03-05  7:42 ` xietangxin
2026-03-05 14:57   ` Willem de Bruijn
2026-03-05 15:21     ` Paolo Abeni
2026-03-06  6:32       ` xietangxin
2026-03-06  8:29         ` Paolo Abeni
2026-03-14 16:19           ` Willem de Bruijn
2026-03-16  8:38             ` Paolo Abeni
2026-03-16 16:55               ` Willem de Bruijn
2026-03-20  9:38             ` xietangxin [this message]
2026-03-20 19:03               ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c0779757-8318-4ecb-93ee-284b325c11d2@yeah.net \
    --to=xietangxin@yeah.net \
    --cc=ahmed.zaki@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=benoit.monin@gmx.fr \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jramaseu@redhat.com \
    --cc=kuba@kernel.org \
    --cc=mschmidt@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    --cc=tizhao@redhat.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox