From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: xietangxin <xietangxin@yeah.net>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Paolo Abeni <pabeni@redhat.com>,
Jakub Ramaseuski <jramaseu@redhat.com>,
netdev@vger.kernel.org
Cc: kuba@kernel.org, horms@kernel.org, edumazet@google.com,
sdf@fomichev.me, ahmed.zaki@intel.com,
aleksander.lobakin@intel.com, benoit.monin@gmx.fr,
willemb@google.com, Tianhao Zhao <tizhao@redhat.com>,
Michal Schmidt <mschmidt@redhat.com>
Subject: Re: [PATCH net v3] net: gso: Forbid IPv6 TSO with extensions on devices with only IPV6_CSUM
Date: Fri, 20 Mar 2026 15:03:53 -0400 [thread overview]
Message-ID: <willemdebruijn.kernel.1f887430ab7b4@gmail.com> (raw)
In-Reply-To: <c0779757-8318-4ecb-93ee-284b325c11d2@yeah.net>
xietangxin wrote:
>
>
> On 3/15/2026 12:19 AM, Willem de Bruijn wrote:
> > Paolo Abeni wrote:
> >> On 3/6/26 7:32 AM, xietangxin wrote:
> >>> On 3/5/2026 11:21 PM, Paolo Abeni wrote:
> >>>> On 3/5/26 3:57 PM, Willem de Bruijn wrote:
> >>>>> xietangxin wrote:
> >>>>>> 在 2025/8/14 18:51, Jakub Ramaseuski 写道:
> >>>>>>> When performing Generic Segmentation Offload (GSO) on an IPv6 packet that
> >>>>>>> contains extension headers, the kernel incorrectly requests checksum offload
> >>>>>>> if the egress device only advertises NETIF_F_IPV6_CSUM feature, which has
> >>>>>>> a strict contract: it supports checksum offload only for plain TCP or UDP
> >>>>>>> over IPv6 and explicitly does not support packets with extension headers.
> >>>>>>> The current GSO logic violates this contract by failing to disable the feature
> >>>>>>> for packets with extension headers, such as those used in GREoIPv6 tunnels.
> >>>>>>>
> >>>>>>> This violation results in the device being asked to perform an operation
> >>>>>>> it cannot support, leading to a `skb_warn_bad_offload` warning and a collapse
> >>>>>>> of network throughput. While device TSO/USO is correctly bypassed in favor
> >>>>>>> of software GSO for these packets, the GSO stack must be explicitly told not
> >>>>>>> to request checksum offload.
> >>>>>>>
> >>>>>>> Mask NETIF_F_IPV6_CSUM, NETIF_F_TSO6 and NETIF_F_GSO_UDP_L4
> >>>>>>> in gso_features_check if the IPv6 header contains extension headers to compute
> >>>>>>> checksum in software.
> >>>>>>>
> >>>>>>> The exception is a BIG TCP extension, which, as stated in commit
> >>>>>>> 68e068cabd2c6c53 ("net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets"):
> >>>>>>> "The feature is only enabled on devices that support BIG TCP TSO.
> >>>>>>> The header is only present for PF_PACKET taps like tcpdump,
> >>>>>>> and not transmitted by physical devices."
> >>>>>>>
> >>>>>>> kernel log output (truncated):
> >>>>>>> WARNING: CPU: 1 PID: 5273 at net/core/dev.c:3535 skb_warn_bad_offload+0x81/0x140
> >>>>>>> ...
> >>>>>>> Call Trace:
> >>>>>>> <TASK>
> >>>>>>> skb_checksum_help+0x12a/0x1f0
> >>>>>>> validate_xmit_skb+0x1a3/0x2d0
> >>>>>>> validate_xmit_skb_list+0x4f/0x80
> >>>>>>> sch_direct_xmit+0x1a2/0x380
> >>>>>>> __dev_xmit_skb+0x242/0x670
> >>>>>>> __dev_queue_xmit+0x3fc/0x7f0
> >>>>>>> ip6_finish_output2+0x25e/0x5d0
> >>>>>>> ip6_finish_output+0x1fc/0x3f0
> >>>>>>> ip6_tnl_xmit+0x608/0xc00 [ip6_tunnel]
> >>>>>>> ip6gre_tunnel_xmit+0x1c0/0x390 [ip6_gre]
> >>>>>>> dev_hard_start_xmit+0x63/0x1c0
> >>>>>>> __dev_queue_xmit+0x6d0/0x7f0
> >>>>>>> ip6_finish_output2+0x214/0x5d0
> >>>>>>> ip6_finish_output+0x1fc/0x3f0
> >>>>>>> ip6_xmit+0x2ca/0x6f0
> >>>>>>> ip6_finish_output+0x1fc/0x3f0
> >>>>>>> ip6_xmit+0x2ca/0x6f0
> >>>>>>> inet6_csk_xmit+0xeb/0x150
> >>>>>>> __tcp_transmit_skb+0x555/0xa80
> >>>>>>> tcp_write_xmit+0x32a/0xe90
> >>>>>>> tcp_sendmsg_locked+0x437/0x1110
> >>>>>>> tcp_sendmsg+0x2f/0x50
> >>>>>>> ...
> >>>>>>> skb linear: 00000000: e4 3d 1a 7d ec 30 e4 3d 1a 7e 5d 90 86 dd 60 0e
> >>>>>>> skb linear: 00000010: 00 0a 1b 34 3c 40 20 11 00 00 00 00 00 00 00 00
> >>>>>>> skb linear: 00000020: 00 00 00 00 00 12 20 11 00 00 00 00 00 00 00 00
> >>>>>>> skb linear: 00000030: 00 00 00 00 00 11 2f 00 04 01 04 01 01 00 00 00
> >>>>>>> skb linear: 00000040: 86 dd 60 0e 00 0a 1b 00 06 40 20 23 00 00 00 00
> >>>>>>> skb linear: 00000050: 00 00 00 00 00 00 00 00 00 12 20 23 00 00 00 00
> >>>>>>> skb linear: 00000060: 00 00 00 00 00 00 00 00 00 11 bf 96 14 51 13 f9
> >>>>>>> skb linear: 00000070: ae 27 a0 a8 2b e3 80 18 00 40 5b 6f 00 00 01 01
> >>>>>>> skb linear: 00000080: 08 0a 42 d4 50 d5 4b 70 f8 1a
> >>>>>>>
> >>>>>>> Fixes: 04c20a9356f283da ("net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension")
> >>>>>>> Reported-by: Tianhao Zhao <tizhao@redhat.com>
> >>>>>>> Suggested-by: Michal Schmidt <mschmidt@redhat.com>
> >>>>>>> Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> >>>>>>> Signed-off-by: Jakub Ramaseuski <jramaseu@redhat.com>
> >>>>>>> ---
> >>>>>>> ---
> >>>>>>> net/core/dev.c | 12 ++++++++++++
> >>>>>>> 1 file changed, 12 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>>>>> index b28ce68830b2b..1d8a4d1da911e 100644
> >>>>>>> --- a/net/core/dev.c
> >>>>>>> +++ b/net/core/dev.c
> >>>>>>> @@ -3778,6 +3778,18 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
> >>>>>>> if (!(iph->frag_off & htons(IP_DF)))
> >>>>>>> features &= ~NETIF_F_TSO_MANGLEID;
> >>>>>>> }
> >>>>>>> +
> >>>>>>> + /* NETIF_F_IPV6_CSUM does not support IPv6 extension headers,
> >>>>>>> + * so neither does TSO that depends on it.
> >>>>>>> + */
> >>>>>>> + if (features & NETIF_F_IPV6_CSUM &&
> >>>>>>> + (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
> >>>>>>> + (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
> >>>>>>> + vlan_get_protocol(skb) == htons(ETH_P_IPV6))) &&
> >>>>>>> + skb_transport_header_was_set(skb) &&
> >>>>>>> + skb_network_header_len(skb) != sizeof(struct ipv6hdr) &&
> >>>>>>> + !ipv6_has_hopopt_jumbo(skb))
> >>>>>>> + features &= ~(NETIF_F_IPV6_CSUM | NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4);
> >>>>>>>
> >>>>>>> return features;
> >>>>>>> }
> >>>>>> question about this patch affecting tunneled IPv6-in-IPv4 packets
> >>>>>>
> >>>>>> In our environment with a hinic NIC, we use VXLAN tunnels where
> >>>>>> the outer header is IPv4 and the inner is IPv6. After this commit,
> >>>>>> large packets no longer use hardware TSO and fall back to software segmentation.
> >>>>>>
> >>>>>> In the VXLAN IPv6-in-IPv4 case, `skb_shinfo(skb)->gso_type` includes
> >>>>>> `SKB_GSO_TCPV6` (inner is IPv6 TCP), but the network header points to the outer
> >>>>>> IPv4 header. Thus `skb_network_header_len(skb)` returns the IPv4 header length
> >>>>>> (usually 20), which is not equal to `sizeof(struct ipv6hdr)` (40). This causes
> >>>>>> the condition to trigger and clears `NETIF_F_TSO6`, even though the inner IPv6
> >>>>>> packet has no extension headers and the device is capable of handling TSO for
> >>>>>> such packets.
> >>>>>>
> >>>>>> Is it the intended behavior to disable TSO for all tunneled IPv6-in-IPv4 packets
> >>>>>> when the NIC lacks NETIF_F_HW_CSUM, even if the inner IPv6 header has no extensions?
> >>>>>>
> >>>>>> Any feedback or guidance would be greatly appreciated.
> >>>>>
> >>>>> That is definitely unintended.
> >>>>>
> >>>>> Thanks for the clear analysis.
> >>>>>
> >>>>> I was about to write a refinement that might catch this case,
> >>>>> something like
> >>>>>
> >>>>> @@ -3819,8 +3819,10 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
> >>>>> (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
> >>>>> (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
> >>>>> vlan_get_protocol(skb) == htons(ETH_P_IPV6))) &&
> >>>>> - skb_transport_header_was_set(skb) &&
> >>>>> - skb_network_header_len(skb) != sizeof(struct ipv6hdr))
> >>>>> + ((!skb->encapsulation &&
> >>>>> + skb_transport_header_was_set(skb) &&
> >>>>> + skb_network_header_len(skb) != sizeof(struct ipv6hdr)) ||
> >>>>> + (skb_inner_network_header_len(skb) != sizeof(struct ipv6hdr))))
> >>>>> features &= ~(NETIF_F_IPV6_CSUM | NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4);
> >>>>>
> >>>>> But, how are these VXLAN IPv6-in-IPv4 packets having
> >>>>> vlan_get_protocol(skb) == htons(ETH_P_IPV6)?
> >>>>>
> >>>>> Shouldn't that be the protocol of the outer headr, so ETH_P_IP, and
> >>>>> thus this branch not reached at all? (Which itself would leave a false
> >>>>> positive as now an inner network header with extensions would not be
> >>>>> caught..)
> >>>>
> >>>> Also the tunnel could have ENCAP_TYPE_IPPROTO, and likely we need to
> >>>> disable csum even in that case? Possibly something alike the following
> >>>> could work?
> >>>>
> >>>> Side note, I *think* that replacing SKB_GSO_UDP_L4 with separate
> >>>> SKB_GSO_UDPV4_L4 SKB_GSO_UDPV6_L4 would remove a bit of complexity in
> >>>> serveral places, but I'm not sure how much invasive would be such a change.
> >>>>
> >>>> ---
> >>>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>>> index 4af4cf2d63a4..f9824dfef376 100644
> >>>> --- a/net/core/dev.c
> >>>> +++ b/net/core/dev.c
> >>>> @@ -3769,6 +3769,22 @@ static netdev_features_t
> >>>> dflt_features_check(struct sk_buff *skb,
> >>>> return vlan_features_check(skb, features);
> >>>> }
> >>>>
> >>>> +static bool skb_gso_has_extension_hdr(const struct sk_buff *skb)
> >>>> +{
> >>>> + if (!skb->encapsulation)
> >>>> + return ((skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
> >>>> + (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
> >>>> + vlan_get_protocol(skb) == htons(ETH_P_IPV6))) &&
> >>>> + skb_transport_header_was_set(skb) &&
> >>>> + skb_network_header_len(skb) != sizeof(struct ipv6hdr));
> >>>> +
> >>>> + return (skb->inner_protocol_type == ENCAP_TYPE_IPPROTO ||
> >>>> + ((skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
> >>>> + (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
> >>>> + inner_ip_hdr(skb)->version == 6)) &&
> >>>> + skb_inner_network_header_len(skb) != sizeof(struct ipv6hdr)));
> >>>> +}
> >>>> +
> >>>> static netdev_features_t gso_features_check(const struct sk_buff *skb,
> >>>> struct net_device *dev,
> >>>> netdev_features_t features)
> >>>> @@ -3815,12 +3831,7 @@ static netdev_features_t gso_features_check(const
> >>>> struct sk_buff *skb,
> >>>> /* NETIF_F_IPV6_CSUM does not support IPv6 extension headers,
> >>>> * so neither does TSO that depends on it.
> >>>> */
> >>>> - if (features & NETIF_F_IPV6_CSUM &&
> >>>> - (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6 ||
> >>>> - (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4 &&
> >>>> - vlan_get_protocol(skb) == htons(ETH_P_IPV6))) &&
> >>>> - skb_transport_header_was_set(skb) &&
> >>>> - skb_network_header_len(skb) != sizeof(struct ipv6hdr))
> >>>> + if (features & NETIF_F_IPV6_CSUM && skb_gso_has_extension_hdr(skb))
> >>>> features &= ~(NETIF_F_IPV6_CSUM | NETIF_F_TSO6 | NETIF_F_GSO_UDP_L4);
> >>>>
> >>>> return features;
> >>>>
> >>> Hi Paolo, Willem,
> >>>
> >>> Thank you both for the insightful analysis and the proposed fix.
> >>>
> >>> I have backported and tested Paolo's patch in our environment with hinic NIC.
> >>> We focused on the VXLAN (IPv6-in-IPv4) scenario and the Native IPv6 scenario :
> >>>
> >>> Scenario | IPv6 Ext-Headers | Result | Behavior
> >>> -----------------------|------------------|--------|---------------
> >>> VXLAN (IPv6-in-IPv4) | No | PASS | HW TSO enabled
> >>> VXLAN (IPv6-in-IPv4) | Yes | PASS | SW GSO fallback
> >>> Native IPv6 | No | PASS | HW TSO enabled
> >>> Native IPv6 | Yes | PASS | SW GSO fallback
> >>>
> >>> Thanks again for the help!
> >> Please, if you will and can, take it over to cook it in a formal patch.
> >
> > Otherwise I can.
> >
> > The check is also needed for tunnels that set ENCAP_TYPE_IPPROTO, such
> > as sit. That condition can be removed as far as I can tell?
> >
> > Only, I still do not see how this condition can have triggered, as
> > vlan_get_protocol(skb) should be htons(ETH_P_IP).
> >
> > I built a simple reproducer using vxlan over veth in virtme-ng, while
> > changing veth's NETIF_F_.._CSUM to reach this code. That indeed shows
> > correct ETH_P_IP.
> >
> > Tangxin, can you show a stack trace when this condition hits? For
> > instance by adding a WARN_ON_ONCE(1) inside that branch, or by using
> > bpftrace:
> >
> > sudo bpftrace -e 'kfunc:netif_skb_features { if (args->skb->encapsulation && args->skb->protocol == 0xDD86) { @[kstack] = count(); } }'
> Hi Willem,
>
> Sorry for the late reply.
>
> I have tested this on Linux 7.0-rc4
> (commit f338e77383789c0cae23ca3d48adcc5e9e137e3c).
>
> In my VXLAN (IPv6-in-IPv4) environment,
> vlan_get_protocol(skb) indeed returns htons(ETH_P_IP) as you expected.
> However, the condition is still triggered because skb_shinfo(skb)->gso_type
> contains SKB_GSO_TCPV6 (since the inner packet is IPv6 TCP).
>
> Below is the call trace captured via WARN_ON_ONCE when the condition is hit:
>
> WARNING: net/core/dev.c:3824 at gso_features_check+0xbc/0x158, CPU#10: python3/16193
> CPU: 10 UID: 0 PID: 16193 Comm: python3 Kdump: loaded Not tainted 7.0.0-rc4+ #12 PREEMPT
> Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 1.86 01/10/2022
> pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : gso_features_check (net/core/dev.c:3824 (discriminator 1))
> lr : netif_skb_features (net/core/dev.c:3837)
> ...
> Call trace:
> gso_features_check (net/core/dev.c:3824 (discriminator 1))
> netif_skb_features (net/core/dev.c:3837)
> validate_xmit_skb (net/core/dev.c:4012)
> validate_xmit_skb_list (net/core/dev.c:4075)
> sch_direct_xmit (net/sched/sch_generic.c:335)
> __dev_xmit_skb (net/core/dev.c:4255 (discriminator 4))
> __dev_queue_xmit (net/core/dev.c:4804)
> neigh_hh_output (./include/net/neighbour.h:541)
> ip_finish_output2 (./include/net/neighbour.h:554 net/ipv4/ip_output.c:237)
> __ip_finish_output (net/ipv4/ip_output.c:315 net/ipv4/ip_output.c:297)
> ip_finish_output (net/ipv4/ip_output.c:325)
> ip_output (./include/linux/netfilter.h:307 net/ipv4/ip_output.c:438)
> ip_local_out (net/ipv4/ip_output.c:134)
> iptunnel_xmit (net/ipv4/ip_tunnel_core.c:99 (discriminator 4))
> udp_tunnel_xmit_skb (net/ipv4/udp_tunnel_core.c:195) [udp_tunnel]
> vxlan_xmit_one (drivers/net/vxlan/vxlan_core.c:2544) [vxlan]
> vxlan_xmit (drivers/net/vxlan/vxlan_core.c:2832) [vxlan]
> dev_hard_start_xmit (net/core/dev.c:3889)
> __dev_queue_xmit (net/core/dev.c:4836)
> ... (TCP/IPv6 stack)
> tcp_sendmsg (net/ipv4/tcp.c:1465)
> inet6_sendmsg (net/ipv6/af_inet6.c:653 (discriminator 2))
>
> look forward to your patch.
Thanks for the extra info and testing. Sent:
https://lore.kernel.org/netdev/20260320190148.2409107-1-willemdebruijn.kernel@gmail.com/T/#u
prev parent reply other threads:[~2026-03-20 19:03 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-14 10:51 [PATCH net v3] net: gso: Forbid IPv6 TSO with extensions on devices with only IPV6_CSUM Jakub Ramaseuski
2025-08-14 13:11 ` Willem de Bruijn
2025-08-19 0:30 ` patchwork-bot+netdevbpf
2026-03-05 7:42 ` xietangxin
2026-03-05 14:57 ` Willem de Bruijn
2026-03-05 15:21 ` Paolo Abeni
2026-03-06 6:32 ` xietangxin
2026-03-06 8:29 ` Paolo Abeni
2026-03-14 16:19 ` Willem de Bruijn
2026-03-16 8:38 ` Paolo Abeni
2026-03-16 16:55 ` Willem de Bruijn
2026-03-20 9:38 ` xietangxin
2026-03-20 19:03 ` Willem de Bruijn [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=willemdebruijn.kernel.1f887430ab7b4@gmail.com \
--to=willemdebruijn.kernel@gmail.com \
--cc=ahmed.zaki@intel.com \
--cc=aleksander.lobakin@intel.com \
--cc=benoit.monin@gmx.fr \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jramaseu@redhat.com \
--cc=kuba@kernel.org \
--cc=mschmidt@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=tizhao@redhat.com \
--cc=willemb@google.com \
--cc=xietangxin@yeah.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.