From: Cosmin Ratiu <cratiu@nvidia.com>
To: <netdev@vger.kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>,
Herbert Xu <herbert@gondor.apana.org.au>,
"David S . Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Shuah Khan <shuah@kernel.org>, Cosmin Ratiu <cratiu@nvidia.com>,
Nimrod Oren <noren@nvidia.com>,
Carolina Jubran <cjubran@nvidia.com>,
Gal Pressman <gal@nvidia.com>, <linux-kselftest@vger.kernel.org>
Subject: [PATCH ipsec 3/3] xfrm: Don't clobber inner headers when already set
Date: Wed, 22 Apr 2026 17:06:48 +0300 [thread overview]
Message-ID: <20260422140648.3877129-4-cratiu@nvidia.com> (raw)
In-Reply-To: <20260422140648.3877129-1-cratiu@nvidia.com>
On VXLAN over IPsec egress, xfrm{4,6}_transport_output() blindly
overwrite inner_transport_header (== the inner TCP header saved in VXLAN
iptunnel_handle_offloads() -> skb_reset_inner_headers()) with the
current transport_header (== the VXLAN outer UDP header set by
udp_tunnel_xmit_skb()).
This was a latent bug, harmless until commit [1] added a doff validation
check in qdisc_pkt_len_segs_init() for encapsulated GSO packets. With
the wrong inner_transport_header set by xfrm, qdisc_pkt_len_segs_init()
interprets inner_transport_header as a TCP header, reads doff=0 from the
upper byte of the VNI and drops the packet with DROP_REASON_SKB_BAD_GSO.
Besides the use in GSO to determine the header size of segmented
packets, inner_transport_header might be used by drivers to set up
inner checksum offloading by pointing the HW to the inner transport
header. A quick browse through available drivers shows that mlx5 uses
skb->csum_start specifically for this scenario, while others either
don't support VXLAN over IPsec crypto offload (ixgbe) or the HW is
capable of parsing the packets itself (nfp, Chelsio).
But in all cases, it is more correct to let the inner_transport_header
point to the innermost header instead of overwriting it in xfrm.
So fix this by guarding all four inner header save sites in
xfrm_output.c (xfrm{4,6}_transport_output, xfrm{4,6}_tunnel_encap_add)
with a check for skb->inner_protocol. When inner_protocol is set, a
tunnel layer (VXLAN, Geneve, GRE, etc.) has already saved the correct
inner header offsets and they must not be overwritten. When
inner_protocol is zero, no prior tunnel encapsulation exists and xfrm
must save the inner headers itself. The tunnel mode checks are only
added for completion, since they aren't strictly required, as
xfrm_output() forces software GSO in tunnel mode before encap.
This makes the previously added test pass:
# ./tools/testing/selftests/drivers/net/hw/ipsec_vxlan.py
TAP version 13
1..4
ok 1 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v4_inner_v4
ok 2 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v4_inner_v6
ok 3 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v6_inner_v4
ok 4 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v6_inner_v6
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
[1] commit 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
Fixes: f1bd7d659ef0 ("xfrm: Add encapsulation header offsets while SKB is not encrypted")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
---
net/xfrm/xfrm_output.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index a9652b422f51..cc35c2fcbbe0 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -66,7 +66,9 @@ static int xfrm4_transport_output(struct xfrm_state *x, struct sk_buff *skb)
struct iphdr *iph = ip_hdr(skb);
int ihl = iph->ihl * 4;
- skb_set_inner_transport_header(skb, skb_transport_offset(skb));
+ if (!skb->inner_protocol)
+ skb_set_inner_transport_header(skb,
+ skb_transport_offset(skb));
skb_set_network_header(skb, -x->props.header_len);
skb->mac_header = skb->network_header +
@@ -167,7 +169,9 @@ static int xfrm6_transport_output(struct xfrm_state *x, struct sk_buff *skb)
int hdr_len;
iph = ipv6_hdr(skb);
- skb_set_inner_transport_header(skb, skb_transport_offset(skb));
+ if (!skb->inner_protocol)
+ skb_set_inner_transport_header(skb,
+ skb_transport_offset(skb));
hdr_len = xfrm6_hdr_offset(x, skb, &prevhdr);
if (hdr_len < 0)
@@ -276,8 +280,10 @@ static int xfrm4_tunnel_encap_add(struct xfrm_state *x, struct sk_buff *skb)
struct iphdr *top_iph;
int flags;
- skb_set_inner_network_header(skb, skb_network_offset(skb));
- skb_set_inner_transport_header(skb, skb_transport_offset(skb));
+ if (!skb->inner_protocol) {
+ skb_set_inner_network_header(skb, skb_network_offset(skb));
+ skb_set_inner_transport_header(skb, skb_transport_offset(skb));
+ }
skb_set_network_header(skb, -x->props.header_len);
skb->mac_header = skb->network_header +
@@ -321,8 +327,10 @@ static int xfrm6_tunnel_encap_add(struct xfrm_state *x, struct sk_buff *skb)
struct ipv6hdr *top_iph;
int dsfield;
- skb_set_inner_network_header(skb, skb_network_offset(skb));
- skb_set_inner_transport_header(skb, skb_transport_offset(skb));
+ if (!skb->inner_protocol) {
+ skb_set_inner_network_header(skb, skb_network_offset(skb));
+ skb_set_inner_transport_header(skb, skb_transport_offset(skb));
+ }
skb_set_network_header(skb, -x->props.header_len);
skb->mac_header = skb->network_header +
--
2.53.0
prev parent reply other threads:[~2026-04-22 14:07 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 14:06 [PATCH ipsec 0/3] xfrm: Don't clobber inner headers when already set Cosmin Ratiu
2026-04-22 14:06 ` [PATCH ipsec 1/3] tools/selftests: Use a sensible timeout value for iperf3 client Cosmin Ratiu
2026-04-22 14:06 ` [PATCH ipsec 2/3] tools/selftests: Add a VXLAN+IPsec traffic test Cosmin Ratiu
2026-04-22 14:06 ` Cosmin Ratiu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260422140648.3877129-4-cratiu@nvidia.com \
--to=cratiu@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=cjubran@nvidia.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=herbert@gondor.apana.org.au \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=noren@nvidia.com \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=steffen.klassert@secunet.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox