From: Alice Mikityanska <alice.kernel@fastmail.im>
To: Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Xin Long <lucien.xin@gmail.com>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Willem de Bruijn <willemb@google.com>,
David Ahern <dsahern@kernel.org>,
Nikolay Aleksandrov <razor@blackwall.org>
Cc: Shuah Khan <shuah@kernel.org>,
Stanislav Fomichev <stfomichev@gmail.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Simon Horman <horms@kernel.org>, Florian Westphal <fw@strlen.de>,
netdev@vger.kernel.org, Alice Mikityanska <alice@isovalent.com>
Subject: [PATCH net-next v4 10/12] vxlan: Enable BIG TCP packets
Date: Tue, 12 May 2026 18:56:46 +0200 [thread overview]
Message-ID: <20260512165648.386518-11-alice.kernel@fastmail.im> (raw)
In-Reply-To: <20260512165648.386518-1-alice.kernel@fastmail.im>
From: Alice Mikityanska <alice@isovalent.com>
In Cilium we do support BIG TCP, but so far the latter has only been
enabled for direct routing use-cases. A lot of users rely on Cilium
with vxlan/geneve tunneling though. The underlying kernel infra for
tunneling has not been supporting BIG TCP up to this point.
Given we do now, bump tso_max_size for vxlan netdevs up to GSO_MAX_SIZE
to allow the admin to use BIG TCP with vxlan tunnels.
BIG TCP on vxlan disabled:
Standard MTU:
# netperf -H 10.1.0.2 -t TCP_STREAM -l60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.0.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 30.00 34440.00
8k MTU:
# netperf -H 10.1.0.2 -t TCP_STREAM -l60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.0.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
262144 32768 32768 30.00 55684.26
BIG TCP on vxlan enabled:
Standard MTU:
# netperf -H 10.1.0.2 -t TCP_STREAM -l60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.0.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 30.00 39564.78
8k MTU:
# netperf -H 10.1.0.2 -t TCP_STREAM -l60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.0.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
262144 32768 32768 30.00 61466.47
When tunnel offloads are not enabled/exposed and we fully need to rely on
SW-based segmentation on transmit (e.g. in case of Azure) then the more
aggressive batching also has a visible effect. Below example was on the
same setup as with above benchmarks but with HW support disabled:
# ethtool -k enp10s0f0np0 | grep udp
tx-udp_tnl-segmentation: off
tx-udp_tnl-csum-segmentation: off
tx-udp-segmentation: off
rx-udp_tunnel-port-offload: off
rx-udp-gro-forwarding: off
Before:
# netperf -H 10.1.0.2 -t TCP_STREAM -l60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.0.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 60.00 21820.82
After:
# netperf -H 10.1.0.2 -t TCP_STREAM -l60
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.1.0.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 60.00 29390.78
Example receive side:
swapper 0 [002] 4712.645070: net:netif_receive_skb: dev=enp10s0f0np0 skbaddr=0xffff8f3b086e0200 len=129542
ffffffff8cfe3aaa __netif_receive_skb_core.constprop.0+0x6ca ([kernel.kallsyms])
ffffffff8cfe3aaa __netif_receive_skb_core.constprop.0+0x6ca ([kernel.kallsyms])
ffffffff8cfe47dd __netif_receive_skb_list_core+0xed ([kernel.kallsyms])
ffffffff8cfe4e52 netif_receive_skb_list_internal+0x1d2 ([kernel.kallsyms])
ffffffff8d0210d8 gro_complete.constprop.0+0x108 ([kernel.kallsyms])
ffffffff8d021724 dev_gro_receive+0x4e4 ([kernel.kallsyms])
ffffffff8d021a99 gro_receive_skb+0x89 ([kernel.kallsyms])
ffffffffc06edb71 mlx5e_handle_rx_cqe_mpwrq+0x131 ([kernel.kallsyms])
ffffffffc06ee38a mlx5e_poll_rx_cq+0x9a ([kernel.kallsyms])
ffffffffc06ef2c7 mlx5e_napi_poll+0x107 ([kernel.kallsyms])
ffffffff8cfe586d __napi_poll+0x2d ([kernel.kallsyms])
ffffffff8cfe5f8d net_rx_action+0x20d ([kernel.kallsyms])
ffffffff8c35d252 handle_softirqs+0xe2 ([kernel.kallsyms])
ffffffff8c35d556 __irq_exit_rcu+0xd6 ([kernel.kallsyms])
ffffffff8c35d81e irq_exit_rcu+0xe ([kernel.kallsyms])
ffffffff8d2602b8 common_interrupt+0x98 ([kernel.kallsyms])
ffffffff8c000da7 asm_common_interrupt+0x27 ([kernel.kallsyms])
ffffffff8d2645c5 cpuidle_enter_state+0xd5 ([kernel.kallsyms])
ffffffff8cf6358e cpuidle_enter+0x2e ([kernel.kallsyms])
ffffffff8c3ba932 call_cpuidle+0x22 ([kernel.kallsyms])
ffffffff8c3bfb5e do_idle+0x1ce ([kernel.kallsyms])
ffffffff8c3bfd79 cpu_startup_entry+0x29 ([kernel.kallsyms])
ffffffff8c30a6c2 start_secondary+0x112 ([kernel.kallsyms])
ffffffff8c2c142d common_startup_64+0x13e ([kernel.kallsyms])
Example transmit side:
swapper 0 [005] 4768.021375: net:net_dev_xmit: dev=enp10s0f0np0 skbaddr=0xffff8af32ebe1200 len=129556 rc=0
ffffffffa75e19c3 dev_hard_start_xmit+0x173 ([kernel.kallsyms])
ffffffffa75e19c3 dev_hard_start_xmit+0x173 ([kernel.kallsyms])
ffffffffa7653823 sch_direct_xmit+0x143 ([kernel.kallsyms])
ffffffffa75e2780 __dev_queue_xmit+0xc70 ([kernel.kallsyms])
ffffffffa76a1205 ip_finish_output2+0x265 ([kernel.kallsyms])
ffffffffa76a1577 __ip_finish_output+0x87 ([kernel.kallsyms])
ffffffffa76a165b ip_finish_output+0x2b ([kernel.kallsyms])
ffffffffa76a179e ip_output+0x5e ([kernel.kallsyms])
ffffffffa76a19d5 ip_local_out+0x35 ([kernel.kallsyms])
ffffffffa770d0e5 iptunnel_xmit+0x185 ([kernel.kallsyms])
ffffffffc179634e nf_nat_used_tuple_new.cold+0x1129 ([kernel.kallsyms])
ffffffffc17a7301 vxlan_xmit_one+0xc21 ([kernel.kallsyms])
ffffffffc17a80a2 vxlan_xmit+0x4a2 ([kernel.kallsyms])
ffffffffa75e18af dev_hard_start_xmit+0x5f ([kernel.kallsyms])
ffffffffa75e1d3f __dev_queue_xmit+0x22f ([kernel.kallsyms])
ffffffffa76a1205 ip_finish_output2+0x265 ([kernel.kallsyms])
ffffffffa76a1577 __ip_finish_output+0x87 ([kernel.kallsyms])
ffffffffa76a165b ip_finish_output+0x2b ([kernel.kallsyms])
ffffffffa76a179e ip_output+0x5e ([kernel.kallsyms])
ffffffffa76a1de2 __ip_queue_xmit+0x1b2 ([kernel.kallsyms])
ffffffffa76a2135 ip_queue_xmit+0x15 ([kernel.kallsyms])
ffffffffa76c70a2 __tcp_transmit_skb+0x522 ([kernel.kallsyms])
ffffffffa76c931a tcp_write_xmit+0x65a ([kernel.kallsyms])
ffffffffa76cb42e tcp_tsq_write+0x5e ([kernel.kallsyms])
ffffffffa76cb7ef tcp_tasklet_func+0x10f ([kernel.kallsyms])
ffffffffa695d9f7 tasklet_action_common+0x107 ([kernel.kallsyms])
ffffffffa695db99 tasklet_action+0x29 ([kernel.kallsyms])
ffffffffa695d252 handle_softirqs+0xe2 ([kernel.kallsyms])
ffffffffa695d556 __irq_exit_rcu+0xd6 ([kernel.kallsyms])
ffffffffa695d81e irq_exit_rcu+0xe ([kernel.kallsyms])
ffffffffa78602b8 common_interrupt+0x98 ([kernel.kallsyms])
ffffffffa6600da7 asm_common_interrupt+0x27 ([kernel.kallsyms])
ffffffffa78645c5 cpuidle_enter_state+0xd5 ([kernel.kallsyms])
ffffffffa756358e cpuidle_enter+0x2e ([kernel.kallsyms])
ffffffffa69ba932 call_cpuidle+0x22 ([kernel.kallsyms])
ffffffffa69bfb5e do_idle+0x1ce ([kernel.kallsyms])
ffffffffa69bfd79 cpu_startup_entry+0x29 ([kernel.kallsyms])
ffffffffa690a6c2 start_secondary+0x112 ([kernel.kallsyms])
ffffffffa68c142d common_startup_64+0x13e ([kernel.kallsyms])
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
drivers/net/vxlan/vxlan_core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 00facbfabced..d3249c319779 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -3371,6 +3371,8 @@ static void vxlan_setup(struct net_device *dev)
dev->mangleid_features = NETIF_F_GSO_PARTIAL;
netif_keep_dst(dev);
+ netif_set_tso_max_size(dev, GSO_MAX_SIZE);
+
dev->priv_flags |= IFF_NO_QUEUE;
dev->change_proto_down = true;
dev->lltx = true;
--
2.54.0
next prev parent reply other threads:[~2026-05-12 16:58 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 16:56 [PATCH net-next v4 00/12] BIG TCP for UDP tunnels Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 01/12] net/sched: act_csum: don't mangle UDP tunnel GSO packets Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 02/12] udp: gso: Simplify handling length in GSO_PARTIAL Alice Mikityanska
2026-05-13 7:53 ` Gal Pressman
2026-05-13 9:23 ` Alice Mikityanska
2026-05-13 9:40 ` Gal Pressman
2026-05-12 16:56 ` [PATCH net-next v4 03/12] geneve: Fix off-by-one comparing with GRO_LEGACY_MAX_SIZE Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 04/12] net: Use helpers to get/set UDP len tree-wide Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 05/12] net: Enable BIG TCP with partial GSO Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 06/12] udp: Support gro_ipv4_max_size > 65536 Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 07/12] udp: Support BIG TCP GSO packets where they can occur Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 08/12] udp: Validate UDP length in udp_gro_receive Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 09/12] udp: Set length in UDP header to 0 for big GSO packets Alice Mikityanska
2026-05-12 16:56 ` Alice Mikityanska [this message]
2026-05-12 16:56 ` [PATCH net-next v4 11/12] geneve: Enable BIG TCP packets Alice Mikityanska
2026-05-12 16:56 ` [PATCH net-next v4 12/12] selftests: net: Add a test for BIG TCP in UDP tunnels Alice Mikityanska
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260512165648.386518-11-alice.kernel@fastmail.im \
--to=alice.kernel@fastmail.im \
--cc=alice@isovalent.com \
--cc=andrew+netdev@lunn.ch \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=lucien.xin@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=razor@blackwall.org \
--cc=shuah@kernel.org \
--cc=stfomichev@gmail.com \
--cc=willemb@google.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.