* [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb()
@ 2026-03-03 10:04 Keita Morisaki
2026-03-03 10:19 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Keita Morisaki @ 2026-03-03 10:04 UTC (permalink / raw)
To: Eric Dumazet, Neal Cardwell, David S . Miller, David Ahern,
Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Simon Horman, netdev, linux-kernel, bpf,
Keita Morisaki
Use struct_group() to group the three fields in tcp_out_options that are
read unconditionally by tcp_options_write() and bpf_skops_write_hdr_opt()
(mss, bpf_opt_len, num_sack_blocks), then replace the full-struct memset
with a targeted memset of only that group.
struct tcp_out_options is 40 bytes without MPTCP and 96 bytes with
CONFIG_MPTCP=y (typical distro config). Every remaining field is either
assigned before first use by tcp_established_options()/tcp_syn_options(),
or gated behind its OPTION_* flag in tcp_options_write(). This memset
runs on every transmitted TCP packet, so shrinking it from 96 (or 40)
bytes to 4 bytes reduces per-packet overhead on the hot path.
Assembly comparison (x86-64, GCC 13, CONFIG_MPTCP=y):
Before: rep stos zeroing 96 bytes (5 instructions, 12 8-byte stores)
After: movl $0x0 zeroing 4 bytes (1 instruction, 1 store)
Also add opts->options = 0 at the top of tcp_syn_options(), which
already used |= without a prior clear. tcp_established_options() already
clears opts->options at its top.
Signed-off-by: Keita Morisaki <kmta1236@gmail.com>
---
net/ipv4/tcp_output.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 326b58ff1118d..63ee037f46e50 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -429,14 +429,19 @@ static void smc_options_write(__be32 *ptr, u16 *options)
}
struct tcp_out_options {
+ /* Following group is cleared in __tcp_transmit_skb() */
+ struct_group(cleared,
+ u16 mss; /* 0 to disable */
+ u8 bpf_opt_len; /* length of BPF hdr option */
+ u8 num_sack_blocks; /* number of SACK blocks to include */
+ );
+
+ /* Caution: following fields are not cleared in __tcp_transmit_skb() */
u16 options; /* bit field of OPTION_* */
- u16 mss; /* 0 to disable */
u8 ws; /* window scale, 0 to disable */
- u8 num_sack_blocks; /* number of SACK blocks to include */
u8 num_accecn_fields:7, /* number of AccECN fields needed */
use_synack_ecn_bytes:1; /* Use synack_ecn_bytes or not */
u8 hash_size; /* bytes in hash_location */
- u8 bpf_opt_len; /* length of BPF hdr option */
__u8 *hash_location; /* temporary pointer, overloaded */
__u32 tsval, tsecr; /* need to include OPTION_TS */
struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */
@@ -965,6 +970,8 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb,
struct tcp_fastopen_request *fastopen = tp->fastopen_req;
bool timestamps;
+ opts->options = 0;
+
/* Better than switch (key.type) as it has static branches */
if (tcp_key_is_md5(key)) {
timestamps = false;
@@ -1549,7 +1556,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
inet = inet_sk(sk);
tcb = TCP_SKB_CB(skb);
- memset(&opts, 0, sizeof(opts));
+ memset(&opts.cleared, 0, sizeof(opts.cleared));
tcp_get_current_key(sk, &key);
if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) {
base-commit: af4e9ef3d78420feb8fe58cd9a1ab80c501b3c08
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb()
2026-03-03 10:04 [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb() Keita Morisaki
@ 2026-03-03 10:19 ` Eric Dumazet
2026-03-04 1:50 ` Keita Morisaki
2026-03-03 12:28 ` Jakub Sitnicki
2026-03-04 10:51 ` Eric Dumazet
2 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2026-03-03 10:19 UTC (permalink / raw)
To: Keita Morisaki
Cc: Neal Cardwell, David S . Miller, David Ahern, Jakub Kicinski,
Paolo Abeni, Kuniyuki Iwashima, Simon Horman, netdev,
linux-kernel, bpf
On Tue, Mar 3, 2026 at 11:04 AM Keita Morisaki <kmta1236@gmail.com> wrote:
>
> Use struct_group() to group the three fields in tcp_out_options that are
> read unconditionally by tcp_options_write() and bpf_skops_write_hdr_opt()
> (mss, bpf_opt_len, num_sack_blocks), then replace the full-struct memset
> with a targeted memset of only that group.
>
> struct tcp_out_options is 40 bytes without MPTCP and 96 bytes with
> CONFIG_MPTCP=y (typical distro config). Every remaining field is either
> assigned before first use by tcp_established_options()/tcp_syn_options(),
> or gated behind its OPTION_* flag in tcp_options_write(). This memset
> runs on every transmitted TCP packet, so shrinking it from 96 (or 40)
> bytes to 4 bytes reduces per-packet overhead on the hot path.
>
> Assembly comparison (x86-64, GCC 13, CONFIG_MPTCP=y):
>
> Before: rep stos zeroing 96 bytes (5 instructions, 12 8-byte stores)
> After: movl $0x0 zeroing 4 bytes (1 instruction, 1 store)
>
> Also add opts->options = 0 at the top of tcp_syn_options(), which
> already used |= without a prior clear. tcp_established_options() already
> clears opts->options at its top.
Please read Documentation/process/maintainer-netdev.rst (about the ~24
hours delay)
Also, please CC MPTCP maintainers, since this patch could impact MPTCP.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb()
2026-03-03 10:19 ` Eric Dumazet
@ 2026-03-04 1:50 ` Keita Morisaki
0 siblings, 0 replies; 6+ messages in thread
From: Keita Morisaki @ 2026-03-04 1:50 UTC (permalink / raw)
To: edumazet
Cc: bpf, davem, dsahern, horms, kmta1236, kuba, kuniyu, linux-kernel,
ncardwell, netdev, pabeni
> Please read Documentation/process/maintainer-netdev.rst (about the ~24
> hours delay)
>
> Also, please CC MPTCP maintainers, since this patch could impact MPTCP.
Oops, thank you for the pointer. I will follow the 24+ hours delay if I need to update the patch.
I will also CC MPTCP maintainers later.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb()
2026-03-03 10:04 [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb() Keita Morisaki
2026-03-03 10:19 ` Eric Dumazet
@ 2026-03-03 12:28 ` Jakub Sitnicki
2026-03-04 1:52 ` Keita Morisaki
2026-03-04 10:51 ` Eric Dumazet
2 siblings, 1 reply; 6+ messages in thread
From: Jakub Sitnicki @ 2026-03-03 12:28 UTC (permalink / raw)
To: Keita Morisaki
Cc: Eric Dumazet, Neal Cardwell, David S . Miller, David Ahern,
Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima, Simon Horman,
netdev, linux-kernel, bpf
On Tue, Mar 03, 2026 at 07:04 PM +09, Keita Morisaki wrote:
> Use struct_group() to group the three fields in tcp_out_options that are
> read unconditionally by tcp_options_write() and bpf_skops_write_hdr_opt()
> (mss, bpf_opt_len, num_sack_blocks), then replace the full-struct memset
> with a targeted memset of only that group.
>
> struct tcp_out_options is 40 bytes without MPTCP and 96 bytes with
> CONFIG_MPTCP=y (typical distro config). Every remaining field is either
> assigned before first use by tcp_established_options()/tcp_syn_options(),
> or gated behind its OPTION_* flag in tcp_options_write(). This memset
> runs on every transmitted TCP packet, so shrinking it from 96 (or 40)
> bytes to 4 bytes reduces per-packet overhead on the hot path.
>
> Assembly comparison (x86-64, GCC 13, CONFIG_MPTCP=y):
>
> Before: rep stos zeroing 96 bytes (5 instructions, 12 8-byte stores)
> After: movl $0x0 zeroing 4 bytes (1 instruction, 1 store)
>
> Also add opts->options = 0 at the top of tcp_syn_options(), which
> already used |= without a prior clear. tcp_established_options() already
> clears opts->options at its top.
>
> Signed-off-by: Keita Morisaki <kmta1236@gmail.com>
> ---
Loosely related, noticed while reviewing - tcp_out_options.hash_size
looks unused:
* MD5 - always 16
* AO - uses tcp_ao_maclen()
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb()
2026-03-03 12:28 ` Jakub Sitnicki
@ 2026-03-04 1:52 ` Keita Morisaki
0 siblings, 0 replies; 6+ messages in thread
From: Keita Morisaki @ 2026-03-04 1:52 UTC (permalink / raw)
To: jakub
Cc: bpf, davem, dsahern, edumazet, horms, kmta1236, kuba, kuniyu,
linux-kernel, ncardwell, netdev, pabeni
> Loosely related, noticed while reviewing - tcp_out_options.hash_size
> looks unused:
>
> * MD5 - always 16
> * AO - uses tcp_ao_maclen()
Great catch. I will follow up.
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Thank you for the review, Jakub.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb()
2026-03-03 10:04 [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb() Keita Morisaki
2026-03-03 10:19 ` Eric Dumazet
2026-03-03 12:28 ` Jakub Sitnicki
@ 2026-03-04 10:51 ` Eric Dumazet
2 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2026-03-04 10:51 UTC (permalink / raw)
To: Keita Morisaki
Cc: Neal Cardwell, David S . Miller, David Ahern, Jakub Kicinski,
Paolo Abeni, Kuniyuki Iwashima, Simon Horman, netdev,
linux-kernel, bpf
On Tue, Mar 3, 2026 at 11:04 AM Keita Morisaki <kmta1236@gmail.com> wrote:
>
> Use struct_group() to group the three fields in tcp_out_options that are
> read unconditionally by tcp_options_write() and bpf_skops_write_hdr_opt()
> (mss, bpf_opt_len, num_sack_blocks), then replace the full-struct memset
> with a targeted memset of only that group.
>
> struct tcp_out_options is 40 bytes without MPTCP and 96 bytes with
> CONFIG_MPTCP=y (typical distro config). Every remaining field is either
> assigned before first use by tcp_established_options()/tcp_syn_options(),
> or gated behind its OPTION_* flag in tcp_options_write(). This memset
> runs on every transmitted TCP packet, so shrinking it from 96 (or 40)
> bytes to 4 bytes reduces per-packet overhead on the hot path.
>
> Assembly comparison (x86-64, GCC 13, CONFIG_MPTCP=y):
>
> Before: rep stos zeroing 96 bytes (5 instructions, 12 8-byte stores)
> After: movl $0x0 zeroing 4 bytes (1 instruction, 1 store)
>
> Also add opts->options = 0 at the top of tcp_syn_options(), which
> already used |= without a prior clear. tcp_established_options() already
> clears opts->options at its top.
>
> Signed-off-by: Keita Morisaki <kmta1236@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-04 10:51 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03 10:04 [PATCH net-next v2] tcp: shrink per-packet memset in __tcp_transmit_skb() Keita Morisaki
2026-03-03 10:19 ` Eric Dumazet
2026-03-04 1:50 ` Keita Morisaki
2026-03-03 12:28 ` Jakub Sitnicki
2026-03-04 1:52 ` Keita Morisaki
2026-03-04 10:51 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox