* [PATCH net-next] tcp: replace per-packet memset in __tcp_transmit_skb() @ 2026-03-03 7:48 Keita Morisaki 2026-03-03 8:49 ` Eric Dumazet 0 siblings, 1 reply; 3+ messages in thread From: Keita Morisaki @ 2026-03-03 7:48 UTC (permalink / raw) To: Eric Dumazet, Neal Cardwell, David S . Miller, David Ahern, Jakub Kicinski, Paolo Abeni Cc: Kuniyuki Iwashima, Simon Horman, netdev, linux-kernel, bpf, Keita Morisaki Replace memset(&opts, 0, sizeof(opts)) with targeted initialization of only the three fields read unconditionally by tcp_options_write() and bpf_skops_write_hdr_opt(): mss, num_sack_blocks, and bpf_opt_len. struct tcp_out_options is 40 bytes without MPTCP and 96 bytes with CONFIG_MPTCP=y (typical distro config). Every remaining field is either assigned before first use by tcp_established_options()/tcp_syn_options(), or gated behind its OPTION_* flag in tcp_options_write(). This memset runs on every transmitted TCP packet, so removing it reduces per-packet overhead on the hot path. Assembly comparison (x86-64, GCC 13, CONFIG_MPTCP=n): Before: 5 store instructions zeroing 40 bytes After: 3 store instructions zeroing 4 bytes With CONFIG_MPTCP=y the savings are larger: 12 stores (96 bytes) become 3 stores (4 bytes). A BUILD_BUG_ON guards against future field additions: if the struct size changes, the assertion fires and forces the developer to audit whether the new field needs explicit zeroing. Also add opts->options = 0 at the top of tcp_syn_options(), which already used |= without a prior clear. tcp_established_options() already clears opts->options at its top. Signed-off-by: Keita Morisaki <kmta1236@gmail.com> --- net/ipv4/tcp_output.c | 16 ++++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 326b58ff1118d..ae04c697dbacc 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -965,6 +965,8 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, struct tcp_fastopen_request *fastopen = tp->fastopen_req; bool timestamps; + opts->options = 0; + /* Better than switch (key.type) as it has static branches */ if (tcp_key_is_md5(key)) { timestamps = false; @@ -1549,7 +1551,20 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, inet = inet_sk(sk); tcb = TCP_SKB_CB(skb); - memset(&opts, 0, sizeof(opts)); + /* Only zero fields read unconditionally by tcp_options_write() + * or bpf_skops_write_hdr_opt(). Other fields are set by + * tcp_established_options()/tcp_syn_options() before use, or + * gated behind their OPTION_* flag. + * + * If you add a field to tcp_out_options, this will fire — + * audit whether the new field needs zeroing here. + */ + BUILD_BUG_ON(sizeof(opts) != + sizeof(struct mptcp_out_options) + 40); + opts.mss = 0; + opts.num_sack_blocks = 0; + opts.bpf_opt_len = 0; tcp_get_current_key(sk, &key); if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) { base-commit: 7dff99b354601e04378b4490e1b1b5765a3c3a88 -- 2.43.0 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net-next] tcp: replace per-packet memset in __tcp_transmit_skb() 2026-03-03 7:48 [PATCH net-next] tcp: replace per-packet memset in __tcp_transmit_skb() Keita Morisaki @ 2026-03-03 8:49 ` Eric Dumazet 2026-03-03 9:54 ` Keita Morisaki 0 siblings, 1 reply; 3+ messages in thread From: Eric Dumazet @ 2026-03-03 8:49 UTC (permalink / raw) To: Keita Morisaki Cc: Neal Cardwell, David S . Miller, David Ahern, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima, Simon Horman, netdev, linux-kernel, bpf On Tue, Mar 3, 2026 at 8:49 AM Keita Morisaki <kmta1236@gmail.com> wrote: > > Replace memset(&opts, 0, sizeof(opts)) with targeted initialization of > only the three fields read unconditionally by tcp_options_write() and > bpf_skops_write_hdr_opt(): mss, num_sack_blocks, and bpf_opt_len. > > struct tcp_out_options is 40 bytes without MPTCP and 96 bytes with > CONFIG_MPTCP=y (typical distro config). Every remaining field is either > assigned before first use by tcp_established_options()/tcp_syn_options(), > or gated behind its OPTION_* flag in tcp_options_write(). This memset > runs on every transmitted TCP packet, so removing it reduces per-packet > overhead on the hot path. > > Assembly comparison (x86-64, GCC 13, CONFIG_MPTCP=n): > > Before: 5 store instructions zeroing 40 bytes > After: 3 store instructions zeroing 4 bytes > > With CONFIG_MPTCP=y the savings are larger: 12 stores (96 bytes) become > 3 stores (4 bytes). > > A BUILD_BUG_ON guards against future field additions: if the struct size > changes, the assertion fires and forces the developer to audit whether > the new field needs explicit zeroing. > > Also add opts->options = 0 at the top of tcp_syn_options(), which > already used |= without a prior clear. tcp_established_options() already > clears opts->options at its top. > > Signed-off-by: Keita Morisaki <kmta1236@gmail.com> > --- > net/ipv4/tcp_output.c | 16 ++++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 326b58ff1118d..ae04c697dbacc 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -965,6 +965,8 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, > struct tcp_fastopen_request *fastopen = tp->fastopen_req; > bool timestamps; > > + opts->options = 0; > + > /* Better than switch (key.type) as it has static branches */ > if (tcp_key_is_md5(key)) { > timestamps = false; > @@ -1549,7 +1551,20 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, > > inet = inet_sk(sk); > tcb = TCP_SKB_CB(skb); > - memset(&opts, 0, sizeof(opts)); > + /* Only zero fields read unconditionally by tcp_options_write() > + * or bpf_skops_write_hdr_opt(). Other fields are set by > + * tcp_established_options()/tcp_syn_options() before use, or > + * gated behind their OPTION_* flag. > + * > + * If you add a field to tcp_out_options, this will fire — > + * audit whether the new field needs zeroing here. > + */ > + BUILD_BUG_ON(sizeof(opts) != > + sizeof(struct mptcp_out_options) + 40); > + opts.mss = 0; > + opts.num_sack_blocks = 0; > + opts.bpf_opt_len = 0; > > tcp_get_current_key(sk, &key); > if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) { > Interesting idea, but the implementation is a bit fragile. I would instead add a group in struct tcp_out_options to clearly mark which group is zeroed in __tcp_transmit_skb(). Partial patch to give the idea : diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 46bd48cf776a6a37e6ab2664245cd1e35a88d4f8..b96168f50170b7359d26de2389128bc80d4d549e 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -429,14 +429,19 @@ static void smc_options_write(__be32 *ptr, u16 *options) } struct tcp_out_options { + /* Following group is cleared in __tcp_transmit_skb() */ + struct_group(cleared, + u16 mss; /* 0 to disable */ + u8 bpf_opt_len; /* length of BPF hdr option */ + u8 num_sack_blocks; /* number of SACK blocks to include */ + ); + + /* Caution : Following fields are not cleared in __tcp_transmit_skb(). */ u16 options; /* bit field of OPTION_* */ - u16 mss; /* 0 to disable */ u8 ws; /* window scale, 0 to disable */ - u8 num_sack_blocks; /* number of SACK blocks to include */ u8 num_accecn_fields:7, /* number of AccECN fields needed */ use_synack_ecn_bytes:1; /* Use synack_ecn_bytes or not */ u8 hash_size; /* bytes in hash_location */ - u8 bpf_opt_len; /* length of BPF hdr option */ __u8 *hash_location; /* temporary pointer, overloaded */ __u32 tsval, tsecr; /* need to include OPTION_TS */ struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */ @@ -1565,7 +1570,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, inet = inet_sk(sk); tcb = TCP_SKB_CB(skb); - memset(&opts, 0, sizeof(opts)); + memset(&opts.cleared, 0, sizeof(opts.cleared)); tcp_get_current_key(sk, &key); if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) { ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net-next] tcp: replace per-packet memset in __tcp_transmit_skb() 2026-03-03 8:49 ` Eric Dumazet @ 2026-03-03 9:54 ` Keita Morisaki 0 siblings, 0 replies; 3+ messages in thread From: Keita Morisaki @ 2026-03-03 9:54 UTC (permalink / raw) To: edumazet Cc: bpf, davem, dsahern, horms, kmta1236, kuba, kuniyu, linux-kernel, ncardwell, netdev, pabeni > Interesting idea, but the implementation is a bit fragile. > > I would instead add a group in struct tcp_out_options to clearly mark > which group is zeroed in __tcp_transmit_skb(). That's a great idea. > > Partial patch to give the idea : > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 46bd48cf776a6a37e6ab2664245cd1e35a88d4f8..b96168f50170b7359d26de2389128bc80d4d549e > 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -429,14 +429,19 @@ static void smc_options_write(__be32 *ptr, u16 *options) > } > > struct tcp_out_options { > + /* Following group is cleared in __tcp_transmit_skb() */ > + struct_group(cleared, > + u16 mss; /* 0 to disable */ > + u8 bpf_opt_len; /* length of BPF hdr option */ > + u8 num_sack_blocks; /* number of SACK blocks to include */ > + ); > + > + /* Caution : Following fields are not cleared in > __tcp_transmit_skb(). */ > u16 options; /* bit field of OPTION_* */ > - u16 mss; /* 0 to disable */ > u8 ws; /* window scale, 0 to disable */ > - u8 num_sack_blocks; /* number of SACK blocks to include */ > u8 num_accecn_fields:7, /* number of AccECN fields needed */ > use_synack_ecn_bytes:1; /* Use synack_ecn_bytes or not */ > u8 hash_size; /* bytes in hash_location */ > - u8 bpf_opt_len; /* length of BPF hdr option */ > __u8 *hash_location; /* temporary pointer, overloaded */ > __u32 tsval, tsecr; /* need to include OPTION_TS */ > struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */ > @@ -1565,7 +1570,7 @@ static int __tcp_transmit_skb(struct sock *sk, > struct sk_buff *skb, > > inet = inet_sk(sk); > tcb = TCP_SKB_CB(skb); > - memset(&opts, 0, sizeof(opts)); > + memset(&opts.cleared, 0, sizeof(opts.cleared)); Will send V2 with this change. Thank you for the review! ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-03 9:54 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-03 7:48 [PATCH net-next] tcp: replace per-packet memset in __tcp_transmit_skb() Keita Morisaki 2026-03-03 8:49 ` Eric Dumazet 2026-03-03 9:54 ` Keita Morisaki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox