* [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
@ 2026-04-06 22:49 Werner Kasselman
2026-04-07 1:25 ` Martin KaFai Lau
2026-04-09 6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman
0 siblings, 2 replies; 5+ messages in thread
From: Werner Kasselman @ 2026-04-06 22:49 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, John Fastabend, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org,
netdev@vger.kernel.org, stable@vger.kernel.org
sock_ops_convert_ctx_access() generates BPF instructions to inline
context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For
tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock
and returns 0 when the socket is not a locked full TCP socket.
However, the rtt_min field bypasses this guard entirely: it emits a raw
two-instruction load sequence (load sk pointer, then load from
tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.
This is a problem because bpf_skops_hdr_opt_len() and
bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a
tcp_request_sock (cast from request_sock) during SYN-ACK processing,
with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program
with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this
callback, the generated code treats the tcp_request_sock pointer as a
tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is
well past the end of the tcp_request_sock allocation, causing an
out-of-bounds slab read.
The rtt_min field was introduced in the same commit as the other
tcp_sock fields but was given hand-rolled access code because it reads
a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct
struct member, making it incompatible with the SOCK_OPS_GET_FIELD()
macro. This hand-rolled code omitted the is_fullsock guard that the
macro provides. The guard was later renamed to is_locked_tcp_sock in
commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").
Add the is_locked_tcp_sock guard to the rtt_min case, replicating the
exact instruction pattern used by SOCK_OPS_GET_FIELD() including
proper handling of the dst_reg==src_reg case with temp register
save/restore. Use offsetof(struct minmax_sample, v) for the sub-field
offset to match the style in bpf_tcp_sock_convert_ctx_access().
Found via AST-based call-graph analysis using sqry.
Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
Cc: stable@vger.kernel.org
Signed-off-by: Werner Kasselman <werner@verivus.com>
---
net/core/filter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 3 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 78b548158fb0..58f0735b18d9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
BUILD_BUG_ON(sizeof(struct minmax) <
sizeof(struct minmax_sample));
+ /* Unlike other tcp_sock fields that use
+ * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
+ * custom access pattern because it reads a sub-field
+ * (rtt_min.s[0].v) rather than a direct struct member.
+ * We must still guard the access with is_locked_tcp_sock
+ * to prevent an OOB read when sk points to a
+ * tcp_request_sock (e.g., during SYN-ACK processing via
+ * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
+ */
+ off = offsetof(struct tcp_sock, rtt_min) +
+ offsetof(struct minmax_sample, v);
+ {
+ int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
+
+ if (si->dst_reg == reg || si->src_reg == reg)
+ reg--;
+ if (si->dst_reg == reg || si->src_reg == reg)
+ reg--;
+ if (si->dst_reg == si->src_reg) {
+ *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
+ offsetof(struct bpf_sock_ops_kern,
+ temp));
+ fullsock_reg = reg;
+ jmp += 2;
+ }
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct bpf_sock_ops_kern,
+ is_locked_tcp_sock),
+ fullsock_reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern,
+ is_locked_tcp_sock));
+ *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
+ if (si->dst_reg == si->src_reg)
+ *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern,
+ temp));
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
struct bpf_sock_ops_kern, sk),
si->dst_reg, si->src_reg,
offsetof(struct bpf_sock_ops_kern, sk));
- *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
- offsetof(struct tcp_sock, rtt_min) +
- sizeof_field(struct minmax_sample, t));
+ *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
+ if (si->dst_reg == si->src_reg) {
+ *insn++ = BPF_JMP_A(1);
+ *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern,
+ temp));
+ }
+ }
break;
case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags):
--
2.43.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access 2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman @ 2026-04-07 1:25 ` Martin KaFai Lau 2026-04-07 1:56 ` Werner Kasselman 2026-04-09 6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman 1 sibling, 1 reply; 5+ messages in thread From: Martin KaFai Lau @ 2026-04-07 1:25 UTC (permalink / raw) To: Werner Kasselman Cc: Jiayuan Chen, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, John Fastabend, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org, netdev@vger.kernel.org, stable@vger.kernel.org On Mon, Apr 06, 2026 at 10:49:56PM +0000, Werner Kasselman wrote: > sock_ops_convert_ctx_access() generates BPF instructions to inline > context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For > tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the > SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock > and returns 0 when the socket is not a locked full TCP socket. > > However, the rtt_min field bypasses this guard entirely: it emits a raw > two-instruction load sequence (load sk pointer, then load from > tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first. > > This is a problem because bpf_skops_hdr_opt_len() and > bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a > tcp_request_sock (cast from request_sock) during SYN-ACK processing, > with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program > with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this > callback, the generated code treats the tcp_request_sock pointer as a > tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is > well past the end of the tcp_request_sock allocation, causing an > out-of-bounds slab read. This is not limited to hdr related CB flags. It also happens to earlier CB flags that have a request_sock, such as BPF_SOCK_OPS_RWND_INIT. > > The rtt_min field was introduced in the same commit as the other > tcp_sock fields but was given hand-rolled access code because it reads > a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct > struct member, making it incompatible with the SOCK_OPS_GET_FIELD() > macro. This hand-rolled code omitted the is_fullsock guard that the > macro provides. The guard was later renamed to is_locked_tcp_sock in > commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback"). > > Add the is_locked_tcp_sock guard to the rtt_min case, replicating the > exact instruction pattern used by SOCK_OPS_GET_FIELD() including > proper handling of the dst_reg==src_reg case with temp register > save/restore. Use offsetof(struct minmax_sample, v) for the sub-field > offset to match the style in bpf_tcp_sock_convert_ctx_access(). > > Found via AST-based call-graph analysis using sqry. > > Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more") > Cc: stable@vger.kernel.org > Signed-off-by: Werner Kasselman <werner@verivus.com> > --- > net/core/filter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 44 insertions(+), 3 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 78b548158fb0..58f0735b18d9 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, > BUILD_BUG_ON(sizeof(struct minmax) < > sizeof(struct minmax_sample)); > > + /* Unlike other tcp_sock fields that use > + * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a > + * custom access pattern because it reads a sub-field > + * (rtt_min.s[0].v) rather than a direct struct member. > + * We must still guard the access with is_locked_tcp_sock > + * to prevent an OOB read when sk points to a > + * tcp_request_sock (e.g., during SYN-ACK processing via > + * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt). > + */ > + off = offsetof(struct tcp_sock, rtt_min) + > + offsetof(struct minmax_sample, v); > + { > + int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2; > + > + if (si->dst_reg == reg || si->src_reg == reg) > + reg--; > + if (si->dst_reg == reg || si->src_reg == reg) > + reg--; > + if (si->dst_reg == si->src_reg) { > + *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg, > + offsetof(struct bpf_sock_ops_kern, > + temp)); > + fullsock_reg = reg; > + jmp += 2; > + } > + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( > + struct bpf_sock_ops_kern, > + is_locked_tcp_sock), > + fullsock_reg, si->src_reg, > + offsetof(struct bpf_sock_ops_kern, > + is_locked_tcp_sock)); > + *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp); > + if (si->dst_reg == si->src_reg) > + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, > + offsetof(struct bpf_sock_ops_kern, > + temp)); > *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( > struct bpf_sock_ops_kern, sk), > si->dst_reg, si->src_reg, > offsetof(struct bpf_sock_ops_kern, sk)); > - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, > - offsetof(struct tcp_sock, rtt_min) + > - sizeof_field(struct minmax_sample, t)); > + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off); > + if (si->dst_reg == si->src_reg) { > + *insn++ = BPF_JMP_A(1); > + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, > + offsetof(struct bpf_sock_ops_kern, > + temp)); There is an existing bug in this copy-and-paste codes [1] and now is repeated here, so please find a way to refactor it to be reusable instead of duplicating it. This also needs a test. It should be a subtest of [1], so [1] need to land first. [1]: https://lore.kernel.org/bpf/20260406031330.187630-1-jiayuan.chen@linux.dev/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access 2026-04-07 1:25 ` Martin KaFai Lau @ 2026-04-07 1:56 ` Werner Kasselman 0 siblings, 0 replies; 5+ messages in thread From: Werner Kasselman @ 2026-04-07 1:56 UTC (permalink / raw) To: Martin KaFai Lau Cc: Jiayuan Chen, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, John Fastabend, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org, netdev@vger.kernel.org, stable@vger.kernel.org Thanks for the review. You are right that this is not limited to the SYN-ACK header-option paths. The same unguarded ctx->rtt_min access is reachable from request_sock-backed sock_ops invocations through tcp_call_bpf(), including BPF_SOCK_OPS_RWND_INIT, BPF_SOCK_OPS_TIMEOUT_INIT, and BPF_SOCK_OPS_NEEDS_ECN. I'll update the changelog to describe the affected paths more accurately. I also agree that this should not duplicate the existing guarded ctx-access sequence. I'll rework the change to reuse a common helper/macro instead of open-coding another copy, and I'll add a selftest as a subtest of [1] after [1] lands. Thanks, Werner -----Original Message----- From: Martin KaFai Lau <martin.lau@linux.dev> Sent: Tuesday, 7 April 2026 11:26 AM To: Werner Kasselman <werner@verivus.ai> Cc: Jiayuan Chen <jiayuan.chen@linux.dev>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann <daniel@iogearbox.net>; Andrii Nakryiko <andrii@kernel.org>; John Fastabend <john.fastabend@gmail.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Lawrence Brakmo <brakmo@fb.com>; bpf@vger.kernel.org; netdev@vger.kernel.org; stable@vger.kernel.org Subject: Re: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access On Mon, Apr 06, 2026 at 10:49:56PM +0000, Werner Kasselman wrote: > sock_ops_convert_ctx_access() generates BPF instructions to inline > context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For > tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the > SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock > and returns 0 when the socket is not a locked full TCP socket. > > However, the rtt_min field bypasses this guard entirely: it emits a > raw two-instruction load sequence (load sk pointer, then load from > tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first. > > This is a problem because bpf_skops_hdr_opt_len() and > bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a > tcp_request_sock (cast from request_sock) during SYN-ACK processing, > with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program > with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this > callback, the generated code treats the tcp_request_sock pointer as a > tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is > well past the end of the tcp_request_sock allocation, causing an > out-of-bounds slab read. This is not limited to hdr related CB flags. It also happens to earlier CB flags that have a request_sock, such as BPF_SOCK_OPS_RWND_INIT. > > The rtt_min field was introduced in the same commit as the other > tcp_sock fields but was given hand-rolled access code because it reads > a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct > struct member, making it incompatible with the SOCK_OPS_GET_FIELD() > macro. This hand-rolled code omitted the is_fullsock guard that the > macro provides. The guard was later renamed to is_locked_tcp_sock in > commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback"). > > Add the is_locked_tcp_sock guard to the rtt_min case, replicating the > exact instruction pattern used by SOCK_OPS_GET_FIELD() including > proper handling of the dst_reg==src_reg case with temp register > save/restore. Use offsetof(struct minmax_sample, v) for the sub-field > offset to match the style in bpf_tcp_sock_convert_ctx_access(). > > Found via AST-based call-graph analysis using sqry. > > Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more") > Cc: stable@vger.kernel.org > Signed-off-by: Werner Kasselman <werner@verivus.com> > --- > net/core/filter.c | 47 > ++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 44 insertions(+), 3 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c index > 78b548158fb0..58f0735b18d9 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, > BUILD_BUG_ON(sizeof(struct minmax) < > sizeof(struct minmax_sample)); > > + /* Unlike other tcp_sock fields that use > + * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a > + * custom access pattern because it reads a sub-field > + * (rtt_min.s[0].v) rather than a direct struct member. > + * We must still guard the access with is_locked_tcp_sock > + * to prevent an OOB read when sk points to a > + * tcp_request_sock (e.g., during SYN-ACK processing via > + * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt). > + */ > + off = offsetof(struct tcp_sock, rtt_min) + > + offsetof(struct minmax_sample, v); > + { > + int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2; > + > + if (si->dst_reg == reg || si->src_reg == reg) > + reg--; > + if (si->dst_reg == reg || si->src_reg == reg) > + reg--; > + if (si->dst_reg == si->src_reg) { > + *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg, > + offsetof(struct bpf_sock_ops_kern, > + temp)); > + fullsock_reg = reg; > + jmp += 2; > + } > + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( > + struct bpf_sock_ops_kern, > + is_locked_tcp_sock), > + fullsock_reg, si->src_reg, > + offsetof(struct bpf_sock_ops_kern, > + is_locked_tcp_sock)); > + *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp); > + if (si->dst_reg == si->src_reg) > + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, > + offsetof(struct bpf_sock_ops_kern, > + temp)); > *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( > struct bpf_sock_ops_kern, sk), > si->dst_reg, si->src_reg, > offsetof(struct bpf_sock_ops_kern, sk)); > - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, > - offsetof(struct tcp_sock, rtt_min) + > - sizeof_field(struct minmax_sample, t)); > + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off); > + if (si->dst_reg == si->src_reg) { > + *insn++ = BPF_JMP_A(1); > + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, > + offsetof(struct bpf_sock_ops_kern, > + temp)); There is an existing bug in this copy-and-paste codes [1] and now is repeated here, so please find a way to refactor it to be reusable instead of duplicating it. This also needs a test. It should be a subtest of [1], so [1] need to land first. [1]: https://lore.kernel.org/bpf/20260406031330.187630-1-jiayuan.chen@linux.dev/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock 2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman 2026-04-07 1:25 ` Martin KaFai Lau @ 2026-04-09 6:10 ` Werner Kasselman 2026-04-09 14:29 ` Alexei Starovoitov 1 sibling, 1 reply; 5+ messages in thread From: Werner Kasselman @ 2026-04-09 6:10 UTC (permalink / raw) To: Martin KaFai Lau Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, John Fastabend, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Shuah Khan, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, stable@vger.kernel.org, Werner Kasselman sock_ops_convert_ctx_access() emits guarded reads for tcp_sock-backed bpf_sock_ops fields such as snd_cwnd, srtt_us, snd_ssthresh, rcv_nxt, snd_nxt, snd_una, mss_cache, ecn_flags, rate_delivered, and rate_interval_us. Those accesses go through SOCK_OPS_GET_TCP_SOCK_FIELD(), which checks is_locked_tcp_sock before dereferencing sock_ops.sk. The rtt_min case is different. Because it reads a subfield of struct minmax, it uses a custom open-coded load sequence instead of the usual helper macro, and that sequence currently dereferences sock_ops.sk without checking is_locked_tcp_sock first. This is unsafe when sock_ops.sk points to a request_sock-backed object instead of a locked full tcp_sock. That is reachable not only from the SYNACK header option callbacks, but also from other request_sock-backed sock_ops callbacks such as BPF_SOCK_OPS_TIMEOUT_INIT, BPF_SOCK_OPS_RWND_INIT, and BPF_SOCK_OPS_NEEDS_ECN. In those cases, reading ctx->rtt_min makes the generated code treat a request_sock as a tcp_sock and read beyond the end of the request_sock allocation. Fix the rtt_min conversion by adding the same is_locked_tcp_sock guard used for the other tcp_sock field reads. Also make the accessed subfield explicit by using offsetof(struct minmax_sample, v). Add a selftest that verifies request_sock-backed sock_ops callbacks see ctx->rtt_min as zero after the fix. Found via AST-based call-graph analysis using sqry. Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more") Cc: stable@vger.kernel.org Signed-off-by: Werner Kasselman <werner@verivus.com> --- net/core/filter.c | 53 +++++++++++++++---- .../selftests/bpf/prog_tests/tcpbpf_user.c | 9 ++++ .../selftests/bpf/progs/test_tcpbpf_kern.c | 21 ++++++++ tools/testing/selftests/bpf/test_tcpbpf.h | 6 +++ 4 files changed, 79 insertions(+), 10 deletions(-) diff --git a/net/core/filter.c b/net/core/filter.c index 78b548158..5040bf7e4 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -10827,16 +10827,49 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, case offsetof(struct bpf_sock_ops, rtt_min): BUILD_BUG_ON(sizeof_field(struct tcp_sock, rtt_min) != sizeof(struct minmax)); - BUILD_BUG_ON(sizeof(struct minmax) < - sizeof(struct minmax_sample)); - - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( - struct bpf_sock_ops_kern, sk), - si->dst_reg, si->src_reg, - offsetof(struct bpf_sock_ops_kern, sk)); - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, - offsetof(struct tcp_sock, rtt_min) + - sizeof_field(struct minmax_sample, t)); + BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, rtt_min) != + sizeof_field(struct minmax_sample, v)); + off = offsetof(struct tcp_sock, rtt_min) + + offsetof(struct minmax_sample, v); + + { + int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2; + + if (si->dst_reg == reg || si->src_reg == reg) + reg--; + if (si->dst_reg == reg || si->src_reg == reg) + reg--; + if (si->dst_reg == si->src_reg) { + *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg, + offsetof(struct bpf_sock_ops_kern, + temp)); + fullsock_reg = reg; + jmp += 2; + } + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( + struct bpf_sock_ops_kern, + is_locked_tcp_sock), + fullsock_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, + is_locked_tcp_sock)); + *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp); + if (si->dst_reg == si->src_reg) + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, + temp)); + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( + struct bpf_sock_ops_kern, sk), + si->dst_reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, sk)); + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, + off); + if (si->dst_reg == si->src_reg) { + *insn++ = BPF_JMP_A(1); + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, + offsetof(struct bpf_sock_ops_kern, + temp)); + } + } break; case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags): diff --git a/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c b/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c index 7e8fe1bad..d243d6713 100644 --- a/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c +++ b/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c @@ -42,6 +42,15 @@ static void verify_result(struct tcpbpf_globals *result) /* check getsockopt for window_clamp */ ASSERT_EQ(result->window_clamp_client, 9216, "window_clamp_client"); ASSERT_EQ(result->window_clamp_server, 9216, "window_clamp_server"); + + ASSERT_EQ(result->timeout_init_req_seen, 1, "timeout_init_req_seen"); + ASSERT_EQ(result->timeout_init_req_rtt_min, 0, "timeout_init_req_rtt_min"); + + ASSERT_EQ(result->rwnd_init_req_seen, 1, "rwnd_init_req_seen"); + ASSERT_EQ(result->rwnd_init_req_rtt_min, 0, "rwnd_init_req_rtt_min"); + + ASSERT_EQ(result->needs_ecn_req_seen, 1, "needs_ecn_req_seen"); + ASSERT_EQ(result->needs_ecn_req_rtt_min, 0, "needs_ecn_req_rtt_min"); } static void run_test(struct tcpbpf_globals *result) diff --git a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c index 6935f32ee..79757a19b 100644 --- a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c +++ b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c @@ -85,6 +85,27 @@ int bpf_testcb(struct bpf_sock_ops *skops) global.event_map |= (1 << op); switch (op) { + case BPF_SOCK_OPS_TIMEOUT_INIT: + if (!skops->is_fullsock) { + global.timeout_init_req_seen = 1; + global.timeout_init_req_rtt_min = skops->rtt_min; + } + rv = -1; + break; + case BPF_SOCK_OPS_RWND_INIT: + if (!skops->is_fullsock) { + global.rwnd_init_req_seen = 1; + global.rwnd_init_req_rtt_min = skops->rtt_min; + } + rv = 0; + break; + case BPF_SOCK_OPS_NEEDS_ECN: + if (!skops->is_fullsock) { + global.needs_ecn_req_seen = 1; + global.needs_ecn_req_rtt_min = skops->rtt_min; + } + rv = 0; + break; case BPF_SOCK_OPS_TCP_CONNECT_CB: rv = bpf_setsockopt(skops, SOL_TCP, TCP_WINDOW_CLAMP, &window_clamp, sizeof(window_clamp)); diff --git a/tools/testing/selftests/bpf/test_tcpbpf.h b/tools/testing/selftests/bpf/test_tcpbpf.h index 9dd9b5590..46500c1d6 100644 --- a/tools/testing/selftests/bpf/test_tcpbpf.h +++ b/tools/testing/selftests/bpf/test_tcpbpf.h @@ -18,5 +18,11 @@ struct tcpbpf_globals { __u32 tcp_saved_syn; __u32 window_clamp_client; __u32 window_clamp_server; + __u32 timeout_init_req_seen; + __u32 timeout_init_req_rtt_min; + __u32 rwnd_init_req_seen; + __u32 rwnd_init_req_rtt_min; + __u32 needs_ecn_req_seen; + __u32 needs_ecn_req_rtt_min; }; #endif -- 2.43.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock 2026-04-09 6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman @ 2026-04-09 14:29 ` Alexei Starovoitov 0 siblings, 0 replies; 5+ messages in thread From: Alexei Starovoitov @ 2026-04-09 14:29 UTC (permalink / raw) To: Werner Kasselman Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, John Fastabend, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Shuah Khan, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, stable@vger.kernel.org On Wed, Apr 8, 2026 at 11:10 PM Werner Kasselman <werner@verivus.ai> wrote: > > sock_ops_convert_ctx_access() emits guarded reads for tcp_sock-backed > bpf_sock_ops fields such as snd_cwnd, srtt_us, snd_ssthresh, rcv_nxt, > snd_nxt, snd_una, mss_cache, ecn_flags, rate_delivered, and > rate_interval_us. Those accesses go through SOCK_OPS_GET_TCP_SOCK_FIELD(), > which checks is_locked_tcp_sock before dereferencing sock_ops.sk. > > The rtt_min case is different. Because it reads a subfield of > struct minmax, it uses a custom open-coded load sequence instead of the > usual helper macro, and that sequence currently dereferences sock_ops.sk > without checking is_locked_tcp_sock first. > > This is unsafe when sock_ops.sk points to a request_sock-backed object > instead of a locked full tcp_sock. That is reachable not only from the > SYNACK header option callbacks, but also from other request_sock-backed > sock_ops callbacks such as BPF_SOCK_OPS_TIMEOUT_INIT, > BPF_SOCK_OPS_RWND_INIT, and BPF_SOCK_OPS_NEEDS_ECN. In those cases, > reading ctx->rtt_min makes the generated code treat a request_sock as a > tcp_sock and read beyond the end of the request_sock allocation. > > Fix the rtt_min conversion by adding the same is_locked_tcp_sock guard > used for the other tcp_sock field reads. Also make the accessed subfield > explicit by using offsetof(struct minmax_sample, v). > > Add a selftest that verifies request_sock-backed sock_ops callbacks see > ctx->rtt_min as zero after the fix. > > Found via AST-based call-graph analysis using sqry. > > Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more") > Cc: stable@vger.kernel.org > Signed-off-by: Werner Kasselman <werner@verivus.com> > --- > net/core/filter.c | 53 +++++++++++++++---- > .../selftests/bpf/prog_tests/tcpbpf_user.c | 9 ++++ > .../selftests/bpf/progs/test_tcpbpf_kern.c | 21 ++++++++ > tools/testing/selftests/bpf/test_tcpbpf.h | 6 +++ > 4 files changed, 79 insertions(+), 10 deletions(-) > > diff --git a/net/core/filter.c b/net/core/filter.c > index 78b548158..5040bf7e4 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -10827,16 +10827,49 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type, > case offsetof(struct bpf_sock_ops, rtt_min): > BUILD_BUG_ON(sizeof_field(struct tcp_sock, rtt_min) != > sizeof(struct minmax)); > - BUILD_BUG_ON(sizeof(struct minmax) < > - sizeof(struct minmax_sample)); > - > - *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( > - struct bpf_sock_ops_kern, sk), > - si->dst_reg, si->src_reg, > - offsetof(struct bpf_sock_ops_kern, sk)); > - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, > - offsetof(struct tcp_sock, rtt_min) + > - sizeof_field(struct minmax_sample, t)); > + BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, rtt_min) != > + sizeof_field(struct minmax_sample, v)); > + off = offsetof(struct tcp_sock, rtt_min) + > + offsetof(struct minmax_sample, v); > + > + { > + int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2; > + please de-claude your patches before posting. pw-bot: cr ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-09 14:30 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman 2026-04-07 1:25 ` Martin KaFai Lau 2026-04-07 1:56 ` Werner Kasselman 2026-04-09 6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman 2026-04-09 14:29 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox