* [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
@ 2026-04-06 22:49 Werner Kasselman
2026-04-07 1:25 ` Martin KaFai Lau
0 siblings, 1 reply; 3+ messages in thread
From: Werner Kasselman @ 2026-04-06 22:49 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
Cc: Martin KaFai Lau, John Fastabend, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org,
netdev@vger.kernel.org, stable@vger.kernel.org
sock_ops_convert_ctx_access() generates BPF instructions to inline
context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For
tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock
and returns 0 when the socket is not a locked full TCP socket.
However, the rtt_min field bypasses this guard entirely: it emits a raw
two-instruction load sequence (load sk pointer, then load from
tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.
This is a problem because bpf_skops_hdr_opt_len() and
bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a
tcp_request_sock (cast from request_sock) during SYN-ACK processing,
with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program
with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this
callback, the generated code treats the tcp_request_sock pointer as a
tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is
well past the end of the tcp_request_sock allocation, causing an
out-of-bounds slab read.
The rtt_min field was introduced in the same commit as the other
tcp_sock fields but was given hand-rolled access code because it reads
a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct
struct member, making it incompatible with the SOCK_OPS_GET_FIELD()
macro. This hand-rolled code omitted the is_fullsock guard that the
macro provides. The guard was later renamed to is_locked_tcp_sock in
commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").
Add the is_locked_tcp_sock guard to the rtt_min case, replicating the
exact instruction pattern used by SOCK_OPS_GET_FIELD() including
proper handling of the dst_reg==src_reg case with temp register
save/restore. Use offsetof(struct minmax_sample, v) for the sub-field
offset to match the style in bpf_tcp_sock_convert_ctx_access().
Found via AST-based call-graph analysis using sqry.
Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
Cc: stable@vger.kernel.org
Signed-off-by: Werner Kasselman <werner@verivus.com>
---
net/core/filter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 3 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 78b548158fb0..58f0735b18d9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
BUILD_BUG_ON(sizeof(struct minmax) <
sizeof(struct minmax_sample));
+ /* Unlike other tcp_sock fields that use
+ * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
+ * custom access pattern because it reads a sub-field
+ * (rtt_min.s[0].v) rather than a direct struct member.
+ * We must still guard the access with is_locked_tcp_sock
+ * to prevent an OOB read when sk points to a
+ * tcp_request_sock (e.g., during SYN-ACK processing via
+ * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
+ */
+ off = offsetof(struct tcp_sock, rtt_min) +
+ offsetof(struct minmax_sample, v);
+ {
+ int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
+
+ if (si->dst_reg == reg || si->src_reg == reg)
+ reg--;
+ if (si->dst_reg == reg || si->src_reg == reg)
+ reg--;
+ if (si->dst_reg == si->src_reg) {
+ *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
+ offsetof(struct bpf_sock_ops_kern,
+ temp));
+ fullsock_reg = reg;
+ jmp += 2;
+ }
+ *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+ struct bpf_sock_ops_kern,
+ is_locked_tcp_sock),
+ fullsock_reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern,
+ is_locked_tcp_sock));
+ *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
+ if (si->dst_reg == si->src_reg)
+ *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern,
+ temp));
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
struct bpf_sock_ops_kern, sk),
si->dst_reg, si->src_reg,
offsetof(struct bpf_sock_ops_kern, sk));
- *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
- offsetof(struct tcp_sock, rtt_min) +
- sizeof_field(struct minmax_sample, t));
+ *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
+ if (si->dst_reg == si->src_reg) {
+ *insn++ = BPF_JMP_A(1);
+ *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
+ offsetof(struct bpf_sock_ops_kern,
+ temp));
+ }
+ }
break;
case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags):
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman
@ 2026-04-07 1:25 ` Martin KaFai Lau
2026-04-07 1:56 ` Werner Kasselman
0 siblings, 1 reply; 3+ messages in thread
From: Martin KaFai Lau @ 2026-04-07 1:25 UTC (permalink / raw)
To: Werner Kasselman
Cc: Jiayuan Chen, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, John Fastabend, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org,
netdev@vger.kernel.org, stable@vger.kernel.org
On Mon, Apr 06, 2026 at 10:49:56PM +0000, Werner Kasselman wrote:
> sock_ops_convert_ctx_access() generates BPF instructions to inline
> context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For
> tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
> SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock
> and returns 0 when the socket is not a locked full TCP socket.
>
> However, the rtt_min field bypasses this guard entirely: it emits a raw
> two-instruction load sequence (load sk pointer, then load from
> tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.
>
> This is a problem because bpf_skops_hdr_opt_len() and
> bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a
> tcp_request_sock (cast from request_sock) during SYN-ACK processing,
> with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program
> with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this
> callback, the generated code treats the tcp_request_sock pointer as a
> tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is
> well past the end of the tcp_request_sock allocation, causing an
> out-of-bounds slab read.
This is not limited to hdr related CB flags.
It also happens to earlier CB flags that have a request_sock, such as
BPF_SOCK_OPS_RWND_INIT.
>
> The rtt_min field was introduced in the same commit as the other
> tcp_sock fields but was given hand-rolled access code because it reads
> a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct
> struct member, making it incompatible with the SOCK_OPS_GET_FIELD()
> macro. This hand-rolled code omitted the is_fullsock guard that the
> macro provides. The guard was later renamed to is_locked_tcp_sock in
> commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").
>
> Add the is_locked_tcp_sock guard to the rtt_min case, replicating the
> exact instruction pattern used by SOCK_OPS_GET_FIELD() including
> proper handling of the dst_reg==src_reg case with temp register
> save/restore. Use offsetof(struct minmax_sample, v) for the sub-field
> offset to match the style in bpf_tcp_sock_convert_ctx_access().
>
> Found via AST-based call-graph analysis using sqry.
>
> Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
> Cc: stable@vger.kernel.org
> Signed-off-by: Werner Kasselman <werner@verivus.com>
> ---
> net/core/filter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158fb0..58f0735b18d9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
> BUILD_BUG_ON(sizeof(struct minmax) <
> sizeof(struct minmax_sample));
>
> + /* Unlike other tcp_sock fields that use
> + * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
> + * custom access pattern because it reads a sub-field
> + * (rtt_min.s[0].v) rather than a direct struct member.
> + * We must still guard the access with is_locked_tcp_sock
> + * to prevent an OOB read when sk points to a
> + * tcp_request_sock (e.g., during SYN-ACK processing via
> + * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
> + */
> + off = offsetof(struct tcp_sock, rtt_min) +
> + offsetof(struct minmax_sample, v);
> + {
> + int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
> +
> + if (si->dst_reg == reg || si->src_reg == reg)
> + reg--;
> + if (si->dst_reg == reg || si->src_reg == reg)
> + reg--;
> + if (si->dst_reg == si->src_reg) {
> + *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
> + fullsock_reg = reg;
> + jmp += 2;
> + }
> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> + struct bpf_sock_ops_kern,
> + is_locked_tcp_sock),
> + fullsock_reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + is_locked_tcp_sock));
> + *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
> + if (si->dst_reg == si->src_reg)
> + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
> *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> struct bpf_sock_ops_kern, sk),
> si->dst_reg, si->src_reg,
> offsetof(struct bpf_sock_ops_kern, sk));
> - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> - offsetof(struct tcp_sock, rtt_min) +
> - sizeof_field(struct minmax_sample, t));
> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> + if (si->dst_reg == si->src_reg) {
> + *insn++ = BPF_JMP_A(1);
> + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
There is an existing bug in this copy-and-paste codes [1] and now is
repeated here, so please find a way to refactor it to be reusable instead of
duplicating it.
This also needs a test. It should be a subtest of [1],
so [1] need to land first.
[1]: https://lore.kernel.org/bpf/20260406031330.187630-1-jiayuan.chen@linux.dev/
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
2026-04-07 1:25 ` Martin KaFai Lau
@ 2026-04-07 1:56 ` Werner Kasselman
0 siblings, 0 replies; 3+ messages in thread
From: Werner Kasselman @ 2026-04-07 1:56 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Jiayuan Chen, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, John Fastabend, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org,
netdev@vger.kernel.org, stable@vger.kernel.org
Thanks for the review.
You are right that this is not limited to the SYN-ACK header-option paths.
The same unguarded ctx->rtt_min access is reachable from request_sock-backed
sock_ops invocations through tcp_call_bpf(), including BPF_SOCK_OPS_RWND_INIT,
BPF_SOCK_OPS_TIMEOUT_INIT, and BPF_SOCK_OPS_NEEDS_ECN. I'll update the
changelog to describe the affected paths more accurately.
I also agree that this should not duplicate the existing guarded ctx-access
sequence. I'll rework the change to reuse a common helper/macro instead of
open-coding another copy, and I'll add a selftest as a subtest of [1] after
[1] lands.
Thanks,
Werner
-----Original Message-----
From: Martin KaFai Lau <martin.lau@linux.dev>
Sent: Tuesday, 7 April 2026 11:26 AM
To: Werner Kasselman <werner@verivus.ai>
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann <daniel@iogearbox.net>; Andrii Nakryiko <andrii@kernel.org>; John Fastabend <john.fastabend@gmail.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Lawrence Brakmo <brakmo@fb.com>; bpf@vger.kernel.org; netdev@vger.kernel.org; stable@vger.kernel.org
Subject: Re: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
On Mon, Apr 06, 2026 at 10:49:56PM +0000, Werner Kasselman wrote:
> sock_ops_convert_ctx_access() generates BPF instructions to inline
> context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For
> tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
> SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock
> and returns 0 when the socket is not a locked full TCP socket.
>
> However, the rtt_min field bypasses this guard entirely: it emits a
> raw two-instruction load sequence (load sk pointer, then load from
> tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.
>
> This is a problem because bpf_skops_hdr_opt_len() and
> bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a
> tcp_request_sock (cast from request_sock) during SYN-ACK processing,
> with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program
> with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this
> callback, the generated code treats the tcp_request_sock pointer as a
> tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is
> well past the end of the tcp_request_sock allocation, causing an
> out-of-bounds slab read.
This is not limited to hdr related CB flags.
It also happens to earlier CB flags that have a request_sock, such as BPF_SOCK_OPS_RWND_INIT.
>
> The rtt_min field was introduced in the same commit as the other
> tcp_sock fields but was given hand-rolled access code because it reads
> a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct
> struct member, making it incompatible with the SOCK_OPS_GET_FIELD()
> macro. This hand-rolled code omitted the is_fullsock guard that the
> macro provides. The guard was later renamed to is_locked_tcp_sock in
> commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").
>
> Add the is_locked_tcp_sock guard to the rtt_min case, replicating the
> exact instruction pattern used by SOCK_OPS_GET_FIELD() including
> proper handling of the dst_reg==src_reg case with temp register
> save/restore. Use offsetof(struct minmax_sample, v) for the sub-field
> offset to match the style in bpf_tcp_sock_convert_ctx_access().
>
> Found via AST-based call-graph analysis using sqry.
>
> Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
> Cc: stable@vger.kernel.org
> Signed-off-by: Werner Kasselman <werner@verivus.com>
> ---
> net/core/filter.c | 47
> ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c index
> 78b548158fb0..58f0735b18d9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
> BUILD_BUG_ON(sizeof(struct minmax) <
> sizeof(struct minmax_sample));
>
> + /* Unlike other tcp_sock fields that use
> + * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
> + * custom access pattern because it reads a sub-field
> + * (rtt_min.s[0].v) rather than a direct struct member.
> + * We must still guard the access with is_locked_tcp_sock
> + * to prevent an OOB read when sk points to a
> + * tcp_request_sock (e.g., during SYN-ACK processing via
> + * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
> + */
> + off = offsetof(struct tcp_sock, rtt_min) +
> + offsetof(struct minmax_sample, v);
> + {
> + int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
> +
> + if (si->dst_reg == reg || si->src_reg == reg)
> + reg--;
> + if (si->dst_reg == reg || si->src_reg == reg)
> + reg--;
> + if (si->dst_reg == si->src_reg) {
> + *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
> + fullsock_reg = reg;
> + jmp += 2;
> + }
> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> + struct bpf_sock_ops_kern,
> + is_locked_tcp_sock),
> + fullsock_reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + is_locked_tcp_sock));
> + *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
> + if (si->dst_reg == si->src_reg)
> + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
> *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> struct bpf_sock_ops_kern, sk),
> si->dst_reg, si->src_reg,
> offsetof(struct bpf_sock_ops_kern, sk));
> - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> - offsetof(struct tcp_sock, rtt_min) +
> - sizeof_field(struct minmax_sample, t));
> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> + if (si->dst_reg == si->src_reg) {
> + *insn++ = BPF_JMP_A(1);
> + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
There is an existing bug in this copy-and-paste codes [1] and now is repeated here, so please find a way to refactor it to be reusable instead of duplicating it.
This also needs a test. It should be a subtest of [1], so [1] need to land first.
[1]: https://lore.kernel.org/bpf/20260406031330.187630-1-jiayuan.chen@linux.dev/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-07 1:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman
2026-04-07 1:25 ` Martin KaFai Lau
2026-04-07 1:56 ` Werner Kasselman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox