[PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
@ 2026-04-06 22:49 Werner Kasselman
  2026-04-07  1:25 ` Martin KaFai Lau
  2026-04-09  6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman
  0 siblings, 2 replies; 5+ messages in thread
From: Werner Kasselman @ 2026-04-06 22:49 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, John Fastabend, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org,
	netdev@vger.kernel.org, stable@vger.kernel.org

sock_ops_convert_ctx_access() generates BPF instructions to inline
context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For
tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock
and returns 0 when the socket is not a locked full TCP socket.

However, the rtt_min field bypasses this guard entirely: it emits a raw
two-instruction load sequence (load sk pointer, then load from
tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.

This is a problem because bpf_skops_hdr_opt_len() and
bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a
tcp_request_sock (cast from request_sock) during SYN-ACK processing,
with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program
with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this
callback, the generated code treats the tcp_request_sock pointer as a
tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is
well past the end of the tcp_request_sock allocation, causing an
out-of-bounds slab read.

The rtt_min field was introduced in the same commit as the other
tcp_sock fields but was given hand-rolled access code because it reads
a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct
struct member, making it incompatible with the SOCK_OPS_GET_FIELD()
macro. This hand-rolled code omitted the is_fullsock guard that the
macro provides. The guard was later renamed to is_locked_tcp_sock in
commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").

Add the is_locked_tcp_sock guard to the rtt_min case, replicating the
exact instruction pattern used by SOCK_OPS_GET_FIELD() including
proper handling of the dst_reg==src_reg case with temp register
save/restore. Use offsetof(struct minmax_sample, v) for the sub-field
offset to match the style in bpf_tcp_sock_convert_ctx_access().

Found via AST-based call-graph analysis using sqry.

Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
Cc: stable@vger.kernel.org
Signed-off-by: Werner Kasselman <werner@verivus.com>
---
 net/core/filter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 78b548158fb0..58f0735b18d9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		BUILD_BUG_ON(sizeof(struct minmax) <
 			     sizeof(struct minmax_sample));

+		/* Unlike other tcp_sock fields that use
+		 * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
+		 * custom access pattern because it reads a sub-field
+		 * (rtt_min.s[0].v) rather than a direct struct member.
+		 * We must still guard the access with is_locked_tcp_sock
+		 * to prevent an OOB read when sk points to a
+		 * tcp_request_sock (e.g., during SYN-ACK processing via
+		 * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
+		 */
+		off = offsetof(struct tcp_sock, rtt_min) +
+		      offsetof(struct minmax_sample, v);
+	{
+		int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
+
+		if (si->dst_reg == reg || si->src_reg == reg)
+			reg--;
+		if (si->dst_reg == reg || si->src_reg == reg)
+			reg--;
+		if (si->dst_reg == si->src_reg) {
+			*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
+					  offsetof(struct bpf_sock_ops_kern,
+					  temp));
+			fullsock_reg = reg;
+			jmp += 2;
+		}
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+						struct bpf_sock_ops_kern,
+						is_locked_tcp_sock),
+				      fullsock_reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern,
+					       is_locked_tcp_sock));
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
+		if (si->dst_reg == si->src_reg)
+			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern,
+				      temp));
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
 						struct bpf_sock_ops_kern, sk),
 				      si->dst_reg, si->src_reg,
 				      offsetof(struct bpf_sock_ops_kern, sk));
-		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
-				      offsetof(struct tcp_sock, rtt_min) +
-				      sizeof_field(struct minmax_sample, t));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
+		if (si->dst_reg == si->src_reg) {
+			*insn++ = BPF_JMP_A(1);
+			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
+				      offsetof(struct bpf_sock_ops_kern,
+				      temp));
+		}
+	}
 		break;

 	case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags):
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
  2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman
@ 2026-04-07  1:25 ` Martin KaFai Lau
  2026-04-07  1:56   ` Werner Kasselman
  2026-04-09  6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman
  1 sibling, 1 reply; 5+ messages in thread
From: Martin KaFai Lau @ 2026-04-07  1:25 UTC (permalink / raw)
  To: Werner Kasselman
  Cc: Jiayuan Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, John Fastabend, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org,
	netdev@vger.kernel.org, stable@vger.kernel.org

On Mon, Apr 06, 2026 at 10:49:56PM +0000, Werner Kasselman wrote:
> sock_ops_convert_ctx_access() generates BPF instructions to inline
> context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For
> tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
> SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock
> and returns 0 when the socket is not a locked full TCP socket.
> 
> However, the rtt_min field bypasses this guard entirely: it emits a raw
> two-instruction load sequence (load sk pointer, then load from
> tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.
> 
> This is a problem because bpf_skops_hdr_opt_len() and
> bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a
> tcp_request_sock (cast from request_sock) during SYN-ACK processing,
> with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program
> with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this
> callback, the generated code treats the tcp_request_sock pointer as a
> tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is
> well past the end of the tcp_request_sock allocation, causing an
> out-of-bounds slab read.

This is not limited to hdr related CB flags.

It also happens to earlier CB flags that have a request_sock, such as
BPF_SOCK_OPS_RWND_INIT.

> 
> The rtt_min field was introduced in the same commit as the other
> tcp_sock fields but was given hand-rolled access code because it reads
> a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct
> struct member, making it incompatible with the SOCK_OPS_GET_FIELD()
> macro. This hand-rolled code omitted the is_fullsock guard that the
> macro provides. The guard was later renamed to is_locked_tcp_sock in
> commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").
> 
> Add the is_locked_tcp_sock guard to the rtt_min case, replicating the
> exact instruction pattern used by SOCK_OPS_GET_FIELD() including
> proper handling of the dst_reg==src_reg case with temp register
> save/restore. Use offsetof(struct minmax_sample, v) for the sub-field
> offset to match the style in bpf_tcp_sock_convert_ctx_access().
> 
> Found via AST-based call-graph analysis using sqry.
> 
> Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
> Cc: stable@vger.kernel.org
> Signed-off-by: Werner Kasselman <werner@verivus.com>
> ---
>  net/core/filter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158fb0..58f0735b18d9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>  		BUILD_BUG_ON(sizeof(struct minmax) <
>  			     sizeof(struct minmax_sample));
>  
> +		/* Unlike other tcp_sock fields that use
> +		 * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
> +		 * custom access pattern because it reads a sub-field
> +		 * (rtt_min.s[0].v) rather than a direct struct member.
> +		 * We must still guard the access with is_locked_tcp_sock
> +		 * to prevent an OOB read when sk points to a
> +		 * tcp_request_sock (e.g., during SYN-ACK processing via
> +		 * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
> +		 */
> +		off = offsetof(struct tcp_sock, rtt_min) +
> +		      offsetof(struct minmax_sample, v);
> +	{
> +		int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
> +
> +		if (si->dst_reg == reg || si->src_reg == reg)
> +			reg--;
> +		if (si->dst_reg == reg || si->src_reg == reg)
> +			reg--;
> +		if (si->dst_reg == si->src_reg) {
> +			*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
> +					  offsetof(struct bpf_sock_ops_kern,
> +					  temp));
> +			fullsock_reg = reg;
> +			jmp += 2;
> +		}
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +						struct bpf_sock_ops_kern,
> +						is_locked_tcp_sock),
> +				      fullsock_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +					       is_locked_tcp_sock));
> +		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
> +		if (si->dst_reg == si->src_reg)
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +				      temp));
>  		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
>  						struct bpf_sock_ops_kern, sk),
>  				      si->dst_reg, si->src_reg,
>  				      offsetof(struct bpf_sock_ops_kern, sk));
> -		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> -				      offsetof(struct tcp_sock, rtt_min) +
> -				      sizeof_field(struct minmax_sample, t));
> +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> +		if (si->dst_reg == si->src_reg) {
> +			*insn++ = BPF_JMP_A(1);
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +				      temp));

There is an existing bug in this copy-and-paste codes [1] and now is
repeated here, so please find a way to refactor it to be reusable instead of
duplicating it.

This also needs a test. It should be a subtest of [1],
so [1] need to land first.

[1]: https://lore.kernel.org/bpf/20260406031330.187630-1-jiayuan.chen@linux.dev/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
  2026-04-07  1:25 ` Martin KaFai Lau
@ 2026-04-07  1:56   ` Werner Kasselman
  0 siblings, 0 replies; 5+ messages in thread
From: Werner Kasselman @ 2026-04-07  1:56 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Jiayuan Chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, John Fastabend, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Lawrence Brakmo, bpf@vger.kernel.org,
	netdev@vger.kernel.org, stable@vger.kernel.org

  Thanks for the review.

  You are right that this is not limited to the SYN-ACK header-option paths.
  The same unguarded ctx->rtt_min access is reachable from request_sock-backed
  sock_ops invocations through tcp_call_bpf(), including BPF_SOCK_OPS_RWND_INIT,
  BPF_SOCK_OPS_TIMEOUT_INIT, and BPF_SOCK_OPS_NEEDS_ECN. I'll update the
  changelog to describe the affected paths more accurately.

  I also agree that this should not duplicate the existing guarded ctx-access
  sequence. I'll rework the change to reuse a common helper/macro instead of
  open-coding another copy, and I'll add a selftest as a subtest of [1] after
  [1] lands.

  Thanks,
  Werner

-----Original Message-----
From: Martin KaFai Lau <martin.lau@linux.dev> 
Sent: Tuesday, 7 April 2026 11:26 AM
To: Werner Kasselman <werner@verivus.ai>
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>; Alexei Starovoitov <ast@kernel.org>; Daniel Borkmann <daniel@iogearbox.net>; Andrii Nakryiko <andrii@kernel.org>; John Fastabend <john.fastabend@gmail.com>; David S. Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Lawrence Brakmo <brakmo@fb.com>; bpf@vger.kernel.org; netdev@vger.kernel.org; stable@vger.kernel.org
Subject: Re: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access

On Mon, Apr 06, 2026 at 10:49:56PM +0000, Werner Kasselman wrote:
> sock_ops_convert_ctx_access() generates BPF instructions to inline 
> context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For 
> tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
> SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock 
> and returns 0 when the socket is not a locked full TCP socket.
> 
> However, the rtt_min field bypasses this guard entirely: it emits a 
> raw two-instruction load sequence (load sk pointer, then load from 
> tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.
> 
> This is a problem because bpf_skops_hdr_opt_len() and
> bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a 
> tcp_request_sock (cast from request_sock) during SYN-ACK processing, 
> with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program 
> with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this 
> callback, the generated code treats the tcp_request_sock pointer as a 
> tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is 
> well past the end of the tcp_request_sock allocation, causing an 
> out-of-bounds slab read.

This is not limited to hdr related CB flags.

It also happens to earlier CB flags that have a request_sock, such as BPF_SOCK_OPS_RWND_INIT.

> 
> The rtt_min field was introduced in the same commit as the other 
> tcp_sock fields but was given hand-rolled access code because it reads 
> a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct 
> struct member, making it incompatible with the SOCK_OPS_GET_FIELD() 
> macro. This hand-rolled code omitted the is_fullsock guard that the 
> macro provides. The guard was later renamed to is_locked_tcp_sock in 
> commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").
> 
> Add the is_locked_tcp_sock guard to the rtt_min case, replicating the 
> exact instruction pattern used by SOCK_OPS_GET_FIELD() including 
> proper handling of the dst_reg==src_reg case with temp register 
> save/restore. Use offsetof(struct minmax_sample, v) for the sub-field 
> offset to match the style in bpf_tcp_sock_convert_ctx_access().
> 
> Found via AST-based call-graph analysis using sqry.
> 
> Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
> Cc: stable@vger.kernel.org
> Signed-off-by: Werner Kasselman <werner@verivus.com>
> ---
>  net/core/filter.c | 47 
> ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c index 
> 78b548158fb0..58f0735b18d9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>  		BUILD_BUG_ON(sizeof(struct minmax) <
>  			     sizeof(struct minmax_sample));
>  
> +		/* Unlike other tcp_sock fields that use
> +		 * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
> +		 * custom access pattern because it reads a sub-field
> +		 * (rtt_min.s[0].v) rather than a direct struct member.
> +		 * We must still guard the access with is_locked_tcp_sock
> +		 * to prevent an OOB read when sk points to a
> +		 * tcp_request_sock (e.g., during SYN-ACK processing via
> +		 * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
> +		 */
> +		off = offsetof(struct tcp_sock, rtt_min) +
> +		      offsetof(struct minmax_sample, v);
> +	{
> +		int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
> +
> +		if (si->dst_reg == reg || si->src_reg == reg)
> +			reg--;
> +		if (si->dst_reg == reg || si->src_reg == reg)
> +			reg--;
> +		if (si->dst_reg == si->src_reg) {
> +			*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
> +					  offsetof(struct bpf_sock_ops_kern,
> +					  temp));
> +			fullsock_reg = reg;
> +			jmp += 2;
> +		}
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +						struct bpf_sock_ops_kern,
> +						is_locked_tcp_sock),
> +				      fullsock_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +					       is_locked_tcp_sock));
> +		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
> +		if (si->dst_reg == si->src_reg)
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +				      temp));
>  		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
>  						struct bpf_sock_ops_kern, sk),
>  				      si->dst_reg, si->src_reg,
>  				      offsetof(struct bpf_sock_ops_kern, sk));
> -		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> -				      offsetof(struct tcp_sock, rtt_min) +
> -				      sizeof_field(struct minmax_sample, t));
> +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> +		if (si->dst_reg == si->src_reg) {
> +			*insn++ = BPF_JMP_A(1);
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +				      temp));

There is an existing bug in this copy-and-paste codes [1] and now is repeated here, so please find a way to refactor it to be reusable instead of duplicating it.

This also needs a test. It should be a subtest of [1], so [1] need to land first.

[1]: https://lore.kernel.org/bpf/20260406031330.187630-1-jiayuan.chen@linux.dev/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock
  2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman
  2026-04-07  1:25 ` Martin KaFai Lau
@ 2026-04-09  6:10 ` Werner Kasselman
  2026-04-09 14:29   ` Alexei Starovoitov
  1 sibling, 1 reply; 5+ messages in thread
From: Werner Kasselman @ 2026-04-09  6:10 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	John Fastabend, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan, bpf@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, stable@vger.kernel.org,
	Werner Kasselman

sock_ops_convert_ctx_access() emits guarded reads for tcp_sock-backed
bpf_sock_ops fields such as snd_cwnd, srtt_us, snd_ssthresh, rcv_nxt,
snd_nxt, snd_una, mss_cache, ecn_flags, rate_delivered, and
rate_interval_us. Those accesses go through SOCK_OPS_GET_TCP_SOCK_FIELD(),
which checks is_locked_tcp_sock before dereferencing sock_ops.sk.

The rtt_min case is different. Because it reads a subfield of
struct minmax, it uses a custom open-coded load sequence instead of the
usual helper macro, and that sequence currently dereferences sock_ops.sk
without checking is_locked_tcp_sock first.

This is unsafe when sock_ops.sk points to a request_sock-backed object
instead of a locked full tcp_sock. That is reachable not only from the
SYNACK header option callbacks, but also from other request_sock-backed
sock_ops callbacks such as BPF_SOCK_OPS_TIMEOUT_INIT,
BPF_SOCK_OPS_RWND_INIT, and BPF_SOCK_OPS_NEEDS_ECN. In those cases,
reading ctx->rtt_min makes the generated code treat a request_sock as a
tcp_sock and read beyond the end of the request_sock allocation.

Fix the rtt_min conversion by adding the same is_locked_tcp_sock guard
used for the other tcp_sock field reads. Also make the accessed subfield
explicit by using offsetof(struct minmax_sample, v).

Add a selftest that verifies request_sock-backed sock_ops callbacks see
ctx->rtt_min as zero after the fix.

Found via AST-based call-graph analysis using sqry.

Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
Cc: stable@vger.kernel.org
Signed-off-by: Werner Kasselman <werner@verivus.com>
---
 net/core/filter.c                             | 53 +++++++++++++++----
 .../selftests/bpf/prog_tests/tcpbpf_user.c    |  9 ++++
 .../selftests/bpf/progs/test_tcpbpf_kern.c    | 21 ++++++++
 tools/testing/selftests/bpf/test_tcpbpf.h     |  6 +++
 4 files changed, 79 insertions(+), 10 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 78b548158..5040bf7e4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10827,16 +10827,49 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 	case offsetof(struct bpf_sock_ops, rtt_min):
 		BUILD_BUG_ON(sizeof_field(struct tcp_sock, rtt_min) !=
 			     sizeof(struct minmax));
-		BUILD_BUG_ON(sizeof(struct minmax) <
-			     sizeof(struct minmax_sample));
-
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
-						struct bpf_sock_ops_kern, sk),
-				      si->dst_reg, si->src_reg,
-				      offsetof(struct bpf_sock_ops_kern, sk));
-		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
-				      offsetof(struct tcp_sock, rtt_min) +
-				      sizeof_field(struct minmax_sample, t));
+		BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, rtt_min) !=
+			     sizeof_field(struct minmax_sample, v));
+		off = offsetof(struct tcp_sock, rtt_min) +
+		      offsetof(struct minmax_sample, v);
+
+		{
+			int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
+
+			if (si->dst_reg == reg || si->src_reg == reg)
+				reg--;
+			if (si->dst_reg == reg || si->src_reg == reg)
+				reg--;
+			if (si->dst_reg == si->src_reg) {
+				*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
+						      offsetof(struct bpf_sock_ops_kern,
+							       temp));
+				fullsock_reg = reg;
+				jmp += 2;
+			}
+			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+							struct bpf_sock_ops_kern,
+							is_locked_tcp_sock),
+					      fullsock_reg, si->src_reg,
+					      offsetof(struct bpf_sock_ops_kern,
+						       is_locked_tcp_sock));
+			*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
+			if (si->dst_reg == si->src_reg)
+				*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
+						      offsetof(struct bpf_sock_ops_kern,
+							       temp));
+			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
+							struct bpf_sock_ops_kern, sk),
+					      si->dst_reg, si->src_reg,
+					      offsetof(struct bpf_sock_ops_kern, sk));
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+					      off);
+			if (si->dst_reg == si->src_reg) {
+				*insn++ = BPF_JMP_A(1);
+				*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
+						      offsetof(struct bpf_sock_ops_kern,
+							       temp));
+			}
+		}
 		break;
 
 	case offsetof(struct bpf_sock_ops, bpf_sock_ops_cb_flags):
diff --git a/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c b/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c
index 7e8fe1bad..d243d6713 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcpbpf_user.c
@@ -42,6 +42,15 @@ static void verify_result(struct tcpbpf_globals *result)
 	/* check getsockopt for window_clamp */
 	ASSERT_EQ(result->window_clamp_client, 9216, "window_clamp_client");
 	ASSERT_EQ(result->window_clamp_server, 9216, "window_clamp_server");
+
+	ASSERT_EQ(result->timeout_init_req_seen, 1, "timeout_init_req_seen");
+	ASSERT_EQ(result->timeout_init_req_rtt_min, 0, "timeout_init_req_rtt_min");
+
+	ASSERT_EQ(result->rwnd_init_req_seen, 1, "rwnd_init_req_seen");
+	ASSERT_EQ(result->rwnd_init_req_rtt_min, 0, "rwnd_init_req_rtt_min");
+
+	ASSERT_EQ(result->needs_ecn_req_seen, 1, "needs_ecn_req_seen");
+	ASSERT_EQ(result->needs_ecn_req_rtt_min, 0, "needs_ecn_req_rtt_min");
 }
 
 static void run_test(struct tcpbpf_globals *result)
diff --git a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
index 6935f32ee..79757a19b 100644
--- a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
@@ -85,6 +85,27 @@ int bpf_testcb(struct bpf_sock_ops *skops)
 	global.event_map |= (1 << op);
 
 	switch (op) {
+	case BPF_SOCK_OPS_TIMEOUT_INIT:
+		if (!skops->is_fullsock) {
+			global.timeout_init_req_seen = 1;
+			global.timeout_init_req_rtt_min = skops->rtt_min;
+		}
+		rv = -1;
+		break;
+	case BPF_SOCK_OPS_RWND_INIT:
+		if (!skops->is_fullsock) {
+			global.rwnd_init_req_seen = 1;
+			global.rwnd_init_req_rtt_min = skops->rtt_min;
+		}
+		rv = 0;
+		break;
+	case BPF_SOCK_OPS_NEEDS_ECN:
+		if (!skops->is_fullsock) {
+			global.needs_ecn_req_seen = 1;
+			global.needs_ecn_req_rtt_min = skops->rtt_min;
+		}
+		rv = 0;
+		break;
 	case BPF_SOCK_OPS_TCP_CONNECT_CB:
 		rv = bpf_setsockopt(skops, SOL_TCP, TCP_WINDOW_CLAMP,
 				    &window_clamp, sizeof(window_clamp));
diff --git a/tools/testing/selftests/bpf/test_tcpbpf.h b/tools/testing/selftests/bpf/test_tcpbpf.h
index 9dd9b5590..46500c1d6 100644
--- a/tools/testing/selftests/bpf/test_tcpbpf.h
+++ b/tools/testing/selftests/bpf/test_tcpbpf.h
@@ -18,5 +18,11 @@ struct tcpbpf_globals {
 	__u32 tcp_saved_syn;
 	__u32 window_clamp_client;
 	__u32 window_clamp_server;
+	__u32 timeout_init_req_seen;
+	__u32 timeout_init_req_rtt_min;
+	__u32 rwnd_init_req_seen;
+	__u32 rwnd_init_req_rtt_min;
+	__u32 needs_ecn_req_seen;
+	__u32 needs_ecn_req_rtt_min;
 };
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock
  2026-04-09  6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman
@ 2026-04-09 14:29   ` Alexei Starovoitov
  0 siblings, 0 replies; 5+ messages in thread
From: Alexei Starovoitov @ 2026-04-09 14:29 UTC (permalink / raw)
  To: Werner Kasselman
  Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, John Fastabend, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Shuah Khan, bpf@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, stable@vger.kernel.org

On Wed, Apr 8, 2026 at 11:10 PM Werner Kasselman <werner@verivus.ai> wrote:
>
> sock_ops_convert_ctx_access() emits guarded reads for tcp_sock-backed
> bpf_sock_ops fields such as snd_cwnd, srtt_us, snd_ssthresh, rcv_nxt,
> snd_nxt, snd_una, mss_cache, ecn_flags, rate_delivered, and
> rate_interval_us. Those accesses go through SOCK_OPS_GET_TCP_SOCK_FIELD(),
> which checks is_locked_tcp_sock before dereferencing sock_ops.sk.
>
> The rtt_min case is different. Because it reads a subfield of
> struct minmax, it uses a custom open-coded load sequence instead of the
> usual helper macro, and that sequence currently dereferences sock_ops.sk
> without checking is_locked_tcp_sock first.
>
> This is unsafe when sock_ops.sk points to a request_sock-backed object
> instead of a locked full tcp_sock. That is reachable not only from the
> SYNACK header option callbacks, but also from other request_sock-backed
> sock_ops callbacks such as BPF_SOCK_OPS_TIMEOUT_INIT,
> BPF_SOCK_OPS_RWND_INIT, and BPF_SOCK_OPS_NEEDS_ECN. In those cases,
> reading ctx->rtt_min makes the generated code treat a request_sock as a
> tcp_sock and read beyond the end of the request_sock allocation.
>
> Fix the rtt_min conversion by adding the same is_locked_tcp_sock guard
> used for the other tcp_sock field reads. Also make the accessed subfield
> explicit by using offsetof(struct minmax_sample, v).
>
> Add a selftest that verifies request_sock-backed sock_ops callbacks see
> ctx->rtt_min as zero after the fix.
>
> Found via AST-based call-graph analysis using sqry.
>
> Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
> Cc: stable@vger.kernel.org
> Signed-off-by: Werner Kasselman <werner@verivus.com>
> ---
>  net/core/filter.c                             | 53 +++++++++++++++----
>  .../selftests/bpf/prog_tests/tcpbpf_user.c    |  9 ++++
>  .../selftests/bpf/progs/test_tcpbpf_kern.c    | 21 ++++++++
>  tools/testing/selftests/bpf/test_tcpbpf.h     |  6 +++
>  4 files changed, 79 insertions(+), 10 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158..5040bf7e4 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10827,16 +10827,49 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>         case offsetof(struct bpf_sock_ops, rtt_min):
>                 BUILD_BUG_ON(sizeof_field(struct tcp_sock, rtt_min) !=
>                              sizeof(struct minmax));
> -               BUILD_BUG_ON(sizeof(struct minmax) <
> -                            sizeof(struct minmax_sample));
> -
> -               *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> -                                               struct bpf_sock_ops_kern, sk),
> -                                     si->dst_reg, si->src_reg,
> -                                     offsetof(struct bpf_sock_ops_kern, sk));
> -               *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> -                                     offsetof(struct tcp_sock, rtt_min) +
> -                                     sizeof_field(struct minmax_sample, t));
> +               BUILD_BUG_ON(sizeof_field(struct bpf_sock_ops, rtt_min) !=
> +                            sizeof_field(struct minmax_sample, v));
> +               off = offsetof(struct tcp_sock, rtt_min) +
> +                     offsetof(struct minmax_sample, v);
> +
> +               {
> +                       int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
> +

please de-claude your patches before posting.

pw-bot: cr

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-09 14:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman
2026-04-07  1:25 ` Martin KaFai Lau
2026-04-07  1:56   ` Werner Kasselman
2026-04-09  6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman
2026-04-09 14:29   ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox