public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Werner Kasselman <werner@verivus.ai>
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>,
	 Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	 "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>, Lawrence Brakmo <brakmo@fb.com>,
	 "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	 "stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
Date: Mon, 6 Apr 2026 18:25:52 -0700	[thread overview]
Message-ID: <2026471147.8A6S.martin.lau@linux.dev> (raw)
In-Reply-To: <20260406224953.2787289-1-werner@verivus.com>

On Mon, Apr 06, 2026 at 10:49:56PM +0000, Werner Kasselman wrote:
> sock_ops_convert_ctx_access() generates BPF instructions to inline
> context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For
> tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
> SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock
> and returns 0 when the socket is not a locked full TCP socket.
> 
> However, the rtt_min field bypasses this guard entirely: it emits a raw
> two-instruction load sequence (load sk pointer, then load from
> tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.
> 
> This is a problem because bpf_skops_hdr_opt_len() and
> bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a
> tcp_request_sock (cast from request_sock) during SYN-ACK processing,
> with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program
> with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this
> callback, the generated code treats the tcp_request_sock pointer as a
> tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is
> well past the end of the tcp_request_sock allocation, causing an
> out-of-bounds slab read.

This is not limited to hdr related CB flags.

It also happens to earlier CB flags that have a request_sock, such as
BPF_SOCK_OPS_RWND_INIT.

> 
> The rtt_min field was introduced in the same commit as the other
> tcp_sock fields but was given hand-rolled access code because it reads
> a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct
> struct member, making it incompatible with the SOCK_OPS_GET_FIELD()
> macro. This hand-rolled code omitted the is_fullsock guard that the
> macro provides. The guard was later renamed to is_locked_tcp_sock in
> commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").
> 
> Add the is_locked_tcp_sock guard to the rtt_min case, replicating the
> exact instruction pattern used by SOCK_OPS_GET_FIELD() including
> proper handling of the dst_reg==src_reg case with temp register
> save/restore. Use offsetof(struct minmax_sample, v) for the sub-field
> offset to match the style in bpf_tcp_sock_convert_ctx_access().
> 
> Found via AST-based call-graph analysis using sqry.
> 
> Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
> Cc: stable@vger.kernel.org
> Signed-off-by: Werner Kasselman <werner@verivus.com>
> ---
>  net/core/filter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158fb0..58f0735b18d9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>  		BUILD_BUG_ON(sizeof(struct minmax) <
>  			     sizeof(struct minmax_sample));
>  
> +		/* Unlike other tcp_sock fields that use
> +		 * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
> +		 * custom access pattern because it reads a sub-field
> +		 * (rtt_min.s[0].v) rather than a direct struct member.
> +		 * We must still guard the access with is_locked_tcp_sock
> +		 * to prevent an OOB read when sk points to a
> +		 * tcp_request_sock (e.g., during SYN-ACK processing via
> +		 * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
> +		 */
> +		off = offsetof(struct tcp_sock, rtt_min) +
> +		      offsetof(struct minmax_sample, v);
> +	{
> +		int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
> +
> +		if (si->dst_reg == reg || si->src_reg == reg)
> +			reg--;
> +		if (si->dst_reg == reg || si->src_reg == reg)
> +			reg--;
> +		if (si->dst_reg == si->src_reg) {
> +			*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
> +					  offsetof(struct bpf_sock_ops_kern,
> +					  temp));
> +			fullsock_reg = reg;
> +			jmp += 2;
> +		}
> +		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> +						struct bpf_sock_ops_kern,
> +						is_locked_tcp_sock),
> +				      fullsock_reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +					       is_locked_tcp_sock));
> +		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
> +		if (si->dst_reg == si->src_reg)
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +				      temp));
>  		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
>  						struct bpf_sock_ops_kern, sk),
>  				      si->dst_reg, si->src_reg,
>  				      offsetof(struct bpf_sock_ops_kern, sk));
> -		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> -				      offsetof(struct tcp_sock, rtt_min) +
> -				      sizeof_field(struct minmax_sample, t));
> +		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> +		if (si->dst_reg == si->src_reg) {
> +			*insn++ = BPF_JMP_A(1);
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> +				      offsetof(struct bpf_sock_ops_kern,
> +				      temp));

There is an existing bug in this copy-and-paste codes [1] and now is
repeated here, so please find a way to refactor it to be reusable instead of
duplicating it.

This also needs a test. It should be a subtest of [1],
so [1] need to land first.

[1]: https://lore.kernel.org/bpf/20260406031330.187630-1-jiayuan.chen@linux.dev/

  reply	other threads:[~2026-04-07  1:26 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman
2026-04-07  1:25 ` Martin KaFai Lau [this message]
2026-04-07  1:56   ` Werner Kasselman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2026471147.8A6S.martin.lau@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brakmo@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jiayuan.chen@linux.dev \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=werner@verivus.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox