From: Martin KaFai Lau <martin.lau@linux.dev>
To: Werner Kasselman <werner@verivus.ai>
Cc: Jiayuan Chen <jiayuan.chen@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
John Fastabend <john.fastabend@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Lawrence Brakmo <brakmo@fb.com>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access
Date: Mon, 6 Apr 2026 18:25:52 -0700 [thread overview]
Message-ID: <2026471147.8A6S.martin.lau@linux.dev> (raw)
In-Reply-To: <20260406224953.2787289-1-werner@verivus.com>
On Mon, Apr 06, 2026 at 10:49:56PM +0000, Werner Kasselman wrote:
> sock_ops_convert_ctx_access() generates BPF instructions to inline
> context field accesses for BPF_PROG_TYPE_SOCK_OPS programs. For
> tcp_sock-specific fields like snd_cwnd, srtt_us, etc., it uses the
> SOCK_OPS_GET_TCP_SOCK_FIELD() macro which checks is_locked_tcp_sock
> and returns 0 when the socket is not a locked full TCP socket.
>
> However, the rtt_min field bypasses this guard entirely: it emits a raw
> two-instruction load sequence (load sk pointer, then load from
> tcp_sock->rtt_min offset) without checking is_locked_tcp_sock first.
>
> This is a problem because bpf_skops_hdr_opt_len() and
> bpf_skops_write_hdr_opt() in tcp_output.c set sock_ops.sk to a
> tcp_request_sock (cast from request_sock) during SYN-ACK processing,
> with is_fullsock=0 and is_locked_tcp_sock=0. If a SOCK_OPS program
> with BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG reads ctx->rtt_min in this
> callback, the generated code treats the tcp_request_sock pointer as a
> tcp_sock and reads at offsetof(struct tcp_sock, rtt_min) -- which is
> well past the end of the tcp_request_sock allocation, causing an
> out-of-bounds slab read.
This is not limited to hdr related CB flags.
It also happens to earlier CB flags that have a request_sock, such as
BPF_SOCK_OPS_RWND_INIT.
>
> The rtt_min field was introduced in the same commit as the other
> tcp_sock fields but was given hand-rolled access code because it reads
> a sub-field (rtt_min.s[0].v, a minmax_sample) rather than a direct
> struct member, making it incompatible with the SOCK_OPS_GET_FIELD()
> macro. This hand-rolled code omitted the is_fullsock guard that the
> macro provides. The guard was later renamed to is_locked_tcp_sock in
> commit fd93eaffb3f9 ("bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback").
>
> Add the is_locked_tcp_sock guard to the rtt_min case, replicating the
> exact instruction pattern used by SOCK_OPS_GET_FIELD() including
> proper handling of the dst_reg==src_reg case with temp register
> save/restore. Use offsetof(struct minmax_sample, v) for the sub-field
> offset to match the style in bpf_tcp_sock_convert_ctx_access().
>
> Found via AST-based call-graph analysis using sqry.
>
> Fixes: 44f0e43037d3 ("bpf: Add support for reading sk_state and more")
> Cc: stable@vger.kernel.org
> Signed-off-by: Werner Kasselman <werner@verivus.com>
> ---
> net/core/filter.c | 47 ++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 78b548158fb0..58f0735b18d9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10830,13 +10830,54 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
> BUILD_BUG_ON(sizeof(struct minmax) <
> sizeof(struct minmax_sample));
>
> + /* Unlike other tcp_sock fields that use
> + * SOCK_OPS_GET_TCP_SOCK_FIELD(), rtt_min requires a
> + * custom access pattern because it reads a sub-field
> + * (rtt_min.s[0].v) rather than a direct struct member.
> + * We must still guard the access with is_locked_tcp_sock
> + * to prevent an OOB read when sk points to a
> + * tcp_request_sock (e.g., during SYN-ACK processing via
> + * bpf_skops_hdr_opt_len/bpf_skops_write_hdr_opt).
> + */
> + off = offsetof(struct tcp_sock, rtt_min) +
> + offsetof(struct minmax_sample, v);
> + {
> + int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;
> +
> + if (si->dst_reg == reg || si->src_reg == reg)
> + reg--;
> + if (si->dst_reg == reg || si->src_reg == reg)
> + reg--;
> + if (si->dst_reg == si->src_reg) {
> + *insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
> + fullsock_reg = reg;
> + jmp += 2;
> + }
> + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> + struct bpf_sock_ops_kern,
> + is_locked_tcp_sock),
> + fullsock_reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + is_locked_tcp_sock));
> + *insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);
> + if (si->dst_reg == si->src_reg)
> + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
> *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
> struct bpf_sock_ops_kern, sk),
> si->dst_reg, si->src_reg,
> offsetof(struct bpf_sock_ops_kern, sk));
> - *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> - offsetof(struct tcp_sock, rtt_min) +
> - sizeof_field(struct minmax_sample, t));
> + *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> + if (si->dst_reg == si->src_reg) {
> + *insn++ = BPF_JMP_A(1);
> + *insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,
> + offsetof(struct bpf_sock_ops_kern,
> + temp));
There is an existing bug in this copy-and-paste codes [1] and now is
repeated here, so please find a way to refactor it to be reusable instead of
duplicating it.
This also needs a test. It should be a subtest of [1],
so [1] need to land first.
[1]: https://lore.kernel.org/bpf/20260406031330.187630-1-jiayuan.chen@linux.dev/
next prev parent reply other threads:[~2026-04-07 1:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-06 22:49 [PATCH] bpf: add is_locked_tcp_sock guard for sock_ops rtt_min access Werner Kasselman
2026-04-07 1:25 ` Martin KaFai Lau [this message]
2026-04-07 1:56 ` Werner Kasselman
2026-04-09 6:10 ` [PATCH v2] bpf: guard sock_ops rtt_min access with is_locked_tcp_sock Werner Kasselman
2026-04-09 14:29 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2026471147.8A6S.martin.lau@linux.dev \
--to=martin.lau@linux.dev \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brakmo@fb.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jiayuan.chen@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
--cc=werner@verivus.ai \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.