All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf.kernel@gmail.com>
To: Quan Sun <2022090917019@std.uestc.edu.cn>
Cc: daniel@iogearbox.net, bpf@vger.kernel.org, dddddd@hust.edu.cn,
	 M202472210@hust.edu.cn, dzm91@hust.edu.cn,
	hust-os-kernel-patches@googlegroups.com,  ast@kernel.org,
	andrii@kernel.org, jiayuan.chen@linux.dev
Subject: Re: Infinite Recursion / Kernel Stack Overflow in bpf_skops_hdr_opt_len() via TCP_NODELAY setsockopt
Date: Mon, 13 Apr 2026 08:55:44 -0700	[thread overview]
Message-ID: <ad0Row2av871JSJL@devvm17672.vll0.facebook.com> (raw)
In-Reply-To: <d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn>

On 04/13, Quan Sun wrote:
> Our fuzzing found a Stack Guard Page hit / Infinite Recursion vulnerability
> in the Linux TCP BPF Subsystem. The issue is triggered when a
> `BPF_PROG_TYPE_SOCK_OPS` program is attached and uses the `bpf_setsockopt()`
> helper inside the `BPF_SOCK_OPS_HDR_OPT_LEN_CB` callback to set
> `TCP_NODELAY` on the associated socket. This creates a logical loop that
> unconditionally pushes pending frames and re-invokes the same option-length
> BPF callback until the kernel stack overflows.
> 
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
> 
> ## Root Cause
> 
> This vulnerability is caused by a semantic loop created by mixing BPF TCP
> hooks tightly bounded to transmission paths with auxiliary socket state
> mutations like TCP Nagle transitions.
> 
> 1. A user loads a `BPF_PROG_TYPE_SOCK_OPS` program and attaches it to a
> cgroup via `BPF_CGROUP_SOCK_OPS`.
> 2. The program intercepts `BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB` or
> `BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB` events and sets the
> `BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG` on the socket to enable custom TCP
> header options injection.
> 3. During the standard TCP frame transmission process (e.g., when sending
> data), the kernel needs to calculate the header's precise length. To
> retrieve the size of the injected option, `tcp_established_options()`
> invokes `bpf_skops_hdr_opt_len()`, which correctly triggers the
> `BPF_SOCK_OPS_HDR_OPT_LEN_CB` callback inside the BPF program.
> 4. Inside this callback, the malicious BPF program uses the
> `bpf_setsockopt()` helper function to force `TCP_NODELAY`.
> 5. Down in the kernel, setting `TCP_NODELAY` alters the connection context.
> Doing so invokes `__tcp_sock_set_nodelay()`, which unconditionally calls
> `tcp_push_pending_frames()` to immediately dispatch any packets that were
> previously waiting under the Nagle algorithm logic.
> 6. The recursive sub-call to `tcp_push_pending_frames()` initiates packet
> building again, causing a cascading invocation of
> `tcp_established_options()` -> `bpf_skops_hdr_opt_len()` ->
> `BPF_SOCK_OPS_HDR_OPT_LEN_CB` -> `bpf_setsockopt()` ->
> `tcp_push_pending_frames()`...
> 7. Without a depth limit or re-entrancy blocking condition on these socket
> callbacks, the repetitive nesting rapidly exhausts the kernel stack
> boundaries, pushing past the limits (hitting the `stack guard page`). The
> result is an immediate kernel panic leading to Denial of Service.
> 
> #### Execution Flow Visualization
> 
> ```text
> Vulnerability Execution Flow
> |
> |--- 1. `BPF_SOCK_OPS_HDR_OPT_LEN_CB` BPF Handler Executed
> |    |\
> |    | `-- `bpf_setsockopt(..., SOL_TCP, TCP_NODELAY, ...)`
> |    |
> |--- 2. `do_tcp_setsockopt()` called by BPF Helper
> |    |\
> |    | `-- `__tcp_sock_set_nodelay()`
> |    |     |
> |    |     `-- `tcp_push_pending_frames()` (Immediate TCP transmission)
> |    |
> |--- 3. Context switches to Frame Packaging
> |    |\
> |    | `-- `tcp_current_mss()`
> |    |     |
> |    |     `-- `tcp_established_options()`
> |    |
> |--- 4. TCP Header calls BPF back again for size computation
> |    |\
> |    | `-- `bpf_skops_hdr_opt_len()`
> |    |     |
> |    |     `-- Invokes BPF Callback: `BPF_SOCK_OPS_HDR_OPT_LEN_CB`
> |    |         |
> |    |         `-- (Reverts to Step 1 directly) Infinite recursion depth.
> <==== KERNEL PANIC
> ```
> 
> ## Reproduction Steps
> 
> 1. Load a `BPF_PROG_TYPE_SOCK_OPS` BPF program that:
>    - Checks the `op` field in the `bpf_sock_ops` context.
>    - If the `op` is `BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB` or
> `BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB`, calls the
> `bpf_sock_ops_cb_flags_set()` helper to enable the
> `BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG`.
>    - If the `op` is `BPF_SOCK_OPS_HDR_OPT_LEN_CB`, calls the
> `bpf_setsockopt()` helper to forcefully enable `TCP_NODELAY`.
> 2. Attach the loaded program to a chosen cgroup directory using
> `BPF_CGROUP_SOCK_OPS`.
> 3. Trigger a standard TCP connection (e.g., using `connect()`) within the
> targeted cgroup to establish the socket and trigger the initial established
> callbacks.
> 4. Force a packet transmission (e.g., using `send()`). This forces the
> kernel to compute the MSS and invoke the option length callback.
> 5. The forced `TCP_NODELAY` inside the callback will trap the kernel in an
> infinite recursive call sequence traversing `tcp_push_pending_frames()` and
> `bpf_skops_hdr_opt_len()` until the stack overflows, leading to a stack
> guard page crash.

The easiest fix is to probably return early from tcp_push_pending_frames
when has_current_bpf_ctx()?

      reply	other threads:[~2026-04-13 15:55 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13 12:12 Infinite Recursion / Kernel Stack Overflow in bpf_skops_hdr_opt_len() via TCP_NODELAY setsockopt Quan Sun
2026-04-13 15:55 ` Stanislav Fomichev [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad0Row2av871JSJL@devvm17672.vll0.facebook.com \
    --to=sdf.kernel@gmail.com \
    --cc=2022090917019@std.uestc.edu.cn \
    --cc=M202472210@hust.edu.cn \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dddddd@hust.edu.cn \
    --cc=dzm91@hust.edu.cn \
    --cc=hust-os-kernel-patches@googlegroups.com \
    --cc=jiayuan.chen@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.