public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next] bpf: reuseport: add cond_resched_rcu() in reuseport_array_free()
@ 2026-04-10 14:07 Zijing Yin
  2026-04-10 19:53 ` Alexei Starovoitov
  0 siblings, 1 reply; 2+ messages in thread
From: Zijing Yin @ 2026-04-10 14:07 UTC (permalink / raw)
  To: bpf
  Cc: linux-kernel, ast, daniel, andrii, martin.lau, eddyz87, song,
	yonghong.song, john.fastabend, kpsingh, sdf, haoluo, jolsa,
	Zijing Yin

reuseport_array_free() iterates over all map entries inside
rcu_read_lock() to detach sockets from the array. When max_entries is
very large (e.g., hundreds of millions), this loop runs for an extended
period without yielding the CPU, triggering RCU stall warnings in the
kworker thread that executes bpf_map_free_deferred().

The observed stall occurs because the loop has no scheduling point:

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  Workqueue: events_unbound bpf_map_free_deferred
  Call Trace:
   reuseport_array_free+0x1ec/0x470 kernel/bpf/reuseport_array.c:127
   bpf_map_free_deferred+0x34a/0x7e0 kernel/bpf/syscall.c:893
   process_one_work+0x952/0x1a80
   worker_thread+0x87b/0x11f0

Add cond_resched_rcu() in the loop body to allow the scheduler to run
and RCU grace periods to complete. This is safe because each iteration
processes a single entry independently, sk->sk_callback_lock is not held
across the yield point, and the map is fully detached from userspace so
no concurrent insertions can occur.

This follows an established pattern for long-running kernel loops that
must run under rcu_read_lock().  The closest precedent is in another BPF
map free function:

  kernel/bpf/hashtab.c:1600
    htab_free_malloced_internal_structs()
      rcu_read_lock();
      for (i = 0; i < htab->n_buckets; i++) {
          ... walk bucket ...
          cond_resched_rcu();
      }
      rcu_read_unlock();

Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
---
Base: bpf-next.git master branch
      (tip a0c584fc18056709c8e047a82a6045d6c209f4ce
       "bpf: Fix use-after-free in offloaded map/prog info fill"
       as of 2026-04-09).

Tested with CONFIG_PREEMPT_RCU=y, CONFIG_KASAN=y (inline),
CONFIG_SMP=n (single vCPU QEMU VM), gcc 13.3.0.

To reproduce: create a BPF_MAP_TYPE_REUSEPORT_SOCKARRAY with
max_entries >= 100M, set rcu_cpu_stall_timeout low, pin the CPU with a
SCHED_FIFO thread so the kworker stays in rcu_read_lock() long enough
to trip the stall timeout, then close the fd. Without the fix the
reuseport_array_free() kworker stalls RCU reliably; with the fix,
cond_resched_rcu() yields periodically and no stall is observed.
Reproducer (C source): repro_reuseport.c (https://pastebin.com/YjdwqdX1)

 kernel/bpf/reuseport_array.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 49b8e5a0c6b4f..e3c789b80e2b8 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -5,6 +5,7 @@
 #include <linux/bpf.h>
 #include <linux/err.h>
 #include <linux/sock_diag.h>
+#include <linux/rcupdate_wait.h>
 #include <net/sock_reuseport.h>
 #include <linux/btf_ids.h>

@@ -136,6 +137,7 @@ static void reuseport_array_free(struct bpf_map *map)
 			write_unlock_bh(&sk->sk_callback_lock);
 			RCU_INIT_POINTER(array->ptrs[i], NULL);
 		}
+		cond_resched_rcu();
 	}
 	rcu_read_unlock();

-- 
2.43.0

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH bpf-next] bpf: reuseport: add cond_resched_rcu() in reuseport_array_free()
  2026-04-10 14:07 [PATCH bpf-next] bpf: reuseport: add cond_resched_rcu() in reuseport_array_free() Zijing Yin
@ 2026-04-10 19:53 ` Alexei Starovoitov
  0 siblings, 0 replies; 2+ messages in thread
From: Alexei Starovoitov @ 2026-04-10 19:53 UTC (permalink / raw)
  To: Zijing Yin
  Cc: bpf, LKML, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Eduard, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa

On Fri, Apr 10, 2026 at 7:07 AM Zijing Yin <yzjaurora@gmail.com> wrote:
>
> reuseport_array_free() iterates over all map entries inside
> rcu_read_lock() to detach sockets from the array. When max_entries is
> very large (e.g., hundreds of millions), this loop runs for an extended
> period without yielding the CPU, triggering RCU stall warnings in the
> kworker thread that executes bpf_map_free_deferred().
>
> The observed stall occurs because the loop has no scheduling point:
>
>   rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>   Workqueue: events_unbound bpf_map_free_deferred
>   Call Trace:
>    reuseport_array_free+0x1ec/0x470 kernel/bpf/reuseport_array.c:127
>    bpf_map_free_deferred+0x34a/0x7e0 kernel/bpf/syscall.c:893
>    process_one_work+0x952/0x1a80
>    worker_thread+0x87b/0x11f0
>
> Add cond_resched_rcu() in the loop body to allow the scheduler to run
> and RCU grace periods to complete. This is safe because each iteration
> processes a single entry independently, sk->sk_callback_lock is not held
> across the yield point, and the map is fully detached from userspace so
> no concurrent insertions can occur.
>
> This follows an established pattern for long-running kernel loops that
> must run under rcu_read_lock().  The closest precedent is in another BPF
> map free function:
>
>   kernel/bpf/hashtab.c:1600
>     htab_free_malloced_internal_structs()
>       rcu_read_lock();
>       for (i = 0; i < htab->n_buckets; i++) {
>           ... walk bucket ...
>           cond_resched_rcu();
>       }
>       rcu_read_unlock();
>
> Fixes: 5dc4c4b7d4e8 ("bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY")
> Signed-off-by: Zijing Yin <yzjaurora@gmail.com>
> ---
> Base: bpf-next.git master branch
>       (tip a0c584fc18056709c8e047a82a6045d6c209f4ce
>        "bpf: Fix use-after-free in offloaded map/prog info fill"
>        as of 2026-04-09).
>
> Tested with CONFIG_PREEMPT_RCU=y, CONFIG_KASAN=y (inline),
> CONFIG_SMP=n (single vCPU QEMU VM), gcc 13.3.0.
>
> To reproduce: create a BPF_MAP_TYPE_REUSEPORT_SOCKARRAY with
> max_entries >= 100M, set rcu_cpu_stall_timeout low, pin the CPU with a
> SCHED_FIFO thread so the kworker stays in rcu_read_lock() long enough
> to trip the stall timeout, then close the fd. Without the fix the
> reuseport_array_free() kworker stalls RCU reliably; with the fix,
> cond_resched_rcu() yields periodically and no stall is observed.
> Reproducer (C source): repro_reuseport.c (https://pastebin.com/YjdwqdX1)

This is not a realistic scenario that is worth fixing.

pw-bot: cr

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-10 19:53 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10 14:07 [PATCH bpf-next] bpf: reuseport: add cond_resched_rcu() in reuseport_array_free() Zijing Yin
2026-04-10 19:53 ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox