All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/2] bpf: Optimize recursion detection on arm64
@ 2025-12-17 23:35 Puranjay Mohan
  2025-12-17 23:35 ` [PATCH bpf-next v2 1/2] bpf: move recursion detection logic to helpers Puranjay Mohan
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Puranjay Mohan @ 2025-12-17 23:35 UTC (permalink / raw)
  To: bpf
  Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, kernel-team,
	Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel

V1: https://lore.kernel.org/all/20251217162830.2597286-1-puranjay@kernel.org/
Changes in V1->V2:
- Patch 2:
	- Put preempt_enable()/disable() around RMW accesses to mitigate
	  race conditions. Because on CONFIG_PREEMPT_RCU and sleepable
	  bpf programs, preemption can cause no prog to execute.

BPF programs detect recursion using a per-CPU 'active' flag in struct
bpf_prog. The trampoline currently sets/clears this flag with atomic
operations.

On some arm64 platforms (e.g., Neoverse V2 with LSE), per-CPU atomic
operations are relatively slow. Unlike x86_64 - where per-CPU updates
can avoid cross-core atomicity, arm64 LSE atomics are always atomic
across all cores, which is unnecessary overhead for strictly per-CPU
state.

This patch removes atomics from the recursion detection path on arm64.

It was discovered in [1] that per-CPU atomics that don't return a value
were extremely slow on some arm64 platforms, Catalin added a fix in
commit 535fdfc5a228 ("arm64: Use load LSE atomics for the non-return
per-CPU atomic operations") to solve this issue, but it seems to have
caused a regression on the fentry benchmark.

Using the fentry benchmark from the bpf selftests shows the following:

  ./tools/testing/selftests/bpf/bench trig-fentry

 +---------------------------------------------+------------------------+
 |               Configuration                 | Total Operations (M/s) |
 +---------------------------------------------+------------------------+
 | bpf-next/master with Catalin’s fix reverted |         51.862         |
 |---------------------------------------------|------------------------|
 | bpf-next/master                             |         43.067         |
 | bpf-next/master with this change            |         53.856         |
 +---------------------------------------------+------------------------+

All benchmarks were run on a KVM based vm with Neoverse-V2 and 8 cpus.

This patch yields a 25% improvement in this benchmark compared to
bpf-next. Notably, reverting Catalin's fix also results in a performance
gain for this benchmark, which is interesting but expected.

For completeness, this benchmark was also run with the change enabled on
x86-64, which resulted in a 30% regression in the fentry benchmark. So,
it is only enabled on arm64.

[1] https://lore.kernel.org/all/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop/

Puranjay Mohan (2):
  bpf: move recursion detection logic to helpers
  bpf: arm64: Optimize recursion detection by not using atomics

 include/linux/bpf.h      | 39 ++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/core.c        |  3 ++-
 kernel/bpf/trampoline.c  |  8 ++++----
 kernel/trace/bpf_trace.c |  4 ++--
 4 files changed, 46 insertions(+), 8 deletions(-)


base-commit: ec439c38013550420aecc15988ae6acb670838c1
-- 
2.47.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-12-19 18:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17 23:35 [PATCH bpf-next v2 0/2] bpf: Optimize recursion detection on arm64 Puranjay Mohan
2025-12-17 23:35 ` [PATCH bpf-next v2 1/2] bpf: move recursion detection logic to helpers Puranjay Mohan
2025-12-18 17:44   ` Yonghong Song
2025-12-17 23:35 ` [PATCH bpf-next v2 2/2] bpf: arm64: Optimize recursion detection by not using atomics Puranjay Mohan
2025-12-18 17:55   ` Yonghong Song
2025-12-19 16:40     ` Puranjay Mohan
2025-12-19 18:23     ` Puranjay Mohan
2025-12-18  2:52 ` [PATCH bpf-next v2 0/2] bpf: Optimize recursion detection on arm64 Puranjay Mohan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.