* [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot()
@ 2026-03-13 18:03 Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 1/3] perf/arm_pmuv3: Fix NULL pointer dereference in armv8pmu_sched_task() Puranjay Mohan
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Puranjay Mohan @ 2026-03-13 18:03 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Will Deacon,
Mark Rutland, Catalin Marinas, Leo Yan, Rob Herring, Breno Leitao,
linux-arm-kernel, linux-perf-users, kernel-team
RFC: https://lore.kernel.org/all/20260102214043.1410242-1-puranjay@kernel.org/
Changes from RFC:
- Fix pre-existing NULL pointer dereference in armv8pmu_sched_task()
found by Leo Yan during testing (patch 1)
- Pause BRBE before local_daif_save() to avoid branch pollution from
trace_hardirqs_off()
- Use local_daif_save() to prevent pNMI race from counter overflow
(Mark Rutland)
- Reuse perf_entry_from_brbe_regset() instead of duplicating register
read logic, by making it accept NULL event (Mark Rutland)
- Invalidate BRBE after reading to maintain record contiguity for
other consumers (Mark Rutland)
- Adjust selftest wasted_entries threshold for ARM64 (patch 3)
- Tested on ARM FVP with BRBE enabled
This series enables the bpf_get_branch_snapshot() BPF helper on ARM64
by implementing the perf_snapshot_branch_stack static call for ARM's
Branch Record Buffer Extension (BRBE).
bpf_get_branch_snapshot() [1] allows BPF programs to capture hardware
branch records on-demand from any BPF tracing context. This was
previously only available on x86 (Intel LBR) since v5.16. With BRBE
available on ARMv9, this series closes the gap for ARM64.
Usage model
-----------
The helper works in conjunction with perf events. The userspace
component of the BPF application opens a perf event with
PERF_SAMPLE_BRANCH_STACK on each CPU, which configures the hardware
to continuously record branches into BRBE (on ARM64) or LBR (on x86).
A BPF program attached to a tracepoint, kprobe, or fentry hook can
then call bpf_get_branch_snapshot() to snapshot the branch buffer at
any point. Without an active perf event, BRBE is not recording and
the buffer is empty.
On-demand branch snapshots from BPF are useful for diagnosing which
specific code path was taken inside a function. Stack traces only show
function boundaries, but branch records reveal the exact sequence of
jumps, calls, and returns within a function -- making it possible to
identify which specific error check triggered a failure, or which
callback implementation was invoked through a function pointer.
For example, retsnoop [2] is a BPF-based tool for non-intrusive
mass-tracing of kernel internals. Its LBR mode (--lbr) creates per-CPU
perf events with PERF_SAMPLE_BRANCH_STACK and then uses
bpf_get_branch_snapshot() in its fentry/fexit BPF programs to capture
branch records whenever a traced function returns an error.
Consider debugging a bpf() syscall that returns -EINVAL when creating
a BPF map with invalid parameters. Running retsnoop on an ARM64 FVP
with BRBE to trace the bpf() syscall and array_map_alloc_check():
$ retsnoop -e '*sys_bpf' -a 'array_map_alloc_check' --lbr=any \
-F -k vmlinux --debug full-lbr
$ simfail bpf-bad-map-max-entries-array # in another terminal
Output of retsnoop:
--- fentry BPF program (entries #63-#17) ---
[#63-#59] __htab_map_lookup_elem: hash table walk with memcmp (hashtab.c)
[#58] __htab_map_lookup_elem+0x98 -> dump_bpf_prog+0xc850 (hashtab.c:750)
[#57-#55] ... dump_bpf_prog internal branches ...
[#54] dump_bpf_prog+0xcab8 -> bpf_get_current_pid_tgid+0x0 (helpers.c:225)
[#53] bpf_get_current_pid_tgid+0x1c -> dump_bpf_prog+0xcabc (helpers.c:225)
[#52-#51] ... dump_bpf_prog -> __htab_map_lookup_elem ...
[#50-#47] __htab_map_lookup_elem: htab_map_hash (jhash2), select_bucket
[#46-#42] lookup_nulls_elem_raw: hash chain walk with memcmp (hashtab.c:717)
[#41] __htab_map_lookup_elem+0x98 -> dump_bpf_prog+0xcaf8 (hashtab.c:750)
[#40-#37] ... dump_bpf_prog -> bpf_ktime_get_ns ...
[#36] bpf_ktime_get_ns+0x10 -> ktime_get_mono_fast_ns+0x0 (helpers.c:178)
[#35-#32] ktime_get_mono_fast_ns: tk_clock_read -> arch_counter_get_cntpct
[#31] ktime_get_mono_fast_ns+0x9c -> bpf_ktime_get_ns+0x14 (timekeeping.c:493)
[#30] bpf_ktime_get_ns+0x18 -> dump_bpf_prog+0xcd50 (helpers.c:178)
[#29-#25] ... dump_bpf_prog internal branches ...
[#24] dump_bpf_prog+0x11b28 -> __bpf_prog_exit_recur+0x0 (trampoline.c:1190)
[#23-#17] __bpf_prog_exit_recur: rcu_read_unlock, migrate_enable (trampoline.c:1195)
--- array_map_alloc_check (entries #16-#12) ---
[#16] dump_bpf_prog+0x11b38 -> array_map_alloc_check+0x8 (arraymap.c:55)
[#15] array_map_alloc_check+0x18 -> array_map_alloc_check+0xb8 (arraymap.c:56)
. bpf_map_attr_numa_node . bpf_map_attr_numa_node
[#14] array_map_alloc_check+0xbc -> array_map_alloc_check+0x20 (arraymap.c:59)
. bpf_map_attr_numa_node
[#13] array_map_alloc_check+0x24 -> array_map_alloc_check+0x94 (arraymap.c:64)
[#12] array_map_alloc_check+0x98 -> dump_bpf_prog+0x11b3c (arraymap.c:82)
--- fexit trampoline overhead (entries #11-#00) ---
[#11] dump_bpf_prog+0x11b5c -> __bpf_prog_enter_recur+0x0 (trampoline.c:1145)
[#10-#03] __bpf_prog_enter_recur: rcu_read_lock, migrate_disable (trampoline.c:1146)
[#02] __bpf_prog_enter_recur+0x114 -> dump_bpf_prog+0x11b60 (trampoline.c:1157)
[#01] dump_bpf_prog+0x11b6c -> dump_bpf_prog+0xd230
[#00] dump_bpf_prog+0xd340 -> arm_brbe_snapshot_branch_stack+0x0 (arm_brbe.c:814)
el0t_64_sync+0x168
el0t_64_sync_handler+0x98
el0_svc+0x28
do_el0_svc+0x4c
invoke_syscall.constprop.0+0x54
373us [-EINVAL] __arm64_sys_bpf+0x8
__sys_bpf+0x87c
map_create+0x120
95us [-EINVAL] array_map_alloc_check+0x8
The FVP's BRBE buffer has 64 entries (BRBE supports 8, 16, 32, or
64). Of these, entries #63-#17 (47) are consumed by the fentry BPF
trampoline that ran before the function, and entries #11-#00 (12)
are consumed by the fexit trampoline that runs after. Entry #00
shows the very last branch recorded before BRBE is paused: the call
into arm_brbe_snapshot_branch_stack().
The 5 useful entries (#16-#12) show the exact path taken inside
array_map_alloc_check(). Record #14 shows a jump from line 56
(bpf_map_attr_numa_node) to line 59 (the if-condition), and #13
shows an immediate jump from line 59 (attr->max_entries == 0) to
line 64 (return -EINVAL), skipping lines 60-63. This pinpoints
max_entries==0 as the cause -- a diagnosis impossible with stack
traces alone.
[1] 856c02dbce4f ("bpf: Introduce helper bpf_get_branch_snapshot")
[2] https://github.com/anakryiko/retsnoop
Puranjay Mohan (3):
perf/arm_pmuv3: Fix NULL pointer dereference in armv8pmu_sched_task()
perf/arm64: Add BRBE support for bpf_get_branch_snapshot()
selftests/bpf: Adjust wasted entries threshold for ARM64 BRBE
drivers/perf/arm_brbe.c | 70 ++++++++++++++++++-
drivers/perf/arm_brbe.h | 9 +++
drivers/perf/arm_pmuv3.c | 16 ++++-
.../bpf/prog_tests/get_branch_snapshot.c | 9 +--
4 files changed, 95 insertions(+), 9 deletions(-)
base-commit: ca0f39a369c5f927c3d004e63a5a778b08a9df94
--
2.52.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH bpf 1/3] perf/arm_pmuv3: Fix NULL pointer dereference in armv8pmu_sched_task()
2026-03-13 18:03 [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
@ 2026-03-13 18:03 ` Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 2/3] perf/arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2026-03-13 18:03 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Will Deacon,
Mark Rutland, Catalin Marinas, Leo Yan, Rob Herring, Breno Leitao,
linux-arm-kernel, linux-perf-users, kernel-team
This is easily triggered with:
perf record -b -e cycles -a -- ls
which crashes on the first context switch with:
Unable to handle kernel NULL pointer dereference at virtual address 00[.]
PC is at armv8pmu_sched_task+0x14/0x50
LR is at perf_pmu_sched_task+0xac/0x108
Call trace:
armv8pmu_sched_task+0x14/0x50 (P)
perf_pmu_sched_task+0xac/0x108
__perf_event_task_sched_out+0x6c/0xe0
prepare_task_switch+0x120/0x268
__schedule+0x1e8/0x828
...
perf_pmu_sched_task() invokes the PMU sched callback with cpc->task_epc,
which is NULL when no per-task events exist for this PMU. With CPU-wide
branch-stack events, armv8pmu_sched_task() is still registered and
dereferences pmu_ctx->pmu unconditionally, causing the crash.
The bug was introduced by commit fa9d27773873 ("perf: arm_pmu: Kill last
use of per-CPU cpu_armpmu pointer") which changed the function from
using the per-CPU cpu_armpmu pointer (always valid) to dereferencing
pmu_ctx->pmu without adding a NULL check.
Add a NULL check for pmu_ctx to avoid the crash.
Fixes: fa9d27773873 ("perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
drivers/perf/arm_pmuv3.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 8014ff766cff..2d097fad9c10 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -1074,8 +1074,15 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx,
struct task_struct *task, bool sched_in)
{
- struct arm_pmu *armpmu = to_arm_pmu(pmu_ctx->pmu);
- struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
+ struct arm_pmu *armpmu;
+ struct pmu_hw_events *hw_events;
+
+ /* cpc->task_epc is NULL when no per-task events exist for this PMU */
+ if (!pmu_ctx)
+ return;
+
+ armpmu = to_arm_pmu(pmu_ctx->pmu);
+ hw_events = this_cpu_ptr(armpmu->hw_events);
if (!hw_events->branch_users)
return;
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH bpf 2/3] perf/arm64: Add BRBE support for bpf_get_branch_snapshot()
2026-03-13 18:03 [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 1/3] perf/arm_pmuv3: Fix NULL pointer dereference in armv8pmu_sched_task() Puranjay Mohan
@ 2026-03-13 18:03 ` Puranjay Mohan
2026-03-13 19:59 ` Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 3/3] selftests/bpf: Adjust wasted entries threshold for ARM64 BRBE Puranjay Mohan
2026-03-18 13:36 ` [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
3 siblings, 1 reply; 7+ messages in thread
From: Puranjay Mohan @ 2026-03-13 18:03 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Will Deacon,
Mark Rutland, Catalin Marinas, Leo Yan, Rob Herring, Breno Leitao,
linux-arm-kernel, linux-perf-users, kernel-team
Implement the perf_snapshot_branch_stack static call for ARM's Branch
Record Buffer Extension (BRBE), enabling the bpf_get_branch_snapshot()
BPF helper on ARM64.
This is a best-effort snapshot helper intended for tracing and debugging
use. It favors non-invasive snapshotting over strong serialization, and
returns 0 whenever a clean snapshot cannot be obtained. Nested
invocations are not serialized; callers may observe a 0-length result
when a clean snapshot cannot be preserved.
BRBE is paused before the helper does any other work to avoid recording
its own branches. The sysreg writes used to pause are branchless.
local_daif_save() blocks local exception delivery while reading the
buffer. If a PMU overflow raced before that point and re-enabled BRBE,
the helper detects the cleared PAUSED state and returns 0.
Branch records are read using perf_entry_from_brbe_regset() without
event-specific filtering. The BPF program is responsible for applying
its own filter criteria. The BRBE buffer is invalidated after reading
to maintain contiguity for other consumers.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
drivers/perf/arm_brbe.c | 70 ++++++++++++++++++++++++++++++++++++++--
drivers/perf/arm_brbe.h | 9 ++++++
drivers/perf/arm_pmuv3.c | 5 ++-
3 files changed, 81 insertions(+), 3 deletions(-)
diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
index ba554e0c846c..db5e000b2575 100644
--- a/drivers/perf/arm_brbe.c
+++ b/drivers/perf/arm_brbe.c
@@ -9,6 +9,7 @@
#include <linux/types.h>
#include <linux/bitmap.h>
#include <linux/perf/arm_pmu.h>
+#include <asm/daifflags.h>
#include "arm_brbe.h"
#define BRBFCR_EL1_BRANCH_FILTERS (BRBFCR_EL1_DIRECT | \
@@ -618,10 +619,10 @@ static bool perf_entry_from_brbe_regset(int index, struct perf_branch_entry *ent
brbe_set_perf_entry_type(entry, brbinf);
- if (!branch_sample_no_cycles(event))
+ if (!event || !branch_sample_no_cycles(event))
entry->cycles = brbinf_get_cycles(brbinf);
- if (!branch_sample_no_flags(event)) {
+ if (!event || !branch_sample_no_flags(event)) {
/* Mispredict info is available for source only and complete branch records. */
if (!brbe_record_is_target_only(brbinf)) {
entry->mispred = brbinf_get_mispredict(brbinf);
@@ -803,3 +804,68 @@ void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
done:
branch_stack->nr = nr_filtered;
}
+
+/*
+ * Best-effort BRBE snapshot for BPF tracing. Pause BRBE to avoid
+ * self-recording and return 0 if the snapshot state appears disturbed.
+ */
+int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries, unsigned int cnt)
+{
+ unsigned long flags;
+ int nr_hw, nr_banks, nr_copied = 0;
+ u64 brbidr, brbfcr, brbcr;
+
+ if (!cnt)
+ return 0;
+
+ /* Pause BRBE first to avoid recording our own branches. */
+ brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+ brbcr = read_sysreg_s(SYS_BRBCR_EL1);
+ write_sysreg_s(brbfcr | BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1);
+ isb();
+
+ /* Block local exception delivery while reading the buffer. */
+ flags = local_daif_save();
+
+ /*
+ * A PMU overflow before local_daif_save() could have re-enabled
+ * BRBE, clearing the PAUSED bit. Bail out.
+ */
+ if (!(read_sysreg_s(SYS_BRBFCR_EL1) & BRBFCR_EL1_PAUSED))
+ goto out;
+
+ brbidr = read_sysreg_s(SYS_BRBIDR0_EL1);
+ if (!valid_brbidr(brbidr))
+ goto out;
+
+ nr_hw = FIELD_GET(BRBIDR0_EL1_NUMREC_MASK, brbidr);
+ nr_banks = DIV_ROUND_UP(nr_hw, BRBE_BANK_MAX_ENTRIES);
+
+ for (int bank = 0; bank < nr_banks; bank++) {
+ int nr_remaining = nr_hw - (bank * BRBE_BANK_MAX_ENTRIES);
+ int nr_this_bank = min(nr_remaining, BRBE_BANK_MAX_ENTRIES);
+
+ select_brbe_bank(bank);
+
+ for (int i = 0; i < nr_this_bank; i++) {
+ if (nr_copied >= cnt)
+ goto done;
+
+ if (!perf_entry_from_brbe_regset(i, &entries[nr_copied], NULL))
+ goto done;
+
+ nr_copied++;
+ }
+ }
+
+done:
+ brbe_invalidate();
+out:
+ /* Restore BRBCR before unpausing via BRBFCR, matching brbe_enable(). */
+ write_sysreg_s(brbcr, SYS_BRBCR_EL1);
+ isb();
+ write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
+ local_daif_restore(flags);
+
+ return nr_copied;
+}
diff --git a/drivers/perf/arm_brbe.h b/drivers/perf/arm_brbe.h
index b7c7d8796c86..c2a1824437fb 100644
--- a/drivers/perf/arm_brbe.h
+++ b/drivers/perf/arm_brbe.h
@@ -10,6 +10,7 @@
struct arm_pmu;
struct perf_branch_stack;
struct perf_event;
+struct perf_branch_entry;
#ifdef CONFIG_ARM64_BRBE
void brbe_probe(struct arm_pmu *arm_pmu);
@@ -22,6 +23,8 @@ void brbe_disable(void);
bool brbe_branch_attr_valid(struct perf_event *event);
void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
const struct perf_event *event);
+int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries,
+ unsigned int cnt);
#else
static inline void brbe_probe(struct arm_pmu *arm_pmu) { }
static inline unsigned int brbe_num_branch_records(const struct arm_pmu *armpmu)
@@ -44,4 +47,10 @@ static void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
const struct perf_event *event)
{
}
+
+static inline int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries,
+ unsigned int cnt)
+{
+ return 0;
+}
#endif
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 2d097fad9c10..e00c7c47a98d 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -1456,8 +1456,11 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
cpu_pmu->set_event_filter = armv8pmu_set_event_filter;
cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;
- if (brbe_num_branch_records(cpu_pmu))
+ if (brbe_num_branch_records(cpu_pmu)) {
cpu_pmu->pmu.sched_task = armv8pmu_sched_task;
+ static_call_update(perf_snapshot_branch_stack,
+ arm_brbe_snapshot_branch_stack);
+ }
cpu_pmu->name = name;
cpu_pmu->map_event = map_event;
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH bpf 3/3] selftests/bpf: Adjust wasted entries threshold for ARM64 BRBE
2026-03-13 18:03 [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 1/3] perf/arm_pmuv3: Fix NULL pointer dereference in armv8pmu_sched_task() Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 2/3] perf/arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
@ 2026-03-13 18:03 ` Puranjay Mohan
2026-03-18 13:36 ` [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
3 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2026-03-13 18:03 UTC (permalink / raw)
To: bpf
Cc: Puranjay Mohan, Puranjay Mohan, Alexei Starovoitov,
Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau,
Eduard Zingerman, Kumar Kartikeya Dwivedi, Will Deacon,
Mark Rutland, Catalin Marinas, Leo Yan, Rob Herring, Breno Leitao,
linux-arm-kernel, linux-perf-users, kernel-team
The get_branch_snapshot test checks that bpf_get_branch_snapshot()
doesn't waste too many branch entries on infrastructure overhead. The
threshold of < 10 was calibrated for x86 where about 7 entries are
wasted.
On ARM64, the BPF trampoline generates more branches than x86,
resulting in about 13 wasted entries. The overhead comes from
__bpf_prog_exit_recur which on ARM64 makes out-of-line calls to
__rcu_read_unlock and generates more conditional branches than x86:
[#24] dump_bpf_prog+0x118d0 -> __bpf_prog_exit_recur+0x0
[#23] __bpf_prog_exit_recur+0x78 -> __bpf_prog_exit_recur+0xf4
[#22] __bpf_prog_exit_recur+0xf8 -> __bpf_prog_exit_recur+0x80
[#21] __bpf_prog_exit_recur+0x80 -> __rcu_read_unlock+0x0
[#20] __rcu_read_unlock+0x24 -> __bpf_prog_exit_recur+0x84
[#19] __bpf_prog_exit_recur+0xe0 -> __bpf_prog_exit_recur+0x11c
[#18] __bpf_prog_exit_recur+0x120 -> __bpf_prog_exit_recur+0xe8
[#17] __bpf_prog_exit_recur+0xf0 -> dump_bpf_prog+0x118d4
Increase the threshold to < 16 to accommodate ARM64.
The test passes after the change:
[root@(none) bpf]# ./test_progs -t get_branch_snapshot
#136 get_branch_snapshot:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
---
.../selftests/bpf/prog_tests/get_branch_snapshot.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c b/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c
index 0394a1156d99..dcb0ba3d6285 100644
--- a/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c
+++ b/tools/testing/selftests/bpf/prog_tests/get_branch_snapshot.c
@@ -116,13 +116,14 @@ void serial_test_get_branch_snapshot(void)
ASSERT_GT(skel->bss->test1_hits, 6, "find_looptest_in_lbr");
- /* Given we stop LBR in software, we will waste a few entries.
+ /* Given we stop LBR/BRBE in software, we will waste a few entries.
* But we should try to waste as few as possible entries. We are at
- * about 7 on x86_64 systems.
- * Add a check for < 10 so that we get heads-up when something
+ * about 7 on x86_64 and about 13 on arm64 systems (the arm64 BPF
+ * trampoline generates more branches than x86_64).
+ * Add a check for < 16 so that we get heads-up when something
* changes and wastes too many entries.
*/
- ASSERT_LT(skel->bss->wasted_entries, 10, "check_wasted_entries");
+ ASSERT_LT(skel->bss->wasted_entries, 16, "check_wasted_entries");
cleanup:
get_branch_snapshot__destroy(skel);
--
2.52.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH bpf 2/3] perf/arm64: Add BRBE support for bpf_get_branch_snapshot()
2026-03-13 18:03 ` [PATCH bpf 2/3] perf/arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
@ 2026-03-13 19:59 ` Puranjay Mohan
2026-03-13 21:03 ` Puranjay Mohan
0 siblings, 1 reply; 7+ messages in thread
From: Puranjay Mohan @ 2026-03-13 19:59 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Will Deacon, Mark Rutland, Catalin Marinas, Leo Yan, Rob Herring,
Breno Leitao, linux-arm-kernel, linux-perf-users, kernel-team
On Fri, Mar 13, 2026 at 6:04 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> Implement the perf_snapshot_branch_stack static call for ARM's Branch
> Record Buffer Extension (BRBE), enabling the bpf_get_branch_snapshot()
> BPF helper on ARM64.
>
> This is a best-effort snapshot helper intended for tracing and debugging
> use. It favors non-invasive snapshotting over strong serialization, and
> returns 0 whenever a clean snapshot cannot be obtained. Nested
> invocations are not serialized; callers may observe a 0-length result
> when a clean snapshot cannot be preserved.
>
> BRBE is paused before the helper does any other work to avoid recording
> its own branches. The sysreg writes used to pause are branchless.
> local_daif_save() blocks local exception delivery while reading the
> buffer. If a PMU overflow raced before that point and re-enabled BRBE,
> the helper detects the cleared PAUSED state and returns 0.
>
> Branch records are read using perf_entry_from_brbe_regset() without
> event-specific filtering. The BPF program is responsible for applying
> its own filter criteria. The BRBE buffer is invalidated after reading
> to maintain contiguity for other consumers.
>
> Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> ---
> drivers/perf/arm_brbe.c | 70 ++++++++++++++++++++++++++++++++++++++--
> drivers/perf/arm_brbe.h | 9 ++++++
> drivers/perf/arm_pmuv3.c | 5 ++-
> 3 files changed, 81 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
> index ba554e0c846c..db5e000b2575 100644
> --- a/drivers/perf/arm_brbe.c
> +++ b/drivers/perf/arm_brbe.c
> @@ -9,6 +9,7 @@
> #include <linux/types.h>
> #include <linux/bitmap.h>
> #include <linux/perf/arm_pmu.h>
> +#include <asm/daifflags.h>
> #include "arm_brbe.h"
>
> #define BRBFCR_EL1_BRANCH_FILTERS (BRBFCR_EL1_DIRECT | \
> @@ -618,10 +619,10 @@ static bool perf_entry_from_brbe_regset(int index, struct perf_branch_entry *ent
>
> brbe_set_perf_entry_type(entry, brbinf);
>
> - if (!branch_sample_no_cycles(event))
> + if (!event || !branch_sample_no_cycles(event))
> entry->cycles = brbinf_get_cycles(brbinf);
>
> - if (!branch_sample_no_flags(event)) {
> + if (!event || !branch_sample_no_flags(event)) {
> /* Mispredict info is available for source only and complete branch records. */
> if (!brbe_record_is_target_only(brbinf)) {
> entry->mispred = brbinf_get_mispredict(brbinf);
> @@ -803,3 +804,68 @@ void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
> done:
> branch_stack->nr = nr_filtered;
> }
> +
> +/*
> + * Best-effort BRBE snapshot for BPF tracing. Pause BRBE to avoid
> + * self-recording and return 0 if the snapshot state appears disturbed.
> + */
> +int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries, unsigned int cnt)
> +{
> + unsigned long flags;
> + int nr_hw, nr_banks, nr_copied = 0;
> + u64 brbidr, brbfcr, brbcr;
> +
> + if (!cnt)
> + return 0;
> +
> + /* Pause BRBE first to avoid recording our own branches. */
> + brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
> + brbcr = read_sysreg_s(SYS_BRBCR_EL1);
> + write_sysreg_s(brbfcr | BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1);
> + isb();
> +
> + /* Block local exception delivery while reading the buffer. */
> + flags = local_daif_save();
> +
> + /*
> + * A PMU overflow before local_daif_save() could have re-enabled
> + * BRBE, clearing the PAUSED bit. Bail out.
> + */
> + if (!(read_sysreg_s(SYS_BRBFCR_EL1) & BRBFCR_EL1_PAUSED))
> + goto out;
> +
The code below doesn't implement filtering, I am currently trying to
figure out the best way to implement that by reusing
read_branch_records() somehow.
> + brbidr = read_sysreg_s(SYS_BRBIDR0_EL1);
> + if (!valid_brbidr(brbidr))
> + goto out;
> +
> + nr_hw = FIELD_GET(BRBIDR0_EL1_NUMREC_MASK, brbidr);
> + nr_banks = DIV_ROUND_UP(nr_hw, BRBE_BANK_MAX_ENTRIES);
> +
> + for (int bank = 0; bank < nr_banks; bank++) {
> + int nr_remaining = nr_hw - (bank * BRBE_BANK_MAX_ENTRIES);
> + int nr_this_bank = min(nr_remaining, BRBE_BANK_MAX_ENTRIES);
> +
> + select_brbe_bank(bank);
> +
> + for (int i = 0; i < nr_this_bank; i++) {
> + if (nr_copied >= cnt)
> + goto done;
> +
> + if (!perf_entry_from_brbe_regset(i, &entries[nr_copied], NULL))
> + goto done;
> +
> + nr_copied++;
> + }
> + }
> +
> +done:
> + brbe_invalidate();
> +out:
> + /* Restore BRBCR before unpausing via BRBFCR, matching brbe_enable(). */
> + write_sysreg_s(brbcr, SYS_BRBCR_EL1);
> + isb();
> + write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
> + local_daif_restore(flags);
> +
> + return nr_copied;
> +}
> diff --git a/drivers/perf/arm_brbe.h b/drivers/perf/arm_brbe.h
> index b7c7d8796c86..c2a1824437fb 100644
> --- a/drivers/perf/arm_brbe.h
> +++ b/drivers/perf/arm_brbe.h
> @@ -10,6 +10,7 @@
> struct arm_pmu;
> struct perf_branch_stack;
> struct perf_event;
> +struct perf_branch_entry;
>
> #ifdef CONFIG_ARM64_BRBE
> void brbe_probe(struct arm_pmu *arm_pmu);
> @@ -22,6 +23,8 @@ void brbe_disable(void);
> bool brbe_branch_attr_valid(struct perf_event *event);
> void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
> const struct perf_event *event);
> +int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries,
> + unsigned int cnt);
> #else
> static inline void brbe_probe(struct arm_pmu *arm_pmu) { }
> static inline unsigned int brbe_num_branch_records(const struct arm_pmu *armpmu)
> @@ -44,4 +47,10 @@ static void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
> const struct perf_event *event)
> {
> }
> +
> +static inline int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries,
> + unsigned int cnt)
> +{
> + return 0;
> +}
> #endif
> diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
> index 2d097fad9c10..e00c7c47a98d 100644
> --- a/drivers/perf/arm_pmuv3.c
> +++ b/drivers/perf/arm_pmuv3.c
> @@ -1456,8 +1456,11 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
> cpu_pmu->set_event_filter = armv8pmu_set_event_filter;
>
> cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;
> - if (brbe_num_branch_records(cpu_pmu))
> + if (brbe_num_branch_records(cpu_pmu)) {
> cpu_pmu->pmu.sched_task = armv8pmu_sched_task;
> + static_call_update(perf_snapshot_branch_stack,
> + arm_brbe_snapshot_branch_stack);
> + }
>
> cpu_pmu->name = name;
> cpu_pmu->map_event = map_event;
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf 2/3] perf/arm64: Add BRBE support for bpf_get_branch_snapshot()
2026-03-13 19:59 ` Puranjay Mohan
@ 2026-03-13 21:03 ` Puranjay Mohan
0 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2026-03-13 21:03 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Will Deacon, Mark Rutland, Catalin Marinas, Leo Yan, Rob Herring,
Breno Leitao, linux-arm-kernel, linux-perf-users, kernel-team
On Fri, Mar 13, 2026 at 7:59 PM Puranjay Mohan <puranjay12@gmail.com> wrote:
>
> On Fri, Mar 13, 2026 at 6:04 PM Puranjay Mohan <puranjay@kernel.org> wrote:
> >
> > Implement the perf_snapshot_branch_stack static call for ARM's Branch
> > Record Buffer Extension (BRBE), enabling the bpf_get_branch_snapshot()
> > BPF helper on ARM64.
> >
> > This is a best-effort snapshot helper intended for tracing and debugging
> > use. It favors non-invasive snapshotting over strong serialization, and
> > returns 0 whenever a clean snapshot cannot be obtained. Nested
> > invocations are not serialized; callers may observe a 0-length result
> > when a clean snapshot cannot be preserved.
> >
> > BRBE is paused before the helper does any other work to avoid recording
> > its own branches. The sysreg writes used to pause are branchless.
> > local_daif_save() blocks local exception delivery while reading the
> > buffer. If a PMU overflow raced before that point and re-enabled BRBE,
> > the helper detects the cleared PAUSED state and returns 0.
> >
> > Branch records are read using perf_entry_from_brbe_regset() without
> > event-specific filtering. The BPF program is responsible for applying
> > its own filter criteria. The BRBE buffer is invalidated after reading
> > to maintain contiguity for other consumers.
> >
> > Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
> > ---
> > drivers/perf/arm_brbe.c | 70 ++++++++++++++++++++++++++++++++++++++--
> > drivers/perf/arm_brbe.h | 9 ++++++
> > drivers/perf/arm_pmuv3.c | 5 ++-
> > 3 files changed, 81 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
> > index ba554e0c846c..db5e000b2575 100644
> > --- a/drivers/perf/arm_brbe.c
> > +++ b/drivers/perf/arm_brbe.c
> > @@ -9,6 +9,7 @@
> > #include <linux/types.h>
> > #include <linux/bitmap.h>
> > #include <linux/perf/arm_pmu.h>
> > +#include <asm/daifflags.h>
> > #include "arm_brbe.h"
> >
> > #define BRBFCR_EL1_BRANCH_FILTERS (BRBFCR_EL1_DIRECT | \
> > @@ -618,10 +619,10 @@ static bool perf_entry_from_brbe_regset(int index, struct perf_branch_entry *ent
> >
> > brbe_set_perf_entry_type(entry, brbinf);
> >
> > - if (!branch_sample_no_cycles(event))
> > + if (!event || !branch_sample_no_cycles(event))
> > entry->cycles = brbinf_get_cycles(brbinf);
> >
> > - if (!branch_sample_no_flags(event)) {
> > + if (!event || !branch_sample_no_flags(event)) {
> > /* Mispredict info is available for source only and complete branch records. */
> > if (!brbe_record_is_target_only(brbinf)) {
> > entry->mispred = brbinf_get_mispredict(brbinf);
> > @@ -803,3 +804,68 @@ void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
> > done:
> > branch_stack->nr = nr_filtered;
> > }
> > +
> > +/*
> > + * Best-effort BRBE snapshot for BPF tracing. Pause BRBE to avoid
> > + * self-recording and return 0 if the snapshot state appears disturbed.
> > + */
> > +int arm_brbe_snapshot_branch_stack(struct perf_branch_entry *entries, unsigned int cnt)
> > +{
> > + unsigned long flags;
> > + int nr_hw, nr_banks, nr_copied = 0;
> > + u64 brbidr, brbfcr, brbcr;
> > +
> > + if (!cnt)
> > + return 0;
> > +
> > + /* Pause BRBE first to avoid recording our own branches. */
> > + brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
> > + brbcr = read_sysreg_s(SYS_BRBCR_EL1);
> > + write_sysreg_s(brbfcr | BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1);
> > + isb();
> > +
> > + /* Block local exception delivery while reading the buffer. */
> > + flags = local_daif_save();
> > +
> > + /*
> > + * A PMU overflow before local_daif_save() could have re-enabled
> > + * BRBE, clearing the PAUSED bit. Bail out.
> > + */
> > + if (!(read_sysreg_s(SYS_BRBFCR_EL1) & BRBFCR_EL1_PAUSED))
> > + goto out;
> > +
>
> The code below doesn't implement filtering, I am currently trying to
> figure out the best way to implement that by reusing
> read_branch_records() somehow.
>
So, I thought about this more and feel that we don't need to filter
for a specific event because the BPF program is not associated with a
specific event, rather it is associated with the CPU where it is
running, So bpf_get_branch_snapshot() should return the branch records
from the PMU of that CPU. Now, if there are two events on that cpu
with different branch filter types, let's say
PERF_SAMPLE_BRANCH_IND_CALL in one event and
PERF_SAMPLE_BRANCH_ANY_RETURN in another event, the perf subsystem
configures BRBE to record the union and then does per event filtering
in software (brbe_read_filtered_entries()), but the BPF program should
still return everything that was recorded on the CPU, which this patch
is doing.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot()
2026-03-13 18:03 [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
` (2 preceding siblings ...)
2026-03-13 18:03 ` [PATCH bpf 3/3] selftests/bpf: Adjust wasted entries threshold for ARM64 BRBE Puranjay Mohan
@ 2026-03-18 13:36 ` Puranjay Mohan
3 siblings, 0 replies; 7+ messages in thread
From: Puranjay Mohan @ 2026-03-18 13:36 UTC (permalink / raw)
To: bpf
Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
Martin KaFai Lau, Eduard Zingerman, Kumar Kartikeya Dwivedi,
Will Deacon, Mark Rutland, Catalin Marinas, Leo Yan, Rob Herring,
Breno Leitao, linux-arm-kernel, linux-perf-users, kernel-team
I will send v2 of this patch with minor changes that I discovered are
needed, will remove the bpf tag and send it based off the arm64 tree
as that is the correct place for this set to go.
On Fri, Mar 13, 2026 at 6:04 PM Puranjay Mohan <puranjay@kernel.org> wrote:
>
> RFC: https://lore.kernel.org/all/20260102214043.1410242-1-puranjay@kernel.org/
> Changes from RFC:
> - Fix pre-existing NULL pointer dereference in armv8pmu_sched_task()
> found by Leo Yan during testing (patch 1)
> - Pause BRBE before local_daif_save() to avoid branch pollution from
> trace_hardirqs_off()
> - Use local_daif_save() to prevent pNMI race from counter overflow
> (Mark Rutland)
> - Reuse perf_entry_from_brbe_regset() instead of duplicating register
> read logic, by making it accept NULL event (Mark Rutland)
> - Invalidate BRBE after reading to maintain record contiguity for
> other consumers (Mark Rutland)
> - Adjust selftest wasted_entries threshold for ARM64 (patch 3)
> - Tested on ARM FVP with BRBE enabled
>
> This series enables the bpf_get_branch_snapshot() BPF helper on ARM64
> by implementing the perf_snapshot_branch_stack static call for ARM's
> Branch Record Buffer Extension (BRBE).
>
> bpf_get_branch_snapshot() [1] allows BPF programs to capture hardware
> branch records on-demand from any BPF tracing context. This was
> previously only available on x86 (Intel LBR) since v5.16. With BRBE
> available on ARMv9, this series closes the gap for ARM64.
>
> Usage model
> -----------
>
> The helper works in conjunction with perf events. The userspace
> component of the BPF application opens a perf event with
> PERF_SAMPLE_BRANCH_STACK on each CPU, which configures the hardware
> to continuously record branches into BRBE (on ARM64) or LBR (on x86).
> A BPF program attached to a tracepoint, kprobe, or fentry hook can
> then call bpf_get_branch_snapshot() to snapshot the branch buffer at
> any point. Without an active perf event, BRBE is not recording and
> the buffer is empty.
>
> On-demand branch snapshots from BPF are useful for diagnosing which
> specific code path was taken inside a function. Stack traces only show
> function boundaries, but branch records reveal the exact sequence of
> jumps, calls, and returns within a function -- making it possible to
> identify which specific error check triggered a failure, or which
> callback implementation was invoked through a function pointer.
>
> For example, retsnoop [2] is a BPF-based tool for non-intrusive
> mass-tracing of kernel internals. Its LBR mode (--lbr) creates per-CPU
> perf events with PERF_SAMPLE_BRANCH_STACK and then uses
> bpf_get_branch_snapshot() in its fentry/fexit BPF programs to capture
> branch records whenever a traced function returns an error.
>
> Consider debugging a bpf() syscall that returns -EINVAL when creating
> a BPF map with invalid parameters. Running retsnoop on an ARM64 FVP
> with BRBE to trace the bpf() syscall and array_map_alloc_check():
>
> $ retsnoop -e '*sys_bpf' -a 'array_map_alloc_check' --lbr=any \
> -F -k vmlinux --debug full-lbr
> $ simfail bpf-bad-map-max-entries-array # in another terminal
>
> Output of retsnoop:
>
> --- fentry BPF program (entries #63-#17) ---
>
> [#63-#59] __htab_map_lookup_elem: hash table walk with memcmp (hashtab.c)
> [#58] __htab_map_lookup_elem+0x98 -> dump_bpf_prog+0xc850 (hashtab.c:750)
> [#57-#55] ... dump_bpf_prog internal branches ...
> [#54] dump_bpf_prog+0xcab8 -> bpf_get_current_pid_tgid+0x0 (helpers.c:225)
> [#53] bpf_get_current_pid_tgid+0x1c -> dump_bpf_prog+0xcabc (helpers.c:225)
> [#52-#51] ... dump_bpf_prog -> __htab_map_lookup_elem ...
> [#50-#47] __htab_map_lookup_elem: htab_map_hash (jhash2), select_bucket
> [#46-#42] lookup_nulls_elem_raw: hash chain walk with memcmp (hashtab.c:717)
> [#41] __htab_map_lookup_elem+0x98 -> dump_bpf_prog+0xcaf8 (hashtab.c:750)
> [#40-#37] ... dump_bpf_prog -> bpf_ktime_get_ns ...
> [#36] bpf_ktime_get_ns+0x10 -> ktime_get_mono_fast_ns+0x0 (helpers.c:178)
> [#35-#32] ktime_get_mono_fast_ns: tk_clock_read -> arch_counter_get_cntpct
> [#31] ktime_get_mono_fast_ns+0x9c -> bpf_ktime_get_ns+0x14 (timekeeping.c:493)
> [#30] bpf_ktime_get_ns+0x18 -> dump_bpf_prog+0xcd50 (helpers.c:178)
> [#29-#25] ... dump_bpf_prog internal branches ...
> [#24] dump_bpf_prog+0x11b28 -> __bpf_prog_exit_recur+0x0 (trampoline.c:1190)
> [#23-#17] __bpf_prog_exit_recur: rcu_read_unlock, migrate_enable (trampoline.c:1195)
>
> --- array_map_alloc_check (entries #16-#12) ---
>
> [#16] dump_bpf_prog+0x11b38 -> array_map_alloc_check+0x8 (arraymap.c:55)
> [#15] array_map_alloc_check+0x18 -> array_map_alloc_check+0xb8 (arraymap.c:56)
> . bpf_map_attr_numa_node . bpf_map_attr_numa_node
> [#14] array_map_alloc_check+0xbc -> array_map_alloc_check+0x20 (arraymap.c:59)
> . bpf_map_attr_numa_node
> [#13] array_map_alloc_check+0x24 -> array_map_alloc_check+0x94 (arraymap.c:64)
> [#12] array_map_alloc_check+0x98 -> dump_bpf_prog+0x11b3c (arraymap.c:82)
>
> --- fexit trampoline overhead (entries #11-#00) ---
>
> [#11] dump_bpf_prog+0x11b5c -> __bpf_prog_enter_recur+0x0 (trampoline.c:1145)
> [#10-#03] __bpf_prog_enter_recur: rcu_read_lock, migrate_disable (trampoline.c:1146)
> [#02] __bpf_prog_enter_recur+0x114 -> dump_bpf_prog+0x11b60 (trampoline.c:1157)
> [#01] dump_bpf_prog+0x11b6c -> dump_bpf_prog+0xd230
> [#00] dump_bpf_prog+0xd340 -> arm_brbe_snapshot_branch_stack+0x0 (arm_brbe.c:814)
>
> el0t_64_sync+0x168
> el0t_64_sync_handler+0x98
> el0_svc+0x28
> do_el0_svc+0x4c
> invoke_syscall.constprop.0+0x54
> 373us [-EINVAL] __arm64_sys_bpf+0x8
> __sys_bpf+0x87c
> map_create+0x120
> 95us [-EINVAL] array_map_alloc_check+0x8
>
> The FVP's BRBE buffer has 64 entries (BRBE supports 8, 16, 32, or
> 64). Of these, entries #63-#17 (47) are consumed by the fentry BPF
> trampoline that ran before the function, and entries #11-#00 (12)
> are consumed by the fexit trampoline that runs after. Entry #00
> shows the very last branch recorded before BRBE is paused: the call
> into arm_brbe_snapshot_branch_stack().
>
> The 5 useful entries (#16-#12) show the exact path taken inside
> array_map_alloc_check(). Record #14 shows a jump from line 56
> (bpf_map_attr_numa_node) to line 59 (the if-condition), and #13
> shows an immediate jump from line 59 (attr->max_entries == 0) to
> line 64 (return -EINVAL), skipping lines 60-63. This pinpoints
> max_entries==0 as the cause -- a diagnosis impossible with stack
> traces alone.
>
> [1] 856c02dbce4f ("bpf: Introduce helper bpf_get_branch_snapshot")
> [2] https://github.com/anakryiko/retsnoop
>
> Puranjay Mohan (3):
> perf/arm_pmuv3: Fix NULL pointer dereference in armv8pmu_sched_task()
> perf/arm64: Add BRBE support for bpf_get_branch_snapshot()
> selftests/bpf: Adjust wasted entries threshold for ARM64 BRBE
>
> drivers/perf/arm_brbe.c | 70 ++++++++++++++++++-
> drivers/perf/arm_brbe.h | 9 +++
> drivers/perf/arm_pmuv3.c | 16 ++++-
> .../bpf/prog_tests/get_branch_snapshot.c | 9 +--
> 4 files changed, 95 insertions(+), 9 deletions(-)
>
>
> base-commit: ca0f39a369c5f927c3d004e63a5a778b08a9df94
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-18 13:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-13 18:03 [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 1/3] perf/arm_pmuv3: Fix NULL pointer dereference in armv8pmu_sched_task() Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 2/3] perf/arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
2026-03-13 19:59 ` Puranjay Mohan
2026-03-13 21:03 ` Puranjay Mohan
2026-03-13 18:03 ` [PATCH bpf 3/3] selftests/bpf: Adjust wasted entries threshold for ARM64 BRBE Puranjay Mohan
2026-03-18 13:36 ` [PATCH bpf 0/3] arm64: Add BRBE support for bpf_get_branch_snapshot() Puranjay Mohan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox