Linux Perf Users
 help / color / mirror / Atom feed
* [PATCH 0/3] arm64: perf: Skip device memory during user callchain unwinding
@ 2026-04-28 20:48 Fredrik Markstrom
  2026-04-28 20:48 ` [PATCH 1/3] " Fredrik Markstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Fredrik Markstrom @ 2026-04-28 20:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Santosh Shilimkar, Olof Johansson, Tony Lindgren
  Cc: linux-arm-kernel, linux-kernel, linux-kselftest, linux-perf-users,
	Nicolas Pitre, Fredrik Markstrom, Ivar Holmqvist, Malin Jonsson

Perf callchain unwinding follows userspace frame pointers via
copy_from_user. A corrupted or malicious frame pointer can point
into device I/O memory mapped into the process (e.g. via UIO or
/dev/mem), causing the kernel to read from MMIO regions in PMU
interrupt context. Such reads can have side effects on hardware
(clearing status registers, advancing FIFOs, triggering DMA) and
on arm64 can produce a synchronous external abort that panics the
kernel.

This series adds a guard that detects device memory before each
frame pointer read and skips the frame.

Patch 1: Lockless page table walk checking the MAIR attribute index
          in the leaf PTE to identify device memory types
          (MT_DEVICE_nGnRnE, MT_DEVICE_nGnRE). Follows the same
          pattern as perf_get_pgtable_size() in kernel/events/core.c.

Patch 2: (DO NOT MERGE) Module parameter to disable the guard at
          runtime for regression testing.

Patch 3: (DO NOT MERGE) kselftest that exercises the attack vector:
          maps /dev/mem, points FP into it, and verifies the kernel
          survives perf sampling.

Alternatives considered:

 - VMA lookup (mmap_read_trylock + vma_lookup checking VM_IO):
   requires the mmap lock on every frame.
 - RCU maple tree lookup: lock-free but still a tree traversal
   per frame.
 - lock_vma_under_rcu: sleeping lock, unusable from IRQ context.

The page table walk requires no locks and costs only 4 pointer
dereferences per frame.

Limitations:

 - The MAIR attribute check is arm64-specific. Other architectures
   use different mechanisms to identify device memory and would need
   their own PTE inspection logic.
 - The walk only detects memory types visible in the PTE. If a VM_IO
   region has not been faulted in, the walk sees no PTE and fails
   safe (skips the frame). This is conservative — it may skip frames
   that would not actually fault.

A QEMU-based reproducer is available at:
https://gitlab.com/frma71/qemu-kernel-tests/-/tree/vmio_perf_test?ref_type=tags

Signed-off-by: Fredrik Markstrom <fredrik.markstrom@est.tech>
---
Fredrik Markstrom (3):
      arm64: perf: Skip device memory during user callchain unwinding
      DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard
      DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test

 MAINTAINERS                                        |   1 +
 arch/arm64/kernel/stacktrace.c                     | 103 +++++++++++++++++++
 tools/testing/selftests/perf_events/Makefile       |   2 +-
 .../testing/selftests/perf_events/test_perf_vmio.c | 114 +++++++++++++++++++++
 4 files changed, 219 insertions(+), 1 deletion(-)
---
base-commit: dca922e019dd758b4c1b4bec8f1d509efddeaab4
change-id: 20260427-master-with-pfix-v3-ae7173f538ca

Best regards,
-- 
Fredrik Markstrom <fredrik.markstrom@est.tech>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] arm64: perf: Skip device memory during user callchain unwinding
  2026-04-28 20:48 [PATCH 0/3] arm64: perf: Skip device memory during user callchain unwinding Fredrik Markstrom
@ 2026-04-28 20:48 ` Fredrik Markstrom
  2026-04-29  2:24   ` sashiko-bot
  2026-04-28 20:48 ` [PATCH 2/3] DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard Fredrik Markstrom
  2026-04-28 20:49 ` [PATCH 3/3] DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test Fredrik Markstrom
  2 siblings, 1 reply; 7+ messages in thread
From: Fredrik Markstrom @ 2026-04-28 20:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Santosh Shilimkar, Olof Johansson, Tony Lindgren
  Cc: linux-arm-kernel, linux-kernel, linux-kselftest, linux-perf-users,
	Nicolas Pitre, Fredrik Markstrom, Ivar Holmqvist, Malin Jonsson

Perf callchain unwinding follows userspace frame pointers via
copy_from_user. A corrupted frame pointer can point into device
I/O memory mapped into the process (e.g. via UIO), causing the
kernel to read from MMIO regions. Reads from device memory can
have side effects, trigger bus errors, or produce faults that
crash the kernel.

Add a lockless page table walk that inspects the MAIR attribute
index in the leaf PTE before reading. If the PTE indicates
device memory (MT_DEVICE_nGnRnE or MT_DEVICE_nGnRE), the frame
is skipped. The walk uses the same lockless accessors as
perf_get_pgtable_size() with local_irq_save/restore to ensure
page table pages are not freed during the walk, as
arch_stack_walk_user() can also be reached from process
context via ftrace (stack_trace_save_user).

The walk is guarded by #ifdef CONFIG_HAVE_GUP_FAST to match
perf_get_pgtable_size(), though the lockless helpers all have
generic fallbacks and the guard may not be strictly necessary.

Without this guard the kernel panics:

    Internal error: synchronous external abort: 0000000096000010 [#1]  SMP
    CPU: 1 UID: 0 PID: 33 Comm: test_perf_vmio Tainted: G   M                7.0.0+ #37 PREEMPTLAZY
    Tainted: [M]=MACHINE_CHECK
    Hardware name: linux,dummy-virt (DT)
    pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : __arch_copy_from_user+0xb8/0x23c
    lr : arch_stack_walk_user+0x218/0x258
    sp : ffff800080433ba0
    x29: ffff800080433ba0 x28: ffff00000097ed40 x27: 0000000000000000
    x26: 000000000000001f x25: ffffffffffffffff x24: ffff00000097ed40
    x23: 000ffffffffffff0 x22: ffff800080433c78 x21: ffff800080022db8
    x20: ffff80008032bc60 x19: 0000ffff9e575000 x18: 0000000000000000
    x17: ffff7fffbfac5000 x16: ffff800080430000 x15: 0000ffff9e575000
    x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
    x8 : 0000000000000001 x7 : 000000000000007f x6 : ffff800080433c10
    x5 : ffff800080433c20 x4 : 0000000000000000 x3 : 0000000000000010
    x2 : 0000000000000010 x1 : 0000ffff9e575000 x0 : ffff800080433c10
    Call trace:
     __arch_copy_from_user+0xb8/0x23c (P)
     perf_callchain_user+0x1c/0x24
     get_perf_callchain+0x130/0x138
     perf_callchain+0xac/0xc4
     perf_prepare_sample+0xac/0x5d8
     perf_event_output_forward+0x44/0xa0
     __perf_event_overflow+0x190/0x230
     perf_event_overflow+0x18/0x20
     armv8pmu_handle_irq+0x154/0x194
     armpmu_dispatch_irq+0x28/0x54
     handle_percpu_devid_irq+0xf0/0x11c
     handle_irq_desc+0x3c/0x50
     generic_handle_domain_irq+0x14/0x1c
     gic_handle_irq+0x80/0x98
     call_on_irq_stack+0x30/0x4c
     do_interrupt_handler+0x5c/0x84
     el0_interrupt+0x58/0x8c
     __el0_irq_handler_common+0x14/0x1c
     el0t_64_irq_handler+0xc/0x14
     el0t_64_irq+0x154/0x158
    Code: f8400827 f8408828 91004021 a88120c7 (f8400827)
    ---[ end trace 0000000000000000 ]---
    Kernel panic - not syncing: synchronous external abort: Fatal exception in interrupt
    SMP: stopping secondary CPUs
    Kernel Offset: disabled
    CPU features: 0x0000000,000d0000,00040000,0400400b
    Memory Limit: none

Assisted-by: Kiro:claude-opus-4.6 [kiro-cli]
Fixes: 030896885ade ("arm64: Performance counters support")
Signed-off-by: Fredrik Markstrom <fredrik.markstrom@est.tech>
Reviewed-by: Ivar Holmqvist <ivar.holmqvist@est.tech>
Reviewed-by: Malin Jonsson <malin.jonsson@est.tech>
---
 arch/arm64/kernel/stacktrace.c | 98 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 3ebcf8c53fb04050488ffc110ff2059028b6772d..6426a307b8f86ae756ea444247ae329591a89b4b 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -4,12 +4,14 @@
  *
  * Copyright (C) 2012 ARM Ltd.
  */
+#include <linux/bitfield.h>
 #include <linux/kernel.h>
 #include <linux/efi.h>
 #include <linux/export.h>
 #include <linux/filter.h>
 #include <linux/ftrace.h>
 #include <linux/kprobes.h>
+#include <linux/pgtable.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
@@ -17,9 +19,99 @@
 
 #include <asm/efi.h>
 #include <asm/irq.h>
+#include <asm/memory.h>
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+/*
+ * addr_is_device_mem - check if a userspace address maps device memory
+ *
+ * Walks the current task's page tables without taking the mmap lock,
+ * using the same lockless pattern as perf_get_pgtable_size() in
+ * kernel/events/core.c.  Inspects the MAIR attribute index in the
+ * leaf PTE to detect device memory types.
+ *
+ * Returns true for device memory (MT_DEVICE_nGnRnE, MT_DEVICE_nGnRE)
+ * or if the mapping cannot be determined.  Safe to call from IRQ/NMI
+ * context.
+ */
+static bool addr_is_device_mem(unsigned long addr)
+{
+#ifdef CONFIG_HAVE_GUP_FAST
+	struct mm_struct *mm = current->mm;
+	pgd_t *pgdp, pgd;
+	p4d_t *p4dp, p4d;
+	pud_t *pudp, pud;
+	pmd_t *pmdp, pmd;
+	pte_t *ptep, pte;
+	unsigned long flags;
+	unsigned int idx;
+	bool is_dev;
+
+	if (!mm)
+		return true;
+
+	local_irq_save(flags);
+
+	pgdp = pgd_offset(mm, addr);
+	pgd = pgdp_get(pgdp);
+	if (pgd_none(pgd))
+		goto err;
+
+	p4dp = p4d_offset_lockless(pgdp, pgd, addr);
+	p4d = p4dp_get(p4dp);
+	if (!p4d_present(p4d))
+		goto err;
+
+	pudp = pud_offset_lockless(p4dp, p4d, addr);
+	pud = pudp_get(pudp);
+	if (!pud_present(pud))
+		goto err;
+
+	if (pud_leaf(pud)) {
+		pte = pud_pte(pud);
+		goto check;
+	}
+
+	pmdp = pmd_offset_lockless(pudp, pud, addr);
+again:
+	pmd = pmdp_get_lockless(pmdp);
+	if (!pmd_present(pmd))
+		goto err;
+
+	if (pmd_leaf(pmd)) {
+		pte = pmd_pte(pmd);
+		goto check;
+	}
+
+	ptep = pte_offset_map(&pmd, addr);
+	if (!ptep)
+		goto again;
+
+	pte = ptep_get_lockless(ptep);
+	pte_unmap(ptep);
+
+	if (!pte_present(pte))
+		goto err;
+check:
+	idx = FIELD_GET(PTE_ATTRINDX_MASK, pte_val(pte));
+	is_dev = idx == MT_DEVICE_nGnRnE || idx == MT_DEVICE_nGnRE;
+	local_irq_restore(flags);
+	return is_dev;
+err:
+	local_irq_restore(flags);
+	return true;
+#else
+	/*
+	 * Without GUP-fast lockless page table helpers we cannot
+	 * inspect the PTE.  Preserve the existing behavior (no
+	 * device memory check) rather than unconditionally blocking
+	 * all unwinding.
+	 */
+	return false;
+#endif
+}
+
 enum kunwind_source {
 	KUNWIND_SOURCE_UNKNOWN,
 	KUNWIND_SOURCE_FRAME,
@@ -524,6 +616,9 @@ unwind_user_frame(struct frame_tail __user *tail, void *cookie,
 	if (!access_ok(tail, sizeof(buftail)))
 		return NULL;
 
+	if (addr_is_device_mem((unsigned long)tail))
+		return NULL;
+
 	pagefault_disable();
 	err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));
 	pagefault_enable();
@@ -572,6 +667,9 @@ unwind_compat_user_frame(struct compat_frame_tail __user *tail, void *cookie,
 	if (!access_ok(tail, sizeof(buftail)))
 		return NULL;
 
+	if (addr_is_device_mem((unsigned long)tail))
+		return NULL;
+
 	pagefault_disable();
 	err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));
 	pagefault_enable();

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard
  2026-04-28 20:48 [PATCH 0/3] arm64: perf: Skip device memory during user callchain unwinding Fredrik Markstrom
  2026-04-28 20:48 ` [PATCH 1/3] " Fredrik Markstrom
@ 2026-04-28 20:48 ` Fredrik Markstrom
  2026-04-29  2:38   ` sashiko-bot
  2026-04-28 20:49 ` [PATCH 3/3] DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test Fredrik Markstrom
  2 siblings, 1 reply; 7+ messages in thread
From: Fredrik Markstrom @ 2026-04-28 20:48 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Santosh Shilimkar, Olof Johansson, Tony Lindgren
  Cc: linux-arm-kernel, linux-kernel, linux-kselftest, linux-perf-users,
	Nicolas Pitre, Fredrik Markstrom, Ivar Holmqvist, Malin Jonsson

Reproducing the synchronous external abort that the device memory
guard prevents requires disabling the guard at runtime. Without
this, there is no way to verify the guard is actually needed or
to regression-test the crash path.

Add a module parameter (skip_vmio, default true) that controls
whether the guard is active. Set to 0 to disable it:
  Boot:    stacktrace.skip_vmio=0
  Runtime: echo 0 > /sys/module/stacktrace/parameters/skip_vmio

When disabled, perf follows frame pointers into device memory
regions, triggering a synchronous external abort and kernel panic
on arm64.

Assisted-by: Kiro:claude-opus-4.6 [kiro-cli]
Signed-off-by: Fredrik Markstrom <fredrik.markstrom@est.tech>
Reviewed-by: Ivar Holmqvist <ivar.holmqvist@est.tech>
Reviewed-by: Malin Jonsson <malin.jonsson@est.tech>
---
 arch/arm64/kernel/stacktrace.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 6426a307b8f86ae756ea444247ae329591a89b4b..ebe909012e4edee7bb1ddbebac2d3c49bdb91665 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -11,6 +11,7 @@
 #include <linux/filter.h>
 #include <linux/ftrace.h>
 #include <linux/kprobes.h>
+#include <linux/moduleparam.h>
 #include <linux/pgtable.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
@@ -112,6 +113,10 @@ static bool addr_is_device_mem(unsigned long addr)
 #endif
 }
 
+static bool skip_vmio = true;
+module_param(skip_vmio, bool, 0644);
+MODULE_PARM_DESC(skip_vmio, "Skip device memory during user callchain unwinding");
+
 enum kunwind_source {
 	KUNWIND_SOURCE_UNKNOWN,
 	KUNWIND_SOURCE_FRAME,
@@ -616,7 +621,7 @@ unwind_user_frame(struct frame_tail __user *tail, void *cookie,
 	if (!access_ok(tail, sizeof(buftail)))
 		return NULL;
 
-	if (addr_is_device_mem((unsigned long)tail))
+	if (READ_ONCE(skip_vmio) && addr_is_device_mem((unsigned long)tail))
 		return NULL;
 
 	pagefault_disable();
@@ -667,7 +672,7 @@ unwind_compat_user_frame(struct compat_frame_tail __user *tail, void *cookie,
 	if (!access_ok(tail, sizeof(buftail)))
 		return NULL;
 
-	if (addr_is_device_mem((unsigned long)tail))
+	if (READ_ONCE(skip_vmio) && addr_is_device_mem((unsigned long)tail))
 		return NULL;
 
 	pagefault_disable();

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test
  2026-04-28 20:48 [PATCH 0/3] arm64: perf: Skip device memory during user callchain unwinding Fredrik Markstrom
  2026-04-28 20:48 ` [PATCH 1/3] " Fredrik Markstrom
  2026-04-28 20:48 ` [PATCH 2/3] DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard Fredrik Markstrom
@ 2026-04-28 20:49 ` Fredrik Markstrom
  2026-04-29  2:53   ` sashiko-bot
  2 siblings, 1 reply; 7+ messages in thread
From: Fredrik Markstrom @ 2026-04-28 20:49 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Shuah Khan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	James Clark, Santosh Shilimkar, Olof Johansson, Tony Lindgren
  Cc: linux-arm-kernel, linux-kernel, linux-kselftest, linux-perf-users,
	Nicolas Pitre, Fredrik Markstrom, Ivar Holmqvist, Malin Jonsson

The device memory callchain guard introduced earlier in this series
needs a regression test to ensure future refactoring does not
silently reintroduce the vulnerability where perf follows frame
pointers into device memory.

Add a kselftest that exercises the exact attack vector: a process
mmaps /dev/mem (creating a device memory mapping), points its frame
pointer into it, and is sampled by perf with frame-pointer
callchains. The test passes if both the process and kernel survive.

The default MMIO address is 0xc0000000; override via the MMIO_ADDR
environment variable if that is unsuitable. The address must be a
physical address that is not backed by any responding device, so
that an access produces a synchronous external abort rather than
returning data. On QEMU's virt machine and many modern arm64
platforms, 0xc0000000 falls in an unused region of the MMIO
address space and works for this purpose.

Since the test only sets the frame pointer to the address (never
reads it directly), the only reads come from the perf unwinder —
which the guard blocks.

The /dev/mem mmap must happen in the child process after fork.
fork() does not copy PTEs for VM_PFNMAP regions, so mapping before
fork leaves the child with empty page tables — the unwinder gets a
translation fault (caught by extable) instead of a synchronous
external abort.

arm64-only; skipped on other architectures.

Assisted-by: Kiro:claude-opus-4.6 [kiro-cli]
Signed-off-by: Fredrik Markstrom <fredrik.markstrom@est.tech>
Reviewed-by: Ivar Holmqvist <ivar.holmqvist@est.tech>
Reviewed-by: Malin Jonsson <malin.jonsson@est.tech>
---
 MAINTAINERS                                        |   1 +
 tools/testing/selftests/perf_events/Makefile       |   2 +-
 .../testing/selftests/perf_events/test_perf_vmio.c | 114 +++++++++++++++++++++
 3 files changed, 116 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2fb1c75afd16388f590a77c04e08d2d6d002f5cc..5416f80c4aac28a5f1d780c76bb23110283dcdc3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20881,6 +20881,7 @@ F:	include/uapi/linux/perf_event.h
 F:	kernel/events/*
 F:	tools/lib/perf/
 F:	tools/perf/
+F:	tools/testing/selftests/perf_events/
 
 PERFORMANCE EVENTS TOOLING ARM64
 R:	John Garry <john.g.garry@oracle.com>
diff --git a/tools/testing/selftests/perf_events/Makefile b/tools/testing/selftests/perf_events/Makefile
index 2e5d85770dfeadd909196dbf980fd334b9580477..a432e24d9e493f77951092571989249703d22351 100644
--- a/tools/testing/selftests/perf_events/Makefile
+++ b/tools/testing/selftests/perf_events/Makefile
@@ -2,5 +2,5 @@
 CFLAGS += -Wl,-no-as-needed -Wall $(KHDR_INCLUDES)
 LDFLAGS += -lpthread
 
-TEST_GEN_PROGS := sigtrap_threads remove_on_exec watermark_signal mmap
+TEST_GEN_PROGS := sigtrap_threads remove_on_exec watermark_signal mmap test_perf_vmio
 include ../lib.mk
diff --git a/tools/testing/selftests/perf_events/test_perf_vmio.c b/tools/testing/selftests/perf_events/test_perf_vmio.c
new file mode 100644
index 0000000000000000000000000000000000000000..780c5800dd6bd3b7a9d3813b490d4621da876da3
--- /dev/null
+++ b/tools/testing/selftests/perf_events/test_perf_vmio.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Device memory perf callchain unwinding test (arm64 only).
+ *
+ * Maps a physical address via /dev/mem (creating a device memory mapping),
+ * launches perf record to sample this process with frame-pointer
+ * callchains, then points FP (x29) into the mapping and spins.
+ * The test passes if the kernel survives without crashing.
+ *
+ * The default MMIO address is 0xc0000000; override via environment:
+ *   MMIO_ADDR=0x10000000 ./test_perf_vmio
+ */
+#include <fcntl.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "kselftest_harness.h"
+
+#define DEFAULT_MMIO_ADDR 0xc0000000UL
+
+TEST(device_memory_callchain)
+{
+#ifndef __aarch64__
+	SKIP(return, "arm64 only");
+#else
+	unsigned long pa = DEFAULT_MMIO_ADDR;
+	unsigned long page, off;
+	pid_t spin_pid, perf_pid;
+	char pid_str[16];
+	int fd, pst;
+	void *m, *fp;
+	char *env;
+
+	if (getuid() != 0)
+		SKIP(return, "need root");
+
+	env = getenv("MMIO_ADDR");
+	if (env)
+		pa = strtoul(env, NULL, 16);
+
+	page = pa & ~0xFFFUL;
+	off = pa - page;
+
+	fd = open("/dev/mem", O_RDWR | O_SYNC);
+	if (fd < 0)
+		SKIP(return, "cannot open /dev/mem");
+
+	/* Fork a spinner child with FP pointing into device memory */
+	spin_pid = fork();
+	ASSERT_GE(spin_pid, 0);
+	if (spin_pid == 0) {
+		/*
+		 * mmap /dev/mem in the child so remap_pfn_range populates
+		 * PTEs directly. fork() does not copy PTEs for VM_PFNMAP
+		 * regions, so mapping before fork leaves the child with
+		 * empty page tables — the unwinder would get a translation
+		 * fault instead of a synchronous external abort.
+		 */
+		m = mmap(NULL, off + 4096, PROT_READ | PROT_WRITE,
+			 MAP_SHARED, fd, page);
+		if (m == MAP_FAILED)
+			_exit(1);
+		fp = (char *)m + off;
+		__asm__ volatile(
+			"mov x29, %0\n"
+			"1: b 1b\n"
+			: : "r"(fp) : "x29", "memory");
+		_exit(0);
+	}
+
+	/* Launch perf to sample the spinner */
+	snprintf(pid_str, sizeof(pid_str), "%d", spin_pid);
+
+	perf_pid = fork();
+	if (perf_pid < 0) {
+		kill(spin_pid, SIGKILL);
+		waitpid(spin_pid, NULL, 0);
+		close(fd);
+		ASSERT_GE(perf_pid, 0);
+	}
+	if (perf_pid == 0) {
+		char *const perf_argv[] = {
+			"perf", "record", "-g", "--call-graph", "fp",
+			"-p", pid_str, "--", "sleep", "3", NULL
+		};
+
+		if (chdir("/tmp"))
+			_exit(1);
+		execvp(perf_argv[0], perf_argv);
+		_exit(1);
+	}
+
+	waitpid(perf_pid, &pst, 0);
+
+	kill(spin_pid, SIGKILL);
+	waitpid(spin_pid, NULL, 0);
+	close(fd);
+
+	if (WIFEXITED(pst) && WEXITSTATUS(pst) == 1)
+		SKIP(return, "perf not available");
+
+	/*
+	 * The real test is that the kernel survived. If we got here
+	 * without a synchronous external abort, the guard worked.
+	 */
+	TH_LOG("kernel survived perf sampling with FP in device memory");
+#endif /* __aarch64__ */
+}
+
+TEST_HARNESS_MAIN

-- 
2.51.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/3] arm64: perf: Skip device memory during user callchain unwinding
  2026-04-28 20:48 ` [PATCH 1/3] " Fredrik Markstrom
@ 2026-04-29  2:24   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-04-29  2:24 UTC (permalink / raw)
  To: Fredrik Markstrom; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] The device memory check only validates the starting address of the user frame, allowing a cross-page boundary bypass that still panics the kernel.
--

commit e9c0174fdde1a5c6f32621d5e911c732eed46d59
Author: Fredrik Markstrom <fredrik.markstrom@est.tech>

arm64: perf: Skip device memory during user callchain unwinding

This commit adds a lockless page table walk to inspect the MAIR attribute
index before reading a user frame pointer. This aims to prevent reading
from device memory regions during stack unwinding, which can trigger bus
errors and panic the kernel.

> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 3ebcf8c53fb0..6426a307b8f8 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
[ ... ]
> @@ -524,6 +616,9 @@ unwind_user_frame(struct frame_tail __user *tail, void *cookie,
>  	if (!access_ok(tail, sizeof(buftail)))
>  		return NULL;
>  
> +	if (addr_is_device_mem((unsigned long)tail))
> +		return NULL;
> +

Could this allow a read across a page boundary into device memory?

If tail is 8-byte aligned and resides 8 bytes before the end of a normal
memory page, addr_is_device_mem() will only check the first page and
return false.

Because struct frame_tail is 16 bytes, the subsequent copy will cross
the page boundary:

>  	pagefault_disable();
>  	err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));

If the adjacent page is mapped as device memory, wouldn't this still
trigger the Synchronous External Abort the patch is trying to fix?

[ ... ]
> @@ -572,6 +667,9 @@ unwind_compat_user_frame(struct compat_frame_tail __user *tail, void *cookie,
>  	if (!access_ok(tail, sizeof(buftail)))
>  		return NULL;
>  
> +	if (addr_is_device_mem((unsigned long)tail))
> +		return NULL;
> +

Does this have the same page boundary issue?

Since struct compat_frame_tail is 8 bytes, an address aligned to 4 bytes at
the end of the page could similarly result in reading 4 bytes from an
adjacent device memory page.

>  	pagefault_disable();
>  	err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260428-master-with-pfix-v3-v1-0-c384d3e53092@est.tech?part=1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard
  2026-04-28 20:48 ` [PATCH 2/3] DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard Fredrik Markstrom
@ 2026-04-29  2:38   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-04-29  2:38 UTC (permalink / raw)
  To: Fredrik Markstrom; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] The patch embeds a test-only backdoor into core architectural code, exposing a system-wide Denial of Service (DoS) vulnerability.
- [High] The device memory guard is vulnerable to a Time-of-Check to Time-of-Use (TOCTOU) race condition that can lead to an unprivileged Denial of Service.
- [Medium] The `skip_vmio` parameter uses `module_param()` instead of `module_param_unsafe()`, failing to taint the kernel when system integrity is compromised.
--

commit 7350d20819d5cd55953c6253c2ca394a596ece60
Author: Fredrik Markstrom <fredrik.markstrom@est.tech>

DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard

This commit introduces a skip_vmio module parameter to bypass the device
memory guard during user callchain unwinding. It allows administrators to
disable the protection dynamically for testing external abort paths.

> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 6426a307b8f8..ebe909012e4e 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c

[ ... ]

> @@ -112,6 +113,10 @@ static bool addr_is_device_mem(unsigned long addr)
>  #endif
>  }
>  
> +static bool skip_vmio = true;
> +module_param(skip_vmio, bool, 0644);
> +MODULE_PARM_DESC(skip_vmio, "Skip device memory during user callchain unwinding");

Does introducing this parameter expose the system to an unprivileged denial
of service? 

If an administrator disables this guard for testing, it globally removes the
device memory protection. An unprivileged user profiling their process could
then point their frame pointer into a mapped device memory region, triggering
a synchronous external abort and kernel panic. 

Test mechanisms that deliberately enable hardware panics are typically
restricted to isolated testing modules or conditionally compiled out of
production kernels.

Also, since disabling this guard deliberately introduces a known crash
vector, should this use module_param_unsafe() instead of module_param()?

Using the unsafe variant would taint the kernel, ensuring that subsequent
crash dumps clearly indicate the kernel was running in a compromised state
rather than experiencing a genuine hardware failure.

>  enum kunwind_source {
>  	KUNWIND_SOURCE_UNKNOWN,

[ ... ]

> @@ -616,7 +621,7 @@ unwind_user_frame(struct frame_tail __user *tail, void *cookie,
>  	if (!access_ok(tail, sizeof(buftail)))
>  		return NULL;
>  
> -	if (addr_is_device_mem((unsigned long)tail))
> +	if (READ_ONCE(skip_vmio) && addr_is_device_mem((unsigned long)tail))
>  		return NULL;
>  
>  	pagefault_disable();

This is an existing issue with the guard being modified here, but is there a
time-of-check to time-of-use race condition between the lockless check and
the memory read?

Because addr_is_device_mem() performs a lockless page table walk, a
concurrent thread could potentially remap the target virtual address to
device memory in the window before the memory is accessed:

Thread 1                               Thread 2
addr_is_device_mem(tail)
// returns false
                                       mmap(tail, ..., MAP_SHARED, fd, 0)
                                       // remaps to device memory
pagefault_disable()
__copy_from_user_inatomic(...)
// triggers synchronous external abort

Could this allow an attacker to bypass the guard and force an illegal device
memory read?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260428-master-with-pfix-v3-v1-0-c384d3e53092@est.tech?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test
  2026-04-28 20:49 ` [PATCH 3/3] DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test Fredrik Markstrom
@ 2026-04-29  2:53   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-04-29  2:53 UTC (permalink / raw)
  To: Fredrik Markstrom; +Cc: linux-perf-users

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Insecure temporary file usage in /tmp creates a privilege escalation or file overwrite vector
- [Medium] Test silently reports a false positive on initialization or perf execution failures
- [Medium] Hardcoded 4KB page alignment breaks on arm64 systems with 16KB or 64KB pages
- [Low] Uninitialized wait status variable `pst` can lead to undefined behavior
- [Low] File descriptor for `/dev/mem` is leaked to the `perf` child process
--

commit 72f5293ddcb8999284919045e0f80f532368b0f1
Author: Fredrik Markstrom <fredrik.markstrom@est.tech>
DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test

This commit adds a regression test for device memory callchain unwinding by
mapping an MMIO address via /dev/mem and sampling the process with perf.

diff --git a/tools/testing/selftests/perf_events/test_perf_vmio.c b/tools/testing/selftests/perf_events/test_perf_vmio.c
new file mode 100644
index 000000000000..780c5800dd6b
--- /dev/null
+++ b/tools/testing/selftests/perf_events/test_perf_vmio.c

[ ... ]

> +	env = getenv("MMIO_ADDR");
> +	if (env)
> +		pa = strtoul(env, NULL, 16);
> +
> +	page = pa & ~0xFFFUL;
> +	off = pa - page;

Is it possible for this to break on arm64 systems configured with 16KB or
64KB pages? The bitwise mask assumes a 4KB page size. Could this be changed
to use sysconf(_SC_PAGESIZE) instead?

> +
> +	fd = open("/dev/mem", O_RDWR | O_SYNC);

Does this leak the file descriptor to the perf child process? It is opened
without O_CLOEXEC, so the fd remains open across the execvp() call for perf.

> +	if (fd < 0)
> +		SKIP(return, "cannot open /dev/mem");
> +
> +	/* Fork a spinner child with FP pointing into device memory */
> +	spin_pid = fork();
> +	ASSERT_GE(spin_pid, 0);
> +	if (spin_pid == 0) {
> +		/*
> +		 * mmap /dev/mem in the child so remap_pfn_range populates
> +		 * PTEs directly. fork() does not copy PTEs for VM_PFNMAP
> +		 * regions, so mapping before fork leaves the child with
> +		 * empty page tables — the unwinder would get a translation
> +		 * fault instead of a synchronous external abort.
> +		 */
> +		m = mmap(NULL, off + 4096, PROT_READ | PROT_WRITE,
> +			 MAP_SHARED, fd, page);
> +		if (m == MAP_FAILED)
> +			_exit(1);

If the child process fails to mmap /dev/mem here (for instance, due to
CONFIG_STRICT_DEVMEM or an unaligned offset), it immediately exits. 

Because the parent does not verify the child's status before running perf,
perf will attempt to attach to a dead process and fail with a status other
than 1. The test then incorrectly passes. Can we verify the child process
is alive before running perf, or check its exit status?

[ ... ]

> +	if (perf_pid == 0) {
> +		char *const perf_argv[] = {
> +			"perf", "record", "-g", "--call-graph", "fp",
> +			"-p", pid_str, "--", "sleep", "3", NULL
> +		};
> +
> +		if (chdir("/tmp"))
> +			_exit(1);
> +		execvp(perf_argv[0], perf_argv);

Could this overwrite unintended files by running a root-privileged perf
record in the world-writable /tmp directory without specifying an output
file?

perf record defaults to writing to perf.data in the current directory.
An unprivileged user could pre-create a symlink at /tmp/perf.data pointing
to a sensitive file.

Could the test use mkdtemp() to create a secure directory, or pipe the
output instead?

> +		_exit(1);
> +	}
> +
> +	waitpid(perf_pid, &pst, 0);

Could pst be left uninitialized if waitpid() fails and returns an error
(such as -1 from being interrupted by a signal)? 

This would cause the subsequent WIFEXITED(pst) and WEXITSTATUS(pst) macros
to read uninitialized memory. Can we initialize pst to 0 or check the
waitpid() return value?

> +
> +	kill(spin_pid, SIGKILL);
> +	waitpid(spin_pid, NULL, 0);
> +	close(fd);
> +
> +	if (WIFEXITED(pst) && WEXITSTATUS(pst) == 1)
> +		SKIP(return, "perf not available");
> +
> +	/*
> +	 * The real test is that the kernel survived. If we got here
> +	 * without a synchronous external abort, the guard worked.
> +	 */
> +	TH_LOG("kernel survived perf sampling with FP in device memory");
> +#endif /* __aarch64__ */
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260428-master-with-pfix-v3-v1-0-c384d3e53092@est.tech?part=3

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-29  2:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 20:48 [PATCH 0/3] arm64: perf: Skip device memory during user callchain unwinding Fredrik Markstrom
2026-04-28 20:48 ` [PATCH 1/3] " Fredrik Markstrom
2026-04-29  2:24   ` sashiko-bot
2026-04-28 20:48 ` [PATCH 2/3] DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard Fredrik Markstrom
2026-04-29  2:38   ` sashiko-bot
2026-04-28 20:49 ` [PATCH 3/3] DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test Fredrik Markstrom
2026-04-29  2:53   ` sashiko-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox