public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Fredrik Markstrom <fredrik.markstrom@est.tech>
To: Catalin Marinas <catalin.marinas@arm.com>,
	 Will Deacon <will@kernel.org>, Shuah Khan <shuah@kernel.org>,
	 Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	 Arnaldo Carvalho de Melo <acme@kernel.org>,
	 Namhyung Kim <namhyung@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	 Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	 Adrian Hunter <adrian.hunter@intel.com>,
	 James Clark <james.clark@linaro.org>,
	 Santosh Shilimkar <santosh.shilimkar@ti.com>,
	 Olof Johansson <olof@lixom.net>,
	Tony Lindgren <tony@atomide.com>
Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  linux-kselftest@vger.kernel.org,
	linux-perf-users@vger.kernel.org,
	 Nicolas Pitre <nico@fluxnic.net>,
	 Fredrik Markstrom <fredrik.markstrom@est.tech>,
	 Ivar Holmqvist <ivar.holmqvist@est.tech>,
	 Malin Jonsson <malin.jonsson@est.tech>
Subject: [PATCH 1/3] arm64: perf: Skip device memory during user callchain unwinding
Date: Tue, 28 Apr 2026 22:48:58 +0200	[thread overview]
Message-ID: <20260428-master-with-pfix-v3-v1-1-c384d3e53092@est.tech> (raw)
In-Reply-To: <20260428-master-with-pfix-v3-v1-0-c384d3e53092@est.tech>

Perf callchain unwinding follows userspace frame pointers via
copy_from_user. A corrupted frame pointer can point into device
I/O memory mapped into the process (e.g. via UIO), causing the
kernel to read from MMIO regions. Reads from device memory can
have side effects, trigger bus errors, or produce faults that
crash the kernel.

Add a lockless page table walk that inspects the MAIR attribute
index in the leaf PTE before reading. If the PTE indicates
device memory (MT_DEVICE_nGnRnE or MT_DEVICE_nGnRE), the frame
is skipped. The walk uses the same lockless accessors as
perf_get_pgtable_size() with local_irq_save/restore to ensure
page table pages are not freed during the walk, as
arch_stack_walk_user() can also be reached from process
context via ftrace (stack_trace_save_user).

The walk is guarded by #ifdef CONFIG_HAVE_GUP_FAST to match
perf_get_pgtable_size(), though the lockless helpers all have
generic fallbacks and the guard may not be strictly necessary.

Without this guard the kernel panics:

    Internal error: synchronous external abort: 0000000096000010 [#1]  SMP
    CPU: 1 UID: 0 PID: 33 Comm: test_perf_vmio Tainted: G   M                7.0.0+ #37 PREEMPTLAZY
    Tainted: [M]=MACHINE_CHECK
    Hardware name: linux,dummy-virt (DT)
    pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : __arch_copy_from_user+0xb8/0x23c
    lr : arch_stack_walk_user+0x218/0x258
    sp : ffff800080433ba0
    x29: ffff800080433ba0 x28: ffff00000097ed40 x27: 0000000000000000
    x26: 000000000000001f x25: ffffffffffffffff x24: ffff00000097ed40
    x23: 000ffffffffffff0 x22: ffff800080433c78 x21: ffff800080022db8
    x20: ffff80008032bc60 x19: 0000ffff9e575000 x18: 0000000000000000
    x17: ffff7fffbfac5000 x16: ffff800080430000 x15: 0000ffff9e575000
    x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
    x8 : 0000000000000001 x7 : 000000000000007f x6 : ffff800080433c10
    x5 : ffff800080433c20 x4 : 0000000000000000 x3 : 0000000000000010
    x2 : 0000000000000010 x1 : 0000ffff9e575000 x0 : ffff800080433c10
    Call trace:
     __arch_copy_from_user+0xb8/0x23c (P)
     perf_callchain_user+0x1c/0x24
     get_perf_callchain+0x130/0x138
     perf_callchain+0xac/0xc4
     perf_prepare_sample+0xac/0x5d8
     perf_event_output_forward+0x44/0xa0
     __perf_event_overflow+0x190/0x230
     perf_event_overflow+0x18/0x20
     armv8pmu_handle_irq+0x154/0x194
     armpmu_dispatch_irq+0x28/0x54
     handle_percpu_devid_irq+0xf0/0x11c
     handle_irq_desc+0x3c/0x50
     generic_handle_domain_irq+0x14/0x1c
     gic_handle_irq+0x80/0x98
     call_on_irq_stack+0x30/0x4c
     do_interrupt_handler+0x5c/0x84
     el0_interrupt+0x58/0x8c
     __el0_irq_handler_common+0x14/0x1c
     el0t_64_irq_handler+0xc/0x14
     el0t_64_irq+0x154/0x158
    Code: f8400827 f8408828 91004021 a88120c7 (f8400827)
    ---[ end trace 0000000000000000 ]---
    Kernel panic - not syncing: synchronous external abort: Fatal exception in interrupt
    SMP: stopping secondary CPUs
    Kernel Offset: disabled
    CPU features: 0x0000000,000d0000,00040000,0400400b
    Memory Limit: none

Assisted-by: Kiro:claude-opus-4.6 [kiro-cli]
Fixes: 030896885ade ("arm64: Performance counters support")
Signed-off-by: Fredrik Markstrom <fredrik.markstrom@est.tech>
Reviewed-by: Ivar Holmqvist <ivar.holmqvist@est.tech>
Reviewed-by: Malin Jonsson <malin.jonsson@est.tech>
---
 arch/arm64/kernel/stacktrace.c | 98 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 3ebcf8c53fb04050488ffc110ff2059028b6772d..6426a307b8f86ae756ea444247ae329591a89b4b 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -4,12 +4,14 @@
  *
  * Copyright (C) 2012 ARM Ltd.
  */
+#include <linux/bitfield.h>
 #include <linux/kernel.h>
 #include <linux/efi.h>
 #include <linux/export.h>
 #include <linux/filter.h>
 #include <linux/ftrace.h>
 #include <linux/kprobes.h>
+#include <linux/pgtable.h>
 #include <linux/sched.h>
 #include <linux/sched/debug.h>
 #include <linux/sched/task_stack.h>
@@ -17,9 +19,99 @@
 
 #include <asm/efi.h>
 #include <asm/irq.h>
+#include <asm/memory.h>
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+/*
+ * addr_is_device_mem - check if a userspace address maps device memory
+ *
+ * Walks the current task's page tables without taking the mmap lock,
+ * using the same lockless pattern as perf_get_pgtable_size() in
+ * kernel/events/core.c.  Inspects the MAIR attribute index in the
+ * leaf PTE to detect device memory types.
+ *
+ * Returns true for device memory (MT_DEVICE_nGnRnE, MT_DEVICE_nGnRE)
+ * or if the mapping cannot be determined.  Safe to call from IRQ/NMI
+ * context.
+ */
+static bool addr_is_device_mem(unsigned long addr)
+{
+#ifdef CONFIG_HAVE_GUP_FAST
+	struct mm_struct *mm = current->mm;
+	pgd_t *pgdp, pgd;
+	p4d_t *p4dp, p4d;
+	pud_t *pudp, pud;
+	pmd_t *pmdp, pmd;
+	pte_t *ptep, pte;
+	unsigned long flags;
+	unsigned int idx;
+	bool is_dev;
+
+	if (!mm)
+		return true;
+
+	local_irq_save(flags);
+
+	pgdp = pgd_offset(mm, addr);
+	pgd = pgdp_get(pgdp);
+	if (pgd_none(pgd))
+		goto err;
+
+	p4dp = p4d_offset_lockless(pgdp, pgd, addr);
+	p4d = p4dp_get(p4dp);
+	if (!p4d_present(p4d))
+		goto err;
+
+	pudp = pud_offset_lockless(p4dp, p4d, addr);
+	pud = pudp_get(pudp);
+	if (!pud_present(pud))
+		goto err;
+
+	if (pud_leaf(pud)) {
+		pte = pud_pte(pud);
+		goto check;
+	}
+
+	pmdp = pmd_offset_lockless(pudp, pud, addr);
+again:
+	pmd = pmdp_get_lockless(pmdp);
+	if (!pmd_present(pmd))
+		goto err;
+
+	if (pmd_leaf(pmd)) {
+		pte = pmd_pte(pmd);
+		goto check;
+	}
+
+	ptep = pte_offset_map(&pmd, addr);
+	if (!ptep)
+		goto again;
+
+	pte = ptep_get_lockless(ptep);
+	pte_unmap(ptep);
+
+	if (!pte_present(pte))
+		goto err;
+check:
+	idx = FIELD_GET(PTE_ATTRINDX_MASK, pte_val(pte));
+	is_dev = idx == MT_DEVICE_nGnRnE || idx == MT_DEVICE_nGnRE;
+	local_irq_restore(flags);
+	return is_dev;
+err:
+	local_irq_restore(flags);
+	return true;
+#else
+	/*
+	 * Without GUP-fast lockless page table helpers we cannot
+	 * inspect the PTE.  Preserve the existing behavior (no
+	 * device memory check) rather than unconditionally blocking
+	 * all unwinding.
+	 */
+	return false;
+#endif
+}
+
 enum kunwind_source {
 	KUNWIND_SOURCE_UNKNOWN,
 	KUNWIND_SOURCE_FRAME,
@@ -524,6 +616,9 @@ unwind_user_frame(struct frame_tail __user *tail, void *cookie,
 	if (!access_ok(tail, sizeof(buftail)))
 		return NULL;
 
+	if (addr_is_device_mem((unsigned long)tail))
+		return NULL;
+
 	pagefault_disable();
 	err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));
 	pagefault_enable();
@@ -572,6 +667,9 @@ unwind_compat_user_frame(struct compat_frame_tail __user *tail, void *cookie,
 	if (!access_ok(tail, sizeof(buftail)))
 		return NULL;
 
+	if (addr_is_device_mem((unsigned long)tail))
+		return NULL;
+
 	pagefault_disable();
 	err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));
 	pagefault_enable();

-- 
2.51.0



  reply	other threads:[~2026-04-28 20:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 20:48 [PATCH 0/3] arm64: perf: Skip device memory during user callchain unwinding Fredrik Markstrom
2026-04-28 20:48 ` Fredrik Markstrom [this message]
2026-04-28 20:48 ` [PATCH 2/3] DO NOT MERGE: arm64: perf: Add skip_vmio parameter to control device memory callchain guard Fredrik Markstrom
2026-04-28 20:49 ` [PATCH 3/3] DO NOT MERGE: selftests: perf_events: Add device memory callchain unwinding test Fredrik Markstrom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428-master-with-pfix-v3-v1-1-c384d3e53092@est.tech \
    --to=fredrik.markstrom@est.tech \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=irogers@google.com \
    --cc=ivar.holmqvist@est.tech \
    --cc=james.clark@linaro.org \
    --cc=jolsa@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=malin.jonsson@est.tech \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=nico@fluxnic.net \
    --cc=olof@lixom.net \
    --cc=peterz@infradead.org \
    --cc=santosh.shilimkar@ti.com \
    --cc=shuah@kernel.org \
    --cc=tony@atomide.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox