From: Alexandru Elisei <alexandru.elisei@arm.com>
To: James Clark <james.clark@linaro.org>
Cc: mark.rutland@arm.com, james.morse@arm.com, maz@kernel.org,
oliver.upton@linux.dev, joey.gouly@arm.com,
suzuki.poulose@arm.com, yuzenghui@huawei.com, will@kernel.org,
catalin.marinas@arm.com, linux-arm-kernel@lists.infradead.org,
kvmarm@lists.linux.dev
Subject: Re: [RFC PATCH v6 29/35] KVM: arm64: Pin the SPE buffer in the host and map it at stage 2
Date: Mon, 12 Jan 2026 12:01:44 +0000 [thread overview]
Message-ID: <aWTiqNfMPQzGBmHk@raptor> (raw)
In-Reply-To: <38443801-af4e-4ce1-a1c2-603eca8d90da@linaro.org>
Hi James,
On Fri, Jan 09, 2026 at 04:29:33PM +0000, James Clark wrote:
>
>
> On 14/11/2025 4:07 pm, Alexandru Elisei wrote:
> > If the SPU encounters a translation fault when it attempts to write a
> > profiling record to memory, it stops profiling and asserts the PMBIRQ
> > interrupt. Interrupts are not delivered instantaneously to the CPU, and
> > this creates a profiling blackout window where the profiled CPU executes
> > instructions, but no samples are collected.
> >
> > This is not desirable, and the SPE driver avoids it by keeping the buffer
> > mapped for the entire the profiling session.
> >
> > KVM maps memory at stage 2 when the guest accesses it, following a fault on
> > a missing stage 2 translation, which means that the problem is present in a
> > SPE enabled virtual machine. Worse yet, the blackout windows are
> > unpredictable: the guest profiling the same process can during one
> > profiling session, not trigger any stage 2 faults (the entire buffer memory
> > is already mapped at stage 2), but worst case scenario, during another
> > profiling session, trigger stage 2 faults for every record it attempts to
> > write (if KVM keeps removing the buffer pages from stage 2), or something
> > in between - some records trigger a stage 2 fault, some don't.
> >
> > The solution is for KVM to follow what the SPE driver does: keep the buffer
> > mapped at stage 2 while ProfilingBufferEnabled() is true. To accomplish
>
> Hi Alex,
>
> The problem is that the driver enables and disables the buffer every time
> the target process is switched out unless you explicitly ask for per-CPU
> mode. Is there some kind of heuristic you can add to prevent pinning and
> unpinning unless something actually changes?
>
> Otherwise it's basically unusable with normal perf commands and larger
> buffer sizes. Take these basic examples were I've added a filter so no SPE
> data is even recorded:
>
> $ perf record -e arm_spe/min_latency=1000,event_filter=10/ -m,256M --\
> true
>
> On a kernel with lockep and kmemleak etc this takes 20s to complete. On a
> normal kernel build it still takes 4s.
>
> Much worse is anything more complicated than just 'true' which will have
> more context switching:
>
> $ perf record -e arm_spe/min_latency=1000,event_filter=10/ -m,256M --\
> perf stat true
>
> This takes 3 minutes or 50 seconds to complete (with and without kernel
> debugging features respectively)
>
> For comparison, running these on the host all take less than half a second.
> I measured each pin/unpin taking about 0.2s and the basic 'true' example
> resulting in 100 context switches which adds up to the 20s.
>
> Another interesting stat is that the second example says 'true' ends up
> running at an average clock speed of 4Mhz:
>
> 12683357 cycles # 0.004 GHz
>
> You also get warnings like this
>
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-0): P53/1:b..l
> rcu: (detected by 0, t=6503 jiffies, g=8461, q=43 ncpus=1)
> task:perf state:R running task stack:0 pid:53 tgid:53
> ppid:52 task_flags:0x400000 flags:0x00000008
> Call trace:
> __switch_to+0x1b8/0x2d8 (T)
> __schedule+0x8b4/0x1050
> preempt_schedule_common+0x2c/0xb8
> preempt_schedule+0x30/0x38
> _raw_spin_unlock+0x60/0x70
> finish_fault+0x330/0x408
> do_pte_missing+0x7d4/0x1188
> handle_mm_fault+0x244/0x568
> do_page_fault+0x21c/0x548
> do_translation_fault+0x44/0x68
> do_mem_abort+0x4c/0x100
> el0_da+0x58/0x200
> el0t_64_sync_handler+0xc0/0x130
> el0t_64_sync+0x198/0x1a0
This is awful, I was able to reproduce it.
>
> If we can't add a heuristic to keep the buffer pinned, it almost seems like
> the random blackouts would be preferable to pinning being so slow.
I guess I could make it so the memory is kept pinned when the buffer is
disabled. And then unpin that memory only when the guest enables a buffer that
doesn't intersect with it. And also have a timer to unpin memory so it doesn't
stay pinned forever, together with some sort of memory aging mechanism. This is
getting to be very complex.
And all of this still requires walking the guest's stage 1
each time the buffer is enabled, because even though the VAs might be the same,
the VA->IPA mappings might have changed.
I'll try to prototype something, see if I can get an improvement.
Question: if having a large buffer is an issue, couldn't the VMM just restrict
the buffer size? Or having a large buffer size is that important?
Thanks,
Alex
next prev parent reply other threads:[~2026-01-12 12:01 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 16:06 [RFC PATCH v6 00/35] KVM: arm64: Add Statistical Profiling Extension (SPE) support Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 01/35] arm64/sysreg: Add new SPE fields Alexandru Elisei
2025-12-10 18:38 ` Leo Yan
2025-12-12 9:39 ` Alexandru Elisei
2025-12-15 21:42 ` Suzuki K Poulose
2025-11-14 16:06 ` [RFC PATCH v6 02/35] arm64/sysreg: Define MDCR_EL2.E2PB values Alexandru Elisei
2025-12-15 21:33 ` Suzuki K Poulose
2025-11-14 16:06 ` [RFC PATCH v6 03/35] KVM: arm64: Add CONFIG_KVM_ARM_SPE Kconfig option Alexandru Elisei
2026-01-09 16:29 ` James Clark
2026-01-12 11:26 ` Alexandru Elisei
2026-01-12 12:09 ` James Clark
2026-01-12 12:14 ` James Clark
2026-01-12 15:18 ` Alexandru Elisei
2026-01-13 10:25 ` Alexandru Elisei
2026-01-13 15:00 ` James Clark
2026-01-13 17:03 ` Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 04/35] perf: arm_spe_pmu: Move struct arm_spe_pmu to a separate header file Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 05/35] KVM: arm64: Add KVM_CAP_ARM_SPE capability Alexandru Elisei
2025-12-14 12:18 ` Leo Yan
2025-12-15 11:46 ` Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 06/35] KVM: arm64: Add KVM_ARM_VCPU_SPE VCPU feature Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 07/35] HACK! KVM: arm64: Disable SPE virtualization if protected KVM is enabled Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 08/35] HACK! KVM: arm64: Enable SPE virtualization only in VHE mode Alexandru Elisei
2025-12-15 17:49 ` Leo Yan
2025-11-14 16:06 ` [RFC PATCH v6 09/35] HACK! KVM: arm64: Disable SPE virtualization if nested virt is enabled Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 10/35] KVM: arm64: Add a new VCPU device control group for SPE Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 11/35] KVM: arm64: Add SPE VCPU device attribute to set the interrupt number Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 12/35] KVM: arm64: Add SPE VCPU device attribute to set the SPU device Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 13/35] perf: arm_spe_pmu: Add PMBIDR_EL1 to struct arm_spe_pmu Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 14/35] KVM: arm64: Add SPE VCPU device attribute to set the max buffer size Alexandru Elisei
2026-01-09 16:29 ` James Clark
2026-01-12 11:28 ` Alexandru Elisei
2026-01-12 11:50 ` James Clark
2026-01-12 14:03 ` Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 15/35] KVM: arm64: Add SPE VCPU device attribute to initialize SPE Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 16/35] KVM: arm64: Advertise SPE version in ID_AA64DFR0_EL1.PMSver Alexandru Elisei
2025-12-16 11:40 ` Suzuki K Poulose
2026-01-05 16:42 ` Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 17/35] KVM: arm64: Add writable SPE system registers to VCPU context Alexandru Elisei
2025-12-16 11:54 ` Suzuki K Poulose
2026-01-05 16:42 ` Alexandru Elisei
2025-11-14 16:06 ` [RFC PATCH v6 18/35] perf: arm_spe_pmu: Add PMSIDR_EL1 to struct arm_spe_pmu Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 19/35] KVM: arm64: Trap PMBIDR_EL1 and PMSIDR_EL1 Alexandru Elisei
2026-01-09 16:29 ` James Clark
2026-01-12 11:28 ` Alexandru Elisei
2026-01-12 11:54 ` James Clark
2026-01-13 12:48 ` Alexandru Elisei
2026-01-13 14:22 ` James Clark
2025-11-14 16:07 ` [RFC PATCH v6 20/35] KVM: arm64: config: Use functions from spe.c to test FEAT_SPE_{FnE,FDS} Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 21/35] KVM: arm64: Check for unsupported CPU early in kvm_arch_vcpu_load() Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 22/35] KVM: arm64: VHE: Context switch SPE state Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 23/35] KVM: arm64: Allow guest SPE physical timestamps only if perfmon_capable() Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 24/35] KVM: arm64: Handle SPE hardware maintenance interrupts Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 25/35] KVM: arm64: Add basic handling of SPE buffer control registers writes Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 26/35] KVM: arm64: Add comment to explain how trapped SPE registers are handled Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 27/35] KVM: arm64: Make MTE functions public Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 28/35] KVM: arm64: at: Use callback for reading descriptor Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 29/35] KVM: arm64: Pin the SPE buffer in the host and map it at stage 2 Alexandru Elisei
2026-01-09 16:29 ` James Clark
2026-01-09 16:35 ` James Clark
2026-01-12 12:01 ` Alexandru Elisei [this message]
2026-01-13 14:18 ` James Clark
2025-11-14 16:07 ` [RFC PATCH v6 30/35] KVM: Propagate MMU event to the MMU notifier handlers Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 31/35] KVM: arm64: Handle MMU notifiers for the SPE buffer Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 32/35] KVM: Add KVM_EXIT_RLIMIT exit_reason Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 33/35] KVM: arm64: Implement locked memory accounting for the SPE buffer Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 34/35] KVM: arm64: Add hugetlb support for SPE Alexandru Elisei
2025-11-14 16:07 ` [RFC PATCH v6 35/35] KVM: arm64: Allow the creation of a SPE enabled VM Alexandru Elisei
2025-12-11 16:34 ` [RFC PATCH v6 00/35] KVM: arm64: Add Statistical Profiling Extension (SPE) support Leo Yan
2025-12-12 10:18 ` Alexandru Elisei
2025-12-12 11:15 ` Leo Yan
2025-12-12 11:54 ` Alexandru Elisei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWTiqNfMPQzGBmHk@raptor \
--to=alexandru.elisei@arm.com \
--cc=catalin.marinas@arm.com \
--cc=james.clark@linaro.org \
--cc=james.morse@arm.com \
--cc=joey.gouly@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox