* Re: [PATCH v8 0/2] arm64/sve: Performance improvements with SVE state saving
[not found] <20260320-arm64-sve-trap-mitigation-v8-0-8bf116c8e360@kernel.org>
@ 2026-05-12 14:10 ` Will Deacon
2026-05-13 1:18 ` Mark Brown
0 siblings, 1 reply; 2+ messages in thread
From: Will Deacon @ 2026-05-12 14:10 UTC (permalink / raw)
To: Mark Brown
Cc: Catalin Marinas, Mark Rutland, Ryan Roberts, linux-arm-kernel,
linux-kernel
Hi Mark,
On Fri, Mar 20, 2026 at 03:44:13PM +0000, Mark Brown wrote:
> This series aims to improve our handling of SVE access traps and state
> clearing. As SVE deployment progresses both hardware and software
> actively using SVE is becoming more common. When a task is using SVE it
> faces additional costs, the floating point state we must track is larger
> and our syscall ABI requires that the extra state is cleared on every
> syscall. Users have measured these overheads and raised concerns about
> them.
>
> We can avoid these costs by reenabling SVE access traps and falling back
> to FPSIMD only mode but if we do this too often for tasks that are
> actively using SVE the cost of the access traps becomes prohibitive.
> Currently we attempt to balance the tradeoffs here by starting tasks
> with SVE disabled, enabling it on first use and then turning it off if
> we need to load state from memory while the task is in a syscall. This
> means that CPU bound tasks that do not regularly do blocking syscalls
> will rarely drop SVE while tasks that use a lot of SVE but do block in
> syscalls (eg, due to network or user interaction) will be much more
> likely to do and hence incur SVE access traps.
>
> I did some instrumentation which counted the number of SVE access traps
> and the number of times we loaded FPSIMD only register state for each task.
> Testing with Debian Bookworm this showed that during boot the overwhelming
> majority of tasks triggered another SVE access trap more than 50% of the
> time after loading FPSIMD only state with a substantial number near 100%,
> though some programs had a very small number of SVE accesses most likely
> from the dynamic linker. There were few tasks in the range 5-45%, most
> tasks either used SVE frequently or used it only a tiny proportion of
> times. As expected older distributions which do not have the SVE
> performance work available showed no SVE usage in general applications.
>
> For tasks with minimal SVE usage benchmarking with fp-pidbench on a
> system with 128 bit SVE shows an approximately 6% overhead on syscalls
> from having used SVE in the task, the overhead should be greater on a
> system with 256 bit SVE since the Z registers must be flushed as well as
> the P and FFR registers.
>
> The two patches here move to using a time based heuristic to decide when
> to reenable the SVE access trap, doing so after a second. This means
> that tasks actively using SVE which block in syscalls should see reduced
> or similar numbers of access traps, while CPU bound tasks that rarely
> use SVE will see the SVE syscall overhead removed after running for
> approximately a second, confirmed via fp-pidbench.
Have you looked at all at applying this heuristic to SME? I wonder if it
would help with the recent DVMSync erratum workaround, where tasks that
use SME once/infrequently end up causing IPIs for TLB invalidation every
time they run on an effected core.
Will
^ permalink raw reply [flat|nested] 2+ messages in thread