* Re: [PATCH v8 0/2] arm64/sve: Performance improvements with SVE state saving
[not found] <20260320-arm64-sve-trap-mitigation-v8-0-8bf116c8e360@kernel.org>
@ 2026-05-12 14:10 ` Will Deacon
2026-05-13 1:18 ` Mark Brown
0 siblings, 1 reply; 2+ messages in thread
From: Will Deacon @ 2026-05-12 14:10 UTC (permalink / raw)
To: Mark Brown
Cc: Catalin Marinas, Mark Rutland, Ryan Roberts, linux-arm-kernel,
linux-kernel
Hi Mark,
On Fri, Mar 20, 2026 at 03:44:13PM +0000, Mark Brown wrote:
> This series aims to improve our handling of SVE access traps and state
> clearing. As SVE deployment progresses both hardware and software
> actively using SVE is becoming more common. When a task is using SVE it
> faces additional costs, the floating point state we must track is larger
> and our syscall ABI requires that the extra state is cleared on every
> syscall. Users have measured these overheads and raised concerns about
> them.
>
> We can avoid these costs by reenabling SVE access traps and falling back
> to FPSIMD only mode but if we do this too often for tasks that are
> actively using SVE the cost of the access traps becomes prohibitive.
> Currently we attempt to balance the tradeoffs here by starting tasks
> with SVE disabled, enabling it on first use and then turning it off if
> we need to load state from memory while the task is in a syscall. This
> means that CPU bound tasks that do not regularly do blocking syscalls
> will rarely drop SVE while tasks that use a lot of SVE but do block in
> syscalls (eg, due to network or user interaction) will be much more
> likely to do and hence incur SVE access traps.
>
> I did some instrumentation which counted the number of SVE access traps
> and the number of times we loaded FPSIMD only register state for each task.
> Testing with Debian Bookworm this showed that during boot the overwhelming
> majority of tasks triggered another SVE access trap more than 50% of the
> time after loading FPSIMD only state with a substantial number near 100%,
> though some programs had a very small number of SVE accesses most likely
> from the dynamic linker. There were few tasks in the range 5-45%, most
> tasks either used SVE frequently or used it only a tiny proportion of
> times. As expected older distributions which do not have the SVE
> performance work available showed no SVE usage in general applications.
>
> For tasks with minimal SVE usage benchmarking with fp-pidbench on a
> system with 128 bit SVE shows an approximately 6% overhead on syscalls
> from having used SVE in the task, the overhead should be greater on a
> system with 256 bit SVE since the Z registers must be flushed as well as
> the P and FFR registers.
>
> The two patches here move to using a time based heuristic to decide when
> to reenable the SVE access trap, doing so after a second. This means
> that tasks actively using SVE which block in syscalls should see reduced
> or similar numbers of access traps, while CPU bound tasks that rarely
> use SVE will see the SVE syscall overhead removed after running for
> approximately a second, confirmed via fp-pidbench.
Have you looked at all at applying this heuristic to SME? I wonder if it
would help with the recent DVMSync erratum workaround, where tasks that
use SME once/infrequently end up causing IPIs for TLB invalidation every
time they run on an effected core.
Will
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH v8 0/2] arm64/sve: Performance improvements with SVE state saving
2026-05-12 14:10 ` [PATCH v8 0/2] arm64/sve: Performance improvements with SVE state saving Will Deacon
@ 2026-05-13 1:18 ` Mark Brown
0 siblings, 0 replies; 2+ messages in thread
From: Mark Brown @ 2026-05-13 1:18 UTC (permalink / raw)
To: Will Deacon
Cc: Catalin Marinas, Mark Rutland, Ryan Roberts, linux-arm-kernel,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1464 bytes --]
On Tue, May 12, 2026 at 03:10:36PM +0100, Will Deacon wrote:
> On Fri, Mar 20, 2026 at 03:44:13PM +0000, Mark Brown wrote:
> > The two patches here move to using a time based heuristic to decide when
> > to reenable the SVE access trap, doing so after a second. This means
> > that tasks actively using SVE which block in syscalls should see reduced
> > or similar numbers of access traps, while CPU bound tasks that rarely
> > use SVE will see the SVE syscall overhead removed after running for
> > approximately a second, confirmed via fp-pidbench.
> Have you looked at all at applying this heuristic to SME? I wonder if it
> would help with the recent DVMSync erratum workaround, where tasks that
> use SME once/infrequently end up causing IPIs for TLB invalidation every
> time they run on an effected core.
I have not done so myself, though I did discuss this with Catalin while
he was working on those workarounds. IIRC he wanted to get a better
picture of the system level costs with actual usage before deciding if
it was a good tradeoff, either to do this at all or to see if we can
skip the timeout based approach or could just reenable SME traps
whenever we see SME is not in use.
With SME it's easier because you can directly tell if the task currently
has SME enabled so I'd expect the checks on state load are enough and we
don't need to put anything into the syscall path, it's likely that the
tradeoffs for doing that don't work out so well.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-13 1:18 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260320-arm64-sve-trap-mitigation-v8-0-8bf116c8e360@kernel.org>
2026-05-12 14:10 ` [PATCH v8 0/2] arm64/sve: Performance improvements with SVE state saving Will Deacon
2026-05-13 1:18 ` Mark Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox