From: Mark Brown <broonie@kernel.org>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Basant Kumar Dwivedi <Basant.KumarDwivedi@arm.com>,
Will Deacon <will@kernel.org>,
Luis Machado <luis.machado@arm.com>,
Szabolcs Nagy <szabolcs.nagy@arm.com>,
Marc Zyngier <maz@kernel.org>,
Shuah Khan <skhan@linuxfoundation.org>,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org,
Alan Hayward <alan.hayward@arm.com>,
Shuah Khan <shuah@kernel.org>,
kvmarm@lists.cs.columbia.edu,
Salil Akerkar <Salil.Akerkar@arm.com>
Subject: Re: [PATCH v11 06/40] arm64/sme: Provide ABI documentation for SME
Date: Mon, 14 Feb 2022 19:40:52 +0000 [thread overview]
Message-ID: <YgqwRIIi7UZzOOR2@sirena.org.uk> (raw)
In-Reply-To: <YgqdTv3Hq+H76Ml7@arm.com>
[-- Attachment #1.1: Type: text/plain, Size: 6347 bytes --]
On Mon, Feb 14, 2022 at 06:19:58PM +0000, Catalin Marinas wrote:
> On Fri, Feb 11, 2022 at 06:13:58PM +0000, Mark Brown wrote:
> > We could preserve PSTATE.SM, though since all the other register state
> > for streaming mode is shared with SVE I would expect that we should be
> > applying the SVE discard rules to it and there is therefore no other
> > state that should be retained.
> So when clearing PSTATE.SM, the streaming SVE regs become unknown (well,
> the wording is a bit more verbose). I think this fits well with the
> proposal to drop the streaming SVE state entirely on syscalls.
They're preserved or zeroed, yes.
> The ZA state I think is not affected by the PSTATE.SM change (early
> internal SME specs were listing this as unknown after SM clearing but I
> can't find it in the latest spec). However, after the syscall, the user
> won't be able to execute SME instruction until turning on PSTATE.SM
> again.
Yes, ZA is preserved unless PSTATE.ZA is disabled. There are some
instructions that can be used to interact with it outside of streaming
mode, a subset of the instructions for loading and storing values in ZA.
> Would the libc wrappers preserve PSTATE.SM? What I find a bit confusing
> is that we only partially preserve some state while in streaming mode -
> the ZA registers but not the SVE ones.
I would expect that libc wrappers would expect to be called with
streaming mode already disabled - that's what default functions in the
PCS expect, and since without FA64 enabled a huge proportion of FPSIMD
instructions and some SVE instructions become undefined standard code
could easily generate traps if it uses those instructions for anything.
I wouldn't expect that libc would explicitly disable SME itself in
standard configurations.
> Is the user more likely to turn
> PSTATE.SM on for ZA processing or for SVE? If the former, we don't want
> to unnecessarily save/restore some SVE state that the user doesn't care
It's expected that any active work with ZA will require enabling
streaming mode, you can't do any actual computation with it without
doing so and most of the work with ZA will involve using the streaming
mode SVE registers as part of the computation (eg, collecting results in
a Z register, or doing an operation to a ZA tile using the contents of a
Z register as an operand).
It is also expected that some applications may prefer to execute what is
mainly a SVE workload in streaming mode, as well as any performance
relevant differences in the implementation choices the hardware makes it
is likely that some systems will have vector lengths available in
streaming mode that are otherwise unavailable (eg, you might have PEs
with 128 bit FPSIMD/SVE units and a 512 bit SMCU).
I don't have a good handle on which sort of usage is going to be more
common, and I expect that the answer is going to be very system
dependent varying based on both the mix of applications running on the
system at any given moment and the capabilities of the standard and
streaming mode floating point implementations that the system has.
However the existing syscall ABI for the Z and P registers (which is all
the SVE register state, FFR is a magic P register) means that unless we
treat streaming mode differently to non-streaming mode we'll be
discarding whatever state is there anyway so userspace by definition
shouldn't have anything in there it expects to be preserved when it does
a syscall. I'd rather not introduce an ABI that guarantees that we
preserve the streaming mode SVE register state in cases where we discard
(or can discard) the non-streaming SVE register state, that's both going
to be more complicated to implement and more likely to cause unexpected
differences that trip userspace up.
> about (can we even trap SVE instructions independently of SME while in
> streaming mode?).
I'd need to check through but I don't believe so.
> I'd find it clearer if we preserved PSTATE.SM and, w.r.t. the streaming
> SVE state, we somewhat follow the PCS and not restore the regs (input
> from the libc people welcomed).
Like I say we can do that easily enough, it's not something I expect to
ever come up in practical usage though.
> > Having said that as with ZA userspace can just exit streaming mode to
> > avoid any overhead having it enabled introduces and the common case is
> > expected to be that it will have done so due to the PCS, it should be an
> > extremely rare case - unlike keeping ZA active there doesn't seem to be
> > any case where it would be sensible to want to do this and the PCS means
> > you'd have to actively try to do so.
> IIUC, the PCS introduced the notion of streaming-compatible functions
> that preserve the SM bit. If they are non-streaming, SM should be 0 on
Yes, it isn't the default though.
> entry. It would be nice if we put the syscalls in one of these
> categories, so either mandate SM == 0 on entry or preserve (the latter
> being easier, I think, I haven't looked at what it takes to save/restore
> the streaming SVE state; I may change my mind after reviewing at the
> other patches).
The streaming SVE state is identical to the SVE state with the exception
of the FFR predicate register which is not present unless FA64 is
available in the system and enabled and the separatly configured vector
length.
It's sounding like we may as well just preserve SM, it shouldn't come up
that often anyway and if it causes performance problems we can probably
optimise it, and/or userspace can simply just not do that. Like I say I
don't have particularly strong feelings, the current behaviour was just
the easiest thing to implement and it doesn't seem like there is a use
case. This is fine by me, I can do that for the next version.
[fork()/clone() behaviour]
> (few hours later) I think instead of singling out fork() (clone3()
> actually), we can just say that new tasks (process/thread) always start
> with PSTATE.ZA == 0, PSTATE.SM == 0 (tbd for this) and TPIDR2_EL0 == 0
> irrespective of any clone3() flags (even CLONE_SETTLS). The C library
> will have to implement the lazy ZA saving in the parent before the
> syscall and the child will automatically recover the state if it follows
> the PCS.
Works for me, I think forcing the userspace to consider this is going to
work out more robust.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 151 bytes --]
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
next prev parent reply other threads:[~2022-02-14 19:41 UTC|newest]
Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-07 15:20 [PATCH v11 00/40] arm64/sme: Initial support for the Scalable Matrix Extension Mark Brown
2022-02-07 15:20 ` [PATCH v11 01/40] arm64: Define CPACR_EL1_FPEN similarly to other floating point controls Mark Brown
2022-02-10 11:34 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 02/40] arm64: Always use individual bits in CPACR floating point enables Mark Brown
2022-02-10 11:36 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 03/40] arm64: cpufeature: Always specify and use a field width for capabilities Mark Brown
2022-02-10 11:39 ` Catalin Marinas
2022-02-10 11:55 ` Suzuki K Poulose
2022-03-01 22:56 ` Qian Cai
2022-03-02 10:12 ` Marc Zyngier
2022-03-02 11:52 ` Catalin Marinas
2022-03-02 13:02 ` Mark Brown
2022-03-02 12:58 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 04/40] kselftest/arm64: Remove local ARRAY_SIZE() definitions Mark Brown
2022-02-07 23:45 ` Shuah Khan
2022-02-10 15:03 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 05/40] kselftest/arm64: signal: Allow tests to be incompatible with features Mark Brown
2022-02-07 23:54 ` Shuah Khan
2022-02-08 15:32 ` Mark Brown
2022-02-10 15:08 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 06/40] arm64/sme: Provide ABI documentation for SME Mark Brown
2022-02-08 0:10 ` Shuah Khan
2022-02-08 15:46 ` Mark Brown
2022-02-08 18:38 ` Mark Brown
2022-02-08 18:48 ` Shuah Khan
2022-02-08 19:00 ` Mark Brown
2022-02-10 15:12 ` Shuah Khan
2022-02-10 16:18 ` Mark Brown
2022-02-10 16:46 ` Shuah Khan
2022-02-10 18:32 ` Catalin Marinas
2022-02-10 19:45 ` Mark Brown
2022-02-11 17:02 ` Catalin Marinas
2022-02-11 18:13 ` Mark Brown
2022-02-14 18:19 ` Catalin Marinas
2022-02-14 19:40 ` Mark Brown [this message]
2022-02-07 15:20 ` [PATCH v11 07/40] arm64/sme: System register and exception syndrome definitions Mark Brown
2022-02-10 18:35 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 08/40] arm64/sme: Manually encode SME instructions Mark Brown
2022-02-10 18:57 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 09/40] arm64/sme: Early CPU setup for SME Mark Brown
2022-02-21 11:54 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 10/40] arm64/sme: Basic enumeration support Mark Brown
2022-02-21 14:32 ` Catalin Marinas
2022-02-21 15:01 ` Mark Brown
2022-02-21 19:24 ` Catalin Marinas
2022-02-21 23:10 ` Mark Brown
2022-02-22 12:09 ` Catalin Marinas
2022-02-21 16:07 ` Szabolcs Nagy
2022-02-21 19:04 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 11/40] arm64/sme: Identify supported SME vector lengths at boot Mark Brown
2022-02-21 15:57 ` Catalin Marinas
2022-02-21 23:39 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 12/40] arm64/sme: Implement sysctl to set the default vector length Mark Brown
2022-02-21 16:48 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 13/40] arm64/sme: Implement vector length configuration prctl()s Mark Brown
2022-02-21 16:48 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 14/40] arm64/sme: Implement support for TPIDR2 Mark Brown
2022-02-21 16:58 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 15/40] arm64/sme: Implement SVCR context switching Mark Brown
2022-02-21 18:12 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 16/40] arm64/sme: Implement streaming SVE " Mark Brown
2022-02-22 12:53 ` Catalin Marinas
2022-02-22 13:42 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 17/40] arm64/sme: Implement ZA " Mark Brown
2022-02-22 12:53 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 18/40] arm64/sme: Implement traps and syscall handling for SME Mark Brown
2022-02-22 17:54 ` Catalin Marinas
2022-02-22 18:16 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 19/40] arm64/sme: Disable ZA and streaming mode when handling signals Mark Brown
2022-02-22 18:48 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 20/40] arm64/sme: Implement streaming SVE signal handling Mark Brown
2022-02-23 15:16 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 21/40] arm64/sme: Implement ZA " Mark Brown
2022-02-23 15:19 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 22/40] arm64/sme: Implement ptrace support for streaming mode SVE registers Mark Brown
2022-02-23 15:22 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 23/40] arm64/sme: Add ptrace support for ZA Mark Brown
2022-02-23 15:27 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 24/40] arm64/sme: Disable streaming mode and ZA when flushing CPU state Mark Brown
2022-02-23 15:28 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 25/40] arm64/sme: Save and restore streaming mode over EFI runtime calls Mark Brown
2022-02-23 15:31 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 26/40] KVM: arm64: Hide SME system registers from guests Mark Brown
2022-02-23 15:32 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 27/40] KVM: arm64: Trap SME usage in guest Mark Brown
2022-02-23 15:34 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 28/40] KVM: arm64: Handle SME host state when running guests Mark Brown
2022-02-23 15:40 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 29/40] arm64/sme: Provide Kconfig for SME Mark Brown
2022-02-23 15:41 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 30/40] kselftest/arm64: Add manual encodings for SME instructions Mark Brown
2022-02-07 23:57 ` Shuah Khan
2022-02-23 15:41 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 31/40] kselftest/arm64: sme: Add SME support to vlset Mark Brown
2022-02-08 0:15 ` Shuah Khan
2022-02-08 15:51 ` Mark Brown
2022-02-23 15:42 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 32/40] kselftest/arm64: Add tests for TPIDR2 Mark Brown
2022-02-08 0:23 ` Shuah Khan
2022-02-08 16:19 ` Mark Brown
2022-02-23 15:42 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 33/40] kselftest/arm64: Extend vector configuration API tests to cover SME Mark Brown
2022-02-08 0:24 ` Shuah Khan
2022-02-23 15:43 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 34/40] kselftest/arm64: sme: Provide streaming mode SVE stress test Mark Brown
2022-02-08 0:40 ` Shuah Khan
2022-02-08 16:23 ` Mark Brown
2022-02-23 15:45 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 35/40] kselftest/arm64: signal: Handle ZA signal context in core code Mark Brown
2022-02-08 1:01 ` Shuah Khan
2022-02-08 16:29 ` Mark Brown
2022-02-23 15:46 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 36/40] kselftest/arm64: Add stress test for SME ZA context switching Mark Brown
2022-02-23 15:47 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 37/40] kselftest/arm64: signal: Add SME signal handling tests Mark Brown
2022-02-08 1:08 ` Shuah Khan
2022-02-08 17:27 ` Mark Brown
2022-02-23 15:47 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 38/40] kselftest/arm64: Add streaming SVE to SVE ptrace tests Mark Brown
2022-02-08 1:13 ` Shuah Khan
2022-02-23 15:47 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 39/40] kselftest/arm64: Add coverage for the ZA ptrace interface Mark Brown
2022-02-08 1:20 ` Shuah Khan
2022-02-23 15:47 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 40/40] kselftest/arm64: Add SME support to syscall ABI test Mark Brown
2022-02-08 1:52 ` Shuah Khan
2022-02-08 18:15 ` Mark Brown
2022-02-08 18:50 ` Shuah Khan
2022-02-23 15:49 ` Catalin Marinas
2022-02-08 18:54 ` [PATCH v11 00/40] arm64/sme: Initial support for the Scalable Matrix Extension Shuah Khan
2022-02-25 15:50 ` Will Deacon
2022-02-25 15:52 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YgqwRIIi7UZzOOR2@sirena.org.uk \
--to=broonie@kernel.org \
--cc=Basant.KumarDwivedi@arm.com \
--cc=Salil.Akerkar@arm.com \
--cc=alan.hayward@arm.com \
--cc=catalin.marinas@arm.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=luis.machado@arm.com \
--cc=maz@kernel.org \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=szabolcs.nagy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox