From: Catalin Marinas <catalin.marinas@arm.com>
To: Mark Brown <broonie@kernel.org>
Cc: Basant Kumar Dwivedi <Basant.KumarDwivedi@arm.com>,
Will Deacon <will@kernel.org>,
Luis Machado <luis.machado@arm.com>,
Szabolcs Nagy <szabolcs.nagy@arm.com>,
Marc Zyngier <maz@kernel.org>,
Shuah Khan <skhan@linuxfoundation.org>,
linux-arm-kernel@lists.infradead.org,
linux-kselftest@vger.kernel.org,
Alan Hayward <alan.hayward@arm.com>,
Shuah Khan <shuah@kernel.org>,
kvmarm@lists.cs.columbia.edu,
Salil Akerkar <Salil.Akerkar@arm.com>
Subject: Re: [PATCH v11 06/40] arm64/sme: Provide ABI documentation for SME
Date: Mon, 14 Feb 2022 18:19:58 +0000 [thread overview]
Message-ID: <YgqdTv3Hq+H76Ml7@arm.com> (raw)
In-Reply-To: <YganZni933HbRTmO@sirena.org.uk>
On Fri, Feb 11, 2022 at 06:13:58PM +0000, Mark Brown wrote:
> On Fri, Feb 11, 2022 at 05:02:16PM +0000, Catalin Marinas wrote:
> > So in this case we consider the syscall interface as non-streaming (as
> > per the PCS terminology). Should we require that the PSTATE.SM is
> > cleared by the user as well? Alternatively, we could make it
> > streaming-compatible and just preserve it. Are there any drawbacks?
> > kernel_neon_begin() could clear SM if needed.
>
> In fact kernel_neon_begin() already disables PSTATE.SM since we need to
> account for the case where userspace was preempted rather than issued a
> syscall. We could require that PSTATE.SM is disabled by the user,
> though it's questionable what we could usefully and helpfully do about
> it if they forget other than disable it anyway or generate a signal.
>
> We could preserve PSTATE.SM, though since all the other register state
> for streaming mode is shared with SVE I would expect that we should be
> applying the SVE discard rules to it and there is therefore no other
> state that should be retained.
So when clearing PSTATE.SM, the streaming SVE regs become unknown (well,
the wording is a bit more verbose). I think this fits well with the
proposal to drop the streaming SVE state entirely on syscalls.
The ZA state I think is not affected by the PSTATE.SM change (early
internal SME specs were listing this as unknown after SM clearing but I
can't find it in the latest spec). However, after the syscall, the user
won't be able to execute SME instruction until turning on PSTATE.SM
again.
Would the libc wrappers preserve PSTATE.SM? What I find a bit confusing
is that we only partially preserve some state while in streaming mode -
the ZA registers but not the SVE ones. Is the user more likely to turn
PSTATE.SM on for ZA processing or for SVE? If the former, we don't want
to unnecessarily save/restore some SVE state that the user doesn't care
about (can we even trap SVE instructions independently of SME while in
streaming mode?).
I'd find it clearer if we preserved PSTATE.SM and, w.r.t. the streaming
SVE state, we somewhat follow the PCS and not restore the regs (input
from the libc people welcomed).
> As things stand this would either result
> in more overhead or complicate the register save and restore a bit since
> if we're in streaming mode we currently assume that we should save and
> restore the full SVE register contents but normally in a syscall we only
> need to save and restore the FPSIMD subset. The overhead might go away
> anyway as a result of general work on syscall optimisation for SVE,
> though that work isn't done yet and may not end up working out that way.
>
> Having said that as with ZA userspace can just exit streaming mode to
> avoid any overhead having it enabled introduces and the common case is
> expected to be that it will have done so due to the PCS, it should be an
> extremely rare case - unlike keeping ZA active there doesn't seem to be
> any case where it would be sensible to want to do this and the PCS means
> you'd have to actively try to do so.
IIUC, the PCS introduced the notion of streaming-compatible functions
that preserve the SM bit. If they are non-streaming, SM should be 0 on
entry. It would be nice if we put the syscalls in one of these
categories, so either mandate SM == 0 on entry or preserve (the latter
being easier, I think, I haven't looked at what it takes to save/restore
the streaming SVE state; I may change my mind after reviewing at the
other patches).
> > If PSTATE.ZA is valid and the user does a fork() (well, implemented as
> > clone()), normally it expects a nearly identical state in the child.
> > With clone() if a new thread is created, we likely don't need the
> > additional ZA state. We got away with having to think about this for
> > SVE as the state is lost on syscall. Here we risk having a vaguely
> > defined ABI - fork() is disabled on arm64 for example but we do have
> > clone() and clone3().
>
> > Still thinking about this but maybe we could do something like always
> > copy the ZA state unless CLONE_VM is passed for example. It is
> > marginally more precise.
>
> We should definitely write this up a bit more explictly whatever we do,
> like I say I don't really have strong opinions here.
>
> There's also the interaction with the lazy save state to consider -
> TPIDR2 is cleared if CLONE_SETTLS is specified which would interfere
> with any lazy state saving that had already happened, though hopefully
> userspace is taking care of that as part of setting up the new thread so
> I think it's fine.
TPIDR2_EL0 should indeed be cleared in the child, it doesn't make sense
to start a thread with this reg pointing to a buffer in another thread
(not sure whether it needs to be tied to SETTLS but that works as well).
In fork()+execve() cases, it doesn't make sense to preserve ZA in the
child but we can't tell at fork/clone3() time. OTOH, it probably doesn't
make much sense to call clone3() with PSTATE.ZA set either, so such copy
would rarely/never happen in the kernel. We'd just carry some code for
the classic fork() case.
(few hours later) I think instead of singling out fork() (clone3()
actually), we can just say that new tasks (process/thread) always start
with PSTATE.ZA == 0, PSTATE.SM == 0 (tbd for this) and TPIDR2_EL0 == 0
irrespective of any clone3() flags (even CLONE_SETTLS). The C library
will have to implement the lazy ZA saving in the parent before the
syscall and the child will automatically recover the state if it follows
the PCS.
--
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
next prev parent reply other threads:[~2022-02-14 18:20 UTC|newest]
Thread overview: 132+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-07 15:20 [PATCH v11 00/40] arm64/sme: Initial support for the Scalable Matrix Extension Mark Brown
2022-02-07 15:20 ` [PATCH v11 01/40] arm64: Define CPACR_EL1_FPEN similarly to other floating point controls Mark Brown
2022-02-10 11:34 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 02/40] arm64: Always use individual bits in CPACR floating point enables Mark Brown
2022-02-10 11:36 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 03/40] arm64: cpufeature: Always specify and use a field width for capabilities Mark Brown
2022-02-10 11:39 ` Catalin Marinas
2022-02-10 11:55 ` Suzuki K Poulose
2022-03-01 22:56 ` Qian Cai
2022-03-02 10:12 ` Marc Zyngier
2022-03-02 11:52 ` Catalin Marinas
2022-03-02 13:02 ` Mark Brown
2022-03-02 12:58 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 04/40] kselftest/arm64: Remove local ARRAY_SIZE() definitions Mark Brown
2022-02-07 23:45 ` Shuah Khan
2022-02-10 15:03 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 05/40] kselftest/arm64: signal: Allow tests to be incompatible with features Mark Brown
2022-02-07 23:54 ` Shuah Khan
2022-02-08 15:32 ` Mark Brown
2022-02-10 15:08 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 06/40] arm64/sme: Provide ABI documentation for SME Mark Brown
2022-02-08 0:10 ` Shuah Khan
2022-02-08 15:46 ` Mark Brown
2022-02-08 18:38 ` Mark Brown
2022-02-08 18:48 ` Shuah Khan
2022-02-08 19:00 ` Mark Brown
2022-02-10 15:12 ` Shuah Khan
2022-02-10 16:18 ` Mark Brown
2022-02-10 16:46 ` Shuah Khan
2022-02-10 18:32 ` Catalin Marinas
2022-02-10 19:45 ` Mark Brown
2022-02-11 17:02 ` Catalin Marinas
2022-02-11 18:13 ` Mark Brown
2022-02-14 18:19 ` Catalin Marinas [this message]
2022-02-14 19:40 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 07/40] arm64/sme: System register and exception syndrome definitions Mark Brown
2022-02-10 18:35 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 08/40] arm64/sme: Manually encode SME instructions Mark Brown
2022-02-10 18:57 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 09/40] arm64/sme: Early CPU setup for SME Mark Brown
2022-02-21 11:54 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 10/40] arm64/sme: Basic enumeration support Mark Brown
2022-02-21 14:32 ` Catalin Marinas
2022-02-21 15:01 ` Mark Brown
2022-02-21 19:24 ` Catalin Marinas
2022-02-21 23:10 ` Mark Brown
2022-02-22 12:09 ` Catalin Marinas
2022-02-21 16:07 ` Szabolcs Nagy
2022-02-21 19:04 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 11/40] arm64/sme: Identify supported SME vector lengths at boot Mark Brown
2022-02-21 15:57 ` Catalin Marinas
2022-02-21 23:39 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 12/40] arm64/sme: Implement sysctl to set the default vector length Mark Brown
2022-02-21 16:48 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 13/40] arm64/sme: Implement vector length configuration prctl()s Mark Brown
2022-02-21 16:48 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 14/40] arm64/sme: Implement support for TPIDR2 Mark Brown
2022-02-21 16:58 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 15/40] arm64/sme: Implement SVCR context switching Mark Brown
2022-02-21 18:12 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 16/40] arm64/sme: Implement streaming SVE " Mark Brown
2022-02-22 12:53 ` Catalin Marinas
2022-02-22 13:42 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 17/40] arm64/sme: Implement ZA " Mark Brown
2022-02-22 12:53 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 18/40] arm64/sme: Implement traps and syscall handling for SME Mark Brown
2022-02-22 17:54 ` Catalin Marinas
2022-02-22 18:16 ` Mark Brown
2022-02-07 15:20 ` [PATCH v11 19/40] arm64/sme: Disable ZA and streaming mode when handling signals Mark Brown
2022-02-22 18:48 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 20/40] arm64/sme: Implement streaming SVE signal handling Mark Brown
2022-02-23 15:16 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 21/40] arm64/sme: Implement ZA " Mark Brown
2022-02-23 15:19 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 22/40] arm64/sme: Implement ptrace support for streaming mode SVE registers Mark Brown
2022-02-23 15:22 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 23/40] arm64/sme: Add ptrace support for ZA Mark Brown
2022-02-23 15:27 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 24/40] arm64/sme: Disable streaming mode and ZA when flushing CPU state Mark Brown
2022-02-23 15:28 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 25/40] arm64/sme: Save and restore streaming mode over EFI runtime calls Mark Brown
2022-02-23 15:31 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 26/40] KVM: arm64: Hide SME system registers from guests Mark Brown
2022-02-23 15:32 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 27/40] KVM: arm64: Trap SME usage in guest Mark Brown
2022-02-23 15:34 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 28/40] KVM: arm64: Handle SME host state when running guests Mark Brown
2022-02-23 15:40 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 29/40] arm64/sme: Provide Kconfig for SME Mark Brown
2022-02-23 15:41 ` Catalin Marinas
2022-02-07 15:20 ` [PATCH v11 30/40] kselftest/arm64: Add manual encodings for SME instructions Mark Brown
2022-02-07 23:57 ` Shuah Khan
2022-02-23 15:41 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 31/40] kselftest/arm64: sme: Add SME support to vlset Mark Brown
2022-02-08 0:15 ` Shuah Khan
2022-02-08 15:51 ` Mark Brown
2022-02-23 15:42 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 32/40] kselftest/arm64: Add tests for TPIDR2 Mark Brown
2022-02-08 0:23 ` Shuah Khan
2022-02-08 16:19 ` Mark Brown
2022-02-23 15:42 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 33/40] kselftest/arm64: Extend vector configuration API tests to cover SME Mark Brown
2022-02-08 0:24 ` Shuah Khan
2022-02-23 15:43 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 34/40] kselftest/arm64: sme: Provide streaming mode SVE stress test Mark Brown
2022-02-08 0:40 ` Shuah Khan
2022-02-08 16:23 ` Mark Brown
2022-02-23 15:45 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 35/40] kselftest/arm64: signal: Handle ZA signal context in core code Mark Brown
2022-02-08 1:01 ` Shuah Khan
2022-02-08 16:29 ` Mark Brown
2022-02-23 15:46 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 36/40] kselftest/arm64: Add stress test for SME ZA context switching Mark Brown
2022-02-23 15:47 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 37/40] kselftest/arm64: signal: Add SME signal handling tests Mark Brown
2022-02-08 1:08 ` Shuah Khan
2022-02-08 17:27 ` Mark Brown
2022-02-23 15:47 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 38/40] kselftest/arm64: Add streaming SVE to SVE ptrace tests Mark Brown
2022-02-08 1:13 ` Shuah Khan
2022-02-23 15:47 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 39/40] kselftest/arm64: Add coverage for the ZA ptrace interface Mark Brown
2022-02-08 1:20 ` Shuah Khan
2022-02-23 15:47 ` Catalin Marinas
2022-02-07 15:21 ` [PATCH v11 40/40] kselftest/arm64: Add SME support to syscall ABI test Mark Brown
2022-02-08 1:52 ` Shuah Khan
2022-02-08 18:15 ` Mark Brown
2022-02-08 18:50 ` Shuah Khan
2022-02-23 15:49 ` Catalin Marinas
2022-02-08 18:54 ` [PATCH v11 00/40] arm64/sme: Initial support for the Scalable Matrix Extension Shuah Khan
2022-02-25 15:50 ` Will Deacon
2022-02-25 15:52 ` Will Deacon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YgqdTv3Hq+H76Ml7@arm.com \
--to=catalin.marinas@arm.com \
--cc=Basant.KumarDwivedi@arm.com \
--cc=Salil.Akerkar@arm.com \
--cc=alan.hayward@arm.com \
--cc=broonie@kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=luis.machado@arm.com \
--cc=maz@kernel.org \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=szabolcs.nagy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox