public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* ARM SVE ABI: kernel dropping SVE/SME state on syscalls
@ 2024-03-28  0:30 Vineet Gupta
  2024-04-02 18:11 ` Mark Rutland
  0 siblings, 1 reply; 5+ messages in thread
From: Vineet Gupta @ 2024-03-28  0:30 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland; +Cc: Andrew Waterman, Palmer Dabbelt, lkml

Hi Will, Marc,

In the RISC-V land we are hitting an issue and need some help
understanding the SVE ABI about dropping the state on syscalls (and its
implications etc - in hindsight)

If I'm reading the arm64 code correctly, SVE state is unconditionally
(for any syscall whatsoever) dropped in following code path:

el0_svc
    fp_user_discard

The RISC-V Vector ABI mandates something similar and kernel implements
something similar.

    2023-06-29 9657e9b7d253 riscv: Discard vector state on syscalls  

However in recent testing with RISC-V vector builds we are running into
an issue when this just doesn't work.

Just for some background, RISC-V vector instructions relies on
additional state in a VTYPE register which is setup using an apriori
VSETVLI insn.
So consider the following piece of code:

   3ff80:    cc787057              vsetivli    zero,16,e8,mf2,ta,ma    
<-- sets up VTYPE
   3ff84:    44d8                    lw    a4,12(s1)
   3ff86:    449c                    lw    a5,8(s1)
   3ff88:    06f75563              bge    a4,a5,3fff2
   3ff8c:    02010087              vle8.v    v1,(sp)
   3ff90:    020980a7              vse8.v    v1,(s3)   <-- Vector store
instruction
Here's the sequence of events that's causing the issue 1. The vector
store instruction (in say bash) takes a page fault, enters kernel.
2. In PF return path, a SIGCHLD signal is pending (a bash sub-shell
which exited, likely on different cpu).
3. kernel resumes in userspace signal handler which ends up making an
rt_sigreturn syscall - and which as specified discards the V state (and
makes VTYPE reg invalid).
4. When sigreturn finally returns to original Vector store instruction,
invalid VTYPE triggers an Illegal instruction which causes a SIGILL (as
state was discarded above).

So there is no way dropping syscall state would work here.

How do you guys handle this for SVE/SME ? One way would be to not do the
discard in rt_sigreturn codepath, but I don't see that - granted I'm not
too familiar with arch/arm64/*/**

Other thing I wanted to ask is, have there been any perf implications of
this ABI decision: as in if this was other way around, userspace (and/or
compilers) could potentially leverage the fact that SVE/SME state would
still be valid past a syscall - and won't have to reload/resetup etc.

Thanks,
-Vineet

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARM SVE ABI: kernel dropping SVE/SME state on syscalls
  2024-03-28  0:30 ARM SVE ABI: kernel dropping SVE/SME state on syscalls Vineet Gupta
@ 2024-04-02 18:11 ` Mark Rutland
  2024-04-02 19:00   ` Mark Brown
  2024-04-02 20:38   ` Vineet Gupta
  0 siblings, 2 replies; 5+ messages in thread
From: Mark Rutland @ 2024-04-02 18:11 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Will Deacon, Andrew Waterman, Palmer Dabbelt, lkml, Mark Brown,
	Dave Martin

On Wed, Mar 27, 2024 at 05:30:00PM -0700, Vineet Gupta wrote:
> Hi Will, Marc,
> 
> In the RISC-V land we are hitting an issue and need some help
> understanding the SVE ABI about dropping the state on syscalls (and its
> implications etc - in hindsight)
> 
> If I'm reading the arm64 code correctly, SVE state is unconditionally
> (for any syscall whatsoever) dropped in following code path:
> 
> el0_svc
>     fp_user_discard
> 
> The RISC-V Vector ABI mandates something similar and kernel implements
> something similar.
> 
>     2023-06-29 9657e9b7d253 riscv: Discard vector state on syscalls  
> 
> However in recent testing with RISC-V vector builds we are running into
> an issue when this just doesn't work.
> 
> Just for some background, RISC-V vector instructions relies on
> additional state in a VTYPE register which is setup using an apriori
> VSETVLI insn.
> So consider the following piece of code:
> 
>    3ff80:    cc787057              vsetivli    zero,16,e8,mf2,ta,ma    
> <-- sets up VTYPE
>    3ff84:    44d8                    lw    a4,12(s1)
>    3ff86:    449c                    lw    a5,8(s1)
>    3ff88:    06f75563              bge    a4,a5,3fff2
>    3ff8c:    02010087              vle8.v    v1,(sp)
>    3ff90:    020980a7              vse8.v    v1,(s3)   <-- Vector store
> instruction
> Here's the sequence of events that's causing the issue 
>
> 1. The vector store instruction (in say bash) takes a page fault, enters
> kernel.
> 2. In PF return path, a SIGCHLD signal is pending (a bash sub-shell
> which exited, likely on different cpu).

At this point, surely you need to save the VTYPE into a sigframe before
delivering the signal?

> 3. kernel resumes in userspace signal handler which ends up making an
> rt_sigreturn syscall - and which as specified discards the V state (and
> makes VTYPE reg invalid).

The state is discarded at syscall entry, but rt_sigreturn() runs *after* the
discard. If you saved the original VTYPE prior to delivering the signal, it
should be able to restore it regardless of whether it'd be clobbered at syscall
entry.

Surely you *must* save/restore VTYPE in the signal frame? Otherwise the signal
handler can't make any syscall whatsoever, or it's responsible for saving and
restoring VTYPE in userspace, which doesn't seem right.

> 4. When sigreturn finally returns to original Vector store instruction,
> invalid VTYPE triggers an Illegal instruction which causes a SIGILL (as
> state was discarded above).
> 
> So there is no way dropping syscall state would work here.

As above, I don't think that's quite true. It sounds to me like that the actual
bug is that you don't save+restore VTYPE in the signal frame?

> How do you guys handle this for SVE/SME ? One way would be to not do the
> discard in rt_sigreturn codepath, but I don't see that - granted I'm not
> too familiar with arch/arm64/*/**

IIUC this works on arm64 because we'll save all the original state when we
deliver the signal, then restore that state *after* entry to the rt_sigreturn()
syscall.

I can go dig into that tomorrow, but I don't see how this can work unless we
save *all* state prior to delivering the signal, and restoring *all* that state
from the sigframe.

> Other thing I wanted to ask is, have there been any perf implications of
> this ABI decision: as in if this was other way around, userspace (and/or
> compilers) could potentially leverage the fact that SVE/SME state would
> still be valid past a syscall - and won't have to reload/resetup etc.

I believe Mark Brown has made some changes recently to try to avoid some of
that impact. He might be able to comment on that.

Mark.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARM SVE ABI: kernel dropping SVE/SME state on syscalls
  2024-04-02 18:11 ` Mark Rutland
@ 2024-04-02 19:00   ` Mark Brown
  2024-04-02 20:38   ` Vineet Gupta
  1 sibling, 0 replies; 5+ messages in thread
From: Mark Brown @ 2024-04-02 19:00 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Vineet Gupta, Will Deacon, Andrew Waterman, Palmer Dabbelt, lkml,
	Dave Martin

[-- Attachment #1: Type: text/plain, Size: 3250 bytes --]

On Tue, Apr 02, 2024 at 07:11:43PM +0100, Mark Rutland wrote:
> On Wed, Mar 27, 2024 at 05:30:00PM -0700, Vineet Gupta wrote:

> > If I'm reading the arm64 code correctly, SVE state is unconditionally
> > (for any syscall whatsoever) dropped in following code path:

> > el0_svc
> >     fp_user_discard

Yes.

> Surely you *must* save/restore VTYPE in the signal frame? Otherwise the signal
> handler can't make any syscall whatsoever, or it's responsible for saving and
> restoring VTYPE in userspace, which doesn't seem right.

The signal handler would also be unable to use any instructions that
affect the vector state (we've got SVE based memcpy() implementations on
arm64...), and there would be issues with anything that returns to a
context other than the one where the signal was originally deliverered.

> > How do you guys handle this for SVE/SME ? One way would be to not do the
> > discard in rt_sigreturn codepath, but I don't see that - granted I'm not
> > too familiar with arch/arm64/*/**

> IIUC this works on arm64 because we'll save all the original state when we
> deliver the signal, then restore that state *after* entry to the rt_sigreturn()
> syscall.

Yes, that's it exactly.  All the process state including the SVE and SME
state is written into the signal frame along when the signal is
delivered, then on signal return we restore everything from the signal
frame overwriting any changes that happened while the signal handler was
running including the effects of entering the kernel to do the signal
return.

> > Other thing I wanted to ask is, have there been any perf implications of
> > this ABI decision: as in if this was other way around, userspace (and/or
> > compilers) could potentially leverage the fact that SVE/SME state would
> > still be valid past a syscall - and won't have to reload/resetup etc.

> I believe Mark Brown has made some changes recently to try to avoid some of
> that impact. He might be able to comment on that.

The optimisation work I've been doing has all been around avoiding
enabling traps when we discard the SVE state rather than to do with the
discarding of SVE state, the initial implementation for SVE disabled
traps very eagerly which wasn't ideal.

For arm64 our PCS specifies that the SVE specific state becomes unknown
over any standard function call, this means that any optimisation in
userspace that relied on state being preserved would need to be outside
the PCS which would be *relatively* niche though it's definitely
something that could be done especially for non-C code.  The advantage
is that we can stop saving and restoring the full SVE state for
processes that have stopped using SVE, avoiding overhead there.  The
kernel ABI basically models syscalls as C function calls, though it is
stricter in that it specifies the values after a syscall rather than
allowing them to be undefined which does impose a small performance
overhead.

Our SME ABI similarly follows the PCS for standard C functions.  Our
standard PCS for SME requires that functions be called with SME disabled
so when we exit streaming mode over syscall we're just enforcing that,
we will preserve ZA (the matrix register) since it is controlled
separately and that's broadly how the PCS works.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARM SVE ABI: kernel dropping SVE/SME state on syscalls
  2024-04-02 18:11 ` Mark Rutland
  2024-04-02 19:00   ` Mark Brown
@ 2024-04-02 20:38   ` Vineet Gupta
  2024-04-03  7:53     ` Björn Töpel
  1 sibling, 1 reply; 5+ messages in thread
From: Vineet Gupta @ 2024-04-02 20:38 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, Andrew Waterman, Palmer Dabbelt, lkml, Mark Brown,
	Dave Martin, Björn Töpel

+CC Bjorn

On 4/2/24 11:11, Mark Rutland wrote:
> On Wed, Mar 27, 2024 at 05:30:00PM -0700, Vineet Gupta wrote:
>> Hi Will, Marc,

Thx for the reply and apologies for fat-fingering your name above.

>> 1. The vector store instruction (in say bash) takes a page fault, enters
>> kernel.
>> 2. In PF return path, a SIGCHLD signal is pending (a bash sub-shell
>> which exited, likely on different cpu).
> At this point, surely you need to save the VTYPE into a sigframe before
> delivering the signal?

Yes we do.

>> 3. kernel resumes in userspace signal handler which ends up making an
>> rt_sigreturn syscall - and which as specified discards the V state (and
>> makes VTYPE reg invalid).
> The state is discarded at syscall entry, but rt_sigreturn() runs *after* the
> discard. If you saved the original VTYPE prior to delivering the signal, it
> should be able to restore it regardless of whether it'd be clobbered at syscall
> entry.
>
> Surely you *must* save/restore VTYPE in the signal frame? Otherwise the signal
> handler can't make any syscall whatsoever, or it's responsible for saving and
> restoring VTYPE in userspace, which doesn't seem right.

Indeed I later realized that sigreturn is special as it has its own
state to restore. The discard prior drops the state during signal
handler which is anyways transient / throw-away so doesn't hurt this
specific case.

>> 4. When sigreturn finally returns to original Vector store instruction,
>> invalid VTYPE triggers an Illegal instruction which causes a SIGILL (as
>> state was discarded above).
>>
>> So there is no way dropping syscall state would work here.
> As above, I don't think that's quite true. It sounds to me like that the actual
> bug is that you don't save+restore VTYPE in the signal frame?

We do, but there was indeed a different bug which Bjorn found, in
sigreturn V state restore where we were (re)clobbering the V state by
using V-regs in copy-from-user and returning back with that ill restored
state.

>> How do you guys handle this for SVE/SME ? One way would be to not do the
>> discard in rt_sigreturn codepath, but I don't see that - granted I'm not
>> too familiar with arch/arm64/*/**
> IIUC this works on arm64 because we'll save all the original state when we
> deliver the signal, then restore that state *after* entry to the rt_sigreturn()
> syscall.
>
> I can go dig into that tomorrow, but I don't see how this can work unless we
> save *all* state prior to delivering the signal, and restoring *all* that state
> from the sigframe.

You don't have to, Bjorn found the bug and he'll post a fix to lists soon.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: ARM SVE ABI: kernel dropping SVE/SME state on syscalls
  2024-04-02 20:38   ` Vineet Gupta
@ 2024-04-03  7:53     ` Björn Töpel
  0 siblings, 0 replies; 5+ messages in thread
From: Björn Töpel @ 2024-04-03  7:53 UTC (permalink / raw)
  To: Vineet Gupta, Mark Rutland
  Cc: Will Deacon, Andrew Waterman, Palmer Dabbelt, lkml, Mark Brown,
	Dave Martin, Björn Töpel

Vineet Gupta <vineetg@rivosinc.com> writes:

> You don't have to, Bjorn found the bug and he'll post a fix to lists soon.

FWIW, my patch is here:
https://lore.kernel.org/linux-riscv/20240403072638.567446-1-bjorn@kernel.org/


Björn

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-03  7:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-28  0:30 ARM SVE ABI: kernel dropping SVE/SME state on syscalls Vineet Gupta
2024-04-02 18:11 ` Mark Rutland
2024-04-02 19:00   ` Mark Brown
2024-04-02 20:38   ` Vineet Gupta
2024-04-03  7:53     ` Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox