linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* Feature request for enabling SCTLR_ELx.nAA
@ 2023-02-22 23:17 Richard Henderson
  2023-02-23 12:15 ` Catalin Marinas
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Henderson @ 2023-02-22 23:17 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: Alex Bennée

Hi guys,

It would be helpful to have a prctl for enabling nAA.  Since we already have 
task->thread.sctlr_user, it would seem that this would not require any additional overhead 
during __switch_to().

My use case is the QEMU JIT, and being able to make use of LDAR/STLR instead of explicit 
DBM in some cases.  At the moment, I can only make this replacement when the address is 
provably aligned, which is tricky to do with the time budget of a JIT, so the replacement 
rarely triggers.  This ought to make a difference when emulating strongly ordered guests 
like x86.

Thanks,


r~

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Feature request for enabling SCTLR_ELx.nAA
  2023-02-22 23:17 Feature request for enabling SCTLR_ELx.nAA Richard Henderson
@ 2023-02-23 12:15 ` Catalin Marinas
  2023-02-23 15:36   ` Richard Henderson
  2023-02-24 17:09   ` Will Deacon
  0 siblings, 2 replies; 4+ messages in thread
From: Catalin Marinas @ 2023-02-23 12:15 UTC (permalink / raw)
  To: Richard Henderson
  Cc: linux-arm-kernel, Alex Bennée, Will Deacon, Mark Rutland

On Wed, Feb 22, 2023 at 01:17:55PM -1000, Richard Henderson wrote:
> It would be helpful to have a prctl for enabling nAA.  Since we already have
> task->thread.sctlr_user, it would seem that this would not require any
> additional overhead during __switch_to().

This shouldn't be difficult to add.

> My use case is the QEMU JIT, and being able to make use of LDAR/STLR instead
> of explicit DBM in some cases.  At the moment, I can only make this
> replacement when the address is provably aligned, which is tricky to do with
> the time budget of a JIT, so the replacement rarely triggers.  This ought to
> make a difference when emulating strongly ordered guests like x86.

It looks like in 4.17 (commit 7206dc93a58f, "arm64: Expose Arm v8.4
features") we exposed the LSE2 features as HWCAP_USCAT (unaligned
single-copy atomicity) but that still restricts LDAR/STLR to a 16-byte
boundary as there is no control for SCTLR_EL1.nAA.

Given that allowing unaligned accesses could break atomicity, I wouldn't
set this bit to 1 permanently, it helps catching tricky software bugs.
So a prctl() makes more sense. If your intended use is just preserving
the acquire/release semantics, I don't think these are affected by the
atomicity rules even if they go across a 16-byte boundary.

Adding Will and Mark for their view on this.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Feature request for enabling SCTLR_ELx.nAA
  2023-02-23 12:15 ` Catalin Marinas
@ 2023-02-23 15:36   ` Richard Henderson
  2023-02-24 17:09   ` Will Deacon
  1 sibling, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2023-02-23 15:36 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Alex Bennée, Will Deacon, Mark Rutland

On 2/23/23 02:15, Catalin Marinas wrote:
> Given that allowing unaligned accesses could break atomicity, I wouldn't
> set this bit to 1 permanently, it helps catching tricky software bugs.
> So a prctl() makes more sense. If your intended use is just preserving
> the acquire/release semantics, I don't think these are affected by the
> atomicity rules even if they go across a 16-byte boundary.

Yes, my intended use is just the acquire/release.

As I read the Arm ARM pseudo-code for aarch64/functions/memory/Mem, the !aligned case 
devolves to a series of bytes, but with the same acctype, so each byte is AccType_ORDERED.

Which is just fine for my use case.


r~

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Feature request for enabling SCTLR_ELx.nAA
  2023-02-23 12:15 ` Catalin Marinas
  2023-02-23 15:36   ` Richard Henderson
@ 2023-02-24 17:09   ` Will Deacon
  1 sibling, 0 replies; 4+ messages in thread
From: Will Deacon @ 2023-02-24 17:09 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Richard Henderson, linux-arm-kernel, Alex Bennée,
	Mark Rutland

On Thu, Feb 23, 2023 at 12:15:43PM +0000, Catalin Marinas wrote:
> On Wed, Feb 22, 2023 at 01:17:55PM -1000, Richard Henderson wrote:
> > It would be helpful to have a prctl for enabling nAA.  Since we already have
> > task->thread.sctlr_user, it would seem that this would not require any
> > additional overhead during __switch_to().
> 
> This shouldn't be difficult to add.
> 
> > My use case is the QEMU JIT, and being able to make use of LDAR/STLR instead
> > of explicit DBM in some cases.  At the moment, I can only make this
> > replacement when the address is provably aligned, which is tricky to do with
> > the time budget of a JIT, so the replacement rarely triggers.  This ought to
> > make a difference when emulating strongly ordered guests like x86.
> 
> It looks like in 4.17 (commit 7206dc93a58f, "arm64: Expose Arm v8.4
> features") we exposed the LSE2 features as HWCAP_USCAT (unaligned
> single-copy atomicity) but that still restricts LDAR/STLR to a 16-byte
> boundary as there is no control for SCTLR_EL1.nAA.
> 
> Given that allowing unaligned accesses could break atomicity, I wouldn't
> set this bit to 1 permanently, it helps catching tricky software bugs.
> So a prctl() makes more sense. If your intended use is just preserving
> the acquire/release semantics, I don't think these are affected by the
> atomicity rules even if they go across a 16-byte boundary.
> 
> Adding Will and Mark for their view on this.

I'd definitely want to see some numbers to justify the complexity of a new
prctl(), but otherwise it sounds fine as long as it's opt-in and cleared on
exec().

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-02-24 17:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-22 23:17 Feature request for enabling SCTLR_ELx.nAA Richard Henderson
2023-02-23 12:15 ` Catalin Marinas
2023-02-23 15:36   ` Richard Henderson
2023-02-24 17:09   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).