From: Marc Zyngier <maz@kernel.org>
To: Sergio Lopez Pascual <slp@redhat.com>
Cc: Eric Curtin <ecurtin@redhat.com>, Will Deacon <will@kernel.org>,
Hector Martin <marcan@marcan.st>,
Catalin Marinas <catalin.marinas@arm.com>,
Mark Rutland <mark.rutland@arm.com>,
Zayd Qumsieh <zayd_qumsieh@apple.com>,
Justin Lu <ih_justin@apple.com>,
Ryan Houdek <Houdek.Ryan@fex-emu.org>,
Mark Brown <broonie@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
Mateusz Guzik <mjguzik@gmail.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Oliver Upton <oliver.upton@linux.dev>,
Miguel Luis <miguel.luis@oracle.com>,
Joey Gouly <joey.gouly@arm.com>,
Christoph Paasch <cpaasch@apple.com>,
Kees Cook <keescook@chromium.org>,
Sami Tolvanen <samitolvanen@google.com>,
Baoquan He <bhe@redhat.com>,
Joel Granados <j.granados@samsung.com>,
Dawei Li <dawei.li@shingroup.cn>,
Andrew Morton <akpm@linux-foundation.org>,
Florent Revest <revest@chromium.org>,
David Hildenbrand <david@redhat.com>,
Stefan Roesch <shr@devkernel.io>,
Andy Chiu <andy.chiu@sifive.com>,
Josh Triplett <josh@joshtriplett.org>,
Oleg Nesterov <oleg@redhat.com>, Helge Deller <deller@gmx.de>,
Zev Weiss <zev@bewilderbeest.net>,
Ondrej Mosnacek <omosnace@redhat.com>,
Miguel Ojeda <ojeda@kernel.org>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, Asahi Linux <asahi@lists.linux.dev>
Subject: Re: [PATCH 0/4] arm64: Support the TSO memory model
Date: Mon, 06 May 2024 17:12:53 +0100 [thread overview]
Message-ID: <86y18mq5q2.wl-maz@kernel.org> (raw)
In-Reply-To: <CAAiTLFW8DWH-ejNgcXgr2tQxxF4pp7BNUFGyUq99BfrYx1kScQ@mail.gmail.com>
On Mon, 06 May 2024 12:21:40 +0100,
Sergio Lopez Pascual <slp@redhat.com> wrote:
>
> Eric Curtin <ecurtin@redhat.com> writes:
>
> > On Fri, 19 Apr 2024 at 18:08, Will Deacon <will@kernel.org> wrote:
> >>
> >> On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> >> > On 2024/04/11 22:28, Will Deacon wrote:
> >> > > * Some binaries in a distribution exhibit instability which goes away
> >> > > in TSO mode, so a taskset-like program is used to run them with TSO
> >> > > enabled.
> >> >
> >> > Since the flag is cleared on execve, this third one isn't generally
> >> > possible as far as I know.
> >>
> >> Ah ok, I'd missed that. Thanks.
> >>
> >> > > In all these cases, we end up with native arm64 applications that will
> >> > > either fail to load or will crash in subtle ways on CPUs without the TSO
> >> > > feature. Assuming that the application cannot be fixed, a better
> >> > > approach would be to recompile using stronger instructions (e.g.
> >> > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> >> > > true that some existing CPUs are TSO by design (this is a perfectly
> >> > > valid implementation of the arm64 memory model), but I think there's a
> >> > > big difference between quietly providing more ordering guarantees than
> >> > > software may be relying on and providing a mechanism to discover,
> >> > > request and ultimately rely upon the stronger behaviour.
> >> >
> >> > The problem is "just" using stronger instructions is much more
> >> > expensive, as emulators have demonstrated. If TSO didn't serve a
> >> > practical purpose I wouldn't be submitting this, but it does. This is
> >> > basically non-negotiable for x86 emulation; if this is rejected
> >> > upstream, it will forever live as a downstream patch used by the entire
> >> > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> >> > explicitly targeting, given our efforts with microVMs for 4K page size
> >> > support and the upcoming Vulkan drivers).
>
> In addition to the use case Hector exposed here, there's another,
> potentially larger one, which is running x86_64 containers on aarch64
> systems, using a combination of both Virtualization and emulation.
>
> In this scenario, both not being able to use TSO for emulation
> and having to enable it all the time for the whole VM have a very large
> impact on performance (~25% on some workloads).
Well, there is always a price to pay somewhere, and this is the usual
trade-off between performance and maintainability.
> I understand the concern about the risk of userspace fragmentation, but
> I was wondering if we could minimize it to an acceptable level by
> narrowing down the context. For instance, since both use cases we're
> bringing to the table imply the use of Virtualization, we should be able
> to restrict PR_SET_MEM_MODEL to only be accepted when running on EL1
> (and not in nVHE nor pKVM), returning EINVAL otherwise. This would
> heavily discourage users from relying on this feature for native
> applications that can run on arbitrary contexts, hence drastically
> reducing the fragmentation risk.
As I explained in another sub-thread[1], I am not prepared to allow
non architectural state to be exposed to a guest. I'm also not
prepared to make significant ABI differences between VHE, nVHE, hVHE,
with or without pKVM, because the job of the kernel is to abstract
those differences.
> We would still need a way to ensure the trap gets to the VMM and for
> the VMM to operate on the impdef ACTLR_EL12, but that should be dealt on
> a different series.
The VMM can't use ACTLR_EL12, by the very definition of this register
(the clue is in the name). You'd have to proxy the write in the
kernel and context-switch it, which means adding non-architectural
state to KVM, breaking VM migration and adding more kludges to the
existing Apple-specific host crap.
Also, let's realise that we are talking about making significant
changes to the arm64 ABI for a platform that is still not fully
supported in the upstream kernel. I have the feeling that changing the
memory model dynamically may not be of the utmost priority until then.
Thanks,
M.
[1] https://lore.kernel.org/all/867cgcqrb9.wl-maz@kernel.org
--
Without deviation from the norm, progress is not possible.
WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Sergio Lopez Pascual <slp@redhat.com>
Cc: Eric Curtin <ecurtin@redhat.com>, Will Deacon <will@kernel.org>,
Hector Martin <marcan@marcan.st>,
Catalin Marinas <catalin.marinas@arm.com>,
Mark Rutland <mark.rutland@arm.com>,
Zayd Qumsieh <zayd_qumsieh@apple.com>,
Justin Lu <ih_justin@apple.com>,
Ryan Houdek <Houdek.Ryan@fex-emu.org>,
Mark Brown <broonie@kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
Mateusz Guzik <mjguzik@gmail.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Oliver Upton <oliver.upton@linux.dev>,
Miguel Luis <miguel.luis@oracle.com>,
Joey Gouly <joey.gouly@arm.com>,
Christoph Paasch <cpaasch@apple.com>,
Kees Cook <keescook@chromium.org>,
Sami Tolvanen <samitolvanen@google.com>,
Baoquan He <bhe@redhat.com>,
Joel Granados <j.granados@samsung.com>,
Dawei Li <dawei.li@shingroup.cn>,
Andrew Morton <akpm@linux-foundation.org>,
Florent Revest <revest@chromium.org>,
David Hildenbrand <david@redhat.com>,
Stefan Roesch <shr@devkernel.io>,
Andy Chiu <andy.chiu@sifive.com>,
Josh Triplett <josh@joshtriplett.org>,
Oleg Nesterov <oleg@redhat.com>, Helge Deller <deller@gmx.de>,
Zev Weiss <zev@bewilderbeest.net>,
Ondrej Mosnacek <omosnace@redhat.com>,
Miguel Ojeda <ojeda@kernel.org>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, Asahi Linux <asahi@lists.linux.dev>
Subject: Re: [PATCH 0/4] arm64: Support the TSO memory model
Date: Mon, 06 May 2024 17:12:53 +0100 [thread overview]
Message-ID: <86y18mq5q2.wl-maz@kernel.org> (raw)
In-Reply-To: <CAAiTLFW8DWH-ejNgcXgr2tQxxF4pp7BNUFGyUq99BfrYx1kScQ@mail.gmail.com>
On Mon, 06 May 2024 12:21:40 +0100,
Sergio Lopez Pascual <slp@redhat.com> wrote:
>
> Eric Curtin <ecurtin@redhat.com> writes:
>
> > On Fri, 19 Apr 2024 at 18:08, Will Deacon <will@kernel.org> wrote:
> >>
> >> On Thu, Apr 11, 2024 at 11:19:13PM +0900, Hector Martin wrote:
> >> > On 2024/04/11 22:28, Will Deacon wrote:
> >> > > * Some binaries in a distribution exhibit instability which goes away
> >> > > in TSO mode, so a taskset-like program is used to run them with TSO
> >> > > enabled.
> >> >
> >> > Since the flag is cleared on execve, this third one isn't generally
> >> > possible as far as I know.
> >>
> >> Ah ok, I'd missed that. Thanks.
> >>
> >> > > In all these cases, we end up with native arm64 applications that will
> >> > > either fail to load or will crash in subtle ways on CPUs without the TSO
> >> > > feature. Assuming that the application cannot be fixed, a better
> >> > > approach would be to recompile using stronger instructions (e.g.
> >> > > LDAR/STLR) so that at least the resulting binary is portable. Now, it's
> >> > > true that some existing CPUs are TSO by design (this is a perfectly
> >> > > valid implementation of the arm64 memory model), but I think there's a
> >> > > big difference between quietly providing more ordering guarantees than
> >> > > software may be relying on and providing a mechanism to discover,
> >> > > request and ultimately rely upon the stronger behaviour.
> >> >
> >> > The problem is "just" using stronger instructions is much more
> >> > expensive, as emulators have demonstrated. If TSO didn't serve a
> >> > practical purpose I wouldn't be submitting this, but it does. This is
> >> > basically non-negotiable for x86 emulation; if this is rejected
> >> > upstream, it will forever live as a downstream patch used by the entire
> >> > gaming-on-Mac-Linux ecosystem (and this is an ecosystem we are very
> >> > explicitly targeting, given our efforts with microVMs for 4K page size
> >> > support and the upcoming Vulkan drivers).
>
> In addition to the use case Hector exposed here, there's another,
> potentially larger one, which is running x86_64 containers on aarch64
> systems, using a combination of both Virtualization and emulation.
>
> In this scenario, both not being able to use TSO for emulation
> and having to enable it all the time for the whole VM have a very large
> impact on performance (~25% on some workloads).
Well, there is always a price to pay somewhere, and this is the usual
trade-off between performance and maintainability.
> I understand the concern about the risk of userspace fragmentation, but
> I was wondering if we could minimize it to an acceptable level by
> narrowing down the context. For instance, since both use cases we're
> bringing to the table imply the use of Virtualization, we should be able
> to restrict PR_SET_MEM_MODEL to only be accepted when running on EL1
> (and not in nVHE nor pKVM), returning EINVAL otherwise. This would
> heavily discourage users from relying on this feature for native
> applications that can run on arbitrary contexts, hence drastically
> reducing the fragmentation risk.
As I explained in another sub-thread[1], I am not prepared to allow
non architectural state to be exposed to a guest. I'm also not
prepared to make significant ABI differences between VHE, nVHE, hVHE,
with or without pKVM, because the job of the kernel is to abstract
those differences.
> We would still need a way to ensure the trap gets to the VMM and for
> the VMM to operate on the impdef ACTLR_EL12, but that should be dealt on
> a different series.
The VMM can't use ACTLR_EL12, by the very definition of this register
(the clue is in the name). You'd have to proxy the write in the
kernel and context-switch it, which means adding non-architectural
state to KVM, breaking VM migration and adding more kludges to the
existing Apple-specific host crap.
Also, let's realise that we are talking about making significant
changes to the arm64 ABI for a platform that is still not fully
supported in the upstream kernel. I have the feeling that changing the
memory model dynamically may not be of the utmost priority until then.
Thanks,
M.
[1] https://lore.kernel.org/all/867cgcqrb9.wl-maz@kernel.org
--
Without deviation from the norm, progress is not possible.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2024-05-06 16:12 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-11 0:51 [PATCH 0/4] arm64: Support the TSO memory model Hector Martin
2024-04-11 0:51 ` Hector Martin
2024-04-11 0:51 ` [PATCH 1/4] prctl: Introduce PR_{SET,GET}_MEM_MODEL Hector Martin
2024-04-11 0:51 ` Hector Martin
2024-04-11 0:51 ` [PATCH 2/4] arm64: Implement PR_{GET,SET}_MEM_MODEL for always-TSO CPUs Hector Martin
2024-04-11 0:51 ` Hector Martin
2024-04-11 0:51 ` [PATCH 3/4] arm64: Introduce scaffolding to add ACTLR_EL1 to thread state Hector Martin
2024-04-11 0:51 ` Hector Martin
2024-04-11 0:51 ` [PATCH 4/4] arm64: Implement Apple IMPDEF TSO memory model control Hector Martin
2024-04-11 0:51 ` Hector Martin
2024-04-11 1:37 ` [PATCH 0/4] arm64: Support the TSO memory model Neal Gompa
2024-04-11 1:37 ` Neal Gompa
2024-04-11 13:28 ` Will Deacon
2024-04-11 13:28 ` Will Deacon
2024-04-11 14:19 ` Hector Martin
2024-04-11 14:19 ` Hector Martin
2024-04-11 18:43 ` Hector Martin
2024-04-11 18:43 ` Hector Martin
2024-04-16 2:22 ` Zayd Qumsieh
2024-04-16 2:22 ` Zayd Qumsieh
2024-04-19 16:58 ` Will Deacon
2024-04-19 16:58 ` Will Deacon
2024-04-19 18:05 ` Catalin Marinas
2024-04-19 18:05 ` Catalin Marinas
2024-04-19 16:58 ` Will Deacon
2024-04-19 16:58 ` Will Deacon
2024-04-20 11:37 ` Marc Zyngier
2024-04-20 11:37 ` Marc Zyngier
2024-05-02 0:10 ` Zayd Qumsieh
2024-05-02 0:10 ` Zayd Qumsieh
2024-05-02 13:25 ` Marc Zyngier
2024-05-02 13:25 ` Marc Zyngier
2024-05-06 8:20 ` Jonas Oberhauser
2024-05-06 8:20 ` Jonas Oberhauser
2024-04-20 12:13 ` Eric Curtin
2024-04-20 12:13 ` Eric Curtin
2024-04-20 12:15 ` Eric Curtin
2024-04-20 12:15 ` Eric Curtin
2024-05-06 11:21 ` Sergio Lopez Pascual
2024-05-06 11:21 ` Sergio Lopez Pascual
2024-05-06 16:12 ` Marc Zyngier [this message]
2024-05-06 16:12 ` Marc Zyngier
2024-05-06 16:20 ` Eric Curtin
2024-05-06 16:20 ` Eric Curtin
2024-05-06 22:04 ` Sergio Lopez Pascual
2024-05-06 22:04 ` Sergio Lopez Pascual
2024-05-02 0:16 ` Zayd Qumsieh
2024-05-02 0:16 ` Zayd Qumsieh
2024-05-07 10:24 ` Alex Bennée
2024-05-07 10:24 ` Alex Bennée
2024-05-07 14:52 ` Ard Biesheuvel
2024-05-07 14:52 ` Ard Biesheuvel
2024-05-09 11:13 ` Catalin Marinas
2024-05-09 11:13 ` Catalin Marinas
2024-05-09 12:31 ` Neal Gompa
2024-05-09 12:31 ` Neal Gompa
2024-05-09 12:56 ` Catalin Marinas
2024-05-09 12:56 ` Catalin Marinas
2024-04-16 2:11 ` Zayd Qumsieh
2024-04-16 2:11 ` Zayd Qumsieh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86y18mq5q2.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=Houdek.Ryan@fex-emu.org \
--cc=akpm@linux-foundation.org \
--cc=andy.chiu@sifive.com \
--cc=anshuman.khandual@arm.com \
--cc=ardb@kernel.org \
--cc=asahi@lists.linux.dev \
--cc=bhe@redhat.com \
--cc=broonie@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cpaasch@apple.com \
--cc=david@redhat.com \
--cc=dawei.li@shingroup.cn \
--cc=deller@gmx.de \
--cc=ecurtin@redhat.com \
--cc=ih_justin@apple.com \
--cc=j.granados@samsung.com \
--cc=joey.gouly@arm.com \
--cc=josh@joshtriplett.org \
--cc=keescook@chromium.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcan@marcan.st \
--cc=mark.rutland@arm.com \
--cc=miguel.luis@oracle.com \
--cc=mjguzik@gmail.com \
--cc=ojeda@kernel.org \
--cc=oleg@redhat.com \
--cc=oliver.upton@linux.dev \
--cc=omosnace@redhat.com \
--cc=revest@chromium.org \
--cc=samitolvanen@google.com \
--cc=shr@devkernel.io \
--cc=slp@redhat.com \
--cc=will@kernel.org \
--cc=zayd_qumsieh@apple.com \
--cc=zev@bewilderbeest.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.