linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert()
@ 2024-12-11 16:01 Mikołaj Lenczewski
  2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Mikołaj Lenczewski @ 2024-12-11 16:01 UTC (permalink / raw)
  To: ryan.roberts, catalin.marinas, will, corbet, maz, oliver.upton,
	joey.gouly, suzuki.poulose, yuzenghui
  Cc: Mikołaj Lenczewski, linux-arm-kernel, linux-doc,
	linux-kernel, kvmarm

Resending as had wrong address for linux-doc and kvmarm. Apologies for
spam.

Hi All,

This patch series seeks to gather feedback on adding initial support
for level 2 of the Break-Before-Make arm64 architectural feature,
specifically to contpte_convert().

This support reorders a TLB invalidation in contpte_convert(), and
optionally elides said invalidation completely which leads to a 12%
improvement when executing a microbenchmark designed to force the
pathological path where contpte_convert() gets called. This
represents an 80% reduction in the cost of calling contpte_convert().

However, the elision of the invalidation is still pending review to
ensure it is architecturally valid. Without it, the reodering also
represents a performance improvement due to reducing thread contention,
as there is a smaller time window for racing threads to see an invalid
pagetable entry (especially if they already have a cached entry in their
TLB that they are working off of).

This series is based on v6.13-rc2 (fac04efc5c79).

Break-Before-Make Level 2
=========================

Break-Before-Make (BBM) sequences ensure a consistent view of the
page tables. They avoid TLB multi-hits and ensure atomicity and
ordering guarantees. BBM level 0 simply defines the current use
of page tables. When you want to change certain bits in a pte,
you need to:

- clear the pte
- dsb()
- issue a tlbi for the pte
- dsb()
- repaint the pte
- dsb()

When changing block size, or toggling the contiguous bit, we
currently use this BBM level 0 sequence. With BBM level 2 support,
however, we can relax the BBM sequence and benefit from a performance
improvement. The hardware would then either automatically handle the
TLB invalidations, or would take a TLB Conflict Abort Exception.

This exception can either be a stage 1 or stage 2 exception, depending
on whether stage 1 or stage 2 translations are in use. The architecture
currently mandates a worst-case invalidation of vmalle1 or vmalls12e1,
when stage 2 translation is not in-use and in-use respectively.

Outstanding Questions and Remaining TODOs
=========================================

Patch 4 moves the tlbi so that the window where the pte is invalid is
significantly smaller. This reduces the chances of racing threads
accessing the memory during the window and taking a fault. This is
confirmed to be architecturally sound.

Patch 5 removes the tlbi entirely. This has the benefit of
significantly reducing the cost of contpte_convert(). While testing
has demonstrated that this works as expected on Arm-designed CPUs, we
are still in the process of confirming whether it is architecturally
correct. I am requesting review while that process is on-going. Patch 5
would be dropped if it turns out to be architecturally unsound.

Another note is that the stage 2 TLB conflict handling is included as
patch 1 of this series. This patch could (and probably should) be sent
separately as it may be useful outside this series, but is included for
reference.

Thanks,
Miko

Mikołaj Lenczewski (5):
  arm64: Add TLB Conflict Abort Exception handler to KVM
  arm64: Add BBM Level 2 cpu feature
  arm64: Add errata and workarounds for systems with broken BBML2
  arm64/mm: Delay tlbi in contpte_convert() under BBML2
  arm64/mm: Elide tlbi in contpte_convert() under BBML2

 Documentation/arch/arm64/silicon-errata.rst |  32 ++++
 arch/arm64/Kconfig                          | 164 ++++++++++++++++++++
 arch/arm64/include/asm/cpufeature.h         |  14 ++
 arch/arm64/include/asm/esr.h                |   8 +
 arch/arm64/kernel/cpufeature.c              |  37 +++++
 arch/arm64/kvm/mmu.c                        |   6 +
 arch/arm64/mm/contpte.c                     |   3 +-
 arch/arm64/mm/fault.c                       |  27 +++-
 arch/arm64/tools/cpucaps                    |   1 +
 9 files changed, 290 insertions(+), 2 deletions(-)

-- 
2.45.2



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-01-03 18:25 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-11 16:01 [RESEND RFC PATCH v1 0/5] Initial BBML2 support for contpte_convert() Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 1/5] arm64: Add TLB Conflict Abort Exception handler to KVM Mikołaj Lenczewski
2024-12-11 17:40   ` Marc Zyngier
2024-12-12  9:23     ` Ryan Roberts
2024-12-12  9:57       ` Marc Zyngier
2024-12-12 10:37         ` Ryan Roberts
2024-12-13 16:24     ` Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature Mikołaj Lenczewski
2024-12-12  8:25   ` Marc Zyngier
2024-12-12 10:55     ` Ryan Roberts
2024-12-12 14:26       ` Marc Zyngier
2024-12-12 15:05         ` Ryan Roberts
2024-12-12 15:48           ` Marc Zyngier
2024-12-12 16:03             ` Ryan Roberts
2024-12-19 16:45               ` Will Deacon
2025-01-02 12:07                 ` Jonathan Cameron
2025-01-02 12:30                   ` Marc Zyngier
2025-01-03 15:35                     ` Will Deacon
2025-01-03 16:00                       ` Ryan Roberts
2025-01-03 18:18                         ` Jonathan Cameron
2024-12-13 16:53             ` Mikołaj Lenczewski
2024-12-13 16:49     ` Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 3/5] arm64: Add errata and workarounds for systems with broken BBML2 Mikołaj Lenczewski
2024-12-11 16:01 ` [RESEND RFC PATCH v1 4/5] arm64/mm: Delay tlbi in contpte_convert() under BBML2 Mikołaj Lenczewski
2024-12-19 16:36   ` Will Deacon
2024-12-11 16:01 ` [RESEND RFC PATCH v1 5/5] arm64/mm: Elide " Mikołaj Lenczewski
2024-12-19 16:37   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).