Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] arm64: mm: Introducing ROX CACHE to ARM64 systems with bbml2 no abort
@ 2026-06-11 13:01 Adrian Barnaś
  2026-06-11 13:01 ` [RFC PATCH 1/6] arm64: mm: explicitly declare module and ftrace execmem regions Adrian Barnaś
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Adrian Barnaś @ 2026-06-11 13:01 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-mm, Adrian Barnaś, Catalin Marinas, Will Deacon,
	Ryan Roberts, David Hildenbrand, Mike Rapoport (Microsoft),
	Ard Biesheuvel, Christoph Lameter, Yang Shi, Brendan Jackman

Hi All,

I would like to propose the introduction of the EXECMEM_ROX_CACHE feature
for ARM64.

When the `rodata=on` kernel parameter is set, all executable and read-only
aliases in the linear map are forced to be read-only. Originally, this
forced PTE-level mappings across the entire linear map, causing a
significant performance regression (greater than 10%) for some workloads
due to increased TLB miss rates and deeper page table walk.

The `feat_bbml2_no_abort` (FEAT_BBML3 in ARMv9.7) feature can be
utilized to mitigate this regression. Because we can split memory
mappings without triggering TLB conflict aborts, kernel memory permission
adjustments become possible after early boot without forcing PTE-level
mappings across the entire linear map.

However, when applying read-only permissions to kernel module section the
linear map can still became fragmented due to the scattered physical layout
of the underlying pages. To address this, EXECMEM_ROX_CACHE, which was
initially enabled on the x86 architecture [1] for this purpose, can be
used on ARM64 as well.

EXECMEM_ROX_CACHE works by preallocating PMD-sized contiguous blocks to
act as a cache for .module.text memory. These blocks are initially
poisoned and made read-only-execute (which simultaneously makes the
linear alias of this region read-only). When loading a .module.text
section into memory, the requested cache region is made RW, the bytes
are copied, and ROX permissions are restored.

To take full advantage of this approach, after restoring RO permissions
on the PMD-sized linear alias block, the PTE mappings are coalesced back
into a single PMD entry.

Testing on an Android device running 6.18 based kernel with rodata=on
shows an average 20% reduction of level 3 page table entries for the
linear mapping.

This implementation currently works around some limitations of the
`set_memory_xx` API, which might be relevant when considering the
refactoring proposed here [2]:

* Because the execmem_cache operates outside the linear map, its
  permissions could theoretically remain untouched (poisoned and RO)
  until the cache block is fully emptied and freed. However, we currently
  lack an API to interact exclusively with the vmalloc area (e.g., setting
  it to RW) without simultaneously setting the linear alias to RW.

* Additionally, set_direct_map_valid() has somewhat confusing semantics.
  It is used while "cleaning" a cache block (after confirming it is
  entirely empty). The linear map region should be returned to its default
  state to restore writability, but set_valid might just set the "valid"
  attribute (as is the case for ARM64, which I have temporarily addressed
  here with a workaround).

I would be glad to hear your feedback on these changes.

Best regards,
Adrian

[1]: https://lore.kernel.org/all/20241023162711.2579610-1-rppt@kernel.org/
[2]: https://lore.kernel.org/all/20260219175113.618562-1-jackmanb@google.com/

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Brendan Jackman <jackmanb@google.com>

--->8

Adrian Barnaś (6):
  arm64: mm: explicitly declare module and ftrace execmem regions
  arm64: mm: allow huge vmap permission adjustments with bbml2_no_abort
  arm64: mm: fix restoring linear map permissions on execmem cache clean
  arm64: mm: add helper to fill execmem with trapping instructions
  arm64: execmem: enable EXECMEM_ROX_CACHE on supported CPUs
  arm64: mm: support PMD page coalescing in the linear map

 arch/arm64/Kconfig           |  1 +
 arch/arm64/include/asm/mmu.h |  1 +
 arch/arm64/mm/init.c         | 54 +++++++++++++++++++-
 arch/arm64/mm/mmu.c          | 95 ++++++++++++++++++++++++++++++++++++
 arch/arm64/mm/pageattr.c     | 46 +++++++++++++----
 5 files changed, 186 insertions(+), 11 deletions(-)

--
2.54.0.1136.gdb2ca164c4-goog



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-11 14:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 13:01 [RFC PATCH 0/6] arm64: mm: Introducing ROX CACHE to ARM64 systems with bbml2 no abort Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 1/6] arm64: mm: explicitly declare module and ftrace execmem regions Adrian Barnaś
2026-06-11 13:36   ` Brendan Jackman
2026-06-11 13:01 ` [RFC PATCH 2/6] arm64: mm: allow huge vmap permission adjustments with bbml2_no_abort Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 3/6] arm64: mm: fix restoring linear map permissions on execmem cache clean Adrian Barnaś
2026-06-11 13:54   ` Brendan Jackman
2026-06-11 13:01 ` [RFC PATCH 4/6] arm64: mm: add helper to fill execmem with trapping instructions Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 5/6] arm64: execmem: enable EXECMEM_ROX_CACHE on supported CPUs Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 6/6] arm64: mm: support PMD page coalescing in the linear map Adrian Barnaś

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox