* [RFC PATCH 1/6] arm64: mm: explicitly declare module and ftrace execmem regions
2026-06-11 13:01 [RFC PATCH 0/6] arm64: mm: Introducing ROX CACHE to ARM64 systems with bbml2 no abort Adrian Barnaś
@ 2026-06-11 13:01 ` Adrian Barnaś
2026-06-11 13:36 ` Brendan Jackman
2026-06-11 13:01 ` [RFC PATCH 2/6] arm64: mm: allow huge vmap permission adjustments with bbml2_no_abort Adrian Barnaś
` (4 subsequent siblings)
5 siblings, 1 reply; 9+ messages in thread
From: Adrian Barnaś @ 2026-06-11 13:01 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-mm, Adrian Barnaś, Catalin Marinas, Will Deacon,
Ryan Roberts, David Hildenbrand, Mike Rapoport (Microsoft),
Ard Biesheuvel, Christoph Lameter, Yang Shi, Brendan Jackman
Replace the reliance on the EXECMEM_DEFAULT fallback by explicitly defining
the execution memory (execmem) regions for MODULE_TEXT, MODULE_DATA, and
FTRACE in execmem_arch_setup().
Signed-off-by: Adrian Barnaś <abarnas@google.com>
---
arch/arm64/mm/init.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 96711b8578fd..c673a9a839dd 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -519,7 +519,7 @@ struct execmem_info __init *execmem_arch_setup(void)
execmem_info = (struct execmem_info){
.ranges = {
- [EXECMEM_DEFAULT] = {
+ [EXECMEM_MODULE_TEXT] = {
.start = start,
.end = end,
.pgprot = PAGE_KERNEL,
@@ -533,12 +533,28 @@ struct execmem_info __init *execmem_arch_setup(void)
.pgprot = PAGE_KERNEL_ROX,
.alignment = 1,
},
+ [EXECMEM_FTRACE] = {
+ .start = VMALLOC_START,
+ .end = VMALLOC_END,
+ .pgprot = PAGE_KERNEL,
+ .alignment = 1,
+ .fallback_start = fallback_start,
+ .fallback_end = fallback_end,
+ },
[EXECMEM_BPF] = {
.start = VMALLOC_START,
.end = VMALLOC_END,
.pgprot = PAGE_KERNEL,
.alignment = 1,
},
+ [EXECMEM_MODULE_DATA] = {
+ .start = start,
+ .end = end,
+ .pgprot = PAGE_KERNEL,
+ .alignment = 1,
+ .fallback_start = fallback_start,
+ .fallback_end = fallback_end,
+ },
},
};
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [RFC PATCH 1/6] arm64: mm: explicitly declare module and ftrace execmem regions
2026-06-11 13:01 ` [RFC PATCH 1/6] arm64: mm: explicitly declare module and ftrace execmem regions Adrian Barnaś
@ 2026-06-11 13:36 ` Brendan Jackman
0 siblings, 0 replies; 9+ messages in thread
From: Brendan Jackman @ 2026-06-11 13:36 UTC (permalink / raw)
To: Adrian Barnaś, linux-arm-kernel
Cc: linux-mm, Catalin Marinas, Will Deacon, Ryan Roberts,
David Hildenbrand, Mike Rapoport (Microsoft), Ard Biesheuvel,
Christoph Lameter, Yang Shi, Brendan Jackman, owner-linux-mm
On Thu Jun 11, 2026 at 1:01 PM UTC, =?UTF-8?q?Adrian=20Barna=C5=9B?= wrote:
> Replace the reliance on the EXECMEM_DEFAULT fallback by explicitly defining
> the execution memory (execmem) regions for MODULE_TEXT, MODULE_DATA, and
> FTRACE in execmem_arch_setup().
Please can you explain in the commit message _why_ you do this. This way
reviewers don't have to make a mental note and come back later after
reading the rest of the patchset. And once it's commited, it will save
readers from having to chase down the contextual commits to figure out
what's going on.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 2/6] arm64: mm: allow huge vmap permission adjustments with bbml2_no_abort
2026-06-11 13:01 [RFC PATCH 0/6] arm64: mm: Introducing ROX CACHE to ARM64 systems with bbml2 no abort Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 1/6] arm64: mm: explicitly declare module and ftrace execmem regions Adrian Barnaś
@ 2026-06-11 13:01 ` Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 3/6] arm64: mm: fix restoring linear map permissions on execmem cache clean Adrian Barnaś
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Adrian Barnaś @ 2026-06-11 13:01 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-mm, Adrian Barnaś, Catalin Marinas, Will Deacon,
Ryan Roberts, David Hildenbrand, Mike Rapoport (Microsoft),
Ard Biesheuvel, Christoph Lameter, Yang Shi, Brendan Jackman
Remove the protection against huge vmap permission adjustments on
systems that support the bbml2_no_abort CPU feature.
Splitting live kernel VA section mappings into page mappings was
restricted because it could cause TLB Conflict Aborts. This forced
permission adjustments on memory allocated with VM_ALLOW_HUGE_VMAP to be
rejected, resulting in performance drops (e.g., when enforcing rodata=on
disables huge mappings).
The bbml2_no_abort feature (which mirrors the architectural guarantees of
FEAT_BBML3) ensures that changing between table and block sizes without
following a break-before-make sequence will not generate a TLB Conflict
Abort. This hardware guarantee makes it safe to allow dynamic permission
adjustments on huge vmap regions.
Signed-off-by: Adrian Barnaś <abarnas@google.com>
---
arch/arm64/mm/pageattr.c | 22 ++++++++++++++--------
1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 358d1dc9a576..88720bbba892 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -157,23 +157,29 @@ static int change_memory_common(unsigned long addr, int numpages,
}
/*
- * Kernel VA mappings are always live, and splitting live section
- * mappings into page mappings may cause TLB conflicts. This means
- * we have to ensure that changing the permission bits of the range
- * we are operating on does not result in such splitting.
- *
* Let's restrict ourselves to mappings created by vmalloc (or vmap).
- * Disallow VM_ALLOW_HUGE_VMAP mappings to guarantee that only page
- * mappings are updated and splitting is never needed.
*
* So check whether the [addr, addr + size) interval is entirely
* covered by precisely one VM area that has the VM_ALLOC flag set.
*/
area = find_vm_area((void *)addr);
+
if (!area ||
((unsigned long)kasan_reset_tag((void *)end) >
(unsigned long)kasan_reset_tag(area->addr) + area->size) ||
- ((area->flags & (VM_ALLOC | VM_ALLOW_HUGE_VMAP)) != VM_ALLOC))
+ !(area->flags & VM_ALLOC))
+ return -EINVAL;
+
+ /*
+ * Kernel VA mappings are always live, and splitting live section
+ * mappings into page mappings may cause TLB conflicts if bbml2_noabort
+ * is not present.
+ *
+ * While bbml2_noabort is not present disallow VM_ALLOW_HUGE_VMAP mappings
+ * to guarantee that only page mappings are updated and splitting is not
+ * needed.
+ */
+ if (!system_supports_bbml2_noabort() && (area->flags & (VM_ALLOW_HUGE_VMAP)))
return -EINVAL;
if (!numpages)
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread* [RFC PATCH 3/6] arm64: mm: fix restoring linear map permissions on execmem cache clean
2026-06-11 13:01 [RFC PATCH 0/6] arm64: mm: Introducing ROX CACHE to ARM64 systems with bbml2 no abort Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 1/6] arm64: mm: explicitly declare module and ftrace execmem regions Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 2/6] arm64: mm: allow huge vmap permission adjustments with bbml2_no_abort Adrian Barnaś
@ 2026-06-11 13:01 ` Adrian Barnaś
2026-06-11 13:54 ` Brendan Jackman
2026-06-11 13:01 ` [RFC PATCH 4/6] arm64: mm: add helper to fill execmem with trapping instructions Adrian Barnaś
` (2 subsequent siblings)
5 siblings, 1 reply; 9+ messages in thread
From: Adrian Barnaś @ 2026-06-11 13:01 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-mm, Adrian Barnaś, Catalin Marinas, Will Deacon,
Ryan Roberts, David Hildenbrand, Mike Rapoport (Microsoft),
Ard Biesheuvel, Christoph Lameter, Yang Shi, Brendan Jackman
Strip the read-only attribute from the selected memory range when
restoring the linear map after an execmem cache clean.
An execmem cache clean is performed when a cache block becomes empty
after unloading a module. When making the memory valid again, the linear
memory alias must also have its read-only attribute cleared.
Without this change, the linear memory alias remains read-only even
after the execmem cache block itself is freed, which prevents subsequent
allocations from writing to that memory.
Signed-off-by: Adrian Barnaś <abarnas@google.com>
---
arch/arm64/mm/pageattr.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 88720bbba892..eaefdf90b0d5 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -239,6 +239,13 @@ int set_memory_x(unsigned long addr, int numpages)
__pgprot(PTE_PXN));
}
+static int set_memory_default(unsigned long addr, int numpages)
+{
+ return __change_memory_common(addr, PAGE_SIZE * numpages,
+ __pgprot(PTE_VALID),
+ __pgprot(PTE_RDONLY));
+}
+
int set_memory_valid(unsigned long addr, int numpages, int enable)
{
if (enable)
@@ -362,7 +369,15 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
if (!can_set_direct_map())
return 0;
- return set_memory_valid(addr, nr, valid);
+ /*
+ * Execmem cache uses this function to reset permissions on linear mapping
+ * when freeing unused cache block. On x86 it makes memory RW which is
+ * desirable. On ARM64 set_memory_valid() just change valid bit which
+ * leave direct mapping read-only so use set_memory_default instead.
+ */
+
+ return valid ? set_memory_default(addr, nr) :
+ set_memory_valid(addr, nr, false);
}
#ifdef CONFIG_DEBUG_PAGEALLOC
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [RFC PATCH 3/6] arm64: mm: fix restoring linear map permissions on execmem cache clean
2026-06-11 13:01 ` [RFC PATCH 3/6] arm64: mm: fix restoring linear map permissions on execmem cache clean Adrian Barnaś
@ 2026-06-11 13:54 ` Brendan Jackman
0 siblings, 0 replies; 9+ messages in thread
From: Brendan Jackman @ 2026-06-11 13:54 UTC (permalink / raw)
To: Adrian Barnaś, linux-arm-kernel
Cc: linux-mm, Catalin Marinas, Will Deacon, Ryan Roberts,
David Hildenbrand, Mike Rapoport (Microsoft), Ard Biesheuvel,
Christoph Lameter, Yang Shi, Brendan Jackman, owner-linux-mm
On Thu Jun 11, 2026 at 1:01 PM UTC, =?UTF-8?q?Adrian=20Barna=C5=9B?= wrote:
> Strip the read-only attribute from the selected memory range when
> restoring the linear map after an execmem cache clean.
>
> An execmem cache clean is performed when a cache block becomes empty
> after unloading a module. When making the memory valid again, the linear
> memory alias must also have its read-only attribute cleared.
>
> Without this change, the linear memory alias remains read-only even
> after the execmem cache block itself is freed, which prevents subsequent
> allocations from writing to that memory.
>
> Signed-off-by: Adrian Barnaś <abarnas@google.com>
> ---
> arch/arm64/mm/pageattr.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 88720bbba892..eaefdf90b0d5 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -239,6 +239,13 @@ int set_memory_x(unsigned long addr, int numpages)
> __pgprot(PTE_PXN));
> }
>
> +static int set_memory_default(unsigned long addr, int numpages)
> +{
> + return __change_memory_common(addr, PAGE_SIZE * numpages,
> + __pgprot(PTE_VALID),
> + __pgprot(PTE_RDONLY));
> +}
> +
> int set_memory_valid(unsigned long addr, int numpages, int enable)
> {
> if (enable)
> @@ -362,7 +369,15 @@ int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool valid)
> if (!can_set_direct_map())
> return 0;
>
> - return set_memory_valid(addr, nr, valid);
> + /*
> + * Execmem cache uses this function to reset permissions on linear mapping
> + * when freeing unused cache block. On x86 it makes memory RW which is
> + * desirable.
Hm, maybe desirable for execmem but that doesn't really mean the x86
behaviour is correct. Maybe it makes more sense to change the x86
to align with the arm64 behaviour here?
BTW we should probably document this API a little bit, I never thought
abut what "valid" actually means until now. I had thought of it as "I
can access this memory" but that's an unclear concept and now I realise
"valid" is a technical concept in Arm that's confusing. And it's extra
confusing if the kernel API uses "valid" to mean a _different_ thing.
Also, shouldn't execmem be using set_memory_default_noflush() before
freeing anyway? I guess that shoudl even be documented as "if you change
anything you need to call this before you free the page".
> On ARM64 set_memory_valid() just change valid bit which
> + * leave direct mapping read-only so use set_memory_default instead.
> + */
> +
> + return valid ? set_memory_default(addr, nr) :
> + set_memory_valid(addr, nr, false);
> }
>
> #ifdef CONFIG_DEBUG_PAGEALLOC
^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH 4/6] arm64: mm: add helper to fill execmem with trapping instructions
2026-06-11 13:01 [RFC PATCH 0/6] arm64: mm: Introducing ROX CACHE to ARM64 systems with bbml2 no abort Adrian Barnaś
` (2 preceding siblings ...)
2026-06-11 13:01 ` [RFC PATCH 3/6] arm64: mm: fix restoring linear map permissions on execmem cache clean Adrian Barnaś
@ 2026-06-11 13:01 ` Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 5/6] arm64: execmem: enable EXECMEM_ROX_CACHE on supported CPUs Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 6/6] arm64: mm: support PMD page coalescing in the linear map Adrian Barnaś
5 siblings, 0 replies; 9+ messages in thread
From: Adrian Barnaś @ 2026-06-11 13:01 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-mm, Adrian Barnaś, Catalin Marinas, Will Deacon,
Ryan Roberts, David Hildenbrand, Mike Rapoport (Microsoft),
Ard Biesheuvel, Christoph Lameter, Yang Shi, Brendan Jackman
Implement the architecture-specific execmem_fill_trapping_insns() helper
to poison executable memory regions.
When CONFIG_ARCH_HAS_EXECMEM_ROX is enabled, the execmem subsystem
requires a way to fill unused or freed executable memory with
architecture-specific trapping instructions. This implementation fills
the specified region with AARCH64_BREAK_FAULT instructions and flushes
the icache to ensure the traps are immediately visible to execution.
Signed-off-by: Adrian Barnaś <abarnas@google.com>
---
arch/arm64/mm/init.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index c673a9a839dd..71aa745e0bef 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -408,6 +408,20 @@ void dump_mem_limit(void)
}
#ifdef CONFIG_EXECMEM
+
+#ifdef CONFIG_ARCH_HAS_EXECMEM_ROX
+void execmem_fill_trapping_insns(void *ptr, size_t size)
+{
+ int nr_inst = size / AARCH64_INSN_SIZE;
+ __le32 *updptr = ptr;
+
+ for (int i = 0; i < nr_inst; i++)
+ updptr[i] = cpu_to_le32(AARCH64_BREAK_FAULT);
+
+ flush_icache_range((unsigned long)ptr, (unsigned long)ptr + size);
+}
+#endif
+
static u64 module_direct_base __ro_after_init = 0;
static u64 module_plt_base __ro_after_init = 0;
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread* [RFC PATCH 5/6] arm64: execmem: enable EXECMEM_ROX_CACHE on supported CPUs
2026-06-11 13:01 [RFC PATCH 0/6] arm64: mm: Introducing ROX CACHE to ARM64 systems with bbml2 no abort Adrian Barnaś
` (3 preceding siblings ...)
2026-06-11 13:01 ` [RFC PATCH 4/6] arm64: mm: add helper to fill execmem with trapping instructions Adrian Barnaś
@ 2026-06-11 13:01 ` Adrian Barnaś
2026-06-11 13:01 ` [RFC PATCH 6/6] arm64: mm: support PMD page coalescing in the linear map Adrian Barnaś
5 siblings, 0 replies; 9+ messages in thread
From: Adrian Barnaś @ 2026-06-11 13:01 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-mm, Adrian Barnaś, Catalin Marinas, Will Deacon,
Ryan Roberts, David Hildenbrand, Mike Rapoport (Microsoft),
Ard Biesheuvel, Christoph Lameter, Yang Shi, Brendan Jackman
Enable EXECMEM_ROX_CACHE support for ARM64 systems that implement
the bbml2_no_abort CPU feature.
Using the ROX cache brings a performance boost by reducing linear region
fragmentation caused by strict memory permissions (e.g., W^X enforcement).
Grouping executable code (which is read-only in the linear region alias)
into PMD-sized block mappings reduces TLB pressure and page table size.
This is only enabled on systems with bbml2_no_abort, as splitting
these large blocks to make pages writable during module loading would
otherwise risk triggering TLB Conflict Aborts.
Signed-off-by: Adrian Barnaś <abarnas@google.com>
---
arch/arm64/Kconfig | 1 +
arch/arm64/mm/init.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 38dba5f7e4d2..79c347ab841e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -285,6 +285,7 @@ config ARM64
select USER_STACKTRACE_SUPPORT
select VDSO_GETRANDOM
select VMAP_STACK
+ select ARCH_HAS_EXECMEM_ROX
help
ARM 64-bit (AArch64) Linux support.
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 71aa745e0bef..8269d7747b84 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -420,6 +420,12 @@ void execmem_fill_trapping_insns(void *ptr, size_t size)
flush_icache_range((unsigned long)ptr, (unsigned long)ptr + size);
}
+
+#define MODULE_TEXT_FLAG EXECMEM_ROX_CACHE
+#define MODULE_TEXT_PGPROT PAGE_KERNEL_ROX
+#else
+#define MODULE_TEXT_FLAG (0)
+#define MODULE_TEXT_PGPROT PAGE_KERNEL
#endif
static u64 module_direct_base __ro_after_init = 0;
@@ -511,6 +517,8 @@ struct execmem_info __init *execmem_arch_setup(void)
{
unsigned long fallback_start = 0, fallback_end = 0;
unsigned long start = 0, end = 0;
+ enum execmem_range_flags module_text_flags = 0;
+ pgprot_t module_text_pgprot = PAGE_KERNEL;
module_init_limits();
@@ -531,12 +539,24 @@ struct execmem_info __init *execmem_arch_setup(void)
end = module_plt_base + SZ_2G;
}
+ /*
+ * The ROX Cache requires bbml2_no_abort because it uses large block
+ * mappings. On systems without this guarantee, splitting these blocks
+ * to make pages writable for module loading can trigger TLB Conflict
+ * Aborts.
+ */
+ if (system_supports_bbml2_noabort()) {
+ module_text_flags = MODULE_TEXT_FLAG;
+ module_text_pgprot = MODULE_TEXT_PGPROT;
+ }
+
execmem_info = (struct execmem_info){
.ranges = {
[EXECMEM_MODULE_TEXT] = {
.start = start,
.end = end,
- .pgprot = PAGE_KERNEL,
+ .flags = module_text_flags,
+ .pgprot = module_text_pgprot,
.alignment = 1,
.fallback_start = fallback_start,
.fallback_end = fallback_end,
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread* [RFC PATCH 6/6] arm64: mm: support PMD page coalescing in the linear map
2026-06-11 13:01 [RFC PATCH 0/6] arm64: mm: Introducing ROX CACHE to ARM64 systems with bbml2 no abort Adrian Barnaś
` (4 preceding siblings ...)
2026-06-11 13:01 ` [RFC PATCH 5/6] arm64: execmem: enable EXECMEM_ROX_CACHE on supported CPUs Adrian Barnaś
@ 2026-06-11 13:01 ` Adrian Barnaś
5 siblings, 0 replies; 9+ messages in thread
From: Adrian Barnaś @ 2026-06-11 13:01 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-mm, Adrian Barnaś, Catalin Marinas, Will Deacon,
Ryan Roberts, David Hildenbrand, Mike Rapoport (Microsoft),
Ard Biesheuvel, Christoph Lameter, Yang Shi, Brendan Jackman
Implement PMD block coalescing to merge fragmented linear mapping regions
back into huge pages when restoring the read-only attribute.
When memory allocated with VM_ALLOW_HUGE_VMAP (such as for the execmem
ROX cache) has its permissions modified, the PMD block mapping is split
into individual PTEs. Without this change, when that memory have its RO
attribute subsequently cleared and set the mapping remains permanently
fragmented into 4K pages.
Signed-off-by: Adrian Barnaś <abarnas@google.com>
---
arch/arm64/include/asm/mmu.h | 1 +
arch/arm64/mm/mmu.c | 95 ++++++++++++++++++++++++++++++++++++
arch/arm64/mm/pageattr.c | 7 +++
3 files changed, 103 insertions(+)
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 137a173df1ff..19158bacb2df 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -80,6 +80,7 @@ extern void *fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot);
extern void mark_linear_text_alias_ro(void);
extern int split_kernel_leaf_mapping(unsigned long start, unsigned long end);
extern void linear_map_maybe_split_to_ptes(void);
+void try_collapse_kernel_pmd(unsigned long addr);
/*
* This check is triggered during the early boot before the cpufeature
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a6a00accf4f9..d74226fa1c9b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -769,6 +769,101 @@ static inline bool force_pte_mapping(void)
static DEFINE_MUTEX(pgtable_split_lock);
+static inline bool __pte_can_be_collapsed(pte_t pte, unsigned long pfn, pgprot_t prot)
+{
+ if (!pte_valid(pte))
+ return false;
+ if (pte_pfn(pte) != pfn)
+ return false;
+ if ((pgprot_val(pte_pgprot(pte)) & ~PTE_CONT) != pgprot_val(prot))
+ return false;
+
+ return true;
+}
+
+static void __try_collapse_pmd(pmd_t *pmdp, pmd_t pmd, unsigned long addr)
+{
+ pte_t *ptep;
+ pte_t first_pte;
+ unsigned long pfn;
+ pgprot_t prot;
+ int i;
+
+ ptep = (pte_t *)pmd_page_vaddr(pmd);
+ first_pte = __ptep_get(ptep);
+
+ if (!pte_valid(first_pte))
+ return;
+
+ prot = pte_pgprot(first_pte);
+ prot = __pgprot(pgprot_val(prot) & ~PTE_CONT);
+ pfn = pte_pfn(first_pte);
+
+ if (!IS_ALIGNED(pfn, PMD_SIZE >> PAGE_SHIFT))
+ return;
+
+ for (i = 1; i < PTRS_PER_PTE; i++) {
+ if (!__pte_can_be_collapsed(__ptep_get(ptep + i), pfn + i, prot))
+ return;
+ }
+
+ set_pmd(pmdp, pmd_mkhuge(pfn_pmd(pfn, prot)));
+
+ __flush_tlb_kernel_pgtable(addr);
+
+ if (static_branch_unlikely(&arm64_ptdump_lock_key)) {
+ mmap_read_lock(&init_mm);
+ mmap_read_unlock(&init_mm);
+ }
+
+ pte_free_kernel(NULL, ptep);
+}
+
+void try_collapse_kernel_pmd(unsigned long addr)
+{
+ pgd_t *pgdp;
+ p4d_t *p4dp;
+ pud_t *pudp;
+ pmd_t *pmdp;
+ pmd_t pmd;
+
+ /*
+ * collapse_pmd expects exact address of block to be collapsed
+ */
+ if (WARN_ON(ALIGN_DOWN(addr, PMD_SIZE) != addr))
+ return;
+
+ mutex_lock(&pgtable_split_lock);
+
+ pgdp = pgd_offset_k(addr);
+ if (pgd_none(READ_ONCE(*pgdp)))
+ goto out;
+
+ p4dp = p4d_offset(pgdp, addr);
+ if (p4d_none(READ_ONCE(*p4dp)))
+ goto out;
+
+ pudp = pud_offset(p4dp, addr);
+ if (pud_none(READ_ONCE(*pudp)))
+ goto out;
+
+ if (pud_leaf(READ_ONCE(*pudp)))
+ goto out;
+
+ pmdp = pmd_offset(pudp, addr);
+ pmd = pmdp_get(pmdp);
+
+ if (!pmd_table(pmd))
+ goto out;
+
+ lazy_mmu_mode_enable();
+ __try_collapse_pmd(pmdp, pmd, addr);
+ lazy_mmu_mode_disable();
+
+out:
+ mutex_unlock(&pgtable_split_lock);
+}
+
int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
{
int ret;
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index eaefdf90b0d5..11e0b60264c3 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -200,6 +200,13 @@ static int change_memory_common(unsigned long addr, int numpages,
if (ret)
return ret;
}
+ /*
+ * When setting a read-only flag on the linear region, the memory
+ * may have been backed by a PMD before being split. Try to
+ * collapse it back into a PMD to restore huge page performance.
+ */
+ if (pgprot_val(set_mask) == PTE_RDONLY && area->flags & VM_ALLOW_HUGE_VMAP)
+ try_collapse_kernel_pmd((u64)page_address(area->pages[0]));
}
/*
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread