* [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss
@ 2026-05-26 17:58 Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 01/15] arm64: mm: Remove bogus stop condition from map_mem() loop Ard Biesheuvel
` (14 more replies)
0 siblings, 15 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
One of the reasons the lack of randomization of the linear map on arm64
is considered problematic is the fact that bootloaders adhering to the
original arm64 boot protocol (i.e., a substantial fraction of all
Android phones) may place the kernel at the base of DRAM, and therefore
at the base of the non-randomized linear map. This puts a writable alias
of the kernel's data and bss regions at a predictable location, removing
the need for an attacker to guess where KASLR mapped the kernel.
Let's unmap this linear, writable alias entirely, so that knowing the
location of the linear alias does not give write access to the kernel's
data and bss regions.
Changes since v5:
- Reorder series in ascending order of impact, so that the first few can
be merged earlier if desired. This also makes the patch that remaps
the data/bss linear alias as tagged redundant, which is therefore
dropped.
- Add patch #3 to address an existing issue spotted by Sashiko
- Fix thinko in contiguous region check (#5), where the whole region
needs to be considered and not only the first entry (dropped Rb as
well) - this addresses the kfence issue Sashiko reported on v5 [0]
- Update commit log on #6 to clarify that changing permission bits on
PTE_CONT entries is safe as long as PTE_CONT itself does not change
- Likewise, drop hunk that adds the PTE_CONT bit to the 'permitted' mask
in pgattr_change_is_safe(), as changing it is not safe. (#8)
- Move kasan's additional page table to pgdir BSS as well
- Use (NOLOAD) on the .pgdir.bss section so it does not get emitted into
vmlinux
- Add powerpc and SuperH patches to deal with empty_zero_page[] being
made const
Changes since v4:
- Update the correct [early] mapping in patch #1
- Make empty_zero_page[] const instead of __ro_after_init
- Drop patches that remap the fixmap page tables r/o for now
- Don't force page mappings for the data/bss linear alias, as it is no
longer needed for set_memory_valid()
- Add acks
Changes since v3:
- Drop bogus patch adding hierarchical PXN to the fixmap mapping, which
breaks the KPTI trampoline (thanks to Sashiko)
- Add generic patch to move the empty_zero_page to __ro_after_init, as
it now lives in generic code.
- Add patches to remap the linear aliases of the fixmap page tables
read-only too - these live at an a priori known offset in the linear
map if physical KASLR was omitted, and control a priori known
addresses in the virtual kernel space.
- Rebase onto v7.1-rc1
Changes since v2:
- Keep bm_pte[] in the region that is remapped r/o or unmapped, as it is
only manipulated via its kernel alias
- Drop check that prohibits any manipulation of descriptors with the
CONT bit set
- Add Ryan's ack to a couple of patches
- Rebase onto v7.0-rc4
Changes since v1:
- Put zero page patch at the start of the series
- Tweak __map_memblock() API to respect existing table and contiguous
mappings, so that the logic to map the kernel alias can be simplified
- Stop abusing the MEMBLOCK_NOMAP flag to initially omit the kernel
linear alias from the linear map
- Some additional cleanup patches
- Use proper API [set_memory_valid()] to (un)map the linear alias of
data/bss.
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc; Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liz Prucka <lizprucka@google.com>
Cc: Seth Jenkins <sethjenkins@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jann Horn <jannh@google.com>
Cc: linux-mm@kvack.org
Cc: linux-hardening@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
[0] https://sashiko.dev/#/patchset/20260519151616.2557018-15-ardb%2Bgit%40google.com
Ard Biesheuvel (15):
arm64: mm: Remove bogus stop condition from map_mem() loop
arm64: mm: Drop redundant pgd_t* argument from map_mem()
arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings
arm64: mm: Preserve existing table mappings when mapping DRAM
arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
arm64: mm: Permit contiguous descriptors to be manipulated
arm64: kfence: Avoid NOMAP tricks when mapping the early pool
arm64: mm: Permit contiguous attribute for preliminary mappings
arm64: Move fixmap and kasan page tables to end of kernel image
arm64: mm: Don't abuse memblock NOMAP to check for overlaps
arm64: mm: Map the kernel data/bss read-only in the linear map
powerpc/code-patching: Avoid r/w mapping of the zero page
sh: cast away constness from the zero page when flushing it from the
cache
mm: Make empty_zero_page[] const
arm64: mm: Unmap kernel data/bss entirely from the linear map
arch/arm64/include/asm/mmu.h | 2 +
arch/arm64/include/asm/pgtable.h | 4 +
arch/arm64/kernel/vmlinux.lds.S | 8 +-
arch/arm64/mm/fixmap.c | 6 +-
arch/arm64/mm/kasan_init.c | 2 +-
arch/arm64/mm/mmu.c | 153 ++++++++++++--------
arch/powerpc/lib/code-patching.c | 52 +------
arch/sh/mm/init.c | 2 +-
include/linux/pgtable.h | 2 +-
mm/mm_init.c | 2 +-
10 files changed, 111 insertions(+), 122 deletions(-)
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v6 01/15] arm64: mm: Remove bogus stop condition from map_mem() loop
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 02/15] arm64: mm: Drop redundant pgd_t* argument from map_mem() Ard Biesheuvel
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh,
Kevin Brodsky
From: Ard Biesheuvel <ardb@kernel.org>
The memblock API guarantees that start is not greater than or equal to
end, so there is no need to test it. And if it were, it is doubtful that
breaking out of the loop would be a reasonable course of action here
(rather than attempting to map the remaining regions)
So let's drop this check.
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index dd85e093ffdb..112fa4a3b0eb 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1173,8 +1173,6 @@ static void __init map_mem(pgd_t *pgdp)
/* map all the memory banks */
for_each_mem_range(i, &start, &end) {
- if (start >= end)
- break;
/*
* The linear map must allow allocation tags reading/writing
* if MTE is present. Otherwise, it has the same attributes as
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 02/15] arm64: mm: Drop redundant pgd_t* argument from map_mem()
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 01/15] arm64: mm: Remove bogus stop condition from map_mem() loop Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 03/15] arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings Ard Biesheuvel
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh,
Kevin Brodsky
From: Ard Biesheuvel <ardb@kernel.org>
__map_memblock() and map_mem() always operate on swapper_pg_dir, so
there is no need to pass around a pgd_t pointer between them.
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 25 ++++++++++----------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 112fa4a3b0eb..aa0e2c6435f7 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1035,11 +1035,11 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
flush_tlb_kernel_range(virt, virt + size);
}
-static void __init __map_memblock(pgd_t *pgdp, phys_addr_t start,
- phys_addr_t end, pgprot_t prot, int flags)
+static void __init __map_memblock(phys_addr_t start, phys_addr_t end,
+ pgprot_t prot, int flags)
{
- early_create_pgd_mapping(pgdp, start, __phys_to_virt(start), end - start,
- prot, early_pgtable_alloc, flags);
+ early_create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
+ end - start, prot, early_pgtable_alloc, flags);
}
void __init mark_linear_text_alias_ro(void)
@@ -1087,13 +1087,13 @@ static phys_addr_t __init arm64_kfence_alloc_pool(void)
return kfence_pool;
}
-static void __init arm64_kfence_map_pool(phys_addr_t kfence_pool, pgd_t *pgdp)
+static void __init arm64_kfence_map_pool(phys_addr_t kfence_pool)
{
if (!kfence_pool)
return;
/* KFENCE pool needs page-level mapping. */
- __map_memblock(pgdp, kfence_pool, kfence_pool + KFENCE_POOL_SIZE,
+ __map_memblock(kfence_pool, kfence_pool + KFENCE_POOL_SIZE,
pgprot_tagged(PAGE_KERNEL),
NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
@@ -1129,11 +1129,11 @@ bool arch_kfence_init_pool(void)
#else /* CONFIG_KFENCE */
static inline phys_addr_t arm64_kfence_alloc_pool(void) { return 0; }
-static inline void arm64_kfence_map_pool(phys_addr_t kfence_pool, pgd_t *pgdp) { }
+static inline void arm64_kfence_map_pool(phys_addr_t kfence_pool) { }
#endif /* CONFIG_KFENCE */
-static void __init map_mem(pgd_t *pgdp)
+static void __init map_mem(void)
{
static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);
phys_addr_t kernel_start = __pa_symbol(_text);
@@ -1178,7 +1178,7 @@ static void __init map_mem(pgd_t *pgdp)
* if MTE is present. Otherwise, it has the same attributes as
* PAGE_KERNEL.
*/
- __map_memblock(pgdp, start, end, pgprot_tagged(PAGE_KERNEL),
+ __map_memblock(start, end, pgprot_tagged(PAGE_KERNEL),
flags);
}
@@ -1192,10 +1192,9 @@ static void __init map_mem(pgd_t *pgdp)
* Note that contiguous mappings cannot be remapped in this way,
* so we should avoid them here.
*/
- __map_memblock(pgdp, kernel_start, kernel_end,
- PAGE_KERNEL, NO_CONT_MAPPINGS);
+ __map_memblock(kernel_start, kernel_end, PAGE_KERNEL, NO_CONT_MAPPINGS);
memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
- arm64_kfence_map_pool(early_kfence_pool, pgdp);
+ arm64_kfence_map_pool(early_kfence_pool);
}
void mark_rodata_ro(void)
@@ -1417,7 +1416,7 @@ static void __init create_idmap(void)
void __init paging_init(void)
{
- map_mem(swapper_pg_dir);
+ map_mem();
memblock_allow_resize();
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 03/15] arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 01/15] arm64: mm: Remove bogus stop condition from map_mem() loop Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 02/15] arm64: mm: Drop redundant pgd_t* argument from map_mem() Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 04/15] arm64: mm: Preserve existing table mappings when mapping DRAM Ard Biesheuvel
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
Sashiko reports:
| If pmd_set_huge() rejects an unsafe page table transition (such as
| mapping a different physical address over an existing block mapping),
| it returns 0 and leaves the page table entry unmodified.
|
| Because *pmdp remains unmodified, READ_ONCE(pmd_val(*pmdp)) will equal
| pmd_val(old_pmd). The transition from old_pmd to old_pmd is evaluated
| as safe by pgattr_change_is_safe(), so the BUG_ON never triggers.
|
| This allows invalid and unsafe mapping updates to be silently dropped
| instead of panicking, leaving stale memory mappings active while the
| caller assumes the update was successful.
The same applies to pud_set_huge() in alloc_init_pud().
Given how it is generally preferred to limp on rather than blow up the
system if an unexpected condition such as this one occurs, and the fact
that there are no known cases where this disparity results in real
problems, let's WARN on these failures rather than BUG, allowing the
system to survive to the point where it can actually report them.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index aa0e2c6435f7..b2ba5b35c35f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -257,7 +257,7 @@ static int init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
/* try section mapping first */
if (((addr | next | phys) & ~PMD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
- pmd_set_huge(pmdp, phys, prot);
+ WARN_ON(!pmd_set_huge(pmdp, phys, prot));
/*
* After the PMD entry has been populated once, we
@@ -380,7 +380,7 @@ static int alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
if (pud_sect_supported() &&
((addr | next | phys) & ~PUD_MASK) == 0 &&
(flags & NO_BLOCK_MAPPINGS) == 0) {
- pud_set_huge(pudp, phys, prot);
+ WARN_ON(!pud_set_huge(pudp, phys, prot));
/*
* After the PUD entry has been populated once, we
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 04/15] arm64: mm: Preserve existing table mappings when mapping DRAM
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (2 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 03/15] arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 05/15] arm64: mm: Preserve non-contiguous descriptors " Ard Biesheuvel
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
Instead of blindly overwriting an existing table entry when mapping DRAM
regions, take care not to replace a pre-existing table entry with a
block entry. This permits the logic of mapping the kernel's linear alias
to be simplified in a subsequent patch.
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b2ba5b35c35f..5c827fa3cd38 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -256,7 +256,8 @@ static int init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
/* try section mapping first */
if (((addr | next | phys) & ~PMD_MASK) == 0 &&
- (flags & NO_BLOCK_MAPPINGS) == 0) {
+ (flags & NO_BLOCK_MAPPINGS) == 0 &&
+ !pmd_table(old_pmd)) {
WARN_ON(!pmd_set_huge(pmdp, phys, prot));
/*
@@ -379,7 +380,8 @@ static int alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
*/
if (pud_sect_supported() &&
((addr | next | phys) & ~PUD_MASK) == 0 &&
- (flags & NO_BLOCK_MAPPINGS) == 0) {
+ (flags & NO_BLOCK_MAPPINGS) == 0 &&
+ !pud_table(old_pud)) {
WARN_ON(!pud_set_huge(pudp, phys, prot));
/*
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 05/15] arm64: mm: Preserve non-contiguous descriptors when mapping DRAM
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (3 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 04/15] arm64: mm: Preserve existing table mappings when mapping DRAM Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 06/15] arm64: mm: Permit contiguous descriptors to be manipulated Ard Biesheuvel
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
Instead of blindly overwriting existing live entries regardless of the
value of their contiguous bit when mapping DRAM regions at
contiguous-hint granularity, check whether the contiguous region in
question contains any valid descriptors that have the contiguous bit
cleared, and in that case, leave the contiguous bit unset on the entire
region. This permits the logic of mapping the kernel's linear alias to
be simplified in a subsequent patch.
Note that this can only result in a misprogrammed contiguous bit (as per
ARM ARM RNGLXZ) if the region in question already contains a mix of
valid contiguous and valid non-contiguous descriptors, in which case it
was already misprogrammed to begin with.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/pgtable.h | 4 ++++
arch/arm64/mm/mmu.c | 22 ++++++++++++++++++--
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 4dfa42b7d053..a1c5894332d9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -181,6 +181,10 @@ static inline pteval_t __phys_to_pte_val(phys_addr_t phys)
* Returns true if the pte is valid and has the contiguous bit set.
*/
#define pte_valid_cont(pte) (pte_valid(pte) && pte_cont(pte))
+/*
+ * Returns true if the pte is valid and has the contiguous bit cleared.
+ */
+#define pte_valid_noncont(pte) (pte_valid(pte) && !pte_cont(pte))
/*
* Could the pte be present in the TLB? We must check mm_tlb_flush_pending
* so that we don't erroneously return false for pages that have been
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5c827fa3cd38..6b42d724bd1b 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -187,6 +187,14 @@ static void init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
} while (ptep++, addr += PAGE_SIZE, addr != end);
}
+static bool pte_range_has_valid_noncont(pte_t *ptep)
+{
+ for (int i = 0; i < CONT_PTES; i++)
+ if (pte_valid_noncont(__ptep_get(&ptep[i])))
+ return true;
+ return false;
+}
+
static int alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
unsigned long end, phys_addr_t phys,
pgprot_t prot,
@@ -224,7 +232,8 @@ static int alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
/* use a contiguous mapping if the range is suitably aligned */
if ((((addr | next | phys) & ~CONT_PTE_MASK) == 0) &&
- (flags & NO_CONT_MAPPINGS) == 0)
+ (flags & NO_CONT_MAPPINGS) == 0 &&
+ !pte_range_has_valid_noncont(ptep))
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
init_pte(ptep, addr, next, phys, __prot);
@@ -283,6 +292,14 @@ static int init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
return 0;
}
+static bool pmd_range_has_valid_noncont(pmd_t *pmdp)
+{
+ for (int i = 0; i < CONT_PMDS; i++)
+ if (pte_valid_noncont(pmd_pte(READ_ONCE(pmdp[i]))))
+ return true;
+ return false;
+}
+
static int alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
unsigned long end, phys_addr_t phys,
pgprot_t prot,
@@ -324,7 +341,8 @@ static int alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
/* use a contiguous mapping if the range is suitably aligned */
if ((((addr | next | phys) & ~CONT_PMD_MASK) == 0) &&
- (flags & NO_CONT_MAPPINGS) == 0)
+ (flags & NO_CONT_MAPPINGS) == 0 &&
+ !pmd_range_has_valid_noncont(pmdp))
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
ret = init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc, flags);
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 06/15] arm64: mm: Permit contiguous descriptors to be manipulated
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (4 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 05/15] arm64: mm: Preserve non-contiguous descriptors " Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 07/15] arm64: kfence: Avoid NOMAP tricks when mapping the early pool Ard Biesheuvel
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
Currently, pgattr_change_is_safe() is overly pedantic when it comes to
descriptors with the contiguous hint attribute set, as it rejects
assignments even if the old and the new value are the same.
In fact, as per ARM ARM RJQQTC, manipulating descriptors with the
contiguous bit set is safe as long as the bit itself does not change
value, in the sense that no TLB conflict aborts or other exceptions may
be raised as a result. Inconsistent permission attributes within the
contiguous region may result in any of the alternatives to be taken to
apply to the entire region, which might be a programming error, but it
does not constitute an unsafe manipulation in terms of what
pgattr_change_is_safe() is intended to detect.
So drop the special PTE_CONT check, but still omit PTE_CONT from 'mask'
so that modifying the bit is still regarded as unsafe.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6b42d724bd1b..d7a6991e1844 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -134,10 +134,6 @@ bool pgattr_change_is_safe(pteval_t old, pteval_t new)
if (pte_pfn(__pte(old)) != pte_pfn(__pte(new)))
return false;
- /* live contiguous mappings may not be manipulated at all */
- if ((old | new) & PTE_CONT)
- return false;
-
/* Transitioning from Non-Global to Global is unsafe */
if (old & ~new & PTE_NG)
return false;
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 07/15] arm64: kfence: Avoid NOMAP tricks when mapping the early pool
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (5 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 06/15] arm64: mm: Permit contiguous descriptors to be manipulated Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 08/15] arm64: mm: Permit contiguous attribute for preliminary mappings Ard Biesheuvel
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
Now that the map_mem() routines respect existing page mappings and
contiguous granule sized blocks with the contiguous bit cleared, there
is no longer a reason to play tricks with the memblock NOMAP attribute.
Instead, the kfence pool can be allocated and mapped with page
granularity first, and this granularity will be respected when the rest
of DRAM is mapped later, even if block and contiguous mappings are
allowed for the remainder of those mappings.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 25 ++++----------------
1 file changed, 5 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index d7a6991e1844..55bb40348a47 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1083,36 +1083,24 @@ static int __init parse_kfence_early_init(char *arg)
}
early_param("kfence.sample_interval", parse_kfence_early_init);
-static phys_addr_t __init arm64_kfence_alloc_pool(void)
+static void __init arm64_kfence_map_pool(void)
{
phys_addr_t kfence_pool;
if (!kfence_early_init)
- return 0;
+ return;
kfence_pool = memblock_phys_alloc(KFENCE_POOL_SIZE, PAGE_SIZE);
if (!kfence_pool) {
pr_err("failed to allocate kfence pool\n");
kfence_early_init = false;
- return 0;
- }
-
- /* Temporarily mark as NOMAP. */
- memblock_mark_nomap(kfence_pool, KFENCE_POOL_SIZE);
-
- return kfence_pool;
-}
-
-static void __init arm64_kfence_map_pool(phys_addr_t kfence_pool)
-{
- if (!kfence_pool)
return;
+ }
/* KFENCE pool needs page-level mapping. */
__map_memblock(kfence_pool, kfence_pool + KFENCE_POOL_SIZE,
pgprot_tagged(PAGE_KERNEL),
NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
- memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
__kfence_pool = phys_to_virt(kfence_pool);
}
@@ -1144,8 +1132,7 @@ bool arch_kfence_init_pool(void)
}
#else /* CONFIG_KFENCE */
-static inline phys_addr_t arm64_kfence_alloc_pool(void) { return 0; }
-static inline void arm64_kfence_map_pool(phys_addr_t kfence_pool) { }
+static inline void arm64_kfence_map_pool(void) { }
#endif /* CONFIG_KFENCE */
@@ -1155,7 +1142,6 @@ static void __init map_mem(void)
phys_addr_t kernel_start = __pa_symbol(_text);
phys_addr_t kernel_end = __pa_symbol(__init_begin);
phys_addr_t start, end;
- phys_addr_t early_kfence_pool;
int flags = NO_EXEC_MAPPINGS;
u64 i;
@@ -1172,7 +1158,7 @@ static void __init map_mem(void)
BUILD_BUG_ON(pgd_index(direct_map_end - 1) == pgd_index(direct_map_end) &&
pgd_index(_PAGE_OFFSET(VA_BITS_MIN)) != PTRS_PER_PGD - 1);
- early_kfence_pool = arm64_kfence_alloc_pool();
+ arm64_kfence_map_pool();
linear_map_requires_bbml2 = !force_pte_mapping() && can_set_direct_map();
@@ -1210,7 +1196,6 @@ static void __init map_mem(void)
*/
__map_memblock(kernel_start, kernel_end, PAGE_KERNEL, NO_CONT_MAPPINGS);
memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
- arm64_kfence_map_pool(early_kfence_pool);
}
void mark_rodata_ro(void)
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 08/15] arm64: mm: Permit contiguous attribute for preliminary mappings
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (6 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 07/15] arm64: kfence: Avoid NOMAP tricks when mapping the early pool Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 09/15] arm64: Move fixmap and kasan page tables to end of kernel image Ard Biesheuvel
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
There are a few cases where we omit the contiguous hint for mappings
that start out as read-write and are remapped read-only later, on the
basis that manipulating live descriptors with the PTE_CONT attribute set
is unsafe. When support for the contiguous hint was added to the code,
the ARM ARM was ambiguous about this, and so we erred on the side of
caution.
In the meantime, this has been clarified [0], and regions that will be
remapped in their entirety, retaining the contiguous bit on all entries,
can use the contiguous hint both in the initial mapping as well as the
one that replaces it. Note that this requires that the logic that may be
called to remap overlapping regions respects existing valid descriptors
that have the contiguous bit cleared.
So omit the NO_CONT_MAPPINGS flag in places where it is unneeded.
Thanks to Ryan for the reference.
[0] RJQQTC
For a TLB lookup in a contiguous region mapped by translation table entries that
have consistent values for the Contiguous bit, but have the OA, attributes, or
permissions misprogrammed, that TLB lookup is permitted to produce an OA, access
permissions, and memory attributes that are consistent with any one of the
programmed translation table values.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 55bb40348a47..04cc579c7a15 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1016,8 +1016,7 @@ void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
&phys, virt);
return;
}
- early_create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
- NO_CONT_MAPPINGS);
+ early_create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, 0);
}
void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
@@ -1044,8 +1043,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
return;
}
- early_create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
- NO_CONT_MAPPINGS);
+ early_create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL, 0);
/* flush the TLBs after updating live kernel mappings */
flush_tlb_kernel_range(virt, virt + size);
@@ -1191,10 +1189,8 @@ static void __init map_mem(void)
* alternative patching has completed). This makes the contents
* of the region accessible to subsystems such as hibernate,
* but protects it from inadvertent modification or execution.
- * Note that contiguous mappings cannot be remapped in this way,
- * so we should avoid them here.
*/
- __map_memblock(kernel_start, kernel_end, PAGE_KERNEL, NO_CONT_MAPPINGS);
+ __map_memblock(kernel_start, kernel_end, PAGE_KERNEL, 0);
memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
}
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 09/15] arm64: Move fixmap and kasan page tables to end of kernel image
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (7 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 08/15] arm64: mm: Permit contiguous attribute for preliminary mappings Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 10/15] arm64: mm: Don't abuse memblock NOMAP to check for overlaps Ard Biesheuvel
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh,
Kevin Brodsky
From: Ard Biesheuvel <ardb@kernel.org>
Move the fixmap and kasan page tables out of the BSS section, and place
them at the end of the image, right before the init_pg_dir section where
some of the other statically allocated page tables live.
These page tables are currently the only data objects in vmlinux that
are meant to be accessed via the kernel image's linear alias, and so
placing them together allows the remainder of the data/bss section to be
remapped read-only or unmapped entirely.
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/include/asm/mmu.h | 2 ++
arch/arm64/kernel/vmlinux.lds.S | 8 +++++++-
arch/arm64/mm/fixmap.c | 6 +++---
arch/arm64/mm/kasan_init.c | 2 +-
4 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 5e1211c540ab..fb95754f2876 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -13,6 +13,8 @@
#ifndef __ASSEMBLER__
+#define __pgtbl_bss __section(".pgdir.bss") __aligned(PAGE_SIZE)
+
#include <linux/refcount.h>
#include <asm/cpufeature.h>
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index e1ac876200a3..2b0ebfb30c63 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -349,9 +349,15 @@ SECTIONS
_edata = .;
/* start of zero-init region */
- BSS_SECTION(SBSS_ALIGN, 0, 0)
+ BSS_SECTION(SBSS_ALIGN, 0, PAGE_SIZE)
__pi___bss_start = __bss_start;
+ /* fixmap BSS starts here - preceding data/BSS is omitted from the linear map */
+ .pgdir.bss (NOLOAD) : ALIGN(PAGE_SIZE) {
+ *(.pgdir.bss)
+ }
+ ASSERT(ADDR(.pgdir.bss) == __bss_stop, ".pgdir.bss must follow BSS")
+
. = ALIGN(PAGE_SIZE);
__pi_init_pg_dir = .;
. += INIT_DIR_SIZE;
diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
index c5c5425791da..1a3bbd67dd76 100644
--- a/arch/arm64/mm/fixmap.c
+++ b/arch/arm64/mm/fixmap.c
@@ -31,9 +31,9 @@ static_assert(NR_BM_PMD_TABLES == 1);
#define BM_PTE_TABLE_IDX(addr) __BM_TABLE_IDX(addr, PMD_SHIFT)
-static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
-static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
-static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
+static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __pgtbl_bss;
+static pmd_t bm_pmd[PTRS_PER_PMD] __pgtbl_bss __maybe_unused;
+static pud_t bm_pud[PTRS_PER_PUD] __pgtbl_bss __maybe_unused;
static inline pte_t *fixmap_pte(unsigned long addr)
{
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index abeb81bf6ebd..dbf22cae82ee 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -214,7 +214,7 @@ asmlinkage void __init kasan_early_init(void)
* shadow pud_t[]/p4d_t[], which could end up getting corrupted
* when the linear region is mapped.
*/
- static pte_t tbl[PTRS_PER_PTE] __page_aligned_bss;
+ static pte_t tbl[PTRS_PER_PTE] __pgtbl_bss;
pgd_t *pgdp = pgd_offset_k(KASAN_SHADOW_START);
set_pgd(pgdp, __pgd(__pa_symbol(tbl) | PGD_TYPE_TABLE));
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 10/15] arm64: mm: Don't abuse memblock NOMAP to check for overlaps
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (8 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 09/15] arm64: Move fixmap and kasan page tables to end of kernel image Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 11/15] arm64: mm: Map the kernel data/bss read-only in the linear map Ard Biesheuvel
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
Now that the linear region mapping routines respect existing table
mappings and contiguous block and page mappings, it is no longer needed
to fiddle with the memblock tables to set and clear the NOMAP attribute
in order to omit text and rodata when creating the linear map.
Instead, map the kernel text and rodata alias first with the desired
initial attributes and granularity, so that the loop iterating over the
memblocks will not remap it in a manner that prevents it from being
remapped with updated attributes later.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 23 ++++++--------------
1 file changed, 7 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 04cc579c7a15..b20c76b8381d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1164,12 +1164,14 @@ static void __init map_mem(void)
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
/*
- * Take care not to create a writable alias for the
- * read-only text and rodata sections of the kernel image.
- * So temporarily mark them as NOMAP to skip mappings in
- * the following for-loop
+ * Map the linear alias of the [_text, __init_begin) interval
+ * as non-executable now, and remove the write permission in
+ * mark_linear_text_alias_ro() above (which will be called after
+ * alternative patching has completed). This makes the contents
+ * of the region accessible to subsystems such as hibernate,
+ * but protects it from inadvertent modification or execution.
*/
- memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
+ __map_memblock(kernel_start, kernel_end, PAGE_KERNEL, flags);
/* map all the memory banks */
for_each_mem_range(i, &start, &end) {
@@ -1181,17 +1183,6 @@ static void __init map_mem(void)
__map_memblock(start, end, pgprot_tagged(PAGE_KERNEL),
flags);
}
-
- /*
- * Map the linear alias of the [_text, __init_begin) interval
- * as non-executable now, and remove the write permission in
- * mark_linear_text_alias_ro() below (which will be called after
- * alternative patching has completed). This makes the contents
- * of the region accessible to subsystems such as hibernate,
- * but protects it from inadvertent modification or execution.
- */
- __map_memblock(kernel_start, kernel_end, PAGE_KERNEL, 0);
- memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
}
void mark_rodata_ro(void)
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 11/15] arm64: mm: Map the kernel data/bss read-only in the linear map
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (9 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 10/15] arm64: mm: Don't abuse memblock NOMAP to check for overlaps Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 12/15] powerpc/code-patching: Avoid r/w mapping of the zero page Ard Biesheuvel
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
On systems where the bootloader adheres to the original arm64 boot
protocol, the placement of the kernel in the physical address space is
highly predictable, and this makes the placement of its linear alias in
the kernel virtual address space equally predictable, given the lack of
randomization of the linear map.
The linear aliases of the kernel text and rodata regions are already
mapped read-only, but the kernel data and bss are mapped read-write in
this region. This is not needed, so map them read-only as well.
Note that the statically allocated kernel page tables do need to be
modifiable via the linear map, so leave these mapped read-write.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b20c76b8381d..e7ca53d20b87 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1138,7 +1138,9 @@ static void __init map_mem(void)
{
static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);
phys_addr_t kernel_start = __pa_symbol(_text);
- phys_addr_t kernel_end = __pa_symbol(__init_begin);
+ phys_addr_t init_begin = __pa_symbol(__init_begin);
+ phys_addr_t init_end = __pa_symbol(__init_end);
+ phys_addr_t kernel_end = __pa_symbol(__bss_stop);
phys_addr_t start, end;
int flags = NO_EXEC_MAPPINGS;
u64 i;
@@ -1171,7 +1173,11 @@ static void __init map_mem(void)
* of the region accessible to subsystems such as hibernate,
* but protects it from inadvertent modification or execution.
*/
- __map_memblock(kernel_start, kernel_end, PAGE_KERNEL, flags);
+ __map_memblock(kernel_start, init_begin, PAGE_KERNEL, flags);
+
+ /* Map the kernel data/bss so it can be remapped later */
+ __map_memblock(init_end, kernel_end, pgprot_tagged(PAGE_KERNEL),
+ flags);
/* map all the memory banks */
for_each_mem_range(i, &start, &end) {
@@ -1183,6 +1189,11 @@ static void __init map_mem(void)
__map_memblock(start, end, pgprot_tagged(PAGE_KERNEL),
flags);
}
+
+ /* Map the kernel data/bss read-only in the linear map */
+ __map_memblock(init_end, kernel_end, PAGE_KERNEL_RO, flags);
+ flush_tlb_kernel_range((unsigned long)lm_alias(__init_end),
+ (unsigned long)lm_alias(__bss_stop));
}
void mark_rodata_ro(void)
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 12/15] powerpc/code-patching: Avoid r/w mapping of the zero page
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (10 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 11/15] arm64: mm: Map the kernel data/bss read-only in the linear map Ard Biesheuvel
@ 2026-05-26 17:58 ` Ard Biesheuvel
2026-05-26 17:59 ` [PATCH v6 13/15] sh: cast away constness from the zero page when flushing it from the cache Ard Biesheuvel
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:58 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh,
Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy (CS GROUP)
From: Ard Biesheuvel <ardb@kernel.org>
The only remaining use of map_patch_area() is mapping the zero page, and
immediately unmapping it again so that the intermediate page table
levels are all guaranteed to be populated.
The use of the zero page here is completely arbitrary, and not harmful
per se, but currently, it creates a writable mapping, and does so in a
manner that requires that the empty_zero_page[] symbol is not
const-qualified.
Given that this is about to change, and that map_patch_area() now never
maps anything other than the zero page, let's simplify the code and
- remove the helpers and call [un]map_kernel_page() directly
- take the PA of empty_zero_page directly
- create a read-only temporary mapping.
This allows empty_zero_page[] to be repainted as const u8[] in a
subsequent patch, without making substantial changes to this code
patching logic.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: "Christophe Leroy (CS GROUP)" <chleroy@kernel.org>
Link: https://lore.kernel.org/all/20260520085423.485402-1-ardb@kernel.org/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/powerpc/lib/code-patching.c | 52 +-------------------
1 file changed, 2 insertions(+), 50 deletions(-)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index f84e0337cc02..44ff9f684bef 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -60,9 +60,6 @@ struct patch_context {
static DEFINE_PER_CPU(struct patch_context, cpu_patching_context);
-static int map_patch_area(void *addr, unsigned long text_poke_addr);
-static void unmap_patch_area(unsigned long addr);
-
static bool mm_patch_enabled(void)
{
return IS_ENABLED(CONFIG_SMP) && radix_enabled();
@@ -117,11 +114,11 @@ static int text_area_cpu_up(unsigned int cpu)
// Map/unmap the area to ensure all page tables are pre-allocated
addr = (unsigned long)area->addr;
- err = map_patch_area(empty_zero_page, addr);
+ err = map_kernel_page(addr, __pa_symbol(empty_zero_page), PAGE_KERNEL_RO);
if (err)
return err;
- unmap_patch_area(addr);
+ unmap_kernel_page(addr);
this_cpu_write(cpu_patching_context.area, area);
this_cpu_write(cpu_patching_context.addr, addr);
@@ -233,51 +230,6 @@ static unsigned long get_patch_pfn(void *addr)
return __pa_symbol(addr) >> PAGE_SHIFT;
}
-/*
- * This can be called for kernel text or a module.
- */
-static int map_patch_area(void *addr, unsigned long text_poke_addr)
-{
- unsigned long pfn = get_patch_pfn(addr);
-
- return map_kernel_page(text_poke_addr, (pfn << PAGE_SHIFT), PAGE_KERNEL);
-}
-
-static void unmap_patch_area(unsigned long addr)
-{
- pte_t *ptep;
- pmd_t *pmdp;
- pud_t *pudp;
- p4d_t *p4dp;
- pgd_t *pgdp;
-
- pgdp = pgd_offset_k(addr);
- if (WARN_ON(pgd_none(*pgdp)))
- return;
-
- p4dp = p4d_offset(pgdp, addr);
- if (WARN_ON(p4d_none(*p4dp)))
- return;
-
- pudp = pud_offset(p4dp, addr);
- if (WARN_ON(pud_none(*pudp)))
- return;
-
- pmdp = pmd_offset(pudp, addr);
- if (WARN_ON(pmd_none(*pmdp)))
- return;
-
- ptep = pte_offset_kernel(pmdp, addr);
- if (WARN_ON(pte_none(*ptep)))
- return;
-
- /*
- * In hash, pte_clear flushes the tlb, in radix, we have to
- */
- pte_clear(&init_mm, addr, ptep);
- flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
-}
-
static int __do_patch_mem_mm(void *addr, unsigned long val, bool is_dword)
{
int err;
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 13/15] sh: cast away constness from the zero page when flushing it from the cache
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (11 preceding siblings ...)
2026-05-26 17:58 ` [PATCH v6 12/15] powerpc/code-patching: Avoid r/w mapping of the zero page Ard Biesheuvel
@ 2026-05-26 17:59 ` Ard Biesheuvel
2026-05-26 17:59 ` [PATCH v6 14/15] mm: Make empty_zero_page[] const Ard Biesheuvel
2026-05-26 17:59 ` [PATCH v6 15/15] arm64: mm: Unmap kernel data/bss entirely from the linear map Ard Biesheuvel
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:59 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh,
Yoshinori Sato, Rich Felker, John Paul Adrian Glaubitz
From: Ard Biesheuvel <ardb@kernel.org>
SH performs cache maintenance on the zero page during boot, presumably
to ensure that any clearing of BSS that has occurred at startup is
visible to other CPUs and DMA devices.
The __flush_wback_region() function takes a void* argument, which is
conceptually sound, but given that empty_zero_page[] must never be
modified, it is being repainted as const, making it incompatible with a
void* formal parameter.
Given the above, and the fact that __flush_wback_region() is in fact a
function pointer variable with multiple implementations, take the easy
way out, and cast away the constness in this particular invocation.
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/sh/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 4e40d5e96be9..acbb481cdbfe 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -332,7 +332,7 @@ void __init mem_init(void)
cpu_cache_init();
/* clear the zero-page */
- __flush_wback_region(empty_zero_page, PAGE_SIZE);
+ __flush_wback_region((void *)empty_zero_page, PAGE_SIZE);
vsyscall_init();
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 14/15] mm: Make empty_zero_page[] const
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (12 preceding siblings ...)
2026-05-26 17:59 ` [PATCH v6 13/15] sh: cast away constness from the zero page when flushing it from the cache Ard Biesheuvel
@ 2026-05-26 17:59 ` Ard Biesheuvel
2026-05-26 17:59 ` [PATCH v6 15/15] arm64: mm: Unmap kernel data/bss entirely from the linear map Ard Biesheuvel
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:59 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh,
Kevin Brodsky, Feng Tang
From: Ard Biesheuvel <ardb@kernel.org>
The empty zero page is used to back any kernel or user space mapping
that is supposed to remain cleared, and so the page itself is never
supposed to be modified.
So mark it as const, which moves it into .rodata rather than .bss: on
most architectures, this ensures that both the kernel's mapping of it
and any aliases that are accessible via the kernel direct (linear) map
are mapped read-only, and cannot be used (inadvertently or maliciously)
to corrupt the contents of the zero page.
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
include/linux/pgtable.h | 2 +-
mm/mm_init.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index cdd68ed3ae1a..67aa23814010 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1993,7 +1993,7 @@ static inline unsigned long zero_pfn(unsigned long addr)
return zero_page_pfn;
}
-extern uint8_t empty_zero_page[PAGE_SIZE];
+extern const uint8_t empty_zero_page[PAGE_SIZE];
extern struct page *__zero_page;
static inline struct page *_zero_page(unsigned long addr)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index f9f8e1af921c..46cf001238c5 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -57,7 +57,7 @@ unsigned long zero_page_pfn __ro_after_init;
EXPORT_SYMBOL(zero_page_pfn);
#ifndef __HAVE_COLOR_ZERO_PAGE
-uint8_t empty_zero_page[PAGE_SIZE] __page_aligned_bss;
+const uint8_t empty_zero_page[PAGE_SIZE] __aligned(PAGE_SIZE);
EXPORT_SYMBOL(empty_zero_page);
struct page *__zero_page __ro_after_init;
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v6 15/15] arm64: mm: Unmap kernel data/bss entirely from the linear map
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
` (13 preceding siblings ...)
2026-05-26 17:59 ` [PATCH v6 14/15] mm: Make empty_zero_page[] const Ard Biesheuvel
@ 2026-05-26 17:59 ` Ard Biesheuvel
14 siblings, 0 replies; 16+ messages in thread
From: Ard Biesheuvel @ 2026-05-26 17:59 UTC (permalink / raw)
To: linux-arm-kernel
Cc: linux-kernel, will, catalin.marinas, mark.rutland, Ard Biesheuvel,
Ryan Roberts, Anshuman Khandual, Liz Prucka, Seth Jenkins,
Kees Cook, Mike Rapoport, David Hildenbrand, Andrew Morton,
Jann Horn, linux-mm, linux-hardening, linuxppc-dev, linux-sh
From: Ard Biesheuvel <ardb@kernel.org>
The linear aliases of the kernel text and rodata are mapped read-only in
the linear map as well. Given that the contents of these regions are
mostly identical to the version in the loadable image, mapping them
read-only and leaving their contents visible is a reasonable hardening
measure.
Data and bss, however, are now also mapped read-only but the contents of
these regions are more likely to contain data that we'd rather not leak.
So let's unmap these entirely in the linear map when the kernel is
running normally.
When going into hibernation or waking up from it, these regions need to
be mapped, so map the region initially, and toggle the valid bit so
map/unmap the region as needed. (While the hibernation snapshot logic
seems able to map inaccessible pages as needed, it currently disregards
non-present pages entirely.)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/arm64/mm/mmu.c | 39 +++++++++++++++++---
1 file changed, 34 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index e7ca53d20b87..cb00e42abbe1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -24,6 +24,7 @@
#include <linux/mm.h>
#include <linux/vmalloc.h>
#include <linux/set_memory.h>
+#include <linux/suspend.h>
#include <linux/kfence.h>
#include <linux/pkeys.h>
#include <linux/mm_inline.h>
@@ -1056,6 +1057,29 @@ static void __init __map_memblock(phys_addr_t start, phys_addr_t end,
end - start, prot, early_pgtable_alloc, flags);
}
+static void remap_linear_data_alias(bool unmap)
+{
+ set_memory_valid((unsigned long)lm_alias(__init_end),
+ (unsigned long)(__bss_stop - __init_end) / PAGE_SIZE,
+ !unmap);
+}
+
+static int arm64_hibernate_pm_notify(struct notifier_block *nb,
+ unsigned long mode, void *unused)
+{
+ switch (mode) {
+ default:
+ break;
+ case PM_POST_HIBERNATION:
+ remap_linear_data_alias(true);
+ break;
+ case PM_HIBERNATION_PREPARE:
+ remap_linear_data_alias(false);
+ break;
+ }
+ return 0;
+}
+
void __init mark_linear_text_alias_ro(void)
{
/*
@@ -1064,6 +1088,16 @@ void __init mark_linear_text_alias_ro(void)
update_mapping_prot(__pa_symbol(_text), (unsigned long)lm_alias(_text),
(unsigned long)__init_begin - (unsigned long)_text,
PAGE_KERNEL_RO);
+
+ remap_linear_data_alias(true);
+
+ if (IS_ENABLED(CONFIG_HIBERNATION)) {
+ static struct notifier_block nb = {
+ .notifier_call = arm64_hibernate_pm_notify
+ };
+
+ register_pm_notifier(&nb);
+ }
}
#ifdef CONFIG_KFENCE
@@ -1189,11 +1223,6 @@ static void __init map_mem(void)
__map_memblock(start, end, pgprot_tagged(PAGE_KERNEL),
flags);
}
-
- /* Map the kernel data/bss read-only in the linear map */
- __map_memblock(init_end, kernel_end, PAGE_KERNEL_RO, flags);
- flush_tlb_kernel_range((unsigned long)lm_alias(__init_end),
- (unsigned long)lm_alias(__bss_stop));
}
void mark_rodata_ro(void)
--
2.54.0.794.g4f17f83d09-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-05-26 18:24 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 17:58 [PATCH v6 00/15] arm64: Unmap linear alias of kernel data/bss Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 01/15] arm64: mm: Remove bogus stop condition from map_mem() loop Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 02/15] arm64: mm: Drop redundant pgd_t* argument from map_mem() Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 03/15] arm64: mm: Check for pud_/pmd_set_huge() failures on kernel mappings Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 04/15] arm64: mm: Preserve existing table mappings when mapping DRAM Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 05/15] arm64: mm: Preserve non-contiguous descriptors " Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 06/15] arm64: mm: Permit contiguous descriptors to be manipulated Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 07/15] arm64: kfence: Avoid NOMAP tricks when mapping the early pool Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 08/15] arm64: mm: Permit contiguous attribute for preliminary mappings Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 09/15] arm64: Move fixmap and kasan page tables to end of kernel image Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 10/15] arm64: mm: Don't abuse memblock NOMAP to check for overlaps Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 11/15] arm64: mm: Map the kernel data/bss read-only in the linear map Ard Biesheuvel
2026-05-26 17:58 ` [PATCH v6 12/15] powerpc/code-patching: Avoid r/w mapping of the zero page Ard Biesheuvel
2026-05-26 17:59 ` [PATCH v6 13/15] sh: cast away constness from the zero page when flushing it from the cache Ard Biesheuvel
2026-05-26 17:59 ` [PATCH v6 14/15] mm: Make empty_zero_page[] const Ard Biesheuvel
2026-05-26 17:59 ` [PATCH v6 15/15] arm64: mm: Unmap kernel data/bss entirely from the linear map Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox