* [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT
@ 2026-03-30 16:17 Ryan Roberts
2026-03-30 16:17 ` [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests Ryan Roberts
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Ryan Roberts @ 2026-03-30 16:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, David Hildenbrand (Arm), Dev Jain,
Yang Shi, Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel
Hi All,
This fixes a couple of bugs in the "large block mappings for linear map when we
have BBML2_NOABORT" feature when used in conjunction with a CCA realm guest.
While investigating I found and fixed some more general issues too. See commit
logs for full explanations.
Applies on top of v7.0-rc4.
Changes since v1 [1]
====================
Patch 1:
- Moved page_alloc_available declaration to asm/mmu.c (per Kevin)
- Added PTE_PRESENT_VALID_KERNEL macro to hide VALID|NG confusion (per Kevin)
- Improved logic in split_kernel_leaf_mapping() to avoid warning for
DEBUG_PAGEALLOC (per Sashiko)
Patch 2:
- Fixed transitional pgtables to handle present-invalid large leaves (per
Sashiko)
- Hardened split_pXd() for present-invalid leaves (per Sashiko)
Patch 3:
- Converted pXd_leaf() to function to avoid multi-eval of READ_ONCE() (per
Sashiko)
[1] https://lore.kernel.org/all/20260323130317.1737522-1-ryan.roberts@arm.com/
Thanks,
Ryan
Ryan Roberts (3):
arm64: mm: Fix rodata=full block mapping support for realm guests
arm64: mm: Handle invalid large leaf mappings correctly
arm64: mm: Remove pmd_sect() and pud_sect()
arch/arm64/include/asm/mmu.h | 2 +
arch/arm64/include/asm/pgtable-prot.h | 2 +
arch/arm64/include/asm/pgtable.h | 28 +++++++----
arch/arm64/mm/init.c | 9 +++-
arch/arm64/mm/mmu.c | 67 ++++++++++++++++++---------
arch/arm64/mm/pageattr.c | 50 +++++++++++---------
arch/arm64/mm/trans_pgd.c | 42 +++--------------
7 files changed, 111 insertions(+), 89 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-03-30 16:17 [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT Ryan Roberts
@ 2026-03-30 16:17 ` Ryan Roberts
2026-03-31 14:35 ` Suzuki K Poulose
2026-04-02 20:43 ` Catalin Marinas
2026-03-30 16:17 ` [PATCH v2 2/3] arm64: mm: Handle invalid large leaf mappings correctly Ryan Roberts
` (2 subsequent siblings)
3 siblings, 2 replies; 18+ messages in thread
From: Ryan Roberts @ 2026-03-30 16:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, David Hildenbrand (Arm), Dev Jain,
Yang Shi, Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel, stable
Commit a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") enabled the linear map to be mapped by block/cont while
still allowing granular permission changes on BBML2_NOABORT systems by
lazily splitting the live mappings. This mechanism was intended to be
usable by realm guests since they need to dynamically share dma buffers
with the host by "decrypting" them - which for Arm CCA, means marking
them as shared in the page tables.
However, it turns out that the mechanism was failing for realm guests
because realms need to share their dma buffers (via
__set_memory_enc_dec()) much earlier during boot than
split_kernel_leaf_mapping() was able to handle. The report linked below
showed that GIC's ITS was one such user. But during the investigation I
found other callsites that could not meet the
split_kernel_leaf_mapping() constraints.
The problem is that we block map the linear map based on the boot CPU
supporting BBML2_NOABORT, then check that all the other CPUs support it
too when finalizing the caps. If they don't, then we stop_machine() and
split to ptes. For safety, split_kernel_leaf_mapping() previously
wouldn't permit splitting until after the caps were finalized. That
ensured that if any secondary cpus were running that didn't support
BBML2_NOABORT, we wouldn't risk breaking them.
I've fix this problem by reducing the black-out window where we refuse
to split; there are now 2 windows. The first is from T0 until the page
allocator is inititialized. Splitting allocates memory for the page
allocator so it must be in use. The second covers the period between
starting to online the secondary cpus until the system caps are
finalized (this is a very small window).
All of the problematic callers are calling __set_memory_enc_dec() before
the secondary cpus come online, so this solves the problem. However, one
of these callers, swiotlb_update_mem_attributes(), was trying to split
before the page allocator was initialized. So I have moved this call
from arch_mm_preinit() to mem_init(), which solves the ordering issue.
I've added warnings and return an error if any attempt is made to split
in the black-out windows.
Note there are other issues which prevent booting all the way to user
space, which will be fixed in subsequent patches.
Reported-by: Jinjiang Tu <tujinjiang@huawei.com>
Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/
Fixes: a166563e7ec37 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/arm64/include/asm/mmu.h | 2 ++
arch/arm64/mm/init.c | 9 +++++++-
arch/arm64/mm/mmu.c | 45 +++++++++++++++++++++++++-----------
3 files changed, 42 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 137a173df1ff8..472610433aaea 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -112,5 +112,7 @@ void kpti_install_ng_mappings(void);
static inline void kpti_install_ng_mappings(void) {}
#endif
+extern bool page_alloc_available;
+
#endif /* !__ASSEMBLER__ */
#endif
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 96711b8578fd0..b9b248d24fd10 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -350,7 +350,6 @@ void __init arch_mm_preinit(void)
}
swiotlb_init(swiotlb, flags);
- swiotlb_update_mem_attributes();
/*
* Check boundaries twice: Some fundamental inconsistencies can be
@@ -377,6 +376,14 @@ void __init arch_mm_preinit(void)
}
}
+bool page_alloc_available __ro_after_init;
+
+void __init mem_init(void)
+{
+ page_alloc_available = true;
+ swiotlb_update_mem_attributes();
+}
+
void free_initmem(void)
{
void *lm_init_begin = lm_alias(__init_begin);
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a6a00accf4f93..223947487a223 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -768,30 +768,51 @@ static inline bool force_pte_mapping(void)
}
static DEFINE_MUTEX(pgtable_split_lock);
+static bool linear_map_requires_bbml2;
int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
{
int ret;
- /*
- * !BBML2_NOABORT systems should not be trying to change permissions on
- * anything that is not pte-mapped in the first place. Just return early
- * and let the permission change code raise a warning if not already
- * pte-mapped.
- */
- if (!system_supports_bbml2_noabort())
- return 0;
-
/*
* If the region is within a pte-mapped area, there is no need to try to
* split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
* change permissions from atomic context so for those cases (which are
* always pte-mapped), we must not go any further because taking the
- * mutex below may sleep.
+ * mutex below may sleep. Do not call force_pte_mapping() here because
+ * it could return a confusing result if called from a secondary cpu
+ * prior to finalizing caps. Instead, linear_map_requires_bbml2 gives us
+ * what we need.
*/
- if (force_pte_mapping() || is_kfence_address((void *)start))
+ if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))
return 0;
+ if (!system_supports_bbml2_noabort()) {
+ /*
+ * !BBML2_NOABORT systems should not be trying to change
+ * permissions on anything that is not pte-mapped in the first
+ * place. Just return early and let the permission change code
+ * raise a warning if not already pte-mapped.
+ */
+ if (system_capabilities_finalized())
+ return 0;
+
+ /*
+ * Boot-time: split_kernel_leaf_mapping_locked() allocates from
+ * page allocator. Can't split until it's available.
+ */
+ if (WARN_ON(!page_alloc_available))
+ return -EBUSY;
+
+ /*
+ * Boot-time: Started secondary cpus but don't know if they
+ * support BBML2_NOABORT yet. Can't allow splitting in this
+ * window in case they don't.
+ */
+ if (WARN_ON(num_online_cpus() > 1))
+ return -EBUSY;
+ }
+
/*
* Ensure start and end are at least page-aligned since this is the
* finest granularity we can split to.
@@ -891,8 +912,6 @@ static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp
return ret;
}
-static bool linear_map_requires_bbml2 __initdata;
-
u32 idmap_kpti_bbml2_flag;
static void __init init_idmap_kpti_bbml2_flag(void)
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/3] arm64: mm: Handle invalid large leaf mappings correctly
2026-03-30 16:17 [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT Ryan Roberts
2026-03-30 16:17 ` [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests Ryan Roberts
@ 2026-03-30 16:17 ` Ryan Roberts
2026-03-30 16:17 ` [PATCH v2 3/3] arm64: mm: Remove pmd_sect() and pud_sect() Ryan Roberts
2026-04-02 21:11 ` [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT Catalin Marinas
3 siblings, 0 replies; 18+ messages in thread
From: Ryan Roberts @ 2026-03-30 16:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, David Hildenbrand (Arm), Dev Jain,
Yang Shi, Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel, stable
It has been possible for a long time to mark ptes in the linear map as
invalid. This is done for secretmem, kfence, realm dma memory un/share,
and others, by simply clearing the PTE_VALID bit. But until commit
a166563e7ec37 ("arm64: mm: support large block mapping when
rodata=full") large leaf mappings were never made invalid in this way.
It turns out various parts of the code base are not equipped to handle
invalid large leaf mappings (in the way they are currently encoded) and
I've observed a kernel panic while booting a realm guest on a
BBML2_NOABORT system as a result:
[ 15.432706] software IO TLB: Memory encryption is active and system is using DMA bounce buffers
[ 15.476896] Unable to handle kernel paging request at virtual address ffff000019600000
[ 15.513762] Mem abort info:
[ 15.527245] ESR = 0x0000000096000046
[ 15.548553] EC = 0x25: DABT (current EL), IL = 32 bits
[ 15.572146] SET = 0, FnV = 0
[ 15.592141] EA = 0, S1PTW = 0
[ 15.612694] FSC = 0x06: level 2 translation fault
[ 15.640644] Data abort info:
[ 15.661983] ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000
[ 15.694875] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[ 15.723740] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 15.755776] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081f3f000
[ 15.800410] [ffff000019600000] pgd=0000000000000000, p4d=180000009ffff403, pud=180000009fffe403, pmd=00e8000199600704
[ 15.855046] Internal error: Oops: 0000000096000046 [#1] SMP
[ 15.886394] Modules linked in:
[ 15.900029] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-rc4-dirty #4 PREEMPT
[ 15.935258] Hardware name: linux,dummy-virt (DT)
[ 15.955612] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 15.986009] pc : __pi_memcpy_generic+0x128/0x22c
[ 16.006163] lr : swiotlb_bounce+0xf4/0x158
[ 16.024145] sp : ffff80008000b8f0
[ 16.038896] x29: ffff80008000b8f0 x28: 0000000000000000 x27: 0000000000000000
[ 16.069953] x26: ffffb3976d261ba8 x25: 0000000000000000 x24: ffff000019600000
[ 16.100876] x23: 0000000000000001 x22: ffff0000043430d0 x21: 0000000000007ff0
[ 16.131946] x20: 0000000084570010 x19: 0000000000000000 x18: ffff00001ffe3fcc
[ 16.163073] x17: 0000000000000000 x16: 00000000003fffff x15: 646e612065766974
[ 16.194131] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 16.225059] x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000018
[ 16.256113] x8 : 0000000000000018 x7 : 0000000000000000 x6 : 0000000000000000
[ 16.287203] x5 : ffff000019607ff0 x4 : ffff000004578000 x3 : ffff000019600000
[ 16.318145] x2 : 0000000000007ff0 x1 : ffff000004570010 x0 : ffff000019600000
[ 16.349071] Call trace:
[ 16.360143] __pi_memcpy_generic+0x128/0x22c (P)
[ 16.380310] swiotlb_tbl_map_single+0x154/0x2b4
[ 16.400282] swiotlb_map+0x5c/0x228
[ 16.415984] dma_map_phys+0x244/0x2b8
[ 16.432199] dma_map_page_attrs+0x44/0x58
[ 16.449782] virtqueue_map_page_attrs+0x38/0x44
[ 16.469596] virtqueue_map_single_attrs+0xc0/0x130
[ 16.490509] virtnet_rq_alloc.isra.0+0xa4/0x1fc
[ 16.510355] try_fill_recv+0x2a4/0x584
[ 16.526989] virtnet_open+0xd4/0x238
[ 16.542775] __dev_open+0x110/0x24c
[ 16.558280] __dev_change_flags+0x194/0x20c
[ 16.576879] netif_change_flags+0x24/0x6c
[ 16.594489] dev_change_flags+0x48/0x7c
[ 16.611462] ip_auto_config+0x258/0x1114
[ 16.628727] do_one_initcall+0x80/0x1c8
[ 16.645590] kernel_init_freeable+0x208/0x2f0
[ 16.664917] kernel_init+0x24/0x1e0
[ 16.680295] ret_from_fork+0x10/0x20
[ 16.696369] Code: 927cec03 cb0e0021 8b0e0042 a9411c26 (a900340c)
[ 16.723106] ---[ end trace 0000000000000000 ]---
[ 16.752866] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 16.792556] Kernel Offset: 0x3396ea200000 from 0xffff800080000000
[ 16.818966] PHYS_OFFSET: 0xfff1000080000000
[ 16.837237] CPU features: 0x0000000,00060005,13e38581,957e772f
[ 16.862904] Memory Limit: none
[ 16.876526] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
This panic occurs because the swiotlb memory was previously shared to
the host (__set_memory_enc_dec()), which involves transitioning the
(large) leaf mappings to invalid, sharing to the host, then marking the
mappings valid again. But pageattr_p[mu]d_entry() would only update the
entry if it is a section mapping, since otherwise it concluded it must
be a table entry so shouldn't be modified. But p[mu]d_sect() only
returns true if the entry is valid. So the result was that the large
leaf entry was made invalid in the first pass then ignored in the second
pass. It remains invalid until the above code tries to access it and
blows up.
The simple fix would be to update pageattr_pmd_entry() to use
!pmd_table() instead of pmd_sect(). That would solve this problem.
But the ptdump code also suffers from a similar issue. It checks
pmd_leaf() and doesn't call into the arch-specific note_page() machinery
if it returns false. As a result of this, ptdump wasn't even able to
show the invalid large leaf mappings; it looked like they were valid
which made this super fun to debug. the ptdump code is core-mm and
pmd_table() is arm64-specific so we can't use the same trick to solve
that.
But we already support the concept of "present-invalid" for user space
entries. And even better, pmd_leaf() will return true for a leaf mapping
that is marked present-invalid. So let's just use that encoding for
present-invalid kernel mappings too. Then we can use pmd_leaf() where we
previously used pmd_sect() and everything is magically fixed.
Additionally, from inspection kernel_page_present() was broken in a
similar way, so I'm also updating that to use pmd_leaf().
The transitional page tables component was also similarly broken; it
creates a copy of the kernel page tables, making RO leaf mappings RW in
the process. It also makes invalid (but-not-none) pte mappings valid.
But it was not doing this for large leaf mappings. This could have
resulted in crashes at kexec- or hibernate-time. This code is fixed to
flip "present-invalid" mappings back to "present-valid" at all levels.
Finally, I have hardened split_pmd()/split_pud() so that if it is passed
a "present-invalid" leaf, it will maintain that property in the split
leaves, since I wasn't able to convince myself that it would only ever
be called for "present-valid" leaves.
Fixes: a166563e7ec37 ("arm64: mm: support large block mapping when rodata=full")
Cc: stable@vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/arm64/include/asm/pgtable-prot.h | 2 ++
arch/arm64/include/asm/pgtable.h | 9 +++--
arch/arm64/mm/mmu.c | 4 +++
arch/arm64/mm/pageattr.c | 50 +++++++++++++++------------
arch/arm64/mm/trans_pgd.c | 42 ++++------------------
5 files changed, 48 insertions(+), 59 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index f560e64202674..212ce1b02e15e 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -25,6 +25,8 @@
*/
#define PTE_PRESENT_INVALID (PTE_NG) /* only when !PTE_VALID */
+#define PTE_PRESENT_VALID_KERNEL (PTE_VALID | PTE_MAYBE_NG)
+
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
#define PTE_UFFD_WP (_AT(pteval_t, 1) << 58) /* uffd-wp tracking */
#define PTE_SWP_UFFD_WP (_AT(pteval_t, 1) << 3) /* only for swp ptes */
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b3e58735c49bd..dd062179b9b66 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -322,9 +322,11 @@ static inline pte_t pte_mknoncont(pte_t pte)
return clear_pte_bit(pte, __pgprot(PTE_CONT));
}
-static inline pte_t pte_mkvalid(pte_t pte)
+static inline pte_t pte_mkvalid_k(pte_t pte)
{
- return set_pte_bit(pte, __pgprot(PTE_VALID));
+ pte = clear_pte_bit(pte, __pgprot(PTE_PRESENT_INVALID));
+ pte = set_pte_bit(pte, __pgprot(PTE_PRESENT_VALID_KERNEL));
+ return pte;
}
static inline pte_t pte_mkinvalid(pte_t pte)
@@ -594,6 +596,7 @@ static inline int pmd_protnone(pmd_t pmd)
#define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd)))
#define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
+#define pmd_mkvalid_k(pmd) pte_pmd(pte_mkvalid_k(pmd_pte(pmd)))
#define pmd_mkinvalid(pmd) pte_pmd(pte_mkinvalid(pmd_pte(pmd)))
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
#define pmd_uffd_wp(pmd) pte_uffd_wp(pmd_pte(pmd))
@@ -635,6 +638,8 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
#define pud_young(pud) pte_young(pud_pte(pud))
#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
+#define pud_mkwrite_novma(pud) pte_pud(pte_mkwrite_novma(pud_pte(pud)))
+#define pud_mkvalid_k(pud) pte_pud(pte_mkvalid_k(pud_pte(pud)))
#define pud_write(pud) pte_write(pud_pte(pud))
static inline pud_t pud_mkhuge(pud_t pud)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 223947487a223..1575680675d8d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -602,6 +602,8 @@ static int split_pmd(pmd_t *pmdp, pmd_t pmd, gfp_t gfp, bool to_cont)
tableprot |= PMD_TABLE_PXN;
prot = __pgprot((pgprot_val(prot) & ~PTE_TYPE_MASK) | PTE_TYPE_PAGE);
+ if (!pmd_valid(pmd))
+ prot = pte_pgprot(pte_mkinvalid(pfn_pte(0, prot)));
prot = __pgprot(pgprot_val(prot) & ~PTE_CONT);
if (to_cont)
prot = __pgprot(pgprot_val(prot) | PTE_CONT);
@@ -647,6 +649,8 @@ static int split_pud(pud_t *pudp, pud_t pud, gfp_t gfp, bool to_cont)
tableprot |= PUD_TABLE_PXN;
prot = __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
+ if (!pud_valid(pud))
+ prot = pmd_pgprot(pmd_mkinvalid(pfn_pmd(0, prot)));
prot = __pgprot(pgprot_val(prot) & ~PTE_CONT);
if (to_cont)
prot = __pgprot(pgprot_val(prot) | PTE_CONT);
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 358d1dc9a576f..ce035e1b4eaf6 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -25,6 +25,11 @@ static ptdesc_t set_pageattr_masks(ptdesc_t val, struct mm_walk *walk)
{
struct page_change_data *masks = walk->private;
+ /*
+ * Some users clear and set bits which alias each other (e.g. PTE_NG and
+ * PTE_PRESENT_INVALID). It is therefore important that we always clear
+ * first then set.
+ */
val &= ~(pgprot_val(masks->clear_mask));
val |= (pgprot_val(masks->set_mask));
@@ -36,7 +41,7 @@ static int pageattr_pud_entry(pud_t *pud, unsigned long addr,
{
pud_t val = pudp_get(pud);
- if (pud_sect(val)) {
+ if (pud_leaf(val)) {
if (WARN_ON_ONCE((next - addr) != PUD_SIZE))
return -EINVAL;
val = __pud(set_pageattr_masks(pud_val(val), walk));
@@ -52,7 +57,7 @@ static int pageattr_pmd_entry(pmd_t *pmd, unsigned long addr,
{
pmd_t val = pmdp_get(pmd);
- if (pmd_sect(val)) {
+ if (pmd_leaf(val)) {
if (WARN_ON_ONCE((next - addr) != PMD_SIZE))
return -EINVAL;
val = __pmd(set_pageattr_masks(pmd_val(val), walk));
@@ -132,11 +137,12 @@ static int __change_memory_common(unsigned long start, unsigned long size,
ret = update_range_prot(start, size, set_mask, clear_mask);
/*
- * If the memory is being made valid without changing any other bits
- * then a TLBI isn't required as a non-valid entry cannot be cached in
- * the TLB.
+ * If the memory is being switched from present-invalid to valid without
+ * changing any other bits then a TLBI isn't required as a non-valid
+ * entry cannot be cached in the TLB.
*/
- if (pgprot_val(set_mask) != PTE_VALID || pgprot_val(clear_mask))
+ if (pgprot_val(set_mask) != PTE_PRESENT_VALID_KERNEL ||
+ pgprot_val(clear_mask) != PTE_PRESENT_INVALID)
flush_tlb_kernel_range(start, start + size);
return ret;
}
@@ -237,18 +243,18 @@ int set_memory_valid(unsigned long addr, int numpages, int enable)
{
if (enable)
return __change_memory_common(addr, PAGE_SIZE * numpages,
- __pgprot(PTE_VALID),
- __pgprot(0));
+ __pgprot(PTE_PRESENT_VALID_KERNEL),
+ __pgprot(PTE_PRESENT_INVALID));
else
return __change_memory_common(addr, PAGE_SIZE * numpages,
- __pgprot(0),
- __pgprot(PTE_VALID));
+ __pgprot(PTE_PRESENT_INVALID),
+ __pgprot(PTE_PRESENT_VALID_KERNEL));
}
int set_direct_map_invalid_noflush(struct page *page)
{
- pgprot_t clear_mask = __pgprot(PTE_VALID);
- pgprot_t set_mask = __pgprot(0);
+ pgprot_t clear_mask = __pgprot(PTE_PRESENT_VALID_KERNEL);
+ pgprot_t set_mask = __pgprot(PTE_PRESENT_INVALID);
if (!can_set_direct_map())
return 0;
@@ -259,8 +265,8 @@ int set_direct_map_invalid_noflush(struct page *page)
int set_direct_map_default_noflush(struct page *page)
{
- pgprot_t set_mask = __pgprot(PTE_VALID | PTE_WRITE);
- pgprot_t clear_mask = __pgprot(PTE_RDONLY);
+ pgprot_t set_mask = __pgprot(PTE_PRESENT_VALID_KERNEL | PTE_WRITE);
+ pgprot_t clear_mask = __pgprot(PTE_PRESENT_INVALID | PTE_RDONLY);
if (!can_set_direct_map())
return 0;
@@ -296,8 +302,8 @@ static int __set_memory_enc_dec(unsigned long addr,
* entries or Synchronous External Aborts caused by RIPAS_EMPTY
*/
ret = __change_memory_common(addr, PAGE_SIZE * numpages,
- __pgprot(set_prot),
- __pgprot(clear_prot | PTE_VALID));
+ __pgprot(set_prot | PTE_PRESENT_INVALID),
+ __pgprot(clear_prot | PTE_PRESENT_VALID_KERNEL));
if (ret)
return ret;
@@ -311,8 +317,8 @@ static int __set_memory_enc_dec(unsigned long addr,
return ret;
return __change_memory_common(addr, PAGE_SIZE * numpages,
- __pgprot(PTE_VALID),
- __pgprot(0));
+ __pgprot(PTE_PRESENT_VALID_KERNEL),
+ __pgprot(PTE_PRESENT_INVALID));
}
static int realm_set_memory_encrypted(unsigned long addr, int numpages)
@@ -404,15 +410,15 @@ bool kernel_page_present(struct page *page)
pud = READ_ONCE(*pudp);
if (pud_none(pud))
return false;
- if (pud_sect(pud))
- return true;
+ if (pud_leaf(pud))
+ return pud_valid(pud);
pmdp = pmd_offset(pudp, addr);
pmd = READ_ONCE(*pmdp);
if (pmd_none(pmd))
return false;
- if (pmd_sect(pmd))
- return true;
+ if (pmd_leaf(pmd))
+ return pmd_valid(pmd);
ptep = pte_offset_kernel(pmdp, addr);
return pte_valid(__ptep_get(ptep));
diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
index 18543b603c77b..cca9706a875c3 100644
--- a/arch/arm64/mm/trans_pgd.c
+++ b/arch/arm64/mm/trans_pgd.c
@@ -31,36 +31,6 @@ static void *trans_alloc(struct trans_pgd_info *info)
return info->trans_alloc_page(info->trans_alloc_arg);
}
-static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
-{
- pte_t pte = __ptep_get(src_ptep);
-
- if (pte_valid(pte)) {
- /*
- * Resume will overwrite areas that may be marked
- * read only (code, rodata). Clear the RDONLY bit from
- * the temporary mappings we use during restore.
- */
- __set_pte(dst_ptep, pte_mkwrite_novma(pte));
- } else if (!pte_none(pte)) {
- /*
- * debug_pagealloc will removed the PTE_VALID bit if
- * the page isn't in use by the resume kernel. It may have
- * been in use by the original kernel, in which case we need
- * to put it back in our copy to do the restore.
- *
- * Other cases include kfence / vmalloc / memfd_secret which
- * may call `set_direct_map_invalid_noflush()`.
- *
- * Before marking this entry valid, check the pfn should
- * be mapped.
- */
- BUG_ON(!pfn_valid(pte_pfn(pte)));
-
- __set_pte(dst_ptep, pte_mkvalid(pte_mkwrite_novma(pte)));
- }
-}
-
static int copy_pte(struct trans_pgd_info *info, pmd_t *dst_pmdp,
pmd_t *src_pmdp, unsigned long start, unsigned long end)
{
@@ -76,7 +46,11 @@ static int copy_pte(struct trans_pgd_info *info, pmd_t *dst_pmdp,
src_ptep = pte_offset_kernel(src_pmdp, start);
do {
- _copy_pte(dst_ptep, src_ptep, addr);
+ pte_t pte = __ptep_get(src_ptep);
+
+ if (pte_none(pte))
+ continue;
+ __set_pte(dst_ptep, pte_mkvalid_k(pte_mkwrite_novma(pte)));
} while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr != end);
return 0;
@@ -109,8 +83,7 @@ static int copy_pmd(struct trans_pgd_info *info, pud_t *dst_pudp,
if (copy_pte(info, dst_pmdp, src_pmdp, addr, next))
return -ENOMEM;
} else {
- set_pmd(dst_pmdp,
- __pmd(pmd_val(pmd) & ~PMD_SECT_RDONLY));
+ set_pmd(dst_pmdp, pmd_mkvalid_k(pmd_mkwrite_novma(pmd)));
}
} while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
@@ -145,8 +118,7 @@ static int copy_pud(struct trans_pgd_info *info, p4d_t *dst_p4dp,
if (copy_pmd(info, dst_pudp, src_pudp, addr, next))
return -ENOMEM;
} else {
- set_pud(dst_pudp,
- __pud(pud_val(pud) & ~PUD_SECT_RDONLY));
+ set_pud(dst_pudp, pud_mkvalid_k(pud_mkwrite_novma(pud)));
}
} while (dst_pudp++, src_pudp++, addr = next, addr != end);
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/3] arm64: mm: Remove pmd_sect() and pud_sect()
2026-03-30 16:17 [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT Ryan Roberts
2026-03-30 16:17 ` [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests Ryan Roberts
2026-03-30 16:17 ` [PATCH v2 2/3] arm64: mm: Handle invalid large leaf mappings correctly Ryan Roberts
@ 2026-03-30 16:17 ` Ryan Roberts
2026-04-02 21:11 ` [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT Catalin Marinas
3 siblings, 0 replies; 18+ messages in thread
From: Ryan Roberts @ 2026-03-30 16:17 UTC (permalink / raw)
To: Catalin Marinas, Will Deacon, David Hildenbrand (Arm), Dev Jain,
Yang Shi, Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky
Cc: Ryan Roberts, linux-arm-kernel, linux-kernel
The semantics of pXd_leaf() are very similar to pXd_sect(). The only
difference is that pXd_sect() only considers it a section if PTE_VALID
is set, whereas pXd_leaf() permits both "valid" and "present-invalid"
types.
Using pXd_sect() has caused issues now that large leaf entries can be
present-invalid since commit a166563e7ec37 ("arm64: mm: support large
block mapping when rodata=full"), so let's just remove the API and
standardize on pXd_leaf().
There are a few callsites of the form pXd_leaf(READ_ONCE(*pXdp)). This
was previously fine for the pXd_sect() macro because it only evaluated
its argument once. But pXd_leaf() evaluates its argument multiple times.
So let's avoid unintended side effects by reimplementing pXd_leaf() as
an inline function.
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
arch/arm64/include/asm/pgtable.h | 19 ++++++++++++-------
arch/arm64/mm/mmu.c | 18 +++++++++---------
2 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index dd062179b9b66..5bc42b85acfc0 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -784,9 +784,13 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
#define pmd_table(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
PMD_TYPE_TABLE)
-#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
- PMD_TYPE_SECT)
-#define pmd_leaf(pmd) (pmd_present(pmd) && !pmd_table(pmd))
+
+#define pmd_leaf pmd_leaf
+static inline bool pmd_leaf(pmd_t pmd)
+{
+ return pmd_present(pmd) && !pmd_table(pmd);
+}
+
#define pmd_bad(pmd) (!pmd_table(pmd))
#define pmd_leaf_size(pmd) (pmd_cont(pmd) ? CONT_PMD_SIZE : PMD_SIZE)
@@ -804,11 +808,8 @@ static inline int pmd_trans_huge(pmd_t pmd)
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
#if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
-static inline bool pud_sect(pud_t pud) { return false; }
static inline bool pud_table(pud_t pud) { return true; }
#else
-#define pud_sect(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \
- PUD_TYPE_SECT)
#define pud_table(pud) ((pud_val(pud) & PUD_TYPE_MASK) == \
PUD_TYPE_TABLE)
#endif
@@ -878,7 +879,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
PUD_TYPE_TABLE)
#define pud_present(pud) pte_present(pud_pte(pud))
#ifndef __PAGETABLE_PMD_FOLDED
-#define pud_leaf(pud) (pud_present(pud) && !pud_table(pud))
+#define pud_leaf pud_leaf
+static inline bool pud_leaf(pud_t pud)
+{
+ return pud_present(pud) && !pud_table(pud);
+}
#else
#define pud_leaf(pud) false
#endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1575680675d8d..dcee56bb622ad 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -204,7 +204,7 @@ static int alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
pmd_t pmd = READ_ONCE(*pmdp);
pte_t *ptep;
- BUG_ON(pmd_sect(pmd));
+ BUG_ON(pmd_leaf(pmd));
if (pmd_none(pmd)) {
pmdval_t pmdval = PMD_TYPE_TABLE | PMD_TABLE_UXN | PMD_TABLE_AF;
phys_addr_t pte_phys;
@@ -303,7 +303,7 @@ static int alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
/*
* Check for initial section mappings in the pgd/pud.
*/
- BUG_ON(pud_sect(pud));
+ BUG_ON(pud_leaf(pud));
if (pud_none(pud)) {
pudval_t pudval = PUD_TYPE_TABLE | PUD_TABLE_UXN | PUD_TABLE_AF;
phys_addr_t pmd_phys;
@@ -1503,7 +1503,7 @@ static void unmap_hotplug_pmd_range(pud_t *pudp, unsigned long addr,
continue;
WARN_ON(!pmd_present(pmd));
- if (pmd_sect(pmd)) {
+ if (pmd_leaf(pmd)) {
pmd_clear(pmdp);
/*
@@ -1536,7 +1536,7 @@ static void unmap_hotplug_pud_range(p4d_t *p4dp, unsigned long addr,
continue;
WARN_ON(!pud_present(pud));
- if (pud_sect(pud)) {
+ if (pud_leaf(pud)) {
pud_clear(pudp);
/*
@@ -1650,7 +1650,7 @@ static void free_empty_pmd_table(pud_t *pudp, unsigned long addr,
if (pmd_none(pmd))
continue;
- WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd));
+ WARN_ON(!pmd_present(pmd) || !pmd_table(pmd));
free_empty_pte_table(pmdp, addr, next, floor, ceiling);
} while (addr = next, addr < end);
@@ -1690,7 +1690,7 @@ static void free_empty_pud_table(p4d_t *p4dp, unsigned long addr,
if (pud_none(pud))
continue;
- WARN_ON(!pud_present(pud) || !pud_table(pud) || pud_sect(pud));
+ WARN_ON(!pud_present(pud) || !pud_table(pud));
free_empty_pmd_table(pudp, addr, next, floor, ceiling);
} while (addr = next, addr < end);
@@ -1786,7 +1786,7 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
{
vmemmap_verify((pte_t *)pmdp, node, addr, next);
- return pmd_sect(READ_ONCE(*pmdp));
+ return pmd_leaf(READ_ONCE(*pmdp));
}
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
@@ -1850,7 +1850,7 @@ void p4d_clear_huge(p4d_t *p4dp)
int pud_clear_huge(pud_t *pudp)
{
- if (!pud_sect(READ_ONCE(*pudp)))
+ if (!pud_leaf(READ_ONCE(*pudp)))
return 0;
pud_clear(pudp);
return 1;
@@ -1858,7 +1858,7 @@ int pud_clear_huge(pud_t *pudp)
int pmd_clear_huge(pmd_t *pmdp)
{
- if (!pmd_sect(READ_ONCE(*pmdp)))
+ if (!pmd_leaf(READ_ONCE(*pmdp)))
return 0;
pmd_clear(pmdp);
return 1;
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-03-30 16:17 ` [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests Ryan Roberts
@ 2026-03-31 14:35 ` Suzuki K Poulose
2026-04-02 20:43 ` Catalin Marinas
1 sibling, 0 replies; 18+ messages in thread
From: Suzuki K Poulose @ 2026-03-31 14:35 UTC (permalink / raw)
To: Ryan Roberts, Catalin Marinas, Will Deacon,
David Hildenbrand (Arm), Dev Jain, Yang Shi, Jinjiang Tu,
Kevin Brodsky
Cc: linux-arm-kernel, linux-kernel, stable
On 30/03/2026 17:17, Ryan Roberts wrote:
> Commit a166563e7ec37 ("arm64: mm: support large block mapping when
> rodata=full") enabled the linear map to be mapped by block/cont while
> still allowing granular permission changes on BBML2_NOABORT systems by
> lazily splitting the live mappings. This mechanism was intended to be
> usable by realm guests since they need to dynamically share dma buffers
> with the host by "decrypting" them - which for Arm CCA, means marking
> them as shared in the page tables.
>
> However, it turns out that the mechanism was failing for realm guests
> because realms need to share their dma buffers (via
> __set_memory_enc_dec()) much earlier during boot than
> split_kernel_leaf_mapping() was able to handle. The report linked below
> showed that GIC's ITS was one such user. But during the investigation I
> found other callsites that could not meet the
> split_kernel_leaf_mapping() constraints.
>
> The problem is that we block map the linear map based on the boot CPU
> supporting BBML2_NOABORT, then check that all the other CPUs support it
> too when finalizing the caps. If they don't, then we stop_machine() and
> split to ptes. For safety, split_kernel_leaf_mapping() previously
> wouldn't permit splitting until after the caps were finalized. That
> ensured that if any secondary cpus were running that didn't support
> BBML2_NOABORT, we wouldn't risk breaking them.
>
> I've fix this problem by reducing the black-out window where we refuse
> to split; there are now 2 windows. The first is from T0 until the page
> allocator is inititialized. Splitting allocates memory for the page
> allocator so it must be in use. The second covers the period between
> starting to online the secondary cpus until the system caps are
> finalized (this is a very small window).
>
> All of the problematic callers are calling __set_memory_enc_dec() before
> the secondary cpus come online, so this solves the problem. However, one
> of these callers, swiotlb_update_mem_attributes(), was trying to split
> before the page allocator was initialized. So I have moved this call
> from arch_mm_preinit() to mem_init(), which solves the ordering issue.
>
> I've added warnings and return an error if any attempt is made to split
> in the black-out windows.
>
> Note there are other issues which prevent booting all the way to user
> space, which will be fixed in subsequent patches.
>
> Reported-by: Jinjiang Tu <tujinjiang@huawei.com>
> Closes: https://lore.kernel.org/all/0b2a4ae5-fc51-4d77-b177-b2e9db74f11d@huawei.com/
> Fixes: a166563e7ec37 ("arm64: mm: support large block mapping when rodata=full")
> Cc: stable@vger.kernel.org
> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
I have tested with a hacked cpufeature code to enable BBML2_NOABORT
for FVP MIDRs.
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Suzuki
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
> arch/arm64/include/asm/mmu.h | 2 ++
> arch/arm64/mm/init.c | 9 +++++++-
> arch/arm64/mm/mmu.c | 45 +++++++++++++++++++++++++-----------
> 3 files changed, 42 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 137a173df1ff8..472610433aaea 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -112,5 +112,7 @@ void kpti_install_ng_mappings(void);
> static inline void kpti_install_ng_mappings(void) {}
> #endif
>
> +extern bool page_alloc_available;
> +
> #endif /* !__ASSEMBLER__ */
> #endif
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 96711b8578fd0..b9b248d24fd10 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -350,7 +350,6 @@ void __init arch_mm_preinit(void)
> }
>
> swiotlb_init(swiotlb, flags);
> - swiotlb_update_mem_attributes();
>
> /*
> * Check boundaries twice: Some fundamental inconsistencies can be
> @@ -377,6 +376,14 @@ void __init arch_mm_preinit(void)
> }
> }
>
> +bool page_alloc_available __ro_after_init;
> +
> +void __init mem_init(void)
> +{
> + page_alloc_available = true;
> + swiotlb_update_mem_attributes();
> +}
> +
> void free_initmem(void)
> {
> void *lm_init_begin = lm_alias(__init_begin);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index a6a00accf4f93..223947487a223 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -768,30 +768,51 @@ static inline bool force_pte_mapping(void)
> }
>
> static DEFINE_MUTEX(pgtable_split_lock);
> +static bool linear_map_requires_bbml2;
>
> int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
> {
> int ret;
>
> - /*
> - * !BBML2_NOABORT systems should not be trying to change permissions on
> - * anything that is not pte-mapped in the first place. Just return early
> - * and let the permission change code raise a warning if not already
> - * pte-mapped.
> - */
> - if (!system_supports_bbml2_noabort())
> - return 0;
> -
> /*
> * If the region is within a pte-mapped area, there is no need to try to
> * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
> * change permissions from atomic context so for those cases (which are
> * always pte-mapped), we must not go any further because taking the
> - * mutex below may sleep.
> + * mutex below may sleep. Do not call force_pte_mapping() here because
> + * it could return a confusing result if called from a secondary cpu
> + * prior to finalizing caps. Instead, linear_map_requires_bbml2 gives us
> + * what we need.
> */
> - if (force_pte_mapping() || is_kfence_address((void *)start))
> + if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))
> return 0;
>
> + if (!system_supports_bbml2_noabort()) {
> + /*
> + * !BBML2_NOABORT systems should not be trying to change
> + * permissions on anything that is not pte-mapped in the first
> + * place. Just return early and let the permission change code
> + * raise a warning if not already pte-mapped.
> + */
> + if (system_capabilities_finalized())
> + return 0;
> +
> + /*
> + * Boot-time: split_kernel_leaf_mapping_locked() allocates from
> + * page allocator. Can't split until it's available.
> + */
> + if (WARN_ON(!page_alloc_available))
> + return -EBUSY;
> +
> + /*
> + * Boot-time: Started secondary cpus but don't know if they
> + * support BBML2_NOABORT yet. Can't allow splitting in this
> + * window in case they don't.
> + */
> + if (WARN_ON(num_online_cpus() > 1))
> + return -EBUSY;
> + }
> +
> /*
> * Ensure start and end are at least page-aligned since this is the
> * finest granularity we can split to.
> @@ -891,8 +912,6 @@ static int range_split_to_ptes(unsigned long start, unsigned long end, gfp_t gfp
> return ret;
> }
>
> -static bool linear_map_requires_bbml2 __initdata;
> -
> u32 idmap_kpti_bbml2_flag;
>
> static void __init init_idmap_kpti_bbml2_flag(void)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-03-30 16:17 ` [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests Ryan Roberts
2026-03-31 14:35 ` Suzuki K Poulose
@ 2026-04-02 20:43 ` Catalin Marinas
2026-04-03 10:31 ` Catalin Marinas
` (2 more replies)
1 sibling, 3 replies; 18+ messages in thread
From: Catalin Marinas @ 2026-04-02 20:43 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On Mon, Mar 30, 2026 at 05:17:02PM +0100, Ryan Roberts wrote:
> int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
> {
> int ret;
>
> - /*
> - * !BBML2_NOABORT systems should not be trying to change permissions on
> - * anything that is not pte-mapped in the first place. Just return early
> - * and let the permission change code raise a warning if not already
> - * pte-mapped.
> - */
> - if (!system_supports_bbml2_noabort())
> - return 0;
> -
> /*
> * If the region is within a pte-mapped area, there is no need to try to
> * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
> * change permissions from atomic context so for those cases (which are
> * always pte-mapped), we must not go any further because taking the
> - * mutex below may sleep.
> + * mutex below may sleep. Do not call force_pte_mapping() here because
> + * it could return a confusing result if called from a secondary cpu
> + * prior to finalizing caps. Instead, linear_map_requires_bbml2 gives us
> + * what we need.
> */
> - if (force_pte_mapping() || is_kfence_address((void *)start))
> + if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))
> return 0;
>
> + if (!system_supports_bbml2_noabort()) {
> + /*
> + * !BBML2_NOABORT systems should not be trying to change
> + * permissions on anything that is not pte-mapped in the first
> + * place. Just return early and let the permission change code
> + * raise a warning if not already pte-mapped.
> + */
> + if (system_capabilities_finalized())
> + return 0;
> +
> + /*
> + * Boot-time: split_kernel_leaf_mapping_locked() allocates from
> + * page allocator. Can't split until it's available.
> + */
> + if (WARN_ON(!page_alloc_available))
> + return -EBUSY;
> +
> + /*
> + * Boot-time: Started secondary cpus but don't know if they
> + * support BBML2_NOABORT yet. Can't allow splitting in this
> + * window in case they don't.
> + */
> + if (WARN_ON(num_online_cpus() > 1))
> + return -EBUSY;
> + }
I think sashiko is over cautions here
(https://sashiko.dev/#/patchset/20260330161705.3349825-1-ryan.roberts@arm.com)
but it has a somewhat valid point from the perspective of
num_online_cpus() semantics. We have have num_online_cpus() == 1 while
having a secondary CPU just booted and with its MMU enabled. I don't
think we can have any asynchronous tasks running at that point to
trigger a spit though. Even async_init() is called after smp_init().
An option may be to attempt cpus_read_trylock() as this lock is taken by
_cpu_up(). If it fails, return -EBUSY, otherwise check num_online_cpus()
and unlock (and return -EBUSY if secondaries already started).
Another thing I couldn't get my head around - IIUC is_realm_world()
won't return true for map_mem() yet (if in a realm). Can we have realms
on hardware that does not support BBML2_NOABORT? We may not have
configuration with rodata_full set (it should be complementary to realm
support).
I'll add the patches to for-next/core to give them a bit of time in
-next but let's see next week if we ignore this (with an updated
comment) or we try to avoid the issue altogether.
--
Catalin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT
2026-03-30 16:17 [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT Ryan Roberts
` (2 preceding siblings ...)
2026-03-30 16:17 ` [PATCH v2 3/3] arm64: mm: Remove pmd_sect() and pud_sect() Ryan Roberts
@ 2026-04-02 21:11 ` Catalin Marinas
3 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2026-04-02 21:11 UTC (permalink / raw)
To: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, Ryan Roberts
Cc: linux-arm-kernel, linux-kernel
On Mon, 30 Mar 2026 17:17:01 +0100, Ryan Roberts wrote:
> This fixes a couple of bugs in the "large block mappings for linear map when we
> have BBML2_NOABORT" feature when used in conjunction with a CCA realm guest.
> While investigating I found and fixed some more general issues too. See commit
> logs for full explanations.
>
> Applies on top of v7.0-rc4.
>
> [...]
Applied to arm64 (for-next/bbml2-fixes), thanks! I had some comments on
the first patch, so I may rebase it or add something on top.
[1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
https://git.kernel.org/arm64/c/f12b435de2f2
[2/3] arm64: mm: Handle invalid large leaf mappings correctly
https://git.kernel.org/arm64/c/15bfba1ad77f
[3/3] arm64: mm: Remove pmd_sect() and pud_sect()
https://git.kernel.org/arm64/c/1d37713fa837
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-02 20:43 ` Catalin Marinas
@ 2026-04-03 10:31 ` Catalin Marinas
2026-04-07 8:43 ` Ryan Roberts
2026-04-07 8:33 ` Ryan Roberts
2026-04-07 9:57 ` Suzuki K Poulose
2 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2026-04-03 10:31 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On Thu, Apr 02, 2026 at 09:43:59PM +0100, Catalin Marinas wrote:
> Another thing I couldn't get my head around - IIUC is_realm_world()
> won't return true for map_mem() yet (if in a realm). Can we have realms
> on hardware that does not support BBML2_NOABORT? We may not have
> configuration with rodata_full set (it should be complementary to realm
> support).
With rodata_full==false, can_set_direct_map() returns false initially
but after arm64_rsi_init() it starts returning true if is_realm_world().
The side-effect is that map_mem() goes for block mappings and
linear_map_requires_bbml2 set to false. Later on,
linear_map_maybe_split_to_ptes() will skip the splitting.
Unless I'm missing something, is_realm_world() calls in
force_pte_mapping() and can_set_direct_map() are useless. I'd remove
them and either require BBML2_NOABORT with CCA or get the user to force
rodata_full when running in realms. Or move arm64_rsi_init() even
earlier?
--
Catalin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-02 20:43 ` Catalin Marinas
2026-04-03 10:31 ` Catalin Marinas
@ 2026-04-07 8:33 ` Ryan Roberts
2026-04-07 9:19 ` Catalin Marinas
2026-04-07 9:57 ` Suzuki K Poulose
2 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2026-04-07 8:33 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On 02/04/2026 21:43, Catalin Marinas wrote:
> On Mon, Mar 30, 2026 at 05:17:02PM +0100, Ryan Roberts wrote:
>> int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>> {
>> int ret;
>>
>> - /*
>> - * !BBML2_NOABORT systems should not be trying to change permissions on
>> - * anything that is not pte-mapped in the first place. Just return early
>> - * and let the permission change code raise a warning if not already
>> - * pte-mapped.
>> - */
>> - if (!system_supports_bbml2_noabort())
>> - return 0;
>> -
>> /*
>> * If the region is within a pte-mapped area, there is no need to try to
>> * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
>> * change permissions from atomic context so for those cases (which are
>> * always pte-mapped), we must not go any further because taking the
>> - * mutex below may sleep.
>> + * mutex below may sleep. Do not call force_pte_mapping() here because
>> + * it could return a confusing result if called from a secondary cpu
>> + * prior to finalizing caps. Instead, linear_map_requires_bbml2 gives us
>> + * what we need.
>> */
>> - if (force_pte_mapping() || is_kfence_address((void *)start))
>> + if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))
>> return 0;
>>
>> + if (!system_supports_bbml2_noabort()) {
>> + /*
>> + * !BBML2_NOABORT systems should not be trying to change
>> + * permissions on anything that is not pte-mapped in the first
>> + * place. Just return early and let the permission change code
>> + * raise a warning if not already pte-mapped.
>> + */
>> + if (system_capabilities_finalized())
>> + return 0;
>> +
>> + /*
>> + * Boot-time: split_kernel_leaf_mapping_locked() allocates from
>> + * page allocator. Can't split until it's available.
>> + */
>> + if (WARN_ON(!page_alloc_available))
>> + return -EBUSY;
>> +
>> + /*
>> + * Boot-time: Started secondary cpus but don't know if they
>> + * support BBML2_NOABORT yet. Can't allow splitting in this
>> + * window in case they don't.
>> + */
>> + if (WARN_ON(num_online_cpus() > 1))
>> + return -EBUSY;
>> + }
>
> I think sashiko is over cautions here
> (https://sashiko.dev/#/patchset/20260330161705.3349825-1-ryan.roberts@arm.com)
> but it has a somewhat valid point from the perspective of
> num_online_cpus() semantics. We have have num_online_cpus() == 1 while
> having a secondary CPU just booted and with its MMU enabled. I don't
> think we can have any asynchronous tasks running at that point to
> trigger a spit though. Even async_init() is called after smp_init().
Yes I saw the Sashiko report, but we had previously had a (private) discussion
where I thought we had already concluded that this approach is safe in practice
due to the way that the boot cpu brings the secondaries online.
>
> An option may be to attempt cpus_read_trylock() as this lock is taken by
> _cpu_up(). If it fails, return -EBUSY, otherwise check num_online_cpus()
> and unlock (and return -EBUSY if secondaries already started).
That sounds neat; I could dig deeper and have a go at something like this if you
want?
>
> Another thing I couldn't get my head around - IIUC is_realm_world()
> won't return true for map_mem() yet (if in a realm). Can we have realms
> on hardware that does not support BBML2_NOABORT? We may not have
> configuration with rodata_full set (it should be complementary to realm
> support).
My understanding is that this is a pre-existing (and known) bug. It's not
related to the "map linear map by large leaves and split dynamically" feature so
wasn't attempting to fix it.
I had heard that in practice all FEAT_RME systems should support FEAT_BBML3
which would solve the problem. Not sure how true that is though.
>
> I'll add the patches to for-next/core to give them a bit of time in
> -next but let's see next week if we ignore this (with an updated
> comment) or we try to avoid the issue altogether.
>
Thanks,
Ryan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-03 10:31 ` Catalin Marinas
@ 2026-04-07 8:43 ` Ryan Roberts
2026-04-07 9:32 ` Catalin Marinas
0 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2026-04-07 8:43 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On 03/04/2026 11:31, Catalin Marinas wrote:
> On Thu, Apr 02, 2026 at 09:43:59PM +0100, Catalin Marinas wrote:
>> Another thing I couldn't get my head around - IIUC is_realm_world()
>> won't return true for map_mem() yet (if in a realm). Can we have realms
>> on hardware that does not support BBML2_NOABORT? We may not have
>> configuration with rodata_full set (it should be complementary to realm
>> support).
>
> With rodata_full==false, can_set_direct_map() returns false initially
> but after arm64_rsi_init() it starts returning true if is_realm_world().
> The side-effect is that map_mem() goes for block mappings and
> linear_map_requires_bbml2 set to false. Later on,
> linear_map_maybe_split_to_ptes() will skip the splitting.
>
> Unless I'm missing something, is_realm_world() calls in
> force_pte_mapping() and can_set_direct_map() are useless. I'd remove
> them and either require BBML2_NOABORT with CCA or get the user to force
> rodata_full when running in realms. Or move arm64_rsi_init() even
> earlier?
I'd need Suzuki to comment on this. As I said in the other mail, I was treating
this like a pre-existing bug. But I guess linear_map_requires_bbml2 ending up
wrong is a problem here. I'm not sure it's quite as simple as requiring
BBML2_NOABORT with CCA as we still need can_set_direct_map() to return true if
we are in a realm.
I don't know what it would take to run arm64_rsi_init() even earlier, but that
would be the best option from my point of view.
Thanks,
Ryan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-07 8:33 ` Ryan Roberts
@ 2026-04-07 9:19 ` Catalin Marinas
0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2026-04-07 9:19 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On Tue, Apr 07, 2026 at 09:33:50AM +0100, Ryan Roberts wrote:
> On 02/04/2026 21:43, Catalin Marinas wrote:
> > On Mon, Mar 30, 2026 at 05:17:02PM +0100, Ryan Roberts wrote:
> >> int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
> >> {
> >> int ret;
> >>
> >> - /*
> >> - * !BBML2_NOABORT systems should not be trying to change permissions on
> >> - * anything that is not pte-mapped in the first place. Just return early
> >> - * and let the permission change code raise a warning if not already
> >> - * pte-mapped.
> >> - */
> >> - if (!system_supports_bbml2_noabort())
> >> - return 0;
> >> -
> >> /*
> >> * If the region is within a pte-mapped area, there is no need to try to
> >> * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
> >> * change permissions from atomic context so for those cases (which are
> >> * always pte-mapped), we must not go any further because taking the
> >> - * mutex below may sleep.
> >> + * mutex below may sleep. Do not call force_pte_mapping() here because
> >> + * it could return a confusing result if called from a secondary cpu
> >> + * prior to finalizing caps. Instead, linear_map_requires_bbml2 gives us
> >> + * what we need.
> >> */
> >> - if (force_pte_mapping() || is_kfence_address((void *)start))
> >> + if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))
> >> return 0;
> >>
> >> + if (!system_supports_bbml2_noabort()) {
> >> + /*
> >> + * !BBML2_NOABORT systems should not be trying to change
> >> + * permissions on anything that is not pte-mapped in the first
> >> + * place. Just return early and let the permission change code
> >> + * raise a warning if not already pte-mapped.
> >> + */
> >> + if (system_capabilities_finalized())
> >> + return 0;
> >> +
> >> + /*
> >> + * Boot-time: split_kernel_leaf_mapping_locked() allocates from
> >> + * page allocator. Can't split until it's available.
> >> + */
> >> + if (WARN_ON(!page_alloc_available))
> >> + return -EBUSY;
> >> +
> >> + /*
> >> + * Boot-time: Started secondary cpus but don't know if they
> >> + * support BBML2_NOABORT yet. Can't allow splitting in this
> >> + * window in case they don't.
> >> + */
> >> + if (WARN_ON(num_online_cpus() > 1))
> >> + return -EBUSY;
> >> + }
> >
> > I think sashiko is over cautions here
> > (https://sashiko.dev/#/patchset/20260330161705.3349825-1-ryan.roberts@arm.com)
> > but it has a somewhat valid point from the perspective of
> > num_online_cpus() semantics. We have have num_online_cpus() == 1 while
> > having a secondary CPU just booted and with its MMU enabled. I don't
> > think we can have any asynchronous tasks running at that point to
> > trigger a spit though. Even async_init() is called after smp_init().
>
> Yes I saw the Sashiko report, but we had previously had a (private) discussion
> where I thought we had already concluded that this approach is safe in practice
> due to the way that the boot cpu brings the secondaries online.
Yes, I thought I'd mention it. I don't see how this could go wrong in
practice as we don't expect any page splitting on the primary CPU while
it waits for the secondaries to come up.
> > An option may be to attempt cpus_read_trylock() as this lock is taken by
> > _cpu_up(). If it fails, return -EBUSY, otherwise check num_online_cpus()
> > and unlock (and return -EBUSY if secondaries already started).
>
> That sounds neat; I could dig deeper and have a go at something like this if you
> want?
I don't think it's worth it. We'll probably need to keep the lock for
the duration of the splitting (though that's only before the cpu
features are fully initialised). It's something we can look into if we
ever see an issue here. For now, the num_online_cpus() test will do.
> > Another thing I couldn't get my head around - IIUC is_realm_world()
> > won't return true for map_mem() yet (if in a realm). Can we have realms
> > on hardware that does not support BBML2_NOABORT? We may not have
> > configuration with rodata_full set (it should be complementary to realm
> > support).
>
> My understanding is that this is a pre-existing (and known) bug. It's not
> related to the "map linear map by large leaves and split dynamically" feature so
> wasn't attempting to fix it.
Yes, it's an existing bug I noticed while trying to understand the code
paths. It doesn't need any action on this series but we should try to
fix it separately.
--
Catalin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-07 8:43 ` Ryan Roberts
@ 2026-04-07 9:32 ` Catalin Marinas
2026-04-07 10:13 ` Ryan Roberts
0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2026-04-07 9:32 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On Tue, Apr 07, 2026 at 09:43:42AM +0100, Ryan Roberts wrote:
> On 03/04/2026 11:31, Catalin Marinas wrote:
> > On Thu, Apr 02, 2026 at 09:43:59PM +0100, Catalin Marinas wrote:
> >> Another thing I couldn't get my head around - IIUC is_realm_world()
> >> won't return true for map_mem() yet (if in a realm). Can we have realms
> >> on hardware that does not support BBML2_NOABORT? We may not have
> >> configuration with rodata_full set (it should be complementary to realm
> >> support).
> >
> > With rodata_full==false, can_set_direct_map() returns false initially
> > but after arm64_rsi_init() it starts returning true if is_realm_world().
> > The side-effect is that map_mem() goes for block mappings and
> > linear_map_requires_bbml2 set to false. Later on,
> > linear_map_maybe_split_to_ptes() will skip the splitting.
> >
> > Unless I'm missing something, is_realm_world() calls in
> > force_pte_mapping() and can_set_direct_map() are useless. I'd remove
> > them and either require BBML2_NOABORT with CCA or get the user to force
> > rodata_full when running in realms. Or move arm64_rsi_init() even
> > earlier?
>
> I'd need Suzuki to comment on this. As I said in the other mail, I was treating
> this like a pre-existing bug. But I guess linear_map_requires_bbml2 ending up
> wrong is a problem here. I'm not sure it's quite as simple as requiring
> BBML2_NOABORT with CCA as we still need can_set_direct_map() to return true if
> we are in a realm.
can_set_direct_map() == true is not a property of the realm but rather a
requirement. In the absence of BBML2_NOABORT, I guess the test was added
under the assumption that force_pte_mapping() also returns true if
is_realm_world(). We might as well add a variable or static label to
track whether can_set_direct_map() is possible and avoid tests that
duplicate force_pte_mapping().
This won't solve the is_realm_world() changing polarity during boot but
at least we know it won't suddenly make can_set_direct_map() return
true when it shouldn't.
--
Catalin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-02 20:43 ` Catalin Marinas
2026-04-03 10:31 ` Catalin Marinas
2026-04-07 8:33 ` Ryan Roberts
@ 2026-04-07 9:57 ` Suzuki K Poulose
2026-04-07 17:21 ` Catalin Marinas
2 siblings, 1 reply; 18+ messages in thread
From: Suzuki K Poulose @ 2026-04-07 9:57 UTC (permalink / raw)
To: Catalin Marinas, Ryan Roberts
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Jinjiang Tu, Kevin Brodsky, linux-arm-kernel, linux-kernel,
stable
On 02/04/2026 21:43, Catalin Marinas wrote:
> On Mon, Mar 30, 2026 at 05:17:02PM +0100, Ryan Roberts wrote:
>> int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
>> {
>> int ret;
>>
>> - /*
>> - * !BBML2_NOABORT systems should not be trying to change permissions on
>> - * anything that is not pte-mapped in the first place. Just return early
>> - * and let the permission change code raise a warning if not already
>> - * pte-mapped.
>> - */
>> - if (!system_supports_bbml2_noabort())
>> - return 0;
>> -
>> /*
>> * If the region is within a pte-mapped area, there is no need to try to
>> * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
>> * change permissions from atomic context so for those cases (which are
>> * always pte-mapped), we must not go any further because taking the
>> - * mutex below may sleep.
>> + * mutex below may sleep. Do not call force_pte_mapping() here because
>> + * it could return a confusing result if called from a secondary cpu
>> + * prior to finalizing caps. Instead, linear_map_requires_bbml2 gives us
>> + * what we need.
>> */
>> - if (force_pte_mapping() || is_kfence_address((void *)start))
>> + if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))
>> return 0;
>>
>> + if (!system_supports_bbml2_noabort()) {
>> + /*
>> + * !BBML2_NOABORT systems should not be trying to change
>> + * permissions on anything that is not pte-mapped in the first
>> + * place. Just return early and let the permission change code
>> + * raise a warning if not already pte-mapped.
>> + */
>> + if (system_capabilities_finalized())
>> + return 0;
>> +
>> + /*
>> + * Boot-time: split_kernel_leaf_mapping_locked() allocates from
>> + * page allocator. Can't split until it's available.
>> + */
>> + if (WARN_ON(!page_alloc_available))
>> + return -EBUSY;
>> +
>> + /*
>> + * Boot-time: Started secondary cpus but don't know if they
>> + * support BBML2_NOABORT yet. Can't allow splitting in this
>> + * window in case they don't.
>> + */
>> + if (WARN_ON(num_online_cpus() > 1))
>> + return -EBUSY;
>> + }
>
> I think sashiko is over cautions here
> (https://sashiko.dev/#/patchset/20260330161705.3349825-1-ryan.roberts@arm.com)
> but it has a somewhat valid point from the perspective of
> num_online_cpus() semantics. We have have num_online_cpus() == 1 while
> having a secondary CPU just booted and with its MMU enabled. I don't
> think we can have any asynchronous tasks running at that point to
> trigger a spit though. Even async_init() is called after smp_init().
>
> An option may be to attempt cpus_read_trylock() as this lock is taken by
> _cpu_up(). If it fails, return -EBUSY, otherwise check num_online_cpus()
> and unlock (and return -EBUSY if secondaries already started).
>
> Another thing I couldn't get my head around - IIUC is_realm_world()
> won't return true for map_mem() yet (if in a realm).
That is correct. map_mem() comes from paginig_init(), which gets called
before arm64_rsi_init(). Realm check was delayed until psci_xx_init().
We had a version which parsed the DT for PSCI conduit early enough
to be able to make the SMC calls to detect the Realm. But there
were concerns around it.
> Can we have realms on hardware that does not support BBML2_NOABORT?
I can get this checked. I expect that they all will have BBML2 (v8.4
extension). But NOABORT is something that may need to be checked.
We may not have
> configuration with rodata_full set (it should be complementary to realm
> support).
>
> I'll add the patches to for-next/core to give them a bit of time in
> -next but let's see next week if we ignore this (with an updated
> comment) or we try to avoid the issue altogether.
>
Thanks
Suzuki
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-07 9:32 ` Catalin Marinas
@ 2026-04-07 10:13 ` Ryan Roberts
2026-04-07 10:52 ` Catalin Marinas
0 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2026-04-07 10:13 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On 07/04/2026 10:32, Catalin Marinas wrote:
> On Tue, Apr 07, 2026 at 09:43:42AM +0100, Ryan Roberts wrote:
>> On 03/04/2026 11:31, Catalin Marinas wrote:
>>> On Thu, Apr 02, 2026 at 09:43:59PM +0100, Catalin Marinas wrote:
>>>> Another thing I couldn't get my head around - IIUC is_realm_world()
>>>> won't return true for map_mem() yet (if in a realm). Can we have realms
>>>> on hardware that does not support BBML2_NOABORT? We may not have
>>>> configuration with rodata_full set (it should be complementary to realm
>>>> support).
>>>
>>> With rodata_full==false, can_set_direct_map() returns false initially
>>> but after arm64_rsi_init() it starts returning true if is_realm_world().
>>> The side-effect is that map_mem() goes for block mappings and
>>> linear_map_requires_bbml2 set to false. Later on,
>>> linear_map_maybe_split_to_ptes() will skip the splitting.
>>>
>>> Unless I'm missing something, is_realm_world() calls in
>>> force_pte_mapping() and can_set_direct_map() are useless. I'd remove
>>> them and either require BBML2_NOABORT with CCA or get the user to force
>>> rodata_full when running in realms. Or move arm64_rsi_init() even
>>> earlier?
>>
>> I'd need Suzuki to comment on this. As I said in the other mail, I was treating
>> this like a pre-existing bug. But I guess linear_map_requires_bbml2 ending up
>> wrong is a problem here. I'm not sure it's quite as simple as requiring
>> BBML2_NOABORT with CCA as we still need can_set_direct_map() to return true if
>> we are in a realm.
>
> can_set_direct_map() == true is not a property of the realm but rather a
> requirement.
Yes indeed. It would be better to call it might_set_direct_map() or something
like that...
> In the absence of BBML2_NOABORT, I guess the test was added
> under the assumption that force_pte_mapping() also returns true if
> is_realm_world(). We might as well add a variable or static label to
> track whether can_set_direct_map() is possible and avoid tests that
> duplicate force_pte_mapping().
I'm not sure I follow. We have linear_map_requires_bbml2 which is inteded to
track this shape of thing; if we have forced pte mapping then the value of
can_set_direct_map() is irrelevant - we will never need to split because we are
already pte-mapped. But if can_set_direct_map() initially returns false because
is_realm_world() incorrectly returns false in the early boot environment, then
linear_map_requires_bbml2 will be set to false, and we will incorrectly
short-circuit splitting any block mappings in split_kernel_leaf_mapping().
I think we are agreed on the problem. But I don't understand how tracking
can_set_direct_map() in a cached variable helps with that.
>
> This won't solve the is_realm_world() changing polarity during boot but
> at least we know it won't suddenly make can_set_direct_map() return
> true when it shouldn't.
But is_real_world() _should_ make can_set_direct_map() return true, shouldn't
it? If we are in realm-world, we need to be able to flip the NS_SHARED bit on
parts of the linear map. So if we are in realm-world, we _might_ need to update
he direct map and that's what can_set_direct_map() is supposed to tell us.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-07 10:13 ` Ryan Roberts
@ 2026-04-07 10:52 ` Catalin Marinas
2026-04-07 13:06 ` Ryan Roberts
0 siblings, 1 reply; 18+ messages in thread
From: Catalin Marinas @ 2026-04-07 10:52 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On Tue, Apr 07, 2026 at 11:13:07AM +0100, Ryan Roberts wrote:
> On 07/04/2026 10:32, Catalin Marinas wrote:
> > On Tue, Apr 07, 2026 at 09:43:42AM +0100, Ryan Roberts wrote:
> >> On 03/04/2026 11:31, Catalin Marinas wrote:
> >>> On Thu, Apr 02, 2026 at 09:43:59PM +0100, Catalin Marinas wrote:
> >>>> Another thing I couldn't get my head around - IIUC is_realm_world()
> >>>> won't return true for map_mem() yet (if in a realm). Can we have realms
> >>>> on hardware that does not support BBML2_NOABORT? We may not have
> >>>> configuration with rodata_full set (it should be complementary to realm
> >>>> support).
> >>>
> >>> With rodata_full==false, can_set_direct_map() returns false initially
> >>> but after arm64_rsi_init() it starts returning true if is_realm_world().
> >>> The side-effect is that map_mem() goes for block mappings and
> >>> linear_map_requires_bbml2 set to false. Later on,
> >>> linear_map_maybe_split_to_ptes() will skip the splitting.
> >>>
> >>> Unless I'm missing something, is_realm_world() calls in
> >>> force_pte_mapping() and can_set_direct_map() are useless. I'd remove
> >>> them and either require BBML2_NOABORT with CCA or get the user to force
> >>> rodata_full when running in realms. Or move arm64_rsi_init() even
> >>> earlier?
> >>
> >> I'd need Suzuki to comment on this. As I said in the other mail, I was treating
> >> this like a pre-existing bug. But I guess linear_map_requires_bbml2 ending up
> >> wrong is a problem here. I'm not sure it's quite as simple as requiring
> >> BBML2_NOABORT with CCA as we still need can_set_direct_map() to return true if
> >> we are in a realm.
> >
> > can_set_direct_map() == true is not a property of the realm but rather a
> > requirement.
>
> Yes indeed. It would be better to call it might_set_direct_map() or something
> like that...
The way it is used means "is allowed to set the direct map". I guess
"may set..." works as well. My reading of "might" is more like in
might_sleep(), more of hint than a permission check.
If you only look at the linear_map_requires_bbml2 setting in map_mem(),
yes, something like might_set_direct_map() makes sense but that's not
how this function is used in the rest of the kernel (to reject the
direct map change if not supported).
> > In the absence of BBML2_NOABORT, I guess the test was added
> > under the assumption that force_pte_mapping() also returns true if
> > is_realm_world(). We might as well add a variable or static label to
> > track whether can_set_direct_map() is possible and avoid tests that
> > duplicate force_pte_mapping().
>
> I'm not sure I follow. We have linear_map_requires_bbml2 which is inteded to
> track this shape of thing;
As the name implies, linear_map_requires_bbml2 tracks only this -
BBML2_NOABORT is required because the linear map uses large blocks.
Prior to your patches, that's only used as far as
linear_map_maybe_split_to_ptes() and if splitting took place, this
variable is no longer relevant (should be turned to false but since it's
not used, it doesn't matter).
With your patches, its use was extended to runtime and I think it
remains true even if linear_map_maybe_split_to_ptes() changed the block
mappings. Do we need this:
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index dcee56bb622a..595d35fdd8c3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -988,6 +988,7 @@ void __init linear_map_maybe_split_to_ptes(void)
if (linear_map_requires_bbml2 && !system_supports_bbml2_noabort()) {
init_idmap_kpti_bbml2_flag();
stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
+ linear_map_requires_bbml2 = false;
}
}
> if we have forced pte mapping then the value of
> can_set_direct_map() is irrelevant - we will never need to split because we are
> already pte-mapped.
can_set_direct_map() is used in other places, so its value is relevant,
e.g. sys_memfd_secret() is rejected if this function returns false.
> But if can_set_direct_map() initially returns false because
> is_realm_world() incorrectly returns false in the early boot environment, then
> linear_map_requires_bbml2 will be set to false, and we will incorrectly
> short-circuit splitting any block mappings in split_kernel_leaf_mapping().
>
> I think we are agreed on the problem. But I don't understand how tracking
> can_set_direct_map() in a cached variable helps with that.
It's not about the map_mem() decision and linear_map_requires_bbml2
setting but rather its other uses like sys_memfd_secret().
> > This won't solve the is_realm_world() changing polarity during boot but
> > at least we know it won't suddenly make can_set_direct_map() return
> > true when it shouldn't.
>
> But is_real_world() _should_ make can_set_direct_map() return true, shouldn't
> it?
Yes but not directly. If is_realm_world() is true, we either have
(linear_map_requires_bbml2 && system_supports_bbml2_noabort()) or
linear_map_requires_bbml2 is false and we have pte mappings. Adding
is_realm_world() to can_set_direct_map() does not imply any of these.
It's just a hope that something before actually ensured the conditions
are true.
It might be better if we rename the current function to
might_set_direct_map() and introduce a new can_set_direct_map() that
actually tells the truth if all the conditions are met. I suggested a
variable or static label but checking some conditions related to the
actual linear map work as well, just not is_realm_world() directly.
--
Catalin
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-07 10:52 ` Catalin Marinas
@ 2026-04-07 13:06 ` Ryan Roberts
2026-04-07 17:37 ` Catalin Marinas
0 siblings, 1 reply; 18+ messages in thread
From: Ryan Roberts @ 2026-04-07 13:06 UTC (permalink / raw)
To: Catalin Marinas
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On 07/04/2026 11:52, Catalin Marinas wrote:
> On Tue, Apr 07, 2026 at 11:13:07AM +0100, Ryan Roberts wrote:
>> On 07/04/2026 10:32, Catalin Marinas wrote:
>>> On Tue, Apr 07, 2026 at 09:43:42AM +0100, Ryan Roberts wrote:
>>>> On 03/04/2026 11:31, Catalin Marinas wrote:
>>>>> On Thu, Apr 02, 2026 at 09:43:59PM +0100, Catalin Marinas wrote:
>>>>>> Another thing I couldn't get my head around - IIUC is_realm_world()
>>>>>> won't return true for map_mem() yet (if in a realm). Can we have realms
>>>>>> on hardware that does not support BBML2_NOABORT? We may not have
>>>>>> configuration with rodata_full set (it should be complementary to realm
>>>>>> support).
>>>>>
>>>>> With rodata_full==false, can_set_direct_map() returns false initially
>>>>> but after arm64_rsi_init() it starts returning true if is_realm_world().
>>>>> The side-effect is that map_mem() goes for block mappings and
>>>>> linear_map_requires_bbml2 set to false. Later on,
>>>>> linear_map_maybe_split_to_ptes() will skip the splitting.
>>>>>
>>>>> Unless I'm missing something, is_realm_world() calls in
>>>>> force_pte_mapping() and can_set_direct_map() are useless. I'd remove
>>>>> them and either require BBML2_NOABORT with CCA or get the user to force
>>>>> rodata_full when running in realms. Or move arm64_rsi_init() even
>>>>> earlier?
>>>>
>>>> I'd need Suzuki to comment on this. As I said in the other mail, I was treating
>>>> this like a pre-existing bug. But I guess linear_map_requires_bbml2 ending up
>>>> wrong is a problem here. I'm not sure it's quite as simple as requiring
>>>> BBML2_NOABORT with CCA as we still need can_set_direct_map() to return true if
>>>> we are in a realm.
>>>
>>> can_set_direct_map() == true is not a property of the realm but rather a
>>> requirement.
>>
>> Yes indeed. It would be better to call it might_set_direct_map() or something
>> like that...
>
> The way it is used means "is allowed to set the direct map". I guess
> "may set..." works as well. My reading of "might" is more like in
> might_sleep(), more of hint than a permission check.
OK, I read it as "might" as in a hint that we might want to change the direct
map permissions.
>
> If you only look at the linear_map_requires_bbml2 setting in map_mem(),
> yes, something like might_set_direct_map() makes sense but that's not
> how this function is used in the rest of the kernel (to reject the
> direct map change if not supported).
ACK.
>
>>> In the absence of BBML2_NOABORT, I guess the test was added
>>> under the assumption that force_pte_mapping() also returns true if
>>> is_realm_world(). We might as well add a variable or static label to
>>> track whether can_set_direct_map() is possible and avoid tests that
>>> duplicate force_pte_mapping().
>>
>> I'm not sure I follow. We have linear_map_requires_bbml2 which is inteded to
>> track this shape of thing;
>
> As the name implies, linear_map_requires_bbml2 tracks only this -
> BBML2_NOABORT is required because the linear map uses large blocks.
> Prior to your patches, that's only used as far as
> linear_map_maybe_split_to_ptes() and if splitting took place, this
> variable is no longer relevant (should be turned to false but since it's
> not used, it doesn't matter).
>
> With your patches, its use was extended to runtime and I think it
> remains true even if linear_map_maybe_split_to_ptes() changed the block
> mappings. Do we need this:
I'll admit it is ugly but it's not a bug; the system capabilitites are finalized
by the time we call linear_map_maybe_split_to_ptes().
The "if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))" check
in split_kernel_leaf_mapping() would ideally be "if (!force_pte_mapping() ||
is_kfence_address((void *)start))", but it is not safe to call
force_pte_mapping() from a secondary cpu prior to finalizing the system caps.
I'm reusing the flag that I already had available to work around that.
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index dcee56bb622a..595d35fdd8c3 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -988,6 +988,7 @@ void __init linear_map_maybe_split_to_ptes(void)
> if (linear_map_requires_bbml2 && !system_supports_bbml2_noabort()) {
> init_idmap_kpti_bbml2_flag();
> stop_machine(linear_map_split_to_ptes, NULL, cpu_online_mask);
> + linear_map_requires_bbml2 = false;
> }
> }
>
>
>> if we have forced pte mapping then the value of
>> can_set_direct_map() is irrelevant - we will never need to split because we are
>> already pte-mapped.
>
> can_set_direct_map() is used in other places, so its value is relevant,
> e.g. sys_memfd_secret() is rejected if this function returns false.
>
>> But if can_set_direct_map() initially returns false because
>> is_realm_world() incorrectly returns false in the early boot environment, then
>> linear_map_requires_bbml2 will be set to false, and we will incorrectly
>> short-circuit splitting any block mappings in split_kernel_leaf_mapping().
>>
>> I think we are agreed on the problem. But I don't understand how tracking
>> can_set_direct_map() in a cached variable helps with that.
>
> It's not about the map_mem() decision and linear_map_requires_bbml2
> setting but rather its other uses like sys_memfd_secret().
>
>>> This won't solve the is_realm_world() changing polarity during boot but
>>> at least we know it won't suddenly make can_set_direct_map() return
>>> true when it shouldn't.
>>
>> But is_real_world() _should_ make can_set_direct_map() return true, shouldn't
>> it?
>
> Yes but not directly. If is_realm_world() is true, we either have
> (linear_map_requires_bbml2 && system_supports_bbml2_noabort()) or
> linear_map_requires_bbml2 is false and we have pte mappings. Adding
> is_realm_world() to can_set_direct_map() does not imply any of these.
> It's just a hope that something before actually ensured the conditions
> are true.
>
> It might be better if we rename the current function to
> might_set_direct_map() and introduce a new can_set_direct_map() that
> actually tells the truth if all the conditions are met. I suggested a
> variable or static label but checking some conditions related to the
> actual linear map work as well, just not is_realm_world() directly.
I'm not sure I see the distinction between "might" and "can" with your
definition. But regardless, I think we are talking about the pre-existing
is_real_world() bug, so I'm not personally planning to do anything further here
unless you shout.
Thanks,
Ryan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-07 9:57 ` Suzuki K Poulose
@ 2026-04-07 17:21 ` Catalin Marinas
0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2026-04-07 17:21 UTC (permalink / raw)
To: Suzuki K Poulose
Cc: Ryan Roberts, Will Deacon, David Hildenbrand (Arm), Dev Jain,
Yang Shi, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On Tue, Apr 07, 2026 at 10:57:35AM +0100, Suzuki K Poulose wrote:
> On 02/04/2026 21:43, Catalin Marinas wrote:
> > On Mon, Mar 30, 2026 at 05:17:02PM +0100, Ryan Roberts wrote:
> > > int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
> > > {
> > > int ret;
> > > - /*
> > > - * !BBML2_NOABORT systems should not be trying to change permissions on
> > > - * anything that is not pte-mapped in the first place. Just return early
> > > - * and let the permission change code raise a warning if not already
> > > - * pte-mapped.
> > > - */
> > > - if (!system_supports_bbml2_noabort())
> > > - return 0;
> > > -
> > > /*
> > > * If the region is within a pte-mapped area, there is no need to try to
> > > * split. Additionally, CONFIG_DEBUG_PAGEALLOC and CONFIG_KFENCE may
> > > * change permissions from atomic context so for those cases (which are
> > > * always pte-mapped), we must not go any further because taking the
> > > - * mutex below may sleep.
> > > + * mutex below may sleep. Do not call force_pte_mapping() here because
> > > + * it could return a confusing result if called from a secondary cpu
> > > + * prior to finalizing caps. Instead, linear_map_requires_bbml2 gives us
> > > + * what we need.
> > > */
> > > - if (force_pte_mapping() || is_kfence_address((void *)start))
> > > + if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))
> > > return 0;
> > > + if (!system_supports_bbml2_noabort()) {
> > > + /*
> > > + * !BBML2_NOABORT systems should not be trying to change
> > > + * permissions on anything that is not pte-mapped in the first
> > > + * place. Just return early and let the permission change code
> > > + * raise a warning if not already pte-mapped.
> > > + */
> > > + if (system_capabilities_finalized())
> > > + return 0;
> > > +
> > > + /*
> > > + * Boot-time: split_kernel_leaf_mapping_locked() allocates from
> > > + * page allocator. Can't split until it's available.
> > > + */
> > > + if (WARN_ON(!page_alloc_available))
> > > + return -EBUSY;
> > > +
> > > + /*
> > > + * Boot-time: Started secondary cpus but don't know if they
> > > + * support BBML2_NOABORT yet. Can't allow splitting in this
> > > + * window in case they don't.
> > > + */
> > > + if (WARN_ON(num_online_cpus() > 1))
> > > + return -EBUSY;
> > > + }
> >
> > I think sashiko is over cautions here
> > (https://sashiko.dev/#/patchset/20260330161705.3349825-1-ryan.roberts@arm.com)
> > but it has a somewhat valid point from the perspective of
> > num_online_cpus() semantics. We have have num_online_cpus() == 1 while
> > having a secondary CPU just booted and with its MMU enabled. I don't
> > think we can have any asynchronous tasks running at that point to
> > trigger a spit though. Even async_init() is called after smp_init().
> >
> > An option may be to attempt cpus_read_trylock() as this lock is taken by
> > _cpu_up(). If it fails, return -EBUSY, otherwise check num_online_cpus()
> > and unlock (and return -EBUSY if secondaries already started).
> >
> > Another thing I couldn't get my head around - IIUC is_realm_world()
> > won't return true for map_mem() yet (if in a realm).
>
> That is correct. map_mem() comes from paginig_init(), which gets called
> before arm64_rsi_init(). Realm check was delayed until psci_xx_init().
> We had a version which parsed the DT for PSCI conduit early enough
> to be able to make the SMC calls to detect the Realm. But there
> were concerns around it.
Ah, yes, I remember.
Does it mean that commit 42be24a4178f ("arm64: Enable memory encrypt for
Realms") was broken without rodata=full w.r.t. the linear map? Commit
a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
introduced force_pte_mapping() but it just copied the logic in the
existing can_set_direct_map(). Looking at the linear_map_requires_bbml2
assignment, we get (!is_realm_world() && is_realm_world()) and it
cancels out, no effect on it but we don't get pte mappings either (even
if we don't have BBML2).
I think we need at least some safety checks:
1. BBML2_NOABORT support on the boot CPU - continue with the existing
logic (as per Ryan's series)
2. !system_supports_bbml2_noabort() - split in
linear_map_maybe_split_to_ptes(). This does not currently happen
because linear_map_requires_bbml2 may be false in the absence of
rodata=full. Not sure how to fix this without some variable telling
us how the linear map was mapped. The requires_bbml2 flag doesn't
3. Panic in arm64_rsi_init() if !BBML2_NOABORT on the boot CPU _and_ we
have block mappings already. People can avoid it with rodata=full
4. If (3) is a common case, a better alternative is to rewrite the
linear map sometime after arm64_rsi_init() but before we call
split_kernel_leaf_mapping().
--
Catalin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests
2026-04-07 13:06 ` Ryan Roberts
@ 2026-04-07 17:37 ` Catalin Marinas
0 siblings, 0 replies; 18+ messages in thread
From: Catalin Marinas @ 2026-04-07 17:37 UTC (permalink / raw)
To: Ryan Roberts
Cc: Will Deacon, David Hildenbrand (Arm), Dev Jain, Yang Shi,
Suzuki K Poulose, Jinjiang Tu, Kevin Brodsky, linux-arm-kernel,
linux-kernel, stable
On Tue, Apr 07, 2026 at 02:06:10PM +0100, Ryan Roberts wrote:
> On 07/04/2026 11:52, Catalin Marinas wrote:
> > As the name implies, linear_map_requires_bbml2 tracks only this -
> > BBML2_NOABORT is required because the linear map uses large blocks.
> > Prior to your patches, that's only used as far as
> > linear_map_maybe_split_to_ptes() and if splitting took place, this
> > variable is no longer relevant (should be turned to false but since it's
> > not used, it doesn't matter).
> >
> > With your patches, its use was extended to runtime and I think it
> > remains true even if linear_map_maybe_split_to_ptes() changed the block
> > mappings. Do we need this:
>
> I'll admit it is ugly but it's not a bug; the system capabilitites are finalized
> by the time we call linear_map_maybe_split_to_ptes().
>
> The "if (!linear_map_requires_bbml2 || is_kfence_address((void *)start))" check
> in split_kernel_leaf_mapping() would ideally be "if (!force_pte_mapping() ||
> is_kfence_address((void *)start))", but it is not safe to call
> force_pte_mapping() from a secondary cpu prior to finalizing the system caps.
> I'm reusing the flag that I already had available to work around that.
The confusing part is that the flag may be false incorrectly due to the
is_realm_world() evaluation. Nothing to do with this patch though and
the subject even mentions the rodata=full case. We should fix it
separately.
We could have set the flag to zero in linear_map_maybe_split_to_ptes(),
though split_kernel_leaf_mapping() already exits early due to the
!system_supports_bbml2_noabort() && system_capabilities_finalized(), so
not a correctness issue.
> But regardless, I think we are talking about the pre-existing
> is_real_world() bug, so I'm not personally planning to do anything further here
> unless you shout.
Not for this series. Steven or Suzuki should address the other problem
with is_realm_world().
--
Catalin
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-04-07 17:37 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 16:17 [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT Ryan Roberts
2026-03-30 16:17 ` [PATCH v2 1/3] arm64: mm: Fix rodata=full block mapping support for realm guests Ryan Roberts
2026-03-31 14:35 ` Suzuki K Poulose
2026-04-02 20:43 ` Catalin Marinas
2026-04-03 10:31 ` Catalin Marinas
2026-04-07 8:43 ` Ryan Roberts
2026-04-07 9:32 ` Catalin Marinas
2026-04-07 10:13 ` Ryan Roberts
2026-04-07 10:52 ` Catalin Marinas
2026-04-07 13:06 ` Ryan Roberts
2026-04-07 17:37 ` Catalin Marinas
2026-04-07 8:33 ` Ryan Roberts
2026-04-07 9:19 ` Catalin Marinas
2026-04-07 9:57 ` Suzuki K Poulose
2026-04-07 17:21 ` Catalin Marinas
2026-03-30 16:17 ` [PATCH v2 2/3] arm64: mm: Handle invalid large leaf mappings correctly Ryan Roberts
2026-03-30 16:17 ` [PATCH v2 3/3] arm64: mm: Remove pmd_sect() and pud_sect() Ryan Roberts
2026-04-02 21:11 ` [PATCH v2 0/3] Fix bugs for realm guest plus BBML2_NOABORT Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox