* [PATCH 0/2] arm64/mm: prevent panic on -ENOMEM in arch_add_memory()
@ 2025-08-13 14:56 Chaitanya S Prakash
2025-08-13 14:56 ` [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors Chaitanya S Prakash
2025-08-13 14:56 ` [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure Chaitanya S Prakash
0 siblings, 2 replies; 9+ messages in thread
From: Chaitanya S Prakash @ 2025-08-13 14:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-mm, linux-kernel
Cc: Chaitanya S Prakash, Ryan Roberts, Yang Shi, Catalin Marinas,
Will Deacon, Kevin Brodsky, Anshuman Khandual, Andrew Morton,
Zhenhua Huang, Joey Gouly
arch_add_memory() acts as a means to hotplug memory into a system. It
invokes __create_pgd_mapping() which further unwinds to call
pgtable_alloc(). Initially, this path was only invoked during early boot
and therefore it made sense to BUG_ON() in case pgtable_alloc() failed.
Now however, we risk running into a kernel crash if we try to hotplug
memory into a system that is already extremely tight on available
memory. This is undesirable and hence __create_pgd_mapping() and it's
helpers are reworked to be able to propagate the error from
pgtable_alloc() allowing the system to fail gracefully.
Keeping in mind that it is still essential to BUG_ON() if
pgtable_alloc() encounters failure at the time of boot, a wrapper is
created around __create_pgd_mapping() which is designed to BUG_ON() if
it encounters a non-zero return value. This wrapper is then invoked from
the init functions instead of __create_pgd_mapping(), thereby keeping the
original functionality intact.
Lastly, create_kpti_ng_temp_pgd() which originally acted as an alias for
the void returning __create_pgd_mapping_locked() has now been updated
accordingly to handle the return value and BUG_ON() if needed.
This theoretical bug was identified by Ryan Roberts<ryan.roberts@arm.com>
as a part of code review of the following series[1].
[1] https://lore.kernel.org/linux-arm-kernel/20250304222018.615808-4-yang@os.amperecomputing.com/
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zhenhua Huang <quic_zhenhuah@quicinc.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Chaitanya S Prakash (2):
arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc()
errors
arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc
failure
arch/arm64/mm/mmu.c | 174 +++++++++++++++++++++++++++++++++-----------
1 file changed, 133 insertions(+), 41 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
2025-08-13 14:56 [PATCH 0/2] arm64/mm: prevent panic on -ENOMEM in arch_add_memory() Chaitanya S Prakash
@ 2025-08-13 14:56 ` Chaitanya S Prakash
2025-08-14 16:22 ` Jonathan Cameron
2025-08-15 7:30 ` Dev Jain
2025-08-13 14:56 ` [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure Chaitanya S Prakash
1 sibling, 2 replies; 9+ messages in thread
From: Chaitanya S Prakash @ 2025-08-13 14:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-mm, linux-kernel
Cc: Chaitanya S Prakash, Ryan Roberts, Yang Shi, Catalin Marinas,
Will Deacon, Kevin Brodsky, Anshuman Khandual, Andrew Morton,
Zhenhua Huang, Joey Gouly
arch_add_memory() is used to hotplug memory into a system but as a part
of its implementation it calls __create_pgd_mapping(), which uses
pgtable_alloc() in order to build intermediate page tables. As this path
was initally only used during early boot pgtable_alloc() is designed to
BUG_ON() on failure. However, in the event that memory hotplug is
attempted when the system's memory is extremely tight and the allocation
were to fail, it would lead to panicking the system, which is not
desirable. Hence update __create_pgd_mapping and all it's callers to be
non void and propagate -ENOMEM on allocation failure to allow system to
fail gracefully.
But during early boot if there is an allocation failure, we want the
system to panic, hence create a wrapper around __create_pgd_mapping()
called ___create_pgd_mapping() which is designed to BUG_ON(ret), if ret
is non zero value. All the init calls are updated to use the wrapper
rather than the modified __create_pgd_mapping() to restore
functionality.
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
---
arch/arm64/mm/mmu.c | 156 +++++++++++++++++++++++++++++++++-----------
1 file changed, 117 insertions(+), 39 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 00ab1d648db62..db7f45ef16574 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -196,12 +196,13 @@ static void init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
} while (ptep++, addr += PAGE_SIZE, addr != end);
}
-static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
+static int alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
unsigned long end, phys_addr_t phys,
pgprot_t prot,
phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags)
{
+ int ret = 0;
unsigned long next;
pmd_t pmd = READ_ONCE(*pmdp);
pte_t *ptep;
@@ -215,6 +216,10 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
pmdval |= PMD_TABLE_PXN;
BUG_ON(!pgtable_alloc);
pte_phys = pgtable_alloc(TABLE_PTE);
+ if (!pte_phys) {
+ ret = -ENOMEM;
+ goto out;
+ }
ptep = pte_set_fixmap(pte_phys);
init_clear_pgtable(ptep);
ptep += pte_index(addr);
@@ -246,12 +251,16 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
* walker.
*/
pte_clear_fixmap();
+
+out:
+ return ret;
}
-static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
+static int init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(enum pgtable_type), int flags)
{
+ int ret = 0;
unsigned long next;
do {
@@ -271,22 +280,27 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
READ_ONCE(pmd_val(*pmdp))));
} else {
- alloc_init_cont_pte(pmdp, addr, next, phys, prot,
- pgtable_alloc, flags);
+ ret = alloc_init_cont_pte(pmdp, addr, next, phys, prot,
+ pgtable_alloc, flags);
+ if (ret)
+ break;
BUG_ON(pmd_val(old_pmd) != 0 &&
pmd_val(old_pmd) != READ_ONCE(pmd_val(*pmdp)));
}
phys += next - addr;
} while (pmdp++, addr = next, addr != end);
+
+ return ret;
}
-static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
+static int alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
unsigned long end, phys_addr_t phys,
pgprot_t prot,
phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags)
{
+ int ret = 0;
unsigned long next;
pud_t pud = READ_ONCE(*pudp);
pmd_t *pmdp;
@@ -303,6 +317,10 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
pudval |= PUD_TABLE_PXN;
BUG_ON(!pgtable_alloc);
pmd_phys = pgtable_alloc(TABLE_PMD);
+ if (!pmd_phys) {
+ ret = -ENOMEM;
+ goto out;
+ }
pmdp = pmd_set_fixmap(pmd_phys);
init_clear_pgtable(pmdp);
pmdp += pmd_index(addr);
@@ -322,20 +340,26 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr,
(flags & NO_CONT_MAPPINGS) == 0)
__prot = __pgprot(pgprot_val(prot) | PTE_CONT);
- init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc, flags);
+ ret = init_pmd(pmdp, addr, next, phys, __prot, pgtable_alloc, flags);
+ if (ret)
+ break;
pmdp += pmd_index(next) - pmd_index(addr);
phys += next - addr;
} while (addr = next, addr != end);
pmd_clear_fixmap();
+
+out:
+ return ret;
}
-static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
+static int alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags)
{
+ int ret = 0;
unsigned long next;
p4d_t p4d = READ_ONCE(*p4dp);
pud_t *pudp;
@@ -348,6 +372,10 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
p4dval |= P4D_TABLE_PXN;
BUG_ON(!pgtable_alloc);
pud_phys = pgtable_alloc(TABLE_PUD);
+ if (!pud_phys) {
+ ret = -ENOMEM;
+ goto out;
+ }
pudp = pud_set_fixmap(pud_phys);
init_clear_pgtable(pudp);
pudp += pud_index(addr);
@@ -377,8 +405,10 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
BUG_ON(!pgattr_change_is_safe(pud_val(old_pud),
READ_ONCE(pud_val(*pudp))));
} else {
- alloc_init_cont_pmd(pudp, addr, next, phys, prot,
- pgtable_alloc, flags);
+ ret = alloc_init_cont_pmd(pudp, addr, next, phys, prot,
+ pgtable_alloc, flags);
+ if (ret)
+ break;
BUG_ON(pud_val(old_pud) != 0 &&
pud_val(old_pud) != READ_ONCE(pud_val(*pudp)));
@@ -387,13 +417,17 @@ static void alloc_init_pud(p4d_t *p4dp, unsigned long addr, unsigned long end,
} while (pudp++, addr = next, addr != end);
pud_clear_fixmap();
+
+out:
+ return ret;
}
-static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
+static int alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
phys_addr_t phys, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags)
{
+ int ret = 0;
unsigned long next;
pgd_t pgd = READ_ONCE(*pgdp);
p4d_t *p4dp;
@@ -406,6 +440,10 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
pgdval |= PGD_TABLE_PXN;
BUG_ON(!pgtable_alloc);
p4d_phys = pgtable_alloc(TABLE_P4D);
+ if (!p4d_phys) {
+ ret = -ENOMEM;
+ goto out;
+ }
p4dp = p4d_set_fixmap(p4d_phys);
init_clear_pgtable(p4dp);
p4dp += p4d_index(addr);
@@ -420,8 +458,10 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
next = p4d_addr_end(addr, end);
- alloc_init_pud(p4dp, addr, next, phys, prot,
- pgtable_alloc, flags);
+ ret = alloc_init_pud(p4dp, addr, next, phys, prot,
+ pgtable_alloc, flags);
+ if (ret)
+ break;
BUG_ON(p4d_val(old_p4d) != 0 &&
p4d_val(old_p4d) != READ_ONCE(p4d_val(*p4dp)));
@@ -430,14 +470,18 @@ static void alloc_init_p4d(pgd_t *pgdp, unsigned long addr, unsigned long end,
} while (p4dp++, addr = next, addr != end);
p4d_clear_fixmap();
+
+out:
+ return ret;
}
-static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
+static int __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot,
phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags)
{
+ int ret = 0;
unsigned long addr, end, next;
pgd_t *pgdp = pgd_offset_pgd(pgdir, virt);
@@ -445,8 +489,10 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
* If the virtual and physical address don't have the same offset
* within a page, we cannot map the region as the caller expects.
*/
- if (WARN_ON((phys ^ virt) & ~PAGE_MASK))
- return;
+ if (WARN_ON((phys ^ virt) & ~PAGE_MASK)) {
+ ret = -EINVAL;
+ goto out;
+ }
phys &= PAGE_MASK;
addr = virt & PAGE_MASK;
@@ -454,22 +500,44 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
do {
next = pgd_addr_end(addr, end);
- alloc_init_p4d(pgdp, addr, next, phys, prot, pgtable_alloc,
- flags);
+ ret = alloc_init_p4d(pgdp, addr, next, phys, prot, pgtable_alloc,
+ flags);
+ if (ret)
+ break;
phys += next - addr;
} while (pgdp++, addr = next, addr != end);
+
+out:
+ return ret;
}
-static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
+static int __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
unsigned long virt, phys_addr_t size,
pgprot_t prot,
phys_addr_t (*pgtable_alloc)(enum pgtable_type),
int flags)
{
+ int ret = 0;
+
mutex_lock(&fixmap_lock);
- __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
- pgtable_alloc, flags);
+ ret = __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
+ pgtable_alloc, flags);
mutex_unlock(&fixmap_lock);
+
+ return ret;
+}
+
+static void ___create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
+ unsigned long virt, phys_addr_t size,
+ pgprot_t prot,
+ phys_addr_t (*pgtable_alloc)(enum pgtable_type),
+ int flags)
+{
+ int ret = 0;
+
+ ret = __create_pgd_mapping(pgdir, phys, virt, size, prot, pgtable_alloc,
+ flags);
+ BUG_ON(ret);
}
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
@@ -485,9 +553,11 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
{
/* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_ZERO, 0);
- phys_addr_t pa;
+ phys_addr_t pa = 0;
+
+ if (!ptdesc)
+ goto out;
- BUG_ON(!ptdesc);
pa = page_to_phys(ptdesc_page(ptdesc));
switch (pgtable_type) {
@@ -505,6 +575,7 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
break;
}
+out:
return pa;
}
@@ -533,8 +604,8 @@ void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
&phys, virt);
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
- NO_CONT_MAPPINGS);
+ ___create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
+ NO_CONT_MAPPINGS);
}
void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
@@ -548,8 +619,8 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
if (page_mappings_only)
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
- __create_pgd_mapping(mm->pgd, phys, virt, size, prot,
- pgd_pgtable_alloc_special_mm, flags);
+ ___create_pgd_mapping(mm->pgd, phys, virt, size, prot,
+ pgd_pgtable_alloc_special_mm, flags);
}
static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
@@ -561,7 +632,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
return;
}
- __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
+ ___create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
NO_CONT_MAPPINGS);
/* flush the TLBs after updating live kernel mappings */
@@ -571,8 +642,8 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
static void __init __map_memblock(pgd_t *pgdp, phys_addr_t start,
phys_addr_t end, pgprot_t prot, int flags)
{
- __create_pgd_mapping(pgdp, start, __phys_to_virt(start), end - start,
- prot, early_pgtable_alloc, flags);
+ ___create_pgd_mapping(pgdp, start, __phys_to_virt(start), end - start,
+ prot, early_pgtable_alloc, flags);
}
void __init mark_linear_text_alias_ro(void)
@@ -761,9 +832,9 @@ static int __init map_entry_trampoline(void)
/* Map only the text into the trampoline page table */
memset(tramp_pg_dir, 0, PGD_SIZE);
- __create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS,
- entry_tramp_text_size(), prot,
- pgd_pgtable_alloc_init_mm, NO_BLOCK_MAPPINGS);
+ ___create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS,
+ entry_tramp_text_size(), prot,
+ pgd_pgtable_alloc_init_mm, NO_BLOCK_MAPPINGS);
/* Map both the text and data into the kernel page table */
for (i = 0; i < DIV_ROUND_UP(entry_tramp_text_size(), PAGE_SIZE); i++)
@@ -1369,23 +1440,30 @@ int arch_add_memory(int nid, u64 start, u64 size,
if (can_set_direct_map())
flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
- __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
- size, params->pgprot, pgd_pgtable_alloc_init_mm,
- flags);
+ ret = __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
+ size, params->pgprot, pgd_pgtable_alloc_init_mm,
+ flags);
+
+ if (ret)
+ goto out;
memblock_clear_nomap(start, size);
ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
params);
- if (ret)
- __remove_pgd_mapping(swapper_pg_dir,
- __phys_to_virt(start), size);
- else {
+ if (ret) {
+ goto out;
+ } else {
/* Address of hotplugged memory can be smaller */
max_pfn = max(max_pfn, PFN_UP(start + size));
max_low_pfn = max_pfn;
}
+ return 0;
+
+out:
+ __remove_pgd_mapping(swapper_pg_dir,
+ __phys_to_virt(start), size);
return ret;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure
2025-08-13 14:56 [PATCH 0/2] arm64/mm: prevent panic on -ENOMEM in arch_add_memory() Chaitanya S Prakash
2025-08-13 14:56 ` [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors Chaitanya S Prakash
@ 2025-08-13 14:56 ` Chaitanya S Prakash
2025-08-15 12:00 ` Kevin Brodsky
2025-08-19 7:41 ` David Hildenbrand
1 sibling, 2 replies; 9+ messages in thread
From: Chaitanya S Prakash @ 2025-08-13 14:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-mm, linux-kernel
Cc: Chaitanya S Prakash, Ryan Roberts, Yang Shi, Catalin Marinas,
Will Deacon, Kevin Brodsky, Anshuman Khandual, Andrew Morton,
Zhenhua Huang, Joey Gouly
create_kpti_ng_temp_pgd() was created as an alias for void returning
__create_pgd_mapping_locked() and relied on pgtable_alloc() to BUG_ON()
if an allocation failure occurred. But as __create_pgd_mapping_locked()
has been updated as a part of the error propagation patch to return a
non-void value, update create_kpti_ng_temp_pgd() to act as a wrapper
around __create_pgd_mapping_locked() and BUG_ON() on ret being a non
zero value.
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
---
arch/arm64/mm/mmu.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index db7f45ef16574..19cbabceb38bd 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -76,6 +76,14 @@ EXPORT_SYMBOL(empty_zero_page);
static DEFINE_SPINLOCK(swapper_pgdir_lock);
static DEFINE_MUTEX(fixmap_lock);
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys,
+ unsigned long virt, phys_addr_t size,
+ pgprot_t prot,
+ phys_addr_t (*pgtable_alloc)(enum pgtable_type),
+ int flags);
+#endif
+
void noinstr set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
{
pgd_t *fixmap_pgdp;
@@ -541,11 +549,17 @@ static void ___create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
}
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
-extern __alias(__create_pgd_mapping_locked)
void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
phys_addr_t size, pgprot_t prot,
phys_addr_t (*pgtable_alloc)(enum pgtable_type),
- int flags);
+ int flags)
+{
+ int ret = 0;
+
+ ret = __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
+ pgtable_alloc, flags);
+ BUG_ON(ret);
+}
#endif
static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
2025-08-13 14:56 ` [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors Chaitanya S Prakash
@ 2025-08-14 16:22 ` Jonathan Cameron
2025-08-15 7:30 ` Dev Jain
1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Cameron @ 2025-08-14 16:22 UTC (permalink / raw)
To: Chaitanya S Prakash
Cc: linux-arm-kernel, linux-mm, linux-kernel, Ryan Roberts, Yang Shi,
Catalin Marinas, Will Deacon, Kevin Brodsky, Anshuman Khandual,
Andrew Morton, Zhenhua Huang, Joey Gouly
On Wed, 13 Aug 2025 20:26:06 +0530
Chaitanya S Prakash <chaitanyas.prakash@arm.com> wrote:
> arch_add_memory() is used to hotplug memory into a system but as a part
> of its implementation it calls __create_pgd_mapping(), which uses
> pgtable_alloc() in order to build intermediate page tables. As this path
> was initally only used during early boot pgtable_alloc() is designed to
> BUG_ON() on failure. However, in the event that memory hotplug is
> attempted when the system's memory is extremely tight and the allocation
> were to fail, it would lead to panicking the system, which is not
> desirable. Hence update __create_pgd_mapping and all it's callers to be
> non void and propagate -ENOMEM on allocation failure to allow system to
> fail gracefully.
>
> But during early boot if there is an allocation failure, we want the
> system to panic, hence create a wrapper around __create_pgd_mapping()
> called ___create_pgd_mapping() which is designed to BUG_ON(ret), if ret
> is non zero value. All the init calls are updated to use the wrapper
> rather than the modified __create_pgd_mapping() to restore
> functionality.
>
> Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
> ---
> arch/arm64/mm/mmu.c | 156 +++++++++++++++++++++++++++++++++-----------
> 1 file changed, 117 insertions(+), 39 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 00ab1d648db62..db7f45ef16574 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -196,12 +196,13 @@ static void init_pte(pte_t *ptep, unsigned long addr, unsigned long end,
> } while (ptep++, addr += PAGE_SIZE, addr != end);
> }
>
> -static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> +static int alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> unsigned long end, phys_addr_t phys,
> pgprot_t prot,
> phys_addr_t (*pgtable_alloc)(enum pgtable_type),
> int flags)
> {
> + int ret = 0;
> unsigned long next;
> pmd_t pmd = READ_ONCE(*pmdp);
> pte_t *ptep;
> @@ -215,6 +216,10 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> pmdval |= PMD_TABLE_PXN;
> BUG_ON(!pgtable_alloc);
> pte_phys = pgtable_alloc(TABLE_PTE);
> + if (!pte_phys) {
> + ret = -ENOMEM;
return -ENOMEM;
When I saw this I wondered if local style was to always exit at the end of
functions, but it isn't so early returns should be fine and simplify this
a fair bit.
> + goto out;
> + }
> ptep = pte_set_fixmap(pte_phys);
> init_clear_pgtable(ptep);
> ptep += pte_index(addr);
> @@ -246,12 +251,16 @@ static void alloc_init_cont_pte(pmd_t *pmdp, unsigned long addr,
> * walker.
> */
> pte_clear_fixmap();
> +
> +out:
> + return ret;
return 0;
> }
>
> -static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
> +static int init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
> phys_addr_t phys, pgprot_t prot,
> phys_addr_t (*pgtable_alloc)(enum pgtable_type), int flags)
> {
> + int ret = 0;
> unsigned long next;
>
> do {
> @@ -271,22 +280,27 @@ static void init_pmd(pmd_t *pmdp, unsigned long addr, unsigned long end,
> BUG_ON(!pgattr_change_is_safe(pmd_val(old_pmd),
> READ_ONCE(pmd_val(*pmdp))));
> } else {
> - alloc_init_cont_pte(pmdp, addr, next, phys, prot,
> - pgtable_alloc, flags);
> + ret = alloc_init_cont_pte(pmdp, addr, next, phys, prot,
> + pgtable_alloc, flags);
> + if (ret)
> + break;
return ret;
>
> BUG_ON(pmd_val(old_pmd) != 0 &&
> pmd_val(old_pmd) != READ_ONCE(pmd_val(*pmdp)));
> }
> phys += next - addr;
> } while (pmdp++, addr = next, addr != end);
> +
> + return ret;
return 0;
Same in the other cases where there is nothing else to do and they are simply
early error returns.
> }
>
> -static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> +static int __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> unsigned long virt, phys_addr_t size,
> pgprot_t prot,
> phys_addr_t (*pgtable_alloc)(enum pgtable_type),
> int flags)
> {
> + int ret = 0;
> +
> mutex_lock(&fixmap_lock);
> - __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
> - pgtable_alloc, flags);
> + ret = __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
> + pgtable_alloc, flags);
> mutex_unlock(&fixmap_lock);
Could use guard(mutex)(&fixmap_lock);
but maybe not worth introducing that stuff just to simplify this.
> +
> + return ret;
> +}
> +
> +static void ___create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> + unsigned long virt, phys_addr_t size,
> + pgprot_t prot,
> + phys_addr_t (*pgtable_alloc)(enum pgtable_type),
> + int flags)
> +{
> + int ret = 0;
> +
> + ret = __create_pgd_mapping(pgdir, phys, virt, size, prot, pgtable_alloc,
> + flags);
> + BUG_ON(ret);
> }
>
> #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> @@ -485,9 +553,11 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
> {
> /* Page is zeroed by init_clear_pgtable() so don't duplicate effort. */
> struct ptdesc *ptdesc = pagetable_alloc(GFP_PGTABLE_KERNEL & ~__GFP_ZERO, 0);
> - phys_addr_t pa;
> + phys_addr_t pa = 0;
> +
> + if (!ptdesc)
> + goto out;
I'd return 0 here.
>
> - BUG_ON(!ptdesc);
> pa = page_to_phys(ptdesc_page(ptdesc));
>
> switch (pgtable_type) {
> @@ -505,6 +575,7 @@ static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
> break;
> }
>
> +out:
> return pa;
> }
>
> @@ -533,8 +604,8 @@ void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
> &phys, virt);
> return;
> }
> - __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
> - NO_CONT_MAPPINGS);
> + ___create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
> + NO_CONT_MAPPINGS);
> }
>
> @@ -1369,23 +1440,30 @@ int arch_add_memory(int nid, u64 start, u64 size,
> if (can_set_direct_map())
> flags |= NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>
> - __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> - size, params->pgprot, pgd_pgtable_alloc_init_mm,
> - flags);
> + ret = __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
> + size, params->pgprot, pgd_pgtable_alloc_init_mm,
> + flags);
> +
> + if (ret)
> + goto out;
>
> memblock_clear_nomap(start, size);
>
> ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
> params);
> - if (ret)
> - __remove_pgd_mapping(swapper_pg_dir,
> - __phys_to_virt(start), size);
> - else {
> + if (ret) {
> + goto out;
> + } else {
> /* Address of hotplugged memory can be smaller */
> max_pfn = max(max_pfn, PFN_UP(start + size));
> max_low_pfn = max_pfn;
> }
>
> + return 0;
> +
> +out:
> + __remove_pgd_mapping(swapper_pg_dir,
> + __phys_to_virt(start), size);
This one is fine as common cleanup to do.
Jonathan
> return ret;
> }
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
2025-08-13 14:56 ` [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors Chaitanya S Prakash
2025-08-14 16:22 ` Jonathan Cameron
@ 2025-08-15 7:30 ` Dev Jain
2025-08-15 12:12 ` Kevin Brodsky
1 sibling, 1 reply; 9+ messages in thread
From: Dev Jain @ 2025-08-15 7:30 UTC (permalink / raw)
To: Chaitanya S Prakash, linux-arm-kernel, linux-mm, linux-kernel
Cc: Ryan Roberts, Yang Shi, Catalin Marinas, Will Deacon,
Kevin Brodsky, Anshuman Khandual, Andrew Morton, Zhenhua Huang,
Joey Gouly
On 13/08/25 8:26 pm, Chaitanya S Prakash wrote:
> [-------snip-------------]
>
> -static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> +static int __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> unsigned long virt, phys_addr_t size,
> pgprot_t prot,
> phys_addr_t (*pgtable_alloc)(enum pgtable_type),
> int flags)
> {
> + int ret = 0;
> +
> mutex_lock(&fixmap_lock);
> - __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
> - pgtable_alloc, flags);
> + ret = __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
> + pgtable_alloc, flags);
> mutex_unlock(&fixmap_lock);
> +
> + return ret;
> +}
> +
> +static void ___create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> + unsigned long virt, phys_addr_t size,
> + pgprot_t prot,
> + phys_addr_t (*pgtable_alloc)(enum pgtable_type),
> + int flags)
> +{
> + int ret = 0;
> +
> + ret = __create_pgd_mapping(pgdir, phys, virt, size, prot, pgtable_alloc,
> + flags);
> + BUG_ON(ret);
> }
A triple underscore calling a double underscore isn't natural to reason about.
Since this is the function which must succeed, how does "must_create_pgd_mapping()"
sound?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure
2025-08-13 14:56 ` [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure Chaitanya S Prakash
@ 2025-08-15 12:00 ` Kevin Brodsky
2025-08-19 7:41 ` David Hildenbrand
1 sibling, 0 replies; 9+ messages in thread
From: Kevin Brodsky @ 2025-08-15 12:00 UTC (permalink / raw)
To: Chaitanya S Prakash, linux-arm-kernel, linux-mm, linux-kernel
Cc: Ryan Roberts, Yang Shi, Catalin Marinas, Will Deacon,
Anshuman Khandual, Andrew Morton, Zhenhua Huang, Joey Gouly
On 13/08/2025 16:56, Chaitanya S Prakash wrote:
> create_kpti_ng_temp_pgd() was created as an alias for void returning
> __create_pgd_mapping_locked() and relied on pgtable_alloc() to BUG_ON()
> if an allocation failure occurred. But as __create_pgd_mapping_locked()
> has been updated as a part of the error propagation patch to return a
> non-void value, update create_kpti_ng_temp_pgd() to act as a wrapper
> around __create_pgd_mapping_locked() and BUG_ON() on ret being a non
> zero value.
>
> Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
> ---
> arch/arm64/mm/mmu.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index db7f45ef16574..19cbabceb38bd 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -76,6 +76,14 @@ EXPORT_SYMBOL(empty_zero_page);
> static DEFINE_SPINLOCK(swapper_pgdir_lock);
> static DEFINE_MUTEX(fixmap_lock);
>
> +#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> +void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys,
> + unsigned long virt, phys_addr_t size,
> + pgprot_t prot,
> + phys_addr_t (*pgtable_alloc)(enum pgtable_type),
> + int flags);
I'm not sure I understand why we'd now need this declaration?
That function should really be declared in some header instead of the
strange declaration in cpufeature.c, but that's unrelated to this patch.
- Kevin
> +#endif
> +
> void noinstr set_swapper_pgd(pgd_t *pgdp, pgd_t pgd)
> {
> pgd_t *fixmap_pgdp;
> @@ -541,11 +549,17 @@ static void ___create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
> }
>
> #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
> -extern __alias(__create_pgd_mapping_locked)
> void create_kpti_ng_temp_pgd(pgd_t *pgdir, phys_addr_t phys, unsigned long virt,
> phys_addr_t size, pgprot_t prot,
> phys_addr_t (*pgtable_alloc)(enum pgtable_type),
> - int flags);
> + int flags)
> +{
> + int ret = 0;
> +
> + ret = __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
> + pgtable_alloc, flags);
> + BUG_ON(ret);
> +}
> #endif
>
> static phys_addr_t __pgd_pgtable_alloc(struct mm_struct *mm,
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors
2025-08-15 7:30 ` Dev Jain
@ 2025-08-15 12:12 ` Kevin Brodsky
0 siblings, 0 replies; 9+ messages in thread
From: Kevin Brodsky @ 2025-08-15 12:12 UTC (permalink / raw)
To: Dev Jain, Chaitanya S Prakash, linux-arm-kernel, linux-mm,
linux-kernel
Cc: Ryan Roberts, Yang Shi, Catalin Marinas, Will Deacon,
Anshuman Khandual, Andrew Morton, Zhenhua Huang
On 15/08/2025 09:30, Dev Jain wrote:
>
> On 13/08/25 8:26 pm, Chaitanya S Prakash wrote:
>> [-------snip-------------]
>> -static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>> +static int __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>> unsigned long virt, phys_addr_t size,
>> pgprot_t prot,
>> phys_addr_t (*pgtable_alloc)(enum pgtable_type),
>> int flags)
>> {
>> + int ret = 0;
>> +
>> mutex_lock(&fixmap_lock);
>> - __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
>> - pgtable_alloc, flags);
>> + ret = __create_pgd_mapping_locked(pgdir, phys, virt, size, prot,
>> + pgtable_alloc, flags);
>> mutex_unlock(&fixmap_lock);
>> +
>> + return ret;
>> +}
>> +
>> +static void ___create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>> + unsigned long virt, phys_addr_t size,
>> + pgprot_t prot,
>> + phys_addr_t (*pgtable_alloc)(enum pgtable_type),
>> + int flags)
>> +{
>> + int ret = 0;
>> +
>> + ret = __create_pgd_mapping(pgdir, phys, virt, size, prot,
>> pgtable_alloc,
>> + flags);
>> + BUG_ON(ret);
>> }
>
> A triple underscore calling a double underscore isn't natural to
> reason about.
Also not the most readable (easy to confuse the two).
> Since this is the function which must succeed, how does
> "must_create_pgd_mapping()"
> sound?
"must" isn't a prefix that is commonly used in that sense, not sure this
is very clear.
Another idea that comes to mind is early_create_pgd_mapping() - the
BUG_ON() being justified by the fact that early errors are not recoverable.
On a related note, it would be possible to return an error from
create_pgd_mapping() and create_mapping_noalloc() as their callers
already have error paths. That would be a bit cleaner but I don't know
if it's worth the hassle.
- Kevin
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure
2025-08-13 14:56 ` [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure Chaitanya S Prakash
2025-08-15 12:00 ` Kevin Brodsky
@ 2025-08-19 7:41 ` David Hildenbrand
2025-08-19 8:44 ` Giorgi Tchankvetadze
1 sibling, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-08-19 7:41 UTC (permalink / raw)
To: Chaitanya S Prakash, linux-arm-kernel, linux-mm, linux-kernel
Cc: Ryan Roberts, Yang Shi, Catalin Marinas, Will Deacon,
Kevin Brodsky, Anshuman Khandual, Andrew Morton, Zhenhua Huang,
Joey Gouly
On 13.08.25 16:56, Chaitanya S Prakash wrote:
> create_kpti_ng_temp_pgd() was created as an alias for void returning
> __create_pgd_mapping_locked() and relied on pgtable_alloc() to BUG_ON()
> if an allocation failure occurred. But as __create_pgd_mapping_locked()
> has been updated as a part of the error propagation patch to return a
> non-void value, update create_kpti_ng_temp_pgd() to act as a wrapper
> around __create_pgd_mapping_locked() and BUG_ON() on ret being a non
> zero value.
If my memory serves me right, panic() is preferred in such unexpected
early-boot scenarios (BUG_ON is frowned upon), where you can actually
print what is going wrong.
Which raises the question: could create_kpti_ng_temp_pgd() be __init?
__kpti_install_ng_mappings(), the only caller, seems to be.
--
Cheers
David / dhildenb
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure
2025-08-19 7:41 ` David Hildenbrand
@ 2025-08-19 8:44 ` Giorgi Tchankvetadze
0 siblings, 0 replies; 9+ messages in thread
From: Giorgi Tchankvetadze @ 2025-08-19 8:44 UTC (permalink / raw)
To: David Hildenbrand, Chaitanya S Prakash, linux-arm-kernel,
linux-mm, linux-kernel
Cc: Ryan Roberts, Yang Shi, Catalin Marinas, Will Deacon,
Kevin Brodsky, Anshuman Khandual, Andrew Morton, Zhenhua Huang,
Joey Gouly
Smart! BUG_ON() ends up in the generic “kernel bug” path, which on many
distros is configured to continue after printing the back-trace (e.g.
panic_on_oops=0).
Since a memory-allocation failure in early boot is unrecoverable , we
must force a halt.
On 8/19/2025 11:41 AM, David Hildenbrand wrote:
> On 13.08.25 16:56, Chaitanya S Prakash wrote:
>> create_kpti_ng_temp_pgd() was created as an alias for void returning
>> __create_pgd_mapping_locked() and relied on pgtable_alloc() to BUG_ON()
>> if an allocation failure occurred. But as __create_pgd_mapping_locked()
>> has been updated as a part of the error propagation patch to return a
>> non-void value, update create_kpti_ng_temp_pgd() to act as a wrapper
>> around __create_pgd_mapping_locked() and BUG_ON() on ret being a non
>> zero value.
>
> If my memory serves me right, panic() is preferred in such unexpected
> early-boot scenarios (BUG_ON is frowned upon), where you can actually
> print what is going wrong.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-08-19 8:44 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-13 14:56 [PATCH 0/2] arm64/mm: prevent panic on -ENOMEM in arch_add_memory() Chaitanya S Prakash
2025-08-13 14:56 ` [PATCH 1/2] arm64/mm: Allow __create_pgd_mapping() to propagate pgtable_alloc() errors Chaitanya S Prakash
2025-08-14 16:22 ` Jonathan Cameron
2025-08-15 7:30 ` Dev Jain
2025-08-15 12:12 ` Kevin Brodsky
2025-08-13 14:56 ` [PATCH 2/2] arm64/mm: Update create_kpti_ng_temp_pgd() to handle pgtable_alloc failure Chaitanya S Prakash
2025-08-15 12:00 ` Kevin Brodsky
2025-08-19 7:41 ` David Hildenbrand
2025-08-19 8:44 ` Giorgi Tchankvetadze
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).