* [PATCH v2 01/11] powerpc/64s/slb: Fix SLB multihit issue during SLB preload
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 02/11] powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit Ritesh Harjani (IBM)
` (10 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, stable, Ritesh Harjani (IBM)
From: Donet Tom <donettom@linux.ibm.com>
On systems using the hash MMU, there is a software SLB preload cache that
mirrors the entries loaded into the hardware SLB buffer. This preload
cache is subject to periodic eviction — typically after every 256 context
switches — to remove old entry.
To optimize performance, the kernel skips switch_mmu_context() in
switch_mm_irqs_off() when the prev and next mm_struct are the same.
However, on hash MMU systems, this can lead to inconsistencies between
the hardware SLB and the software preload cache.
If an SLB entry for a process is evicted from the software cache on one
CPU, and the same process later runs on another CPU without executing
switch_mmu_context(), the hardware SLB may retain stale entries. If the
kernel then attempts to reload that entry, it can trigger an SLB
multi-hit error.
The following timeline shows how stale SLB entries are created and can
cause a multi-hit error when a process moves between CPUs without a
MMU context switch.
CPU 0 CPU 1
----- -----
Process P
exec swapper/1
load_elf_binary
begin_new_exc
activate_mm
switch_mm_irqs_off
switch_mmu_context
switch_slb
/*
* This invalidates all
* the entries in the HW
* and setup the new HW
* SLB entries as per the
* preload cache.
*/
context_switch
sched_migrate_task migrates process P to cpu-1
Process swapper/0 context switch (to process P)
(uses mm_struct of Process P) switch_mm_irqs_off()
switch_slb
load_slb++
/*
* load_slb becomes 0 here
* and we evict an entry from
* the preload cache with
* preload_age(). We still
* keep HW SLB and preload
* cache in sync, that is
* because all HW SLB entries
* anyways gets evicted in
* switch_slb during SLBIA.
* We then only add those
* entries back in HW SLB,
* which are currently
* present in preload_cache
* (after eviction).
*/
load_elf_binary continues...
setup_new_exec()
slb_setup_new_exec()
sched_switch event
sched_migrate_task migrates
process P to cpu-0
context_switch from swapper/0 to Process P
switch_mm_irqs_off()
/*
* Since both prev and next mm struct are same we don't call
* switch_mmu_context(). This will cause the HW SLB and SW preload
* cache to go out of sync in preload_new_slb_context. Because there
* was an SLB entry which was evicted from both HW and preload cache
* on cpu-1. Now later in preload_new_slb_context(), when we will try
* to add the same preload entry again, we will add this to the SW
* preload cache and then will add it to the HW SLB. Since on cpu-0
* this entry was never invalidated, hence adding this entry to the HW
* SLB will cause a SLB multi-hit error.
*/
load_elf_binary continues...
START_THREAD
start_thread
preload_new_slb_context
/*
* This tries to add a new EA to preload cache which was earlier
* evicted from both cpu-1 HW SLB and preload cache. This caused the
* HW SLB of cpu-0 to go out of sync with the SW preload cache. The
* reason for this was, that when we context switched back on CPU-0,
* we should have ideally called switch_mmu_context() which will
* bring the HW SLB entries on CPU-0 in sync with SW preload cache
* entries by setting up the mmu context properly. But we didn't do
* that since the prev mm_struct running on cpu-0 was same as the
* next mm_struct (which is true for swapper / kernel threads). So
* now when we try to add this new entry into the HW SLB of cpu-0,
* we hit a SLB multi-hit error.
*/
WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
Modules linked in:
CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
VOLUNTARY
Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty)
MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000
CFAR: c0000000001543b0 IRQMASK: 3
<...>
NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
LR [c0000000001543b4] slb_insert_entry+0x124/0x390
Call Trace:
0x7fffceb5ffff (unreliable)
preload_new_slb_context+0x100/0x1a0
start_thread+0x26c/0x420
load_elf_binary+0x1b04/0x1c40
bprm_execve+0x358/0x680
do_execveat_common+0x1f8/0x240
sys_execve+0x58/0x70
system_call_exception+0x114/0x300
system_call_common+0x160/0x2c4
>From the above analysis, during early exec the hardware SLB is cleared,
and entries from the software preload cache are reloaded into hardware
by switch_slb. However, preload_new_slb_context and slb_setup_new_exec
also attempt to load some of the same entries, which can trigger a
multi-hit. In most cases, these additional preloads simply hit existing
entries and add nothing new. Removing these functions avoids redundant
preloads and eliminates the multi-hit issue. This patch removes these
two functions.
We tested process switching performance using the context_switch
benchmark on POWER9/hash, and observed no regression.
Without this patch: 129041 ops/sec
With this patch: 129341 ops/sec
We also measured SLB faults during boot, and the counts are essentially
the same with and without this patch.
SLB faults without this patch: 19727
SLB faults with this patch: 19786
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
cc: <stable@vger.kernel.org>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 -
arch/powerpc/kernel/process.c | 5 --
arch/powerpc/mm/book3s64/internal.h | 2 -
arch/powerpc/mm/book3s64/mmu_context.c | 2 -
arch/powerpc/mm/book3s64/slb.c | 88 -------------------
5 files changed, 98 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 346351423207..af12e2ba8eb8 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -524,7 +524,6 @@ void slb_save_contents(struct slb_entry *slb_ptr);
void slb_dump_contents(struct slb_entry *slb_ptr);
extern void slb_vmalloc_update(void);
-void preload_new_slb_context(unsigned long start, unsigned long sp);
#ifdef CONFIG_PPC_64S_HASH_MMU
void slb_set_size(u16 size);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index eb23966ac0a9..a45fe147868b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1897,8 +1897,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
return 0;
}
-void preload_new_slb_context(unsigned long start, unsigned long sp);
-
/*
* Set up a thread for executing a new program
*/
@@ -1906,9 +1904,6 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
{
#ifdef CONFIG_PPC64
unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */
-
- if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled())
- preload_new_slb_context(start, sp);
#endif
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h
index a57a25f06a21..c26a6f0c90fc 100644
--- a/arch/powerpc/mm/book3s64/internal.h
+++ b/arch/powerpc/mm/book3s64/internal.h
@@ -24,8 +24,6 @@ static inline bool stress_hpt(void)
void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
-void slb_setup_new_exec(void);
-
void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
#endif /* ARCH_POWERPC_MM_BOOK3S64_INTERNAL_H */
diff --git a/arch/powerpc/mm/book3s64/mmu_context.c b/arch/powerpc/mm/book3s64/mmu_context.c
index 4e1e45420bd4..fb9dcf9ca599 100644
--- a/arch/powerpc/mm/book3s64/mmu_context.c
+++ b/arch/powerpc/mm/book3s64/mmu_context.c
@@ -150,8 +150,6 @@ static int hash__init_new_context(struct mm_struct *mm)
void hash__setup_new_exec(void)
{
slice_setup_new_exec();
-
- slb_setup_new_exec();
}
#else
static inline int hash__init_new_context(struct mm_struct *mm)
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 6b783552403c..7e053c561a09 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -328,94 +328,6 @@ static void preload_age(struct thread_info *ti)
ti->slb_preload_tail = (ti->slb_preload_tail + 1) % SLB_PRELOAD_NR;
}
-void slb_setup_new_exec(void)
-{
- struct thread_info *ti = current_thread_info();
- struct mm_struct *mm = current->mm;
- unsigned long exec = 0x10000000;
-
- WARN_ON(irqs_disabled());
-
- /*
- * preload cache can only be used to determine whether a SLB
- * entry exists if it does not start to overflow.
- */
- if (ti->slb_preload_nr + 2 > SLB_PRELOAD_NR)
- return;
-
- hard_irq_disable();
-
- /*
- * We have no good place to clear the slb preload cache on exec,
- * flush_thread is about the earliest arch hook but that happens
- * after we switch to the mm and have already preloaded the SLBEs.
- *
- * For the most part that's probably okay to use entries from the
- * previous exec, they will age out if unused. It may turn out to
- * be an advantage to clear the cache before switching to it,
- * however.
- */
-
- /*
- * preload some userspace segments into the SLB.
- * Almost all 32 and 64bit PowerPC executables are linked at
- * 0x10000000 so it makes sense to preload this segment.
- */
- if (!is_kernel_addr(exec)) {
- if (preload_add(ti, exec))
- slb_allocate_user(mm, exec);
- }
-
- /* Libraries and mmaps. */
- if (!is_kernel_addr(mm->mmap_base)) {
- if (preload_add(ti, mm->mmap_base))
- slb_allocate_user(mm, mm->mmap_base);
- }
-
- /* see switch_slb */
- asm volatile("isync" : : : "memory");
-
- local_irq_enable();
-}
-
-void preload_new_slb_context(unsigned long start, unsigned long sp)
-{
- struct thread_info *ti = current_thread_info();
- struct mm_struct *mm = current->mm;
- unsigned long heap = mm->start_brk;
-
- WARN_ON(irqs_disabled());
-
- /* see above */
- if (ti->slb_preload_nr + 3 > SLB_PRELOAD_NR)
- return;
-
- hard_irq_disable();
-
- /* Userspace entry address. */
- if (!is_kernel_addr(start)) {
- if (preload_add(ti, start))
- slb_allocate_user(mm, start);
- }
-
- /* Top of stack, grows down. */
- if (!is_kernel_addr(sp)) {
- if (preload_add(ti, sp))
- slb_allocate_user(mm, sp);
- }
-
- /* Bottom of heap, grows up. */
- if (heap && !is_kernel_addr(heap)) {
- if (preload_add(ti, heap))
- slb_allocate_user(mm, heap);
- }
-
- /* see switch_slb */
- asm volatile("isync" : : : "memory");
-
- local_irq_enable();
-}
-
static void slb_cache_slbie_kernel(unsigned int index)
{
unsigned long slbie_data = get_paca()->slb_cache[index];
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 02/11] powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 01/11] powerpc/64s/slb: Fix SLB multihit issue during SLB preload Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 03/11] powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format Ritesh Harjani (IBM)
` (9 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
When HV=0 & IR/DR=0, the Hash MMU is said to be in Virtual Real
Addressing Mode during early boot. During this, we should ensure that
memory region allocations for stress_hpt_struct should happen from
within RMA region as otherwise the boot might get stuck while doing
memset of this region.
History behind why do we have RMA region limitation is better explained
in these 2 patches [1] & [2]. This patch ensures that memset to
stress_hpt_struct during early boot does not cross ppc64_rma_size
boundary.
[1]: https://lore.kernel.org/all/20190710052018.14628-1-sjitindarsingh@gmail.com/
[2]: https://lore.kernel.org/all/87wp54usvj.fsf@linux.vnet.ibm.com/
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Fixes: 6b34a099faa12 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 3aee3af614af..c99be1286d51 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1302,11 +1302,14 @@ static void __init htab_initialize(void)
unsigned long table;
unsigned long pteg_count;
unsigned long prot;
- phys_addr_t base = 0, size = 0, end;
+ phys_addr_t base = 0, size = 0, end, limit = MEMBLOCK_ALLOC_ANYWHERE;
u64 i;
DBG(" -> htab_initialize()\n");
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ limit = ppc64_rma_size;
+
if (mmu_has_feature(MMU_FTR_1T_SEGMENT)) {
mmu_kernel_ssize = MMU_SEGSIZE_1T;
mmu_highuser_ssize = MMU_SEGSIZE_1T;
@@ -1322,7 +1325,7 @@ static void __init htab_initialize(void)
// Too early to use nr_cpu_ids, so use NR_CPUS
tmp = memblock_phys_alloc_range(sizeof(struct stress_hpt_struct) * NR_CPUS,
__alignof__(struct stress_hpt_struct),
- 0, MEMBLOCK_ALLOC_ANYWHERE);
+ MEMBLOCK_LOW_LIMIT, limit);
memset((void *)tmp, 0xff, sizeof(struct stress_hpt_struct) * NR_CPUS);
stress_hpt_struct = __va(tmp);
@@ -1356,11 +1359,10 @@ static void __init htab_initialize(void)
mmu_hash_ops.hpte_clear_all();
#endif
} else {
- unsigned long limit = MEMBLOCK_ALLOC_ANYWHERE;
table = memblock_phys_alloc_range(htab_size_bytes,
htab_size_bytes,
- 0, limit);
+ MEMBLOCK_LOW_LIMIT, limit);
if (!table)
panic("ERROR: Failed to allocate %pa bytes below %pa\n",
&htab_size_bytes, &limit);
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 03/11] powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 01/11] powerpc/64s/slb: Fix SLB multihit issue during SLB preload Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 02/11] powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 04/11] powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize() Ritesh Harjani (IBM)
` (8 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
HPTE format was changed since Power9 (ISA 3.0) onwards. While dumping
kernel hash page tables, nothing gets printed on powernv P9+. This patch
utilizes the helpers added in the patch tagged as fixes, to convert new
format to old format and dump the hptes. This fix is only needed for
native_find() (powernv), since pseries continues to work fine with the
old format.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Fixes: 6b243fcfb5f1e ("powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/ptdump/hashpagetable.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/mm/ptdump/hashpagetable.c b/arch/powerpc/mm/ptdump/hashpagetable.c
index a6baa6166d94..671d0dc00c6d 100644
--- a/arch/powerpc/mm/ptdump/hashpagetable.c
+++ b/arch/powerpc/mm/ptdump/hashpagetable.c
@@ -216,6 +216,8 @@ static int native_find(unsigned long ea, int psize, bool primary, u64 *v, u64
vpn = hpt_vpn(ea, vsid, ssize);
hash = hpt_hash(vpn, shift, ssize);
want_v = hpte_encode_avpn(vpn, psize, ssize);
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ want_v = hpte_old_to_new_v(want_v);
/* to check in the secondary hash table, we invert the hash */
if (!primary)
@@ -229,6 +231,10 @@ static int native_find(unsigned long ea, int psize, bool primary, u64 *v, u64
/* HPTE matches */
*v = be64_to_cpu(hptep->v);
*r = be64_to_cpu(hptep->r);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ *v = hpte_new_to_old_v(*v, *r);
+ *r = hpte_new_to_old_r(*r);
+ }
return 0;
}
++hpte_group;
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 04/11] powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize()
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (2 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 03/11] powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 05/11] powerpc/64s/hash: Improve hash mmu printk messages Ritesh Harjani (IBM)
` (7 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
We get below errors when we try to enable debug logs in book3s64/hash_utils.c
This patch fixes these errors related to phys_addr_t printf format.
arch/powerpc/mm/book3s64/hash_utils.c: In function ‘htab_initialize’:
arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
1401 | DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
cc1: all warnings being treated as errors
make[6]: *** [../scripts/Makefile.build:287: arch/powerpc/mm/book3s64/hash_utils.o] Error 1
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index c99be1286d51..0509c0a436d2 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1394,8 +1394,8 @@ static void __init htab_initialize(void)
size = end - base;
base = (unsigned long)__va(base);
- DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
- base, size, prot);
+ pr_debug("creating mapping for region: 0x%pa..0x%pa (prot: %lx)\n",
+ &base, &size, prot);
if ((base + size) >= H_VMALLOC_START) {
pr_warn("Outside the supported range\n");
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 05/11] powerpc/64s/hash: Improve hash mmu printk messages
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (3 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 04/11] powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize() Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 06/11] powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU Ritesh Harjani (IBM)
` (6 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
Let's use pr_info() instead of printk() in order to utilize the pr_fmt
set to "hash-mmu:". This improves the debug messages that are spitted
out during kernel bootup.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 0509c0a436d2..2fa98d26876a 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -952,7 +952,7 @@ static int __init htab_dt_scan_hugepage_blocks(unsigned long node,
block_size = be64_to_cpu(addr_prop[1]);
if (block_size != (16 * GB))
return 0;
- printk(KERN_INFO "Huge page(16GB) memory: "
+ pr_info("Huge page(16GB) memory: "
"addr = 0x%lX size = 0x%lX pages = %d\n",
phys_addr, block_size, expected_pages);
if (phys_addr + block_size * expected_pages <= memblock_end_of_DRAM()) {
@@ -1135,7 +1135,7 @@ static void __init htab_init_page_sizes(void)
mmu_vmemmap_psize = mmu_virtual_psize;
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
- printk(KERN_DEBUG "Page orders: linear mapping = %d, "
+ pr_info("Page orders: linear mapping = %d, "
"virtual = %d, io = %d"
#ifdef CONFIG_SPARSEMEM_VMEMMAP
", vmemmap = %d"
@@ -1313,7 +1313,7 @@ static void __init htab_initialize(void)
if (mmu_has_feature(MMU_FTR_1T_SEGMENT)) {
mmu_kernel_ssize = MMU_SEGSIZE_1T;
mmu_highuser_ssize = MMU_SEGSIZE_1T;
- printk(KERN_INFO "Using 1TB segments\n");
+ pr_info("Using 1TB segments\n");
}
if (stress_slb_enabled)
@@ -1869,7 +1869,7 @@ int hash_page_mm(struct mm_struct *mm, unsigned long ea,
* in vmalloc space, so switch vmalloc
* to 4k pages
*/
- printk(KERN_ALERT "Reducing vmalloc segment "
+ pr_alert("Reducing vmalloc segment "
"to 4kB pages because of "
"non-cacheable mapping\n");
psize = mmu_vmalloc_psize = MMU_PAGE_4K;
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 06/11] powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (4 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 05/11] powerpc/64s/hash: Improve hash mmu printk messages Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 07/11] powerpc/64s/hash: Update directMap page counters for Hash Ritesh Harjani (IBM)
` (5 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
This disables creating hpt_order debugfs entry with radix mode.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 2fa98d26876a..e63befc96708 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -2434,6 +2434,8 @@ DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, "%llu\n")
static int __init hash64_debugfs(void)
{
+ if (radix_enabled())
+ return 0;
debugfs_create_file("hpt_order", 0600, arch_debugfs_dir, NULL,
&fops_hpt_order);
return 0;
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 07/11] powerpc/64s/hash: Update directMap page counters for Hash
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (5 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 06/11] powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 08/11] powerpc/64s/pgtable: Enable directMap counters in meminfo " Ritesh Harjani (IBM)
` (4 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
Update the directMap page counters for Hash. Hash by default always uses
mmu_linear_psize only, for it's directMap. However, once the kernel has
booted and the dmesg log is wrapped over there is no way of knowing the
kernel linear pagesize with Hash mmu. Features like debug_page_alloc can
make mmu_linear_psize to be PAGE_SIZE instead of PMD / PUD mappings. It
would be easier if we have this info printed in proc meminfo similar to
Radix for debugging purposes.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index e63befc96708..31162dbad05c 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -47,6 +47,7 @@
#include <asm/mmu.h>
#include <asm/mmu_context.h>
#include <asm/page.h>
+#include <asm/pgalloc.h>
#include <asm/types.h>
#include <linux/uaccess.h>
#include <asm/machdep.h>
@@ -449,6 +450,7 @@ static __init void hash_kfence_map_pool(void)
{
unsigned long kfence_pool_start, kfence_pool_end;
unsigned long prot = pgprot_val(PAGE_KERNEL);
+ unsigned int pshift = mmu_psize_defs[mmu_linear_psize].shift;
if (!kfence_pool)
return;
@@ -459,6 +461,7 @@ static __init void hash_kfence_map_pool(void)
BUG_ON(htab_bolt_mapping(kfence_pool_start, kfence_pool_end,
kfence_pool, prot, mmu_linear_psize,
mmu_kernel_ssize));
+ update_page_count(mmu_linear_psize, KFENCE_POOL_SIZE >> pshift);
memblock_clear_nomap(kfence_pool, KFENCE_POOL_SIZE);
}
@@ -1234,6 +1237,7 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
int nid, pgprot_t prot)
{
int rc;
+ unsigned int pshift = mmu_psize_defs[mmu_linear_psize].shift;
if (end >= H_VMALLOC_START) {
pr_warn("Outside the supported range\n");
@@ -1251,17 +1255,22 @@ int hash__create_section_mapping(unsigned long start, unsigned long end,
mmu_kernel_ssize);
BUG_ON(rc2 && (rc2 != -ENOENT));
}
+ update_page_count(mmu_linear_psize, (end - start) >> pshift);
return rc;
}
int hash__remove_section_mapping(unsigned long start, unsigned long end)
{
+ unsigned int pshift = mmu_psize_defs[mmu_linear_psize].shift;
+
int rc = htab_remove_mapping(start, end, mmu_linear_psize,
mmu_kernel_ssize);
if (resize_hpt_for_hotplug(memblock_phys_mem_size()) == -ENOSPC)
pr_warn("Hash collision while resizing HPT\n");
+ if (!rc)
+ update_page_count(mmu_linear_psize, -((end - start) >> pshift));
return rc;
}
#endif /* CONFIG_MEMORY_HOTPLUG */
@@ -1304,6 +1313,7 @@ static void __init htab_initialize(void)
unsigned long prot;
phys_addr_t base = 0, size = 0, end, limit = MEMBLOCK_ALLOC_ANYWHERE;
u64 i;
+ unsigned int pshift = mmu_psize_defs[mmu_linear_psize].shift;
DBG(" -> htab_initialize()\n");
@@ -1404,6 +1414,8 @@ static void __init htab_initialize(void)
BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
prot, mmu_linear_psize, mmu_kernel_ssize));
+
+ update_page_count(mmu_linear_psize, size >> pshift);
}
hash_kfence_map_pool();
memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
@@ -1425,6 +1437,8 @@ static void __init htab_initialize(void)
BUG_ON(htab_bolt_mapping(tce_alloc_start, tce_alloc_end,
__pa(tce_alloc_start), prot,
mmu_linear_psize, mmu_kernel_ssize));
+ update_page_count(mmu_linear_psize,
+ (tce_alloc_end - tce_alloc_start) >> pshift);
}
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 08/11] powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (6 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 07/11] powerpc/64s/hash: Update directMap page counters for Hash Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 09/11] powerpc/ptdump: Dump PXX level info for kernel_page_tables Ritesh Harjani (IBM)
` (3 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
This patch enables the directMap counters to be printed in proc/meminfo
for Hash mmu. With this patch on a system with 8G of DRAM we can see the
entire RAM mapped with 16M pagesize:
cat /proc/meminfo |grep -i direct
DirectMap4k: 0 kB
DirectMap64k: 0 kB
DirectMap16M: 8388608 kB
DirectMap16G: 0 kB
Tested with devdax too:
root@buildroot:/# ndctl create-namespace -r region0 -m devdax -s 2G
{
"dev":"namespace0.0",
"mode":"devdax",
"map":"dev",
"size":"2032.00 MiB (2130.71 MB)",
"uuid":"aa383ded-cd99-43a0-979f-5225467cfb40",
"daxregion":{
"id":0,
"size":"2032.00 MiB (2130.71 MB)",
"align":16777216,
"devices":[
{
"chardev":"dax0.0",
"size":"2032.00 MiB (2130.71 MB)",
"target_node":0,
"align":"16.00 MiB (16.78 MB)",
"mode":"devdax"
}
]
},
"align":16777216
}
root@buildroot:/# cat /proc/meminfo |grep -i direct
DirectMap4k: 0 kB
DirectMap64k: 0 kB
DirectMap16M: 10485760 kB
DirectMap16G: 0 kB
root@buildroot:/# ndctl destroy-namespace -f all
destroyed 1 namespace
root@buildroot:/# cat /proc/meminfo |grep -i direct
DirectMap4k: 0 kB
DirectMap64k: 0 kB
DirectMap16M: 8388608 kB
DirectMap16G: 0 kB
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/pgtable.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index c9431ae7f78a..e3485db7de02 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -510,20 +510,21 @@ atomic_long_t direct_pages_count[MMU_PAGE_COUNT];
void arch_report_meminfo(struct seq_file *m)
{
- /*
- * Hash maps the memory with one size mmu_linear_psize.
- * So don't bother to print these on hash
- */
- if (!radix_enabled())
- return;
seq_printf(m, "DirectMap4k: %8lu kB\n",
atomic_long_read(&direct_pages_count[MMU_PAGE_4K]) << 2);
- seq_printf(m, "DirectMap64k: %8lu kB\n",
+ seq_printf(m, "DirectMap64k: %8lu kB\n",
atomic_long_read(&direct_pages_count[MMU_PAGE_64K]) << 6);
- seq_printf(m, "DirectMap2M: %8lu kB\n",
- atomic_long_read(&direct_pages_count[MMU_PAGE_2M]) << 11);
- seq_printf(m, "DirectMap1G: %8lu kB\n",
- atomic_long_read(&direct_pages_count[MMU_PAGE_1G]) << 20);
+ if (radix_enabled()) {
+ seq_printf(m, "DirectMap2M: %8lu kB\n",
+ atomic_long_read(&direct_pages_count[MMU_PAGE_2M]) << 11);
+ seq_printf(m, "DirectMap1G: %8lu kB\n",
+ atomic_long_read(&direct_pages_count[MMU_PAGE_1G]) << 20);
+ } else {
+ seq_printf(m, "DirectMap16M: %8lu kB\n",
+ atomic_long_read(&direct_pages_count[MMU_PAGE_16M]) << 14);
+ seq_printf(m, "DirectMap16G: %8lu kB\n",
+ atomic_long_read(&direct_pages_count[MMU_PAGE_16G]) << 24);
+ }
}
#endif /* CONFIG_PROC_FS */
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 09/11] powerpc/ptdump: Dump PXX level info for kernel_page_tables
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (7 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 08/11] powerpc/64s/pgtable: Enable directMap counters in meminfo " Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 10/11] powerpc/64s/slb: Make preload_add return type as void Ritesh Harjani (IBM)
` (2 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
This patch adds PGD/PUD/PMD/PTE level information while dumping kernel
page tables. Before this patch it was hard to identify which entries
belongs to which page table level e.g.
~ # dmesg |grep -i radix
[0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000005400000 with 2.00 MiB pages (exec)
[0.000000] radix-mmu: Mapped 0x0000000005400000-0x0000000040000000 with 2.00 MiB pages
[0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
[0.000000] radix-mmu: Initializing Radix MMU
Before:
---[ Start of kernel VM ]---
0xc000000000000000-0xc000000003ffffff XXX 64M r X pte valid present dirty accessed
0xc000000004000000-0xc00000003fffffff XXX 960M r w pte valid present dirty accessed
0xc000000040000000-0xc0000000ffffffff XXX 3G r w pte valid present dirty accessed
...
---[ vmemmap start ]---
0xc00c000000000000-0xc00c0000003fffff XXX 4M r w pte valid present dirty accessed
After:
---[ Start of kernel VM ]---
0xc000000000000000-0xc000000003ffffff XXX 64M PMD r X pte valid present dirty accessed
0xc000000004000000-0xc00000003fffffff XXX 960M PMD r w pte valid present dirty accessed
0xc000000040000000-0xc0000000ffffffff XXX 3G PUD r w pte valid present dirty accessed
...
---[ vmemmap start ]---
0xc00c000000000000-0xc00c0000003fffff XXX 4M PMD r w pte valid present dirty accessed
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/ptdump/8xx.c | 5 +++++
arch/powerpc/mm/ptdump/book3s64.c | 5 +++++
arch/powerpc/mm/ptdump/ptdump.c | 1 +
arch/powerpc/mm/ptdump/ptdump.h | 1 +
arch/powerpc/mm/ptdump/shared.c | 5 +++++
5 files changed, 17 insertions(+)
diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c
index 4ca9cf7a90c9..ff845f251724 100644
--- a/arch/powerpc/mm/ptdump/8xx.c
+++ b/arch/powerpc/mm/ptdump/8xx.c
@@ -71,18 +71,23 @@ static const struct flag_info flag_array[] = {
struct ptdump_pg_level pg_level[5] = {
{ /* pgd */
+ .name = "PGD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* p4d */
+ .name = "P4D",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pud */
+ .name = "PUD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pmd */
+ .name = "PMD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pte */
+ .name = "PTE",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
},
diff --git a/arch/powerpc/mm/ptdump/book3s64.c b/arch/powerpc/mm/ptdump/book3s64.c
index 6b2da9241d4c..e8a21c6dc32e 100644
--- a/arch/powerpc/mm/ptdump/book3s64.c
+++ b/arch/powerpc/mm/ptdump/book3s64.c
@@ -104,18 +104,23 @@ static const struct flag_info flag_array[] = {
struct ptdump_pg_level pg_level[5] = {
{ /* pgd */
+ .name = "PGD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* p4d */
+ .name = "P4D",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pud */
+ .name = "PUD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pmd */
+ .name = "PMD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pte */
+ .name = "PTE",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
},
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index b2358d794855..0d499aebee72 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -178,6 +178,7 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
pt_dump_seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
pt_dump_size(st->seq, addr - st->start_address);
+ pt_dump_seq_printf(st->seq, "%s ", pg_level[st->level].name);
}
static void note_prot_wx(struct pg_state *st, unsigned long addr)
diff --git a/arch/powerpc/mm/ptdump/ptdump.h b/arch/powerpc/mm/ptdump/ptdump.h
index 4232aa4b57ea..12aa9eca8b0c 100644
--- a/arch/powerpc/mm/ptdump/ptdump.h
+++ b/arch/powerpc/mm/ptdump/ptdump.h
@@ -13,6 +13,7 @@ struct flag_info {
struct ptdump_pg_level {
const struct flag_info *flag;
+ char name[4];
size_t num;
u64 mask;
};
diff --git a/arch/powerpc/mm/ptdump/shared.c b/arch/powerpc/mm/ptdump/shared.c
index 58998960eb9a..edc69da19b85 100644
--- a/arch/powerpc/mm/ptdump/shared.c
+++ b/arch/powerpc/mm/ptdump/shared.c
@@ -69,18 +69,23 @@ static const struct flag_info flag_array[] = {
struct ptdump_pg_level pg_level[5] = {
{ /* pgd */
+ .name = "PGD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* p4d */
+ .name = "P4D",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pud */
+ .name = "PUD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pmd */
+ .name = "PMD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pte */
+ .name = "PTE",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
},
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 10/11] powerpc/64s/slb: Make preload_add return type as void
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (8 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 09/11] powerpc/ptdump: Dump PXX level info for kernel_page_tables Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
2025-10-30 14:57 ` [PATCH v2 11/11] powerpc/64s/slb: Add no_slb_preload early cmdline param Ritesh Harjani (IBM)
2025-11-23 7:08 ` [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Madhavan Srinivasan
11 siblings, 0 replies; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
We dropped preload_new_slb_context() & slb_setup_new_exec() in a
previous patch. That means we don't really need preload_add() return
type anymore. So let's make its return type as void.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/slb.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 7e053c561a09..042b762fc0d2 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -294,7 +294,7 @@ static bool preload_hit(struct thread_info *ti, unsigned long esid)
return false;
}
-static bool preload_add(struct thread_info *ti, unsigned long ea)
+static void preload_add(struct thread_info *ti, unsigned long ea)
{
unsigned char idx;
unsigned long esid;
@@ -308,7 +308,7 @@ static bool preload_add(struct thread_info *ti, unsigned long ea)
esid = ea >> SID_SHIFT;
if (preload_hit(ti, esid))
- return false;
+ return;
idx = (ti->slb_preload_tail + ti->slb_preload_nr) % SLB_PRELOAD_NR;
ti->slb_preload_esid[idx] = esid;
@@ -316,8 +316,6 @@ static bool preload_add(struct thread_info *ti, unsigned long ea)
ti->slb_preload_tail = (ti->slb_preload_tail + 1) % SLB_PRELOAD_NR;
else
ti->slb_preload_nr++;
-
- return true;
}
static void preload_age(struct thread_info *ti)
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH v2 11/11] powerpc/64s/slb: Add no_slb_preload early cmdline param
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (9 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 10/11] powerpc/64s/slb: Make preload_add return type as void Ritesh Harjani (IBM)
@ 2025-10-30 14:57 ` Ritesh Harjani (IBM)
[not found] ` <2197e5654ffff3960ccd4563f88e9396@imap.linux.ibm.com>
2025-11-23 7:08 ` [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Madhavan Srinivasan
11 siblings, 1 reply; 14+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-10-30 14:57 UTC (permalink / raw)
To: linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K . V, Donet Tom,
Pavithra Prakash, Ritesh Harjani (IBM)
no_slb_preload cmdline can come useful in quickly disabling and/or
testing the performance impact of userspace slb preloads. Recently there
was a slb multi-hit issue due to slb preload cache which was very
difficult to triage. This cmdline option allows to quickly disable
preloads and verify if the issue exists in preload cache or somewhere
else. This can also be a useful option to see the effect of slb preloads
for any application workload e.g. number of slb faults with or w/o slb
preloads.
with slb_preload:
slb_faults (minimal initrd boot): 15
slb_faults (full systemd boot): 300
with no_slb_preload:
slb_faults (minimal initrd boot): 33
slb_faults (full systemd boot): 138180
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
Documentation/admin-guide/kernel-parameters.txt | 3 +++
arch/powerpc/mm/book3s64/hash_utils.c | 3 +++
arch/powerpc/mm/book3s64/internal.h | 7 +++++++
arch/powerpc/mm/book3s64/slb.c | 15 +++++++++++++++
4 files changed, 28 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6c42061ca20e..0b0bb73d1cc1 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7192,6 +7192,9 @@
them frequently to increase the rate of SLB faults
on kernel addresses.
+ no_slb_preload [PPC,EARLY]
+ Disables slb preloading for userspace.
+
sunrpc.min_resvport=
sunrpc.max_resvport=
[NFS,SUNRPC]
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 31162dbad05c..9dc5889d6ecb 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1329,6 +1329,9 @@ static void __init htab_initialize(void)
if (stress_slb_enabled)
static_branch_enable(&stress_slb_key);
+ if (no_slb_preload)
+ static_branch_enable(&no_slb_preload_key);
+
if (stress_hpt_enabled) {
unsigned long tmp;
static_branch_enable(&stress_hpt_key);
diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h
index c26a6f0c90fc..cad08d83369c 100644
--- a/arch/powerpc/mm/book3s64/internal.h
+++ b/arch/powerpc/mm/book3s64/internal.h
@@ -22,6 +22,13 @@ static inline bool stress_hpt(void)
return static_branch_unlikely(&stress_hpt_key);
}
+extern bool no_slb_preload;
+DECLARE_STATIC_KEY_FALSE(no_slb_preload_key);
+static inline bool slb_preload_disabled(void)
+{
+ return static_branch_unlikely(&no_slb_preload_key);
+}
+
void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 042b762fc0d2..15f73abd1506 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -42,6 +42,15 @@ early_param("stress_slb", parse_stress_slb);
__ro_after_init DEFINE_STATIC_KEY_FALSE(stress_slb_key);
+bool no_slb_preload __initdata;
+static int __init parse_no_slb_preload(char *p)
+{
+ no_slb_preload = true;
+ return 0;
+}
+early_param("no_slb_preload", parse_no_slb_preload);
+__ro_after_init DEFINE_STATIC_KEY_FALSE(no_slb_preload_key);
+
static void assert_slb_presence(bool present, unsigned long ea)
{
#ifdef CONFIG_DEBUG_VM
@@ -299,6 +308,9 @@ static void preload_add(struct thread_info *ti, unsigned long ea)
unsigned char idx;
unsigned long esid;
+ if (slb_preload_disabled())
+ return;
+
if (mmu_has_feature(MMU_FTR_1T_SEGMENT)) {
/* EAs are stored >> 28 so 256MB segments don't need clearing */
if (ea & ESID_MASK_1T)
@@ -412,6 +424,9 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
copy_mm_to_paca(mm);
+ if (slb_preload_disabled())
+ return;
+
/*
* We gradually age out SLBs after a number of context switches to
* reduce reload overhead of unused entries (like we do with FP/VEC
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements
2025-10-30 14:57 [PATCH v2 00/11] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (10 preceding siblings ...)
2025-10-30 14:57 ` [PATCH v2 11/11] powerpc/64s/slb: Add no_slb_preload early cmdline param Ritesh Harjani (IBM)
@ 2025-11-23 7:08 ` Madhavan Srinivasan
11 siblings, 0 replies; 14+ messages in thread
From: Madhavan Srinivasan @ 2025-11-23 7:08 UTC (permalink / raw)
To: linuxppc-dev, Ritesh Harjani (IBM)
Cc: Michael Ellerman, Nicholas Piggin, Christophe Leroy,
Paul Mackerras, Aneesh Kumar K . V, Donet Tom, Pavithra Prakash
On Thu, 30 Oct 2025 20:27:25 +0530, Ritesh Harjani (IBM) wrote:
> While working on slb multi-hit issue we identified few more fixes and
> improvements required for Hash / SLB code. This patch series is a result of
> that.
>
> RFC -> v2:
> ==========
> 1. Addressed review comments from Christophe.
> 2. Added PATCH [5-8] as improvements patches.
> 3. Dropped the last patch which adds slb faults to vmstat counter.
> I'd like to do some more testing of this internally first, and if it proves to
> be really useful, I will send that patch separately later.
>
> [...]
Applied to powerpc/next.
[01/11] powerpc/64s/slb: Fix SLB multihit issue during SLB preload
https://git.kernel.org/powerpc/c/00312419f0863964625d6dcda8183f96849412c6
[02/11] powerpc/64s/hash: Restrict stress_hpt_struct memblock region to within RMA limit
https://git.kernel.org/powerpc/c/17b45ccf09882e0c808ad2cf62acdc90ad968746
[03/11] powerpc/64s/ptdump: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format
https://git.kernel.org/powerpc/c/eae40a6da63faa9fb63ff61f8fa2b3b57da78a84
[04/11] powerpc/64s/hash: Fix phys_addr_t printf format in htab_initialize()
https://git.kernel.org/powerpc/c/178dd2ee2b72817a67a8814c35a65fd901b325ba
[05/11] powerpc/64s/hash: Improve hash mmu printk messages
https://git.kernel.org/powerpc/c/fec40fe7e6dc08c97370420301377ee031199a6d
[06/11] powerpc/64s/hash: Hash hpt_order should be only available with Hash MMU
https://git.kernel.org/powerpc/c/b80691e25ec632d020b90eb9de3af0f956dff0a0
[07/11] powerpc/64s/hash: Update directMap page counters for Hash
https://git.kernel.org/powerpc/c/b296fda58d1d095c95c8207b09856b2ceafa1397
[08/11] powerpc/64s/pgtable: Enable directMap counters in meminfo for Hash
https://git.kernel.org/powerpc/c/6394f0e8abe7ca3132faa1321c97c53d0994aecc
[09/11] powerpc/ptdump: Dump PXX level info for kernel_page_tables
https://git.kernel.org/powerpc/c/3d44be297e7e01357b95dd13d2b335e6550ccfcd
[10/11] powerpc/64s/slb: Make preload_add return type as void
https://git.kernel.org/powerpc/c/2a492d6b38c2943c9d2f9008f31a8bb3afc3a40b
[11/11] powerpc/64s/slb: Add no_slb_preload early cmdline param
https://git.kernel.org/powerpc/c/5b3a426affbd30a4293d284ab0d37164a4064531
cheers
^ permalink raw reply [flat|nested] 14+ messages in thread