* [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements
@ 2025-08-30 3:51 Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 1/8] powerpc/mm: Fix SLB multihit issue during SLB preload Ritesh Harjani (IBM)
` (7 more replies)
0 siblings, 8 replies; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Ritesh Harjani (IBM)
Hello all,
While working over SLB related multi-hit issue we identified few more Hash / SLB
related issues, which this patch series fixes.
Patches 1-4 are various fixes related to SLB / HASH MMU.
Patches 5-8 are improvements in ptdump, slb preload and the last patch adds slb
user and kernel vmstat counters.
Patch-1 was posted earlier as a standalone fix here [1]. It has no changes
in this version. But since patch-6 is dependent on this, hence we clubbed all
Hash / SLB related patches into a common series.
Other than patch-1 (which was posted earlier), the rest of the patches are
mostly only tested in Qemu tcg for powernv and pseries. I will be testing these
on the real HW too before the next revision. But meanwhile it will be good to
get any reviews/feedback.
[1]: https://lore.kernel.org/linuxppc-dev/20250814092532.116762-1-donettom@linux.ibm.com/
Donet Tom (1):
powerpc/mm: Fix SLB multihit issue during SLB preload
Ritesh Harjani (IBM) (7):
book3s64/hash: Restrict stress_hpt_struct memblock region to within RMA limit
book3s64/hash: Fix phys_addr_t printf format in htab_initialize()
powerpc/ptdump/64: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format
powerpc/ptdump: Dump PXX level info for kernel_page_tables
powerpc/book3s64/slb: Make preload_add return type as void
powerpc/book3s64/slb: Add no_slb_preload early cmdline param
powerpc/book3s64/slb: Add slb faults to vmstat
.../admin-guide/kernel-parameters.txt | 3 +
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 -
arch/powerpc/kernel/process.c | 5 -
arch/powerpc/mm/book3s64/hash_utils.c | 15 ++-
arch/powerpc/mm/book3s64/internal.h | 9 +-
arch/powerpc/mm/book3s64/mmu_context.c | 2 -
arch/powerpc/mm/book3s64/slb.c | 112 ++++--------------
arch/powerpc/mm/ptdump/8xx.c | 5 +
arch/powerpc/mm/ptdump/book3s64.c | 5 +
arch/powerpc/mm/ptdump/hashpagetable.c | 6 +
arch/powerpc/mm/ptdump/ptdump.c | 1 +
arch/powerpc/mm/ptdump/ptdump.h | 1 +
arch/powerpc/mm/ptdump/shared.c | 5 +
include/linux/vm_event_item.h | 4 +
mm/vmstat.c | 5 +
15 files changed, 73 insertions(+), 106 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [RFC 1/8] powerpc/mm: Fix SLB multihit issue during SLB preload
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
@ 2025-08-30 3:51 ` Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 2/8] book3s64/hash: Restrict stress_hpt_struct memblock region to within RMA limit Ritesh Harjani (IBM)
` (6 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev
Cc: Donet Tom, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Christophe Leroy, Paul Mackerras, Aneesh Kumar K.V, stable
From: Donet Tom <donettom@linux.ibm.com>
On systems using the hash MMU, there is a software SLB preload cache that
mirrors the entries loaded into the hardware SLB buffer. This preload
cache is subject to periodic eviction — typically after every 256 context
switches — to remove old entry.
To optimize performance, the kernel skips switch_mmu_context() in
switch_mm_irqs_off() when the prev and next mm_struct are the same.
However, on hash MMU systems, this can lead to inconsistencies between
the hardware SLB and the software preload cache.
If an SLB entry for a process is evicted from the software cache on one
CPU, and the same process later runs on another CPU without executing
switch_mmu_context(), the hardware SLB may retain stale entries. If the
kernel then attempts to reload that entry, it can trigger an SLB
multi-hit error.
The following timeline shows how stale SLB entries are created and can
cause a multi-hit error when a process moves between CPUs without a
MMU context switch.
CPU 0 CPU 1
----- -----
Process P
exec swapper/1
load_elf_binary
begin_new_exc
activate_mm
switch_mm_irqs_off
switch_mmu_context
switch_slb
/*
* This invalidates all
* the entries in the HW
* and setup the new HW
* SLB entries as per the
* preload cache.
*/
context_switch
sched_migrate_task migrates process P to cpu-1
Process swapper/0 context switch (to process P)
(uses mm_struct of Process P) switch_mm_irqs_off()
switch_slb
load_slb++
/*
* load_slb becomes 0 here
* and we evict an entry from
* the preload cache with
* preload_age(). We still
* keep HW SLB and preload
* cache in sync, that is
* because all HW SLB entries
* anyways gets evicted in
* switch_slb during SLBIA.
* We then only add those
* entries back in HW SLB,
* which are currently
* present in preload_cache
* (after eviction).
*/
load_elf_binary continues...
setup_new_exec()
slb_setup_new_exec()
sched_switch event
sched_migrate_task migrates
process P to cpu-0
context_switch from swapper/0 to Process P
switch_mm_irqs_off()
/*
* Since both prev and next mm struct are same we don't call
* switch_mmu_context(). This will cause the HW SLB and SW preload
* cache to go out of sync in preload_new_slb_context. Because there
* was an SLB entry which was evicted from both HW and preload cache
* on cpu-1. Now later in preload_new_slb_context(), when we will try
* to add the same preload entry again, we will add this to the SW
* preload cache and then will add it to the HW SLB. Since on cpu-0
* this entry was never invalidated, hence adding this entry to the HW
* SLB will cause a SLB multi-hit error.
*/
load_elf_binary continues...
START_THREAD
start_thread
preload_new_slb_context
/*
* This tries to add a new EA to preload cache which was earlier
* evicted from both cpu-1 HW SLB and preload cache. This caused the
* HW SLB of cpu-0 to go out of sync with the SW preload cache. The
* reason for this was, that when we context switched back on CPU-0,
* we should have ideally called switch_mmu_context() which will
* bring the HW SLB entries on CPU-0 in sync with SW preload cache
* entries by setting up the mmu context properly. But we didn't do
* that since the prev mm_struct running on cpu-0 was same as the
* next mm_struct (which is true for swapper / kernel threads). So
* now when we try to add this new entry into the HW SLB of cpu-0,
* we hit a SLB multi-hit error.
*/
WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
Modules linked in:
CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
VOLUNTARY
Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty)
MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000
CFAR: c0000000001543b0 IRQMASK: 3
<...>
NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
LR [c0000000001543b4] slb_insert_entry+0x124/0x390
Call Trace:
0x7fffceb5ffff (unreliable)
preload_new_slb_context+0x100/0x1a0
start_thread+0x26c/0x420
load_elf_binary+0x1b04/0x1c40
bprm_execve+0x358/0x680
do_execveat_common+0x1f8/0x240
sys_execve+0x58/0x70
system_call_exception+0x114/0x300
system_call_common+0x160/0x2c4
>From the above analysis, during early exec the hardware SLB is cleared,
and entries from the software preload cache are reloaded into hardware
by switch_slb. However, preload_new_slb_context and slb_setup_new_exec
also attempt to load some of the same entries, which can trigger a
multi-hit. In most cases, these additional preloads simply hit existing
entries and add nothing new. Removing these functions avoids redundant
preloads and eliminates the multi-hit issue. This patch removes these
two functions.
We tested process switching performance using the context_switch
benchmark on POWER9/hash, and observed no regression.
Without this patch: 129041 ops/sec
With this patch: 129341 ops/sec
We also measured SLB faults during boot, and the counts are essentially
the same with and without this patch.
SLB faults without this patch: 19727
SLB faults with this patch: 19786
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
cc: <stable@vger.kernel.org>
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 -
arch/powerpc/kernel/process.c | 5 --
arch/powerpc/mm/book3s64/internal.h | 2 -
arch/powerpc/mm/book3s64/mmu_context.c | 2 -
arch/powerpc/mm/book3s64/slb.c | 88 -------------------
5 files changed, 98 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 1c4eebbc69c9..e1f77e2eead4 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -524,7 +524,6 @@ void slb_save_contents(struct slb_entry *slb_ptr);
void slb_dump_contents(struct slb_entry *slb_ptr);
extern void slb_vmalloc_update(void);
-void preload_new_slb_context(unsigned long start, unsigned long sp);
#ifdef CONFIG_PPC_64S_HASH_MMU
void slb_set_size(u16 size);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 855e09886503..2b9799157eb4 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1897,8 +1897,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
return 0;
}
-void preload_new_slb_context(unsigned long start, unsigned long sp);
-
/*
* Set up a thread for executing a new program
*/
@@ -1906,9 +1904,6 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
{
#ifdef CONFIG_PPC64
unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */
-
- if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled())
- preload_new_slb_context(start, sp);
#endif
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h
index a57a25f06a21..c26a6f0c90fc 100644
--- a/arch/powerpc/mm/book3s64/internal.h
+++ b/arch/powerpc/mm/book3s64/internal.h
@@ -24,8 +24,6 @@ static inline bool stress_hpt(void)
void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
-void slb_setup_new_exec(void);
-
void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
#endif /* ARCH_POWERPC_MM_BOOK3S64_INTERNAL_H */
diff --git a/arch/powerpc/mm/book3s64/mmu_context.c b/arch/powerpc/mm/book3s64/mmu_context.c
index 4e1e45420bd4..fb9dcf9ca599 100644
--- a/arch/powerpc/mm/book3s64/mmu_context.c
+++ b/arch/powerpc/mm/book3s64/mmu_context.c
@@ -150,8 +150,6 @@ static int hash__init_new_context(struct mm_struct *mm)
void hash__setup_new_exec(void)
{
slice_setup_new_exec();
-
- slb_setup_new_exec();
}
#else
static inline int hash__init_new_context(struct mm_struct *mm)
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 6b783552403c..7e053c561a09 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -328,94 +328,6 @@ static void preload_age(struct thread_info *ti)
ti->slb_preload_tail = (ti->slb_preload_tail + 1) % SLB_PRELOAD_NR;
}
-void slb_setup_new_exec(void)
-{
- struct thread_info *ti = current_thread_info();
- struct mm_struct *mm = current->mm;
- unsigned long exec = 0x10000000;
-
- WARN_ON(irqs_disabled());
-
- /*
- * preload cache can only be used to determine whether a SLB
- * entry exists if it does not start to overflow.
- */
- if (ti->slb_preload_nr + 2 > SLB_PRELOAD_NR)
- return;
-
- hard_irq_disable();
-
- /*
- * We have no good place to clear the slb preload cache on exec,
- * flush_thread is about the earliest arch hook but that happens
- * after we switch to the mm and have already preloaded the SLBEs.
- *
- * For the most part that's probably okay to use entries from the
- * previous exec, they will age out if unused. It may turn out to
- * be an advantage to clear the cache before switching to it,
- * however.
- */
-
- /*
- * preload some userspace segments into the SLB.
- * Almost all 32 and 64bit PowerPC executables are linked at
- * 0x10000000 so it makes sense to preload this segment.
- */
- if (!is_kernel_addr(exec)) {
- if (preload_add(ti, exec))
- slb_allocate_user(mm, exec);
- }
-
- /* Libraries and mmaps. */
- if (!is_kernel_addr(mm->mmap_base)) {
- if (preload_add(ti, mm->mmap_base))
- slb_allocate_user(mm, mm->mmap_base);
- }
-
- /* see switch_slb */
- asm volatile("isync" : : : "memory");
-
- local_irq_enable();
-}
-
-void preload_new_slb_context(unsigned long start, unsigned long sp)
-{
- struct thread_info *ti = current_thread_info();
- struct mm_struct *mm = current->mm;
- unsigned long heap = mm->start_brk;
-
- WARN_ON(irqs_disabled());
-
- /* see above */
- if (ti->slb_preload_nr + 3 > SLB_PRELOAD_NR)
- return;
-
- hard_irq_disable();
-
- /* Userspace entry address. */
- if (!is_kernel_addr(start)) {
- if (preload_add(ti, start))
- slb_allocate_user(mm, start);
- }
-
- /* Top of stack, grows down. */
- if (!is_kernel_addr(sp)) {
- if (preload_add(ti, sp))
- slb_allocate_user(mm, sp);
- }
-
- /* Bottom of heap, grows up. */
- if (heap && !is_kernel_addr(heap)) {
- if (preload_add(ti, heap))
- slb_allocate_user(mm, heap);
- }
-
- /* see switch_slb */
- asm volatile("isync" : : : "memory");
-
- local_irq_enable();
-}
-
static void slb_cache_slbie_kernel(unsigned int index)
{
unsigned long slbie_data = get_paca()->slb_cache[index];
--
2.50.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC 2/8] book3s64/hash: Restrict stress_hpt_struct memblock region to within RMA limit
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 1/8] powerpc/mm: Fix SLB multihit issue during SLB preload Ritesh Harjani (IBM)
@ 2025-08-30 3:51 ` Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 3/8] book3s64/hash: Fix phys_addr_t printf format in htab_initialize() Ritesh Harjani (IBM)
` (5 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev
Cc: Ritesh Harjani (IBM), Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Paul Mackerras,
Aneesh Kumar K.V, Donet Tom
When HV=0 & IR/DR=0, the Hash MMU is said to be in Virtual Real
Addressing Mode during early boot. During this, we should ensure that
memory region allocations for stress_hpt_struct should happen from
within RMA region as otherwise the boot might get stuck while doing
memset of this region.
History behind why do we have RMA region limitation is better explained
in these 2 patches [1] & [2]. This patch ensures that memset to
stress_hpt_struct during early boot does not cross ppc64_rma_size
boundary.
[1]: https://lore.kernel.org/all/20190710052018.14628-1-sjitindarsingh@gmail.com/
[2]: https://lore.kernel.org/all/87wp54usvj.fsf@linux.vnet.ibm.com/
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6b34a099faa12 ("powerpc/64s/hash: add stress_hpt kernel boot option to increase hash faults")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 4693c464fc5a..1e062056cfb8 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1302,11 +1302,14 @@ static void __init htab_initialize(void)
unsigned long table;
unsigned long pteg_count;
unsigned long prot;
- phys_addr_t base = 0, size = 0, end;
+ phys_addr_t base = 0, size = 0, end, limit = MEMBLOCK_ALLOC_ANYWHERE;
u64 i;
DBG(" -> htab_initialize()\n");
+ if (firmware_has_feature(FW_FEATURE_LPAR))
+ limit = ppc64_rma_size;
+
if (mmu_has_feature(MMU_FTR_1T_SEGMENT)) {
mmu_kernel_ssize = MMU_SEGSIZE_1T;
mmu_highuser_ssize = MMU_SEGSIZE_1T;
@@ -1322,7 +1325,7 @@ static void __init htab_initialize(void)
// Too early to use nr_cpu_ids, so use NR_CPUS
tmp = memblock_phys_alloc_range(sizeof(struct stress_hpt_struct) * NR_CPUS,
__alignof__(struct stress_hpt_struct),
- 0, MEMBLOCK_ALLOC_ANYWHERE);
+ MEMBLOCK_LOW_LIMIT, limit);
memset((void *)tmp, 0xff, sizeof(struct stress_hpt_struct) * NR_CPUS);
stress_hpt_struct = __va(tmp);
@@ -1356,11 +1359,10 @@ static void __init htab_initialize(void)
mmu_hash_ops.hpte_clear_all();
#endif
} else {
- unsigned long limit = MEMBLOCK_ALLOC_ANYWHERE;
table = memblock_phys_alloc_range(htab_size_bytes,
htab_size_bytes,
- 0, limit);
+ MEMBLOCK_LOW_LIMIT, limit);
if (!table)
panic("ERROR: Failed to allocate %pa bytes below %pa\n",
&htab_size_bytes, &limit);
--
2.50.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC 3/8] book3s64/hash: Fix phys_addr_t printf format in htab_initialize()
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 1/8] powerpc/mm: Fix SLB multihit issue during SLB preload Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 2/8] book3s64/hash: Restrict stress_hpt_struct memblock region to within RMA limit Ritesh Harjani (IBM)
@ 2025-08-30 3:51 ` Ritesh Harjani (IBM)
2025-08-30 6:26 ` Christophe Leroy
2025-08-30 3:51 ` [RFC 4/8] powerpc/ptdump/64: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format Ritesh Harjani (IBM)
` (4 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev
Cc: Ritesh Harjani (IBM), Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Paul Mackerras,
Aneesh Kumar K.V, Donet Tom
We get below errors when we try to enable debug logs in book3s64/hash_utils.c
This patch fixes these errors related to phys_addr_t printf format.
arch/powerpc/mm/book3s64/hash_utils.c: In function ‘htab_initialize’:
arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
1401 | DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
cc1: all warnings being treated as errors
make[6]: *** [../scripts/Makefile.build:287: arch/powerpc/mm/book3s64/hash_utils.o] Error 1
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/hash_utils.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 1e062056cfb8..495b6da6f5d4 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1394,7 +1394,7 @@ static void __init htab_initialize(void)
size = end - base;
base = (unsigned long)__va(base);
- DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
+ DBG("creating mapping for region: %llx..%llx (prot: %lx)\n",
base, size, prot);
if ((base + size) >= H_VMALLOC_START) {
--
2.50.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC 4/8] powerpc/ptdump/64: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (2 preceding siblings ...)
2025-08-30 3:51 ` [RFC 3/8] book3s64/hash: Fix phys_addr_t printf format in htab_initialize() Ritesh Harjani (IBM)
@ 2025-08-30 3:51 ` Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 5/8] powerpc/ptdump: Dump PXX level info for kernel_page_tables Ritesh Harjani (IBM)
` (3 subsequent siblings)
7 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev
Cc: Ritesh Harjani (IBM), Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Paul Mackerras,
Aneesh Kumar K.V, Donet Tom
HPTE format was changed since Power9 (ISA 3.0) onwards. While dumping
kernel hash page tables, nothing gets printed on powernv P9+. This patch
utilizes the helpers added in the patch tagged as fixes, to convert new
format to old format and dump the hptes. This fix is only needed for
native_find() (powernv), since pseries continues to work fine with the
old format.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Fixes: 6b243fcfb5f1e ("powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/ptdump/hashpagetable.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/powerpc/mm/ptdump/hashpagetable.c b/arch/powerpc/mm/ptdump/hashpagetable.c
index a6baa6166d94..671d0dc00c6d 100644
--- a/arch/powerpc/mm/ptdump/hashpagetable.c
+++ b/arch/powerpc/mm/ptdump/hashpagetable.c
@@ -216,6 +216,8 @@ static int native_find(unsigned long ea, int psize, bool primary, u64 *v, u64
vpn = hpt_vpn(ea, vsid, ssize);
hash = hpt_hash(vpn, shift, ssize);
want_v = hpte_encode_avpn(vpn, psize, ssize);
+ if (cpu_has_feature(CPU_FTR_ARCH_300))
+ want_v = hpte_old_to_new_v(want_v);
/* to check in the secondary hash table, we invert the hash */
if (!primary)
@@ -229,6 +231,10 @@ static int native_find(unsigned long ea, int psize, bool primary, u64 *v, u64
/* HPTE matches */
*v = be64_to_cpu(hptep->v);
*r = be64_to_cpu(hptep->r);
+ if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ *v = hpte_new_to_old_v(*v, *r);
+ *r = hpte_new_to_old_r(*r);
+ }
return 0;
}
++hpte_group;
--
2.50.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC 5/8] powerpc/ptdump: Dump PXX level info for kernel_page_tables
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (3 preceding siblings ...)
2025-08-30 3:51 ` [RFC 4/8] powerpc/ptdump/64: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format Ritesh Harjani (IBM)
@ 2025-08-30 3:51 ` Ritesh Harjani (IBM)
2025-08-30 6:31 ` Christophe Leroy
2025-08-30 3:51 ` [RFC 6/8] powerpc/book3s64/slb: Make preload_add return type as void Ritesh Harjani (IBM)
` (2 subsequent siblings)
7 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev
Cc: Ritesh Harjani (IBM), Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Paul Mackerras,
Aneesh Kumar K.V, Donet Tom
This patch adds PGD/PUD/PMD/PTE level information while dumping kernel
page tables. Before this patch it was hard to identify which entries
belongs to which page table level e.g.
~ # dmesg |grep -i radix
[0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000005400000 with 2.00 MiB pages (exec)
[0.000000] radix-mmu: Mapped 0x0000000005400000-0x0000000040000000 with 2.00 MiB pages
[0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
[0.000000] radix-mmu: Initializing Radix MMU
Before:
---[ Start of kernel VM ]---
0xc000000000000000-0xc000000003ffffff XXX 64M r X pte valid present dirty accessed
0xc000000004000000-0xc00000003fffffff XXX 960M r w pte valid present dirty accessed
0xc000000040000000-0xc0000000ffffffff XXX 3G r w pte valid present dirty accessed
...
---[ vmemmap start ]---
0xc00c000000000000-0xc00c0000003fffff XXX 4M r w pte valid present dirty accessed
After:
---[ Start of kernel VM ]---
0xc000000000000000-0xc000000003ffffff XXX 64M PMD r X pte valid present dirty accessed
0xc000000004000000-0xc00000003fffffff XXX 960M PMD r w pte valid present dirty accessed
0xc000000040000000-0xc0000000ffffffff XXX 3G PUD r w pte valid present dirty accessed
...
---[ vmemmap start ]---
0xc00c000000000000-0xc00c0000003fffff XXX 4M PMD r w pte valid present dirty accessed
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/ptdump/8xx.c | 5 +++++
arch/powerpc/mm/ptdump/book3s64.c | 5 +++++
arch/powerpc/mm/ptdump/ptdump.c | 1 +
arch/powerpc/mm/ptdump/ptdump.h | 1 +
arch/powerpc/mm/ptdump/shared.c | 5 +++++
5 files changed, 17 insertions(+)
diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c
index b5c79b11ea3c..1dc0f2438a73 100644
--- a/arch/powerpc/mm/ptdump/8xx.c
+++ b/arch/powerpc/mm/ptdump/8xx.c
@@ -71,18 +71,23 @@ static const struct flag_info flag_array[] = {
struct pgtable_level pg_level[5] = {
{ /* pgd */
+ .name = "PGD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* p4d */
+ .name = "P4D",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pud */
+ .name = "PUD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pmd */
+ .name = "PMD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pte */
+ .name = "PTX",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
},
diff --git a/arch/powerpc/mm/ptdump/book3s64.c b/arch/powerpc/mm/ptdump/book3s64.c
index 5ad92d9dc5d1..79c9a8391042 100644
--- a/arch/powerpc/mm/ptdump/book3s64.c
+++ b/arch/powerpc/mm/ptdump/book3s64.c
@@ -104,18 +104,23 @@ static const struct flag_info flag_array[] = {
struct pgtable_level pg_level[5] = {
{ /* pgd */
+ .name = "PGD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* p4d */
+ .name = "P4D",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pud */
+ .name = "PUD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pmd */
+ .name = "PMD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pte */
+ .name = "PTE",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
},
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index b2358d794855..0d499aebee72 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -178,6 +178,7 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
pt_dump_seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
pt_dump_size(st->seq, addr - st->start_address);
+ pt_dump_seq_printf(st->seq, "%s ", pg_level[st->level].name);
}
static void note_prot_wx(struct pg_state *st, unsigned long addr)
diff --git a/arch/powerpc/mm/ptdump/ptdump.h b/arch/powerpc/mm/ptdump/ptdump.h
index 154efae96ae0..88cf28c4138e 100644
--- a/arch/powerpc/mm/ptdump/ptdump.h
+++ b/arch/powerpc/mm/ptdump/ptdump.h
@@ -13,6 +13,7 @@ struct flag_info {
struct pgtable_level {
const struct flag_info *flag;
+ char name[4];
size_t num;
u64 mask;
};
diff --git a/arch/powerpc/mm/ptdump/shared.c b/arch/powerpc/mm/ptdump/shared.c
index 39c30c62b7ea..92d77f3e5155 100644
--- a/arch/powerpc/mm/ptdump/shared.c
+++ b/arch/powerpc/mm/ptdump/shared.c
@@ -69,18 +69,23 @@ static const struct flag_info flag_array[] = {
struct pgtable_level pg_level[5] = {
{ /* pgd */
+ .name = "PGD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* p4d */
+ .name = "P4D",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pud */
+ .name = "PUD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pmd */
+ .name = "PMD",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
}, { /* pte */
+ .name = "PTE",
.flag = flag_array,
.num = ARRAY_SIZE(flag_array),
},
--
2.50.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC 6/8] powerpc/book3s64/slb: Make preload_add return type as void
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (4 preceding siblings ...)
2025-08-30 3:51 ` [RFC 5/8] powerpc/ptdump: Dump PXX level info for kernel_page_tables Ritesh Harjani (IBM)
@ 2025-08-30 3:51 ` Ritesh Harjani (IBM)
2025-08-30 6:36 ` Christophe Leroy
2025-08-30 3:51 ` [RFC 7/8] powerpc/book3s64/slb: Add no_slb_preload early cmdline param Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 8/8] powerpc/book3s64/slb: Add slb faults to vmstat Ritesh Harjani (IBM)
7 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev
Cc: Ritesh Harjani (IBM), Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Paul Mackerras,
Aneesh Kumar K.V, Donet Tom
We dropped preload_new_slb_context() in the previous patch. That means
we don't really need preload_add() return type anymore. So let's make
it's return type to void.
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/slb.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 7e053c561a09..780792b9a1e5 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -294,7 +294,7 @@ static bool preload_hit(struct thread_info *ti, unsigned long esid)
return false;
}
-static bool preload_add(struct thread_info *ti, unsigned long ea)
+static void preload_add(struct thread_info *ti, unsigned long ea)
{
unsigned char idx;
unsigned long esid;
@@ -308,7 +308,7 @@ static bool preload_add(struct thread_info *ti, unsigned long ea)
esid = ea >> SID_SHIFT;
if (preload_hit(ti, esid))
- return false;
+ return;
idx = (ti->slb_preload_tail + ti->slb_preload_nr) % SLB_PRELOAD_NR;
ti->slb_preload_esid[idx] = esid;
@@ -317,7 +317,7 @@ static bool preload_add(struct thread_info *ti, unsigned long ea)
else
ti->slb_preload_nr++;
- return true;
+ return;
}
static void preload_age(struct thread_info *ti)
--
2.50.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC 7/8] powerpc/book3s64/slb: Add no_slb_preload early cmdline param
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (5 preceding siblings ...)
2025-08-30 3:51 ` [RFC 6/8] powerpc/book3s64/slb: Make preload_add return type as void Ritesh Harjani (IBM)
@ 2025-08-30 3:51 ` Ritesh Harjani (IBM)
2025-08-30 6:42 ` Christophe Leroy
2025-08-30 3:51 ` [RFC 8/8] powerpc/book3s64/slb: Add slb faults to vmstat Ritesh Harjani (IBM)
7 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev
Cc: Ritesh Harjani (IBM), Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Paul Mackerras,
Aneesh Kumar K.V, Donet Tom
no_slb_preload cmdline can come useful in quickly disabling and/or
testing the performance impact of userspace slb preloads. Recently there
was a slb multi-hit issue due to slb preload cache which was very
difficult to triage. This cmdline option allows to quickly disable
preloads and verify if the issue exists in preload cache or somewhere
else. This can also be a useful option to see the effect of slb preloads
for any application workload e.g. number of slb faults with or w/o slb
preloads.
For e.g. with the next patch where we added slb_faults counter to /proc/vmstat:
with slb_preload:
slb_faults (minimal initrd boot): 15
slb_faults (full systemd boot): 300
with no_slb_preload:
slb_faults (minimal initrd boot): 33
slb_faults (full systemd boot): 138180
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
Documentation/admin-guide/kernel-parameters.txt | 3 +++
arch/powerpc/mm/book3s64/hash_utils.c | 3 +++
arch/powerpc/mm/book3s64/internal.h | 7 +++++++
arch/powerpc/mm/book3s64/slb.c | 15 +++++++++++++++
4 files changed, 28 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 747a55abf494..9a66f255b659 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -7135,6 +7135,9 @@
them frequently to increase the rate of SLB faults
on kernel addresses.
+ no_slb_preload [PPC,EARLY]
+ Disables slb preloading for userspace.
+
sunrpc.min_resvport=
sunrpc.max_resvport=
[NFS,SUNRPC]
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 495b6da6f5d4..abf703563ea3 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1319,6 +1319,9 @@ static void __init htab_initialize(void)
if (stress_slb_enabled)
static_branch_enable(&stress_slb_key);
+ if (no_slb_preload)
+ static_branch_enable(&no_slb_preload_key);
+
if (stress_hpt_enabled) {
unsigned long tmp;
static_branch_enable(&stress_hpt_key);
diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h
index c26a6f0c90fc..cad08d83369c 100644
--- a/arch/powerpc/mm/book3s64/internal.h
+++ b/arch/powerpc/mm/book3s64/internal.h
@@ -22,6 +22,13 @@ static inline bool stress_hpt(void)
return static_branch_unlikely(&stress_hpt_key);
}
+extern bool no_slb_preload;
+DECLARE_STATIC_KEY_FALSE(no_slb_preload_key);
+static inline bool slb_preload_disabled(void)
+{
+ return static_branch_unlikely(&no_slb_preload_key);
+}
+
void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 780792b9a1e5..297ab0e93c1e 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -42,6 +42,15 @@ early_param("stress_slb", parse_stress_slb);
__ro_after_init DEFINE_STATIC_KEY_FALSE(stress_slb_key);
+bool no_slb_preload __initdata;
+static int __init parse_no_slb_preload(char *p)
+{
+ no_slb_preload = true;
+ return 0;
+}
+early_param("no_slb_preload", parse_no_slb_preload);
+__ro_after_init DEFINE_STATIC_KEY_FALSE(no_slb_preload_key);
+
static void assert_slb_presence(bool present, unsigned long ea)
{
#ifdef CONFIG_DEBUG_VM
@@ -299,6 +308,9 @@ static void preload_add(struct thread_info *ti, unsigned long ea)
unsigned char idx;
unsigned long esid;
+ if (slb_preload_disabled())
+ return;
+
if (mmu_has_feature(MMU_FTR_1T_SEGMENT)) {
/* EAs are stored >> 28 so 256MB segments don't need clearing */
if (ea & ESID_MASK_1T)
@@ -414,6 +426,9 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
copy_mm_to_paca(mm);
+ if (slb_preload_disabled())
+ return;
+
/*
* We gradually age out SLBs after a number of context switches to
* reduce reload overhead of unused entries (like we do with FP/VEC
--
2.50.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [RFC 8/8] powerpc/book3s64/slb: Add slb faults to vmstat
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
` (6 preceding siblings ...)
2025-08-30 3:51 ` [RFC 7/8] powerpc/book3s64/slb: Add no_slb_preload early cmdline param Ritesh Harjani (IBM)
@ 2025-08-30 3:51 ` Ritesh Harjani (IBM)
2025-08-30 4:45 ` Stephen Rothwell
7 siblings, 1 reply; 19+ messages in thread
From: Ritesh Harjani (IBM) @ 2025-08-30 3:51 UTC (permalink / raw)
To: linuxppc-dev
Cc: Ritesh Harjani (IBM), Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Donet Tom, Andrew Morton,
David Hildenbrand, Lorenzo Stoakes, linux-kernel, linux-mm
There were good optimization written in past which reduces the number of
slb faults e.g. during context switches [1]. However if one wants to
measure total number of slb faults then there is no easy way of
measuring one e.g. number of slb faults during bootup.
This adds slb faults as part of vmstat counter to easily measure total
number of slb faults for book3s64.
Note: slb fault handling is defined as raw interrupt handler which says:
* raw interrupt handlers must not enable or disable interrupts, or
* schedule, tracing and instrumentation (ftrace, lockdep, etc) would
* not be advisable either, although may be possible in a pinch, the
* trace will look odd at least.
Hence adding a vmstat counter looks a plausible and safe option, to at-
least measure the number of slb kernel & user faults in the system.
[1]: https://lore.kernel.org/all/20181013131836.26764-4-mpe@ellerman.id.au/
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
arch/powerpc/mm/book3s64/slb.c | 3 +++
include/linux/vm_event_item.h | 4 ++++
mm/vmstat.c | 5 +++++
3 files changed, 12 insertions(+)
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 297ab0e93c1e..064427af63f7 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -22,6 +22,7 @@
#include <linux/context_tracking.h>
#include <linux/mm_types.h>
#include <linux/pgtable.h>
+#include <linux/vmstat.h>
#include <asm/udbg.h>
#include <asm/text-patching.h>
@@ -780,6 +781,7 @@ DEFINE_INTERRUPT_HANDLER_RAW(do_slb_fault)
#ifdef CONFIG_DEBUG_VM
local_paca->in_kernel_slb_handler = 0;
#endif
+ count_vm_event(SLB_KERNEL_FAULTS);
return err;
} else {
struct mm_struct *mm = current->mm;
@@ -792,6 +794,7 @@ DEFINE_INTERRUPT_HANDLER_RAW(do_slb_fault)
if (!err)
preload_add(current_thread_info(), ea);
+ count_vm_event(SLB_USER_FAULTS);
return err;
}
}
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 9e15a088ba38..8aa34d0eee3b 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -156,6 +156,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
DIRECT_MAP_LEVEL2_COLLAPSE,
DIRECT_MAP_LEVEL3_COLLAPSE,
#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+ SLB_KERNEL_FAULTS,
+ SLB_USER_FAULTS,
+#endif
#ifdef CONFIG_PER_VMA_LOCK_STATS
VMA_LOCK_SUCCESS,
VMA_LOCK_ABORT,
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 71cd1ceba191..8cd17a5fc72b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1464,6 +1464,11 @@ const char * const vmstat_text[] = {
[I(DIRECT_MAP_LEVEL2_COLLAPSE)] = "direct_map_level2_collapses",
[I(DIRECT_MAP_LEVEL3_COLLAPSE)] = "direct_map_level3_collapses",
#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+ "slb_kernel_faults",
+ "slb_user_faults",
+#endif
+
#ifdef CONFIG_PER_VMA_LOCK_STATS
[I(VMA_LOCK_SUCCESS)] = "vma_lock_success",
[I(VMA_LOCK_ABORT)] = "vma_lock_abort",
--
2.50.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC 8/8] powerpc/book3s64/slb: Add slb faults to vmstat
2025-08-30 3:51 ` [RFC 8/8] powerpc/book3s64/slb: Add slb faults to vmstat Ritesh Harjani (IBM)
@ 2025-08-30 4:45 ` Stephen Rothwell
2025-08-30 4:56 ` Ritesh Harjani
0 siblings, 1 reply; 19+ messages in thread
From: Stephen Rothwell @ 2025-08-30 4:45 UTC (permalink / raw)
To: Ritesh Harjani (IBM)
Cc: linuxppc-dev, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Donet Tom, Andrew Morton,
David Hildenbrand, Lorenzo Stoakes, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 776 bytes --]
Hi Ritesh,
On Sat, 30 Aug 2025 09:21:47 +0530 "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> wrote:
>
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 71cd1ceba191..8cd17a5fc72b 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1464,6 +1464,11 @@ const char * const vmstat_text[] = {
> [I(DIRECT_MAP_LEVEL2_COLLAPSE)] = "direct_map_level2_collapses",
> [I(DIRECT_MAP_LEVEL3_COLLAPSE)] = "direct_map_level3_collapses",
> #endif
> +#ifdef CONFIG_PPC_BOOK3S_64
> + "slb_kernel_faults",
> + "slb_user_faults",
> +#endif
> +
> #ifdef CONFIG_PER_VMA_LOCK_STATS
> [I(VMA_LOCK_SUCCESS)] = "vma_lock_success",
> [I(VMA_LOCK_ABORT)] = "vma_lock_abort",
Should you be using explicit indexes and the I() macro?
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 8/8] powerpc/book3s64/slb: Add slb faults to vmstat
2025-08-30 4:45 ` Stephen Rothwell
@ 2025-08-30 4:56 ` Ritesh Harjani
0 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2025-08-30 4:56 UTC (permalink / raw)
To: Stephen Rothwell
Cc: linuxppc-dev, Madhavan Srinivasan, Michael Ellerman,
Nicholas Piggin, Christophe Leroy, Donet Tom, Andrew Morton,
David Hildenbrand, Lorenzo Stoakes, linux-kernel, linux-mm
Stephen Rothwell <sfr@canb.auug.org.au> writes:
> Hi Ritesh,
>
> On Sat, 30 Aug 2025 09:21:47 +0530 "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> wrote:
>>
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 71cd1ceba191..8cd17a5fc72b 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1464,6 +1464,11 @@ const char * const vmstat_text[] = {
>> [I(DIRECT_MAP_LEVEL2_COLLAPSE)] = "direct_map_level2_collapses",
>> [I(DIRECT_MAP_LEVEL3_COLLAPSE)] = "direct_map_level3_collapses",
>> #endif
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> + "slb_kernel_faults",
>> + "slb_user_faults",
>> +#endif
>> +
>> #ifdef CONFIG_PER_VMA_LOCK_STATS
>> [I(VMA_LOCK_SUCCESS)] = "vma_lock_success",
>> [I(VMA_LOCK_ABORT)] = "vma_lock_abort",
>
> Should you be using explicit indexes and the I() macro?
Aah yes, I guess the branch where I developed the patches was not
upstream tip and when I rebased and tested, I missed to see the I()
macro change in mm/vmstat.
Thanks Stephen for pointing it out. I will fix that in the next revision.
-ritesh
>
> --
> Cheers,
> Stephen Rothwell
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 3/8] book3s64/hash: Fix phys_addr_t printf format in htab_initialize()
2025-08-30 3:51 ` [RFC 3/8] book3s64/hash: Fix phys_addr_t printf format in htab_initialize() Ritesh Harjani (IBM)
@ 2025-08-30 6:26 ` Christophe Leroy
2025-08-30 7:30 ` Ritesh Harjani
0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2025-08-30 6:26 UTC (permalink / raw)
To: Ritesh Harjani (IBM), linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Paul Mackerras, Aneesh Kumar K.V, Donet Tom
Le 30/08/2025 à 05:51, Ritesh Harjani (IBM) a écrit :
> We get below errors when we try to enable debug logs in book3s64/hash_utils.c
> This patch fixes these errors related to phys_addr_t printf format.
>
> arch/powerpc/mm/book3s64/hash_utils.c: In function ‘htab_initialize’:
> arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
> 1401 | DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
> arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
> cc1: all warnings being treated as errors
> make[6]: *** [../scripts/Makefile.build:287: arch/powerpc/mm/book3s64/hash_utils.o] Error 1
>
> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
> Cc: Donet Tom <donettom@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> arch/powerpc/mm/book3s64/hash_utils.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 1e062056cfb8..495b6da6f5d4 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1394,7 +1394,7 @@ static void __init htab_initialize(void)
> size = end - base;
> base = (unsigned long)__va(base);
>
> - DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
> + DBG("creating mapping for region: %llx..%llx (prot: %lx)\n",
Use %pa
See
https://docs.kernel.org/core-api/printk-formats.html#physical-address-types-phys-addr-t
> base, size, prot);
>
> if ((base + size) >= H_VMALLOC_START) {
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 5/8] powerpc/ptdump: Dump PXX level info for kernel_page_tables
2025-08-30 3:51 ` [RFC 5/8] powerpc/ptdump: Dump PXX level info for kernel_page_tables Ritesh Harjani (IBM)
@ 2025-08-30 6:31 ` Christophe Leroy
2025-08-30 7:25 ` Ritesh Harjani
0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2025-08-30 6:31 UTC (permalink / raw)
To: Ritesh Harjani (IBM), linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Paul Mackerras, Aneesh Kumar K.V, Donet Tom
Le 30/08/2025 à 05:51, Ritesh Harjani (IBM) a écrit :
> This patch adds PGD/PUD/PMD/PTE level information while dumping kernel
> page tables. Before this patch it was hard to identify which entries
> belongs to which page table level e.g.
>
> ~ # dmesg |grep -i radix
> [0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000005400000 with 2.00 MiB pages (exec)
> [0.000000] radix-mmu: Mapped 0x0000000005400000-0x0000000040000000 with 2.00 MiB pages
> [0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
> [0.000000] radix-mmu: Initializing Radix MMU
>
> Before:
> ---[ Start of kernel VM ]---
> 0xc000000000000000-0xc000000003ffffff XXX 64M r X pte valid present dirty accessed
> 0xc000000004000000-0xc00000003fffffff XXX 960M r w pte valid present dirty accessed
> 0xc000000040000000-0xc0000000ffffffff XXX 3G r w pte valid present dirty accessed
> ...
> ---[ vmemmap start ]---
> 0xc00c000000000000-0xc00c0000003fffff XXX 4M r w pte valid present dirty accessed
>
> After:
> ---[ Start of kernel VM ]---
> 0xc000000000000000-0xc000000003ffffff XXX 64M PMD r X pte valid present dirty accessed
> 0xc000000004000000-0xc00000003fffffff XXX 960M PMD r w pte valid present dirty accessed
> 0xc000000040000000-0xc0000000ffffffff XXX 3G PUD r w pte valid present dirty accessed
> ...
> ---[ vmemmap start ]---
> 0xc00c000000000000-0xc00c0000003fffff XXX 4M PMD r w pte valid present dirty accessed
>
> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
> Cc: Donet Tom <donettom@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> arch/powerpc/mm/ptdump/8xx.c | 5 +++++
> arch/powerpc/mm/ptdump/book3s64.c | 5 +++++
> arch/powerpc/mm/ptdump/ptdump.c | 1 +
> arch/powerpc/mm/ptdump/ptdump.h | 1 +
> arch/powerpc/mm/ptdump/shared.c | 5 +++++
> 5 files changed, 17 insertions(+)
>
> diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c
> index b5c79b11ea3c..1dc0f2438a73 100644
> --- a/arch/powerpc/mm/ptdump/8xx.c
> +++ b/arch/powerpc/mm/ptdump/8xx.c
> @@ -71,18 +71,23 @@ static const struct flag_info flag_array[] = {
>
> struct pgtable_level pg_level[5] = {
> { /* pgd */
> + .name = "PGD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* p4d */
> + .name = "P4D",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pud */
> + .name = "PUD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pmd */
> + .name = "PMD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pte */
> + .name = "PTX",
Why PTX not PTE ?
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> },
> diff --git a/arch/powerpc/mm/ptdump/book3s64.c b/arch/powerpc/mm/ptdump/book3s64.c
> index 5ad92d9dc5d1..79c9a8391042 100644
> --- a/arch/powerpc/mm/ptdump/book3s64.c
> +++ b/arch/powerpc/mm/ptdump/book3s64.c
> @@ -104,18 +104,23 @@ static const struct flag_info flag_array[] = {
>
> struct pgtable_level pg_level[5] = {
> { /* pgd */
> + .name = "PGD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* p4d */
> + .name = "P4D",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pud */
> + .name = "PUD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pmd */
> + .name = "PMD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pte */
> + .name = "PTE",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> },
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index b2358d794855..0d499aebee72 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -178,6 +178,7 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
> pt_dump_seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
> pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
> pt_dump_size(st->seq, addr - st->start_address);
> + pt_dump_seq_printf(st->seq, "%s ", pg_level[st->level].name);
> }
>
> static void note_prot_wx(struct pg_state *st, unsigned long addr)
> diff --git a/arch/powerpc/mm/ptdump/ptdump.h b/arch/powerpc/mm/ptdump/ptdump.h
> index 154efae96ae0..88cf28c4138e 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.h
> +++ b/arch/powerpc/mm/ptdump/ptdump.h
> @@ -13,6 +13,7 @@ struct flag_info {
>
> struct pgtable_level {
> const struct flag_info *flag;
> + char name[4];
> size_t num;
> u64 mask;
> };
> diff --git a/arch/powerpc/mm/ptdump/shared.c b/arch/powerpc/mm/ptdump/shared.c
> index 39c30c62b7ea..92d77f3e5155 100644
> --- a/arch/powerpc/mm/ptdump/shared.c
> +++ b/arch/powerpc/mm/ptdump/shared.c
> @@ -69,18 +69,23 @@ static const struct flag_info flag_array[] = {
>
> struct pgtable_level pg_level[5] = {
> { /* pgd */
> + .name = "PGD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* p4d */
> + .name = "P4D",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pud */
> + .name = "PUD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pmd */
> + .name = "PMD",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> }, { /* pte */
> + .name = "PTE",
> .flag = flag_array,
> .num = ARRAY_SIZE(flag_array),
> },
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 6/8] powerpc/book3s64/slb: Make preload_add return type as void
2025-08-30 3:51 ` [RFC 6/8] powerpc/book3s64/slb: Make preload_add return type as void Ritesh Harjani (IBM)
@ 2025-08-30 6:36 ` Christophe Leroy
2025-08-30 7:27 ` Ritesh Harjani
0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2025-08-30 6:36 UTC (permalink / raw)
To: Ritesh Harjani (IBM), linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Paul Mackerras, Aneesh Kumar K.V, Donet Tom
Le 30/08/2025 à 05:51, Ritesh Harjani (IBM) a écrit :
> We dropped preload_new_slb_context() in the previous patch. That means
slb_setup_new_exec() was also checking preload_add() returned value but
is also gone.
> we don't really need preload_add() return type anymore. So let's make
> it's return type to void.
s/it's/its
>
> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
> Cc: Donet Tom <donettom@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> arch/powerpc/mm/book3s64/slb.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index 7e053c561a09..780792b9a1e5 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -294,7 +294,7 @@ static bool preload_hit(struct thread_info *ti, unsigned long esid)
> return false;
> }
>
> -static bool preload_add(struct thread_info *ti, unsigned long ea)
> +static void preload_add(struct thread_info *ti, unsigned long ea)
> {
> unsigned char idx;
> unsigned long esid;
> @@ -308,7 +308,7 @@ static bool preload_add(struct thread_info *ti, unsigned long ea)
> esid = ea >> SID_SHIFT;
>
> if (preload_hit(ti, esid))
> - return false;
> + return;
>
> idx = (ti->slb_preload_tail + ti->slb_preload_nr) % SLB_PRELOAD_NR;
> ti->slb_preload_esid[idx] = esid;
> @@ -317,7 +317,7 @@ static bool preload_add(struct thread_info *ti, unsigned long ea)
> else
> ti->slb_preload_nr++;
>
> - return true;
> + return;
You don't need a valueless return at the end of a function
> }
>
> static void preload_age(struct thread_info *ti)
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 7/8] powerpc/book3s64/slb: Add no_slb_preload early cmdline param
2025-08-30 3:51 ` [RFC 7/8] powerpc/book3s64/slb: Add no_slb_preload early cmdline param Ritesh Harjani (IBM)
@ 2025-08-30 6:42 ` Christophe Leroy
2025-08-30 10:11 ` Ritesh Harjani
0 siblings, 1 reply; 19+ messages in thread
From: Christophe Leroy @ 2025-08-30 6:42 UTC (permalink / raw)
To: Ritesh Harjani (IBM), linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Paul Mackerras, Aneesh Kumar K.V, Donet Tom
Le 30/08/2025 à 05:51, Ritesh Harjani (IBM) a écrit :
> no_slb_preload cmdline can come useful in quickly disabling and/or
> testing the performance impact of userspace slb preloads. Recently there
> was a slb multi-hit issue due to slb preload cache which was very
> difficult to triage. This cmdline option allows to quickly disable
> preloads and verify if the issue exists in preload cache or somewhere
> else. This can also be a useful option to see the effect of slb preloads
> for any application workload e.g. number of slb faults with or w/o slb
> preloads.
>
> For e.g. with the next patch where we added slb_faults counter to /proc/vmstat:
>
> with slb_preload:
> slb_faults (minimal initrd boot): 15
> slb_faults (full systemd boot): 300
>
> with no_slb_preload:
> slb_faults (minimal initrd boot): 33
> slb_faults (full systemd boot): 138180
>
> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
> Cc: Donet Tom <donettom@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 3 +++
> arch/powerpc/mm/book3s64/hash_utils.c | 3 +++
> arch/powerpc/mm/book3s64/internal.h | 7 +++++++
> arch/powerpc/mm/book3s64/slb.c | 15 +++++++++++++++
> 4 files changed, 28 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 747a55abf494..9a66f255b659 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -7135,6 +7135,9 @@
> them frequently to increase the rate of SLB faults
> on kernel addresses.
>
> + no_slb_preload [PPC,EARLY]
> + Disables slb preloading for userspace.
> +
> sunrpc.min_resvport=
> sunrpc.max_resvport=
> [NFS,SUNRPC]
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 495b6da6f5d4..abf703563ea3 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1319,6 +1319,9 @@ static void __init htab_initialize(void)
> if (stress_slb_enabled)
> static_branch_enable(&stress_slb_key);
>
> + if (no_slb_preload)
> + static_branch_enable(&no_slb_preload_key);
> +
> if (stress_hpt_enabled) {
> unsigned long tmp;
> static_branch_enable(&stress_hpt_key);
> diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h
> index c26a6f0c90fc..cad08d83369c 100644
> --- a/arch/powerpc/mm/book3s64/internal.h
> +++ b/arch/powerpc/mm/book3s64/internal.h
> @@ -22,6 +22,13 @@ static inline bool stress_hpt(void)
> return static_branch_unlikely(&stress_hpt_key);
> }
>
> +extern bool no_slb_preload;
> +DECLARE_STATIC_KEY_FALSE(no_slb_preload_key);
> +static inline bool slb_preload_disabled(void)
> +{
> + return static_branch_unlikely(&no_slb_preload_key);
> +}
> +
> void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
>
> void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index 780792b9a1e5..297ab0e93c1e 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -42,6 +42,15 @@ early_param("stress_slb", parse_stress_slb);
>
> __ro_after_init DEFINE_STATIC_KEY_FALSE(stress_slb_key);
>
> +bool no_slb_preload __initdata;
> +static int __init parse_no_slb_preload(char *p)
> +{
> + no_slb_preload = true;
> + return 0;
Can't you call static_branch_disable() directly from here and avoid
doing it in hash_utils.c ?
> +}
> +early_param("no_slb_preload", parse_no_slb_preload);
> +__ro_after_init DEFINE_STATIC_KEY_FALSE(no_slb_preload_key);
> +
> static void assert_slb_presence(bool present, unsigned long ea)
> {
> #ifdef CONFIG_DEBUG_VM
> @@ -299,6 +308,9 @@ static void preload_add(struct thread_info *ti, unsigned long ea)
> unsigned char idx;
> unsigned long esid;
>
> + if (slb_preload_disabled())
> + return;
> +
> if (mmu_has_feature(MMU_FTR_1T_SEGMENT)) {
> /* EAs are stored >> 28 so 256MB segments don't need clearing */
> if (ea & ESID_MASK_1T)
> @@ -414,6 +426,9 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>
> copy_mm_to_paca(mm);
>
> + if (slb_preload_disabled())
> + return;
> +
> /*
> * We gradually age out SLBs after a number of context switches to
> * reduce reload overhead of unused entries (like we do with FP/VEC
> --
> 2.50.1
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 5/8] powerpc/ptdump: Dump PXX level info for kernel_page_tables
2025-08-30 6:31 ` Christophe Leroy
@ 2025-08-30 7:25 ` Ritesh Harjani
0 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2025-08-30 7:25 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Paul Mackerras, Aneesh Kumar K.V, Donet Tom
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 30/08/2025 à 05:51, Ritesh Harjani (IBM) a écrit :
>> This patch adds PGD/PUD/PMD/PTE level information while dumping kernel
>> page tables. Before this patch it was hard to identify which entries
>> belongs to which page table level e.g.
>>
>> ~ # dmesg |grep -i radix
>> [0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000005400000 with 2.00 MiB pages (exec)
>> [0.000000] radix-mmu: Mapped 0x0000000005400000-0x0000000040000000 with 2.00 MiB pages
>> [0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000100000000 with 1.00 GiB pages
>> [0.000000] radix-mmu: Initializing Radix MMU
>>
>> Before:
>> ---[ Start of kernel VM ]---
>> 0xc000000000000000-0xc000000003ffffff XXX 64M r X pte valid present dirty accessed
>> 0xc000000004000000-0xc00000003fffffff XXX 960M r w pte valid present dirty accessed
>> 0xc000000040000000-0xc0000000ffffffff XXX 3G r w pte valid present dirty accessed
>> ...
>> ---[ vmemmap start ]---
>> 0xc00c000000000000-0xc00c0000003fffff XXX 4M r w pte valid present dirty accessed
>>
>> After:
>> ---[ Start of kernel VM ]---
>> 0xc000000000000000-0xc000000003ffffff XXX 64M PMD r X pte valid present dirty accessed
>> 0xc000000004000000-0xc00000003fffffff XXX 960M PMD r w pte valid present dirty accessed
>> 0xc000000040000000-0xc0000000ffffffff XXX 3G PUD r w pte valid present dirty accessed
>> ...
>> ---[ vmemmap start ]---
>> 0xc00c000000000000-0xc00c0000003fffff XXX 4M PMD r w pte valid present dirty accessed
>>
>> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Nicholas Piggin <npiggin@gmail.com>
>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
>> Cc: Donet Tom <donettom@linux.ibm.com>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> arch/powerpc/mm/ptdump/8xx.c | 5 +++++
>> arch/powerpc/mm/ptdump/book3s64.c | 5 +++++
>> arch/powerpc/mm/ptdump/ptdump.c | 1 +
>> arch/powerpc/mm/ptdump/ptdump.h | 1 +
>> arch/powerpc/mm/ptdump/shared.c | 5 +++++
>> 5 files changed, 17 insertions(+)
>>
>> diff --git a/arch/powerpc/mm/ptdump/8xx.c b/arch/powerpc/mm/ptdump/8xx.c
>> index b5c79b11ea3c..1dc0f2438a73 100644
>> --- a/arch/powerpc/mm/ptdump/8xx.c
>> +++ b/arch/powerpc/mm/ptdump/8xx.c
>> @@ -71,18 +71,23 @@ static const struct flag_info flag_array[] = {
>>
>> struct pgtable_level pg_level[5] = {
>> { /* pgd */
>> + .name = "PGD",
>> .flag = flag_array,
>> .num = ARRAY_SIZE(flag_array),
>> }, { /* p4d */
>> + .name = "P4D",
>> .flag = flag_array,
>> .num = ARRAY_SIZE(flag_array),
>> }, { /* pud */
>> + .name = "PUD",
>> .flag = flag_array,
>> .num = ARRAY_SIZE(flag_array),
>> }, { /* pmd */
>> + .name = "PMD",
>> .flag = flag_array,
>> .num = ARRAY_SIZE(flag_array),
>> }, { /* pte */
>> + .name = "PTX",
>
> Why PTX not PTE ?
>
My bad I was checking something on 8xx and I missed reverting that back.
Thanks for pointing out. Will fix it in the next revision.
-ritesh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 6/8] powerpc/book3s64/slb: Make preload_add return type as void
2025-08-30 6:36 ` Christophe Leroy
@ 2025-08-30 7:27 ` Ritesh Harjani
0 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2025-08-30 7:27 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Paul Mackerras, Aneesh Kumar K.V, Donet Tom
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 30/08/2025 à 05:51, Ritesh Harjani (IBM) a écrit :
>> We dropped preload_new_slb_context() in the previous patch. That means
>
> slb_setup_new_exec() was also checking preload_add() returned value but
> is also gone.
>
Right. Will add that.
>> we don't really need preload_add() return type anymore. So let's make
>> it's return type to void.
>
> s/it's/its
>
Sure.
>>
>> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Nicholas Piggin <npiggin@gmail.com>
>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
>> Cc: Donet Tom <donettom@linux.ibm.com>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> arch/powerpc/mm/book3s64/slb.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
>> index 7e053c561a09..780792b9a1e5 100644
>> --- a/arch/powerpc/mm/book3s64/slb.c
>> +++ b/arch/powerpc/mm/book3s64/slb.c
>> @@ -294,7 +294,7 @@ static bool preload_hit(struct thread_info *ti, unsigned long esid)
>> return false;
>> }
>>
>> -static bool preload_add(struct thread_info *ti, unsigned long ea)
>> +static void preload_add(struct thread_info *ti, unsigned long ea)
>> {
>> unsigned char idx;
>> unsigned long esid;
>> @@ -308,7 +308,7 @@ static bool preload_add(struct thread_info *ti, unsigned long ea)
>> esid = ea >> SID_SHIFT;
>>
>> if (preload_hit(ti, esid))
>> - return false;
>> + return;
>>
>> idx = (ti->slb_preload_tail + ti->slb_preload_nr) % SLB_PRELOAD_NR;
>> ti->slb_preload_esid[idx] = esid;
>> @@ -317,7 +317,7 @@ static bool preload_add(struct thread_info *ti, unsigned long ea)
>> else
>> ti->slb_preload_nr++;
>>
>> - return true;
>> + return;
>
> You don't need a valueless return at the end of a function
>
Right make sense. I will fix these in the next revision.
Thanks for the review!
-ritesh
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 3/8] book3s64/hash: Fix phys_addr_t printf format in htab_initialize()
2025-08-30 6:26 ` Christophe Leroy
@ 2025-08-30 7:30 ` Ritesh Harjani
0 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2025-08-30 7:30 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Paul Mackerras, Aneesh Kumar K.V, Donet Tom
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 30/08/2025 à 05:51, Ritesh Harjani (IBM) a écrit :
>> We get below errors when we try to enable debug logs in book3s64/hash_utils.c
>> This patch fixes these errors related to phys_addr_t printf format.
>>
>> arch/powerpc/mm/book3s64/hash_utils.c: In function ‘htab_initialize’:
>> arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
>> 1401 | DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
>> arch/powerpc/mm/book3s64/hash_utils.c:1401:21: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘phys_addr_t’ {aka ‘long long unsigned int’} [-Werror=format=]
>> cc1: all warnings being treated as errors
>> make[6]: *** [../scripts/Makefile.build:287: arch/powerpc/mm/book3s64/hash_utils.o] Error 1
>>
>> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Nicholas Piggin <npiggin@gmail.com>
>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
>> Cc: Donet Tom <donettom@linux.ibm.com>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> arch/powerpc/mm/book3s64/hash_utils.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
>> index 1e062056cfb8..495b6da6f5d4 100644
>> --- a/arch/powerpc/mm/book3s64/hash_utils.c
>> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
>> @@ -1394,7 +1394,7 @@ static void __init htab_initialize(void)
>> size = end - base;
>> base = (unsigned long)__va(base);
>>
>> - DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
>> + DBG("creating mapping for region: %llx..%llx (prot: %lx)\n",
>
> Use %pa
>
> See
> https://docs.kernel.org/core-api/printk-formats.html#physical-address-types-phys-addr-t
>
Right. Make sense. Will change it in the next spin.
-ritesh
>> base, size, prot);
>>
>> if ((base + size) >= H_VMALLOC_START) {
>> --
>> 2.50.1
>>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC 7/8] powerpc/book3s64/slb: Add no_slb_preload early cmdline param
2025-08-30 6:42 ` Christophe Leroy
@ 2025-08-30 10:11 ` Ritesh Harjani
0 siblings, 0 replies; 19+ messages in thread
From: Ritesh Harjani @ 2025-08-30 10:11 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev
Cc: Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin,
Paul Mackerras, Aneesh Kumar K.V, Donet Tom
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 30/08/2025 à 05:51, Ritesh Harjani (IBM) a écrit :
>> no_slb_preload cmdline can come useful in quickly disabling and/or
>> testing the performance impact of userspace slb preloads. Recently there
>> was a slb multi-hit issue due to slb preload cache which was very
>> difficult to triage. This cmdline option allows to quickly disable
>> preloads and verify if the issue exists in preload cache or somewhere
>> else. This can also be a useful option to see the effect of slb preloads
>> for any application workload e.g. number of slb faults with or w/o slb
>> preloads.
>>
>> For e.g. with the next patch where we added slb_faults counter to /proc/vmstat:
>>
>> with slb_preload:
>> slb_faults (minimal initrd boot): 15
>> slb_faults (full systemd boot): 300
>>
>> with no_slb_preload:
>> slb_faults (minimal initrd boot): 33
>> slb_faults (full systemd boot): 138180
>>
>> Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Nicholas Piggin <npiggin@gmail.com>
>> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
>> Cc: Paul Mackerras <paulus@ozlabs.org>
>> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
>> Cc: Donet Tom <donettom@linux.ibm.com>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 3 +++
>> arch/powerpc/mm/book3s64/hash_utils.c | 3 +++
>> arch/powerpc/mm/book3s64/internal.h | 7 +++++++
>> arch/powerpc/mm/book3s64/slb.c | 15 +++++++++++++++
>> 4 files changed, 28 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 747a55abf494..9a66f255b659 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -7135,6 +7135,9 @@
>> them frequently to increase the rate of SLB faults
>> on kernel addresses.
>>
>> + no_slb_preload [PPC,EARLY]
>> + Disables slb preloading for userspace.
>> +
>> sunrpc.min_resvport=
>> sunrpc.max_resvport=
>> [NFS,SUNRPC]
>> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
>> index 495b6da6f5d4..abf703563ea3 100644
>> --- a/arch/powerpc/mm/book3s64/hash_utils.c
>> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
>> @@ -1319,6 +1319,9 @@ static void __init htab_initialize(void)
>> if (stress_slb_enabled)
>> static_branch_enable(&stress_slb_key);
>>
>> + if (no_slb_preload)
>> + static_branch_enable(&no_slb_preload_key);
>> +
>> if (stress_hpt_enabled) {
>> unsigned long tmp;
>> static_branch_enable(&stress_hpt_key);
>> diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h
>> index c26a6f0c90fc..cad08d83369c 100644
>> --- a/arch/powerpc/mm/book3s64/internal.h
>> +++ b/arch/powerpc/mm/book3s64/internal.h
>> @@ -22,6 +22,13 @@ static inline bool stress_hpt(void)
>> return static_branch_unlikely(&stress_hpt_key);
>> }
>>
>> +extern bool no_slb_preload;
>> +DECLARE_STATIC_KEY_FALSE(no_slb_preload_key);
>> +static inline bool slb_preload_disabled(void)
>> +{
>> + return static_branch_unlikely(&no_slb_preload_key);
>> +}
>> +
>> void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
>>
>> void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
>> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
>> index 780792b9a1e5..297ab0e93c1e 100644
>> --- a/arch/powerpc/mm/book3s64/slb.c
>> +++ b/arch/powerpc/mm/book3s64/slb.c
>> @@ -42,6 +42,15 @@ early_param("stress_slb", parse_stress_slb);
>>
>> __ro_after_init DEFINE_STATIC_KEY_FALSE(stress_slb_key);
>>
>> +bool no_slb_preload __initdata;
>> +static int __init parse_no_slb_preload(char *p)
>> +{
>> + no_slb_preload = true;
>> + return 0;
>
> Can't you call static_branch_disable() directly from here and avoid
> doing it in hash_utils.c ?
>
parse_early_param() for cmdline options happens before the setup_feature_keys().
Hence we cannot call static_branch_disable() here in parse_no_slb_preload()
-ritesh
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-08-30 10:50 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-30 3:51 [PATCH 0/8] powerpc/book3s64: Hash / SLB fixes & improvements Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 1/8] powerpc/mm: Fix SLB multihit issue during SLB preload Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 2/8] book3s64/hash: Restrict stress_hpt_struct memblock region to within RMA limit Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 3/8] book3s64/hash: Fix phys_addr_t printf format in htab_initialize() Ritesh Harjani (IBM)
2025-08-30 6:26 ` Christophe Leroy
2025-08-30 7:30 ` Ritesh Harjani
2025-08-30 3:51 ` [RFC 4/8] powerpc/ptdump/64: Fix kernel_hash_pagetable dump for ISA v3.00 HPTE format Ritesh Harjani (IBM)
2025-08-30 3:51 ` [RFC 5/8] powerpc/ptdump: Dump PXX level info for kernel_page_tables Ritesh Harjani (IBM)
2025-08-30 6:31 ` Christophe Leroy
2025-08-30 7:25 ` Ritesh Harjani
2025-08-30 3:51 ` [RFC 6/8] powerpc/book3s64/slb: Make preload_add return type as void Ritesh Harjani (IBM)
2025-08-30 6:36 ` Christophe Leroy
2025-08-30 7:27 ` Ritesh Harjani
2025-08-30 3:51 ` [RFC 7/8] powerpc/book3s64/slb: Add no_slb_preload early cmdline param Ritesh Harjani (IBM)
2025-08-30 6:42 ` Christophe Leroy
2025-08-30 10:11 ` Ritesh Harjani
2025-08-30 3:51 ` [RFC 8/8] powerpc/book3s64/slb: Add slb faults to vmstat Ritesh Harjani (IBM)
2025-08-30 4:45 ` Stephen Rothwell
2025-08-30 4:56 ` Ritesh Harjani
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).