From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Donet Tom <donettom@linux.ibm.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Vishal Chourasia <vishalc@linux.ibm.com>,
Donet Tom <donettom@linux.ibm.com>,
stable@vger.kernel.org
Subject: Re: [PATCH v2] powerpc/mm: Fix SLB multihit issue during SLB preload
Date: Fri, 01 Aug 2025 22:56:37 +0530 [thread overview]
Message-ID: <87qzxvq7g2.fsf@gmail.com> (raw)
In-Reply-To: <20250801103747.21864-1-donettom@linux.ibm.com>
Donet Tom <donettom@linux.ibm.com> writes:
> On systems using the hash MMU, there is a software SLB preload cache that
> mirrors the entries loaded into the hardware SLB buffer. This preload
> cache is subject to periodic eviction — typically after every 256 context
> switches — to remove old entry.
>
> To optimize performance, the kernel skips switch_mmu_context() in
> switch_mm_irqs_off() when the prev and next mm_struct are the same.
> However, on hash MMU systems, this can lead to inconsistencies between
> the hardware SLB and the software preload cache.
>
> If an SLB entry for a process is evicted from the software cache on one
> CPU, and the same process later runs on another CPU without executing
> switch_mmu_context(), the hardware SLB may retain stale entries. If the
> kernel then attempts to reload that entry, it can trigger an SLB
> multi-hit error.
>
> The following timeline shows how stale SLB entries are created and can
> cause a multi-hit error when a process moves between CPUs without a
> MMU context switch.
>
> CPU 0 CPU 1
> ----- -----
> Process P
> exec swapper/1
> load_elf_binary
> begin_new_exc
> activate_mm
> switch_mm_irqs_off
> switch_mmu_context
> switch_slb
> /*
> * This invalidates all
> * the entries in the HW
> * and setup the new HW
> * SLB entries as per the
> * preload cache.
> */
> context_switch
> sched_migrate_task migrates process P to cpu-1
>
> Process swapper/0 context switch (to process P)
> (uses mm_struct of Process P) switch_mm_irqs_off()
> switch_slb
> load_slb++
> /*
> * load_slb becomes 0 here
> * and we evict an entry from
> * the preload cache with
> * preload_age(). We still
> * keep HW SLB and preload
> * cache in sync, that is
> * because all HW SLB entries
> * anyways gets evicted in
> * switch_slb during SLBIA.
> * We then only add those
> * entries back in HW SLB,
> * which are currently
> * present in preload_cache
> * (after eviction).
> */
> load_elf_binary continues...
> setup_new_exec()
> slb_setup_new_exec()
>
> sched_switch event
> sched_migrate_task migrates
> process P to cpu-0
>
> context_switch from swapper/0 to Process P
> switch_mm_irqs_off()
> /*
> * Since both prev and next mm struct are same we don't call
> * switch_mmu_context(). This will cause the HW SLB and SW preload
> * cache to go out of sync in preload_new_slb_context. Because there
> * was an SLB entry which was evicted from both HW and preload cache
> * on cpu-1. Now later in preload_new_slb_context(), when we will try
> * to add the same preload entry again, we will add this to the SW
> * preload cache and then will add it to the HW SLB. Since on cpu-0
> * this entry was never invalidated, hence adding this entry to the HW
> * SLB will cause a SLB multi-hit error.
> */
> load_elf_binary continues...
> START_THREAD
> start_thread
> preload_new_slb_context
> /*
> * This tries to add a new EA to preload cache which was earlier
> * evicted from both cpu-1 HW SLB and preload cache. This caused the
> * HW SLB of cpu-0 to go out of sync with the SW preload cache. The
> * reason for this was, that when we context switched back on CPU-0,
> * we should have ideally called switch_mmu_context() which will
> * bring the HW SLB entries on CPU-0 in sync with SW preload cache
> * entries by setting up the mmu context properly. But we didn't do
> * that since the prev mm_struct running on cpu-0 was same as the
> * next mm_struct (which is true for swapper / kernel threads). So
> * now when we try to add this new entry into the HW SLB of cpu-0,
> * we hit a SLB multi-hit error.
> */
>
> WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
> assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
> Modules linked in:
> CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
> VOLUNTARY
> Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
> 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
> NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
> REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty)
> MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000
> CFAR: c0000000001543b0 IRQMASK: 3
> <...>
> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
> LR [c0000000001543b4] slb_insert_entry+0x124/0x390
> Call Trace:
> 0x7fffceb5ffff (unreliable)
> preload_new_slb_context+0x100/0x1a0
> start_thread+0x26c/0x420
> load_elf_binary+0x1b04/0x1c40
> bprm_execve+0x358/0x680
> do_execveat_common+0x1f8/0x240
> sys_execve+0x58/0x70
> system_call_exception+0x114/0x300
> system_call_common+0x160/0x2c4
>
> To fix this issue, we add a code change to always switch the MMU context on
> hash MMU if the SLB preload cache has aged. With this change, the
> SLB multi-hit error no longer occurs.
>
> cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> cc: Michael Ellerman <mpe@ellerman.id.au>
> cc: Nicholas Piggin <npiggin@gmail.com>
> Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
> cc: stable@vger.kernel.org
> Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
>
> v1 -> v2 : Changed commit message and added a comment in
> switch_mm_irqs_off()
>
> v1 - https://lore.kernel.org/all/20250731161027.966196-1-donettom@linux.ibm.com/
Thanks for adding the details in the commit msg. The change looks good
to me. Please feel free to add -
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
next prev parent reply other threads:[~2025-08-01 17:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-01 10:37 [PATCH v2] powerpc/mm: Fix SLB multihit issue during SLB preload Donet Tom
2025-08-01 17:26 ` Ritesh Harjani [this message]
2025-08-04 12:14 ` Vishal Chourasia
2025-08-04 12:32 ` Nicholas Piggin
2025-08-04 13:20 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87qzxvq7g2.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=christophe.leroy@csgroup.eu \
--cc=donettom@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=stable@vger.kernel.org \
--cc=vishalc@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).