Re: [PATCH v2] powerpc/mm: Fix SLB multihit issue during SLB preload

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Donet Tom <donettom@linux.ibm.com>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Vishal Chourasia <vishalc@linux.ibm.com>,
	Donet Tom <donettom@linux.ibm.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] powerpc/mm: Fix SLB multihit issue during SLB preload
Date: Fri, 01 Aug 2025 22:56:37 +0530	[thread overview]
Message-ID: <87qzxvq7g2.fsf@gmail.com> (raw)
In-Reply-To: <20250801103747.21864-1-donettom@linux.ibm.com>

Donet Tom <donettom@linux.ibm.com> writes:

> On systems using the hash MMU, there is a software SLB preload cache that
> mirrors the entries loaded into the hardware SLB buffer. This preload
> cache is subject to periodic eviction — typically after every 256 context
> switches — to remove old entry.
>
> To optimize performance, the kernel skips switch_mmu_context() in
> switch_mm_irqs_off() when the prev and next mm_struct are the same.
> However, on hash MMU systems, this can lead to inconsistencies between
> the hardware SLB and the software preload cache.
>
> If an SLB entry for a process is evicted from the software cache on one
> CPU, and the same process later runs on another CPU without executing
> switch_mmu_context(), the hardware SLB may retain stale entries. If the
> kernel then attempts to reload that entry, it can trigger an SLB
> multi-hit error.
>
> The following timeline shows how stale SLB entries are created and can
> cause a multi-hit error when a process moves between CPUs without a
> MMU context switch.
>
> CPU 0                                   CPU 1
> -----                                    -----
> Process P
> exec                                    swapper/1
>  load_elf_binary
>   begin_new_exc
>     activate_mm
>      switch_mm_irqs_off
>       switch_mmu_context
>        switch_slb
>        /*
>         * This invalidates all
>         * the entries in the HW
>         * and setup the new HW
>         * SLB entries as per the
>         * preload cache.
>         */
> context_switch
> sched_migrate_task migrates process P to cpu-1
>
> Process swapper/0                       context switch (to process P)
> (uses mm_struct of Process P)           switch_mm_irqs_off()
>                                          switch_slb
>                                            load_slb++
>                                             /*
>                                             * load_slb becomes 0 here
>                                             * and we evict an entry from
>                                             * the preload cache with
>                                             * preload_age(). We still
>                                             * keep HW SLB and preload
>                                             * cache in sync, that is
>                                             * because all HW SLB entries
>                                             * anyways gets evicted in
>                                             * switch_slb during SLBIA.
>                                             * We then only add those
>                                             * entries back in HW SLB,
>                                             * which are currently
>                                             * present in preload_cache
>                                             * (after eviction).
>                                             */
>                                         load_elf_binary continues...
>                                          setup_new_exec()
>                                           slb_setup_new_exec()
>
>                                         sched_switch event
>                                         sched_migrate_task migrates
>                                         process P to cpu-0
>
> context_switch from swapper/0 to Process P
>  switch_mm_irqs_off()
>   /*
>    * Since both prev and next mm struct are same we don't call
>    * switch_mmu_context(). This will cause the HW SLB and SW preload
>    * cache to go out of sync in preload_new_slb_context. Because there
>    * was an SLB entry which was evicted from both HW and preload cache
>    * on cpu-1. Now later in preload_new_slb_context(), when we will try
>    * to add the same preload entry again, we will add this to the SW
>    * preload cache and then will add it to the HW SLB. Since on cpu-0
>    * this entry was never invalidated, hence adding this entry to the HW
>    * SLB will cause a SLB multi-hit error.
>    */
> load_elf_binary continues...
>  START_THREAD
>   start_thread
>    preload_new_slb_context
>    /*
>     * This tries to add a new EA to preload cache which was earlier
>     * evicted from both cpu-1 HW SLB and preload cache. This caused the
>     * HW SLB of cpu-0 to go out of sync with the SW preload cache. The
>     * reason for this was, that when we context switched back on CPU-0,
>     * we should have ideally called switch_mmu_context() which will
>     * bring the HW SLB entries on CPU-0 in sync with SW preload cache
>     * entries by setting up the mmu context properly. But we didn't do
>     * that since the prev mm_struct running on cpu-0 was same as the
>     * next mm_struct (which is true for swapper / kernel threads). So
>     * now when we try to add this new entry into the HW SLB of cpu-0,
>     * we hit a SLB multi-hit error.
>     */
>
> WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
> assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
> Modules linked in:
> CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
> VOLUNTARY
> Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
> 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
> NIP:  c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
> REGS: c0000000497c77e0 TRAP: 0700   Not tainted  (6.16.0-rc3-dirty)
> MSR:  8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 28888482  XER: 00000000
> CFAR: c0000000001543b0 IRQMASK: 3
> <...>
> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
> LR [c0000000001543b4] slb_insert_entry+0x124/0x390
> Call Trace:
>   0x7fffceb5ffff (unreliable)
>   preload_new_slb_context+0x100/0x1a0
>   start_thread+0x26c/0x420
>   load_elf_binary+0x1b04/0x1c40
>   bprm_execve+0x358/0x680
>   do_execveat_common+0x1f8/0x240
>   sys_execve+0x58/0x70
>   system_call_exception+0x114/0x300
>   system_call_common+0x160/0x2c4
>
> To fix this issue, we add a code change to always switch the MMU context on
> hash MMU if the SLB preload cache has aged. With this change, the
> SLB multi-hit error no longer occurs.
>
> cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> cc: Michael Ellerman <mpe@ellerman.id.au>
> cc: Nicholas Piggin <npiggin@gmail.com>
> Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
> cc: stable@vger.kernel.org
> Suggested-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> ---
>
> v1 -> v2 : Changed commit message and added a comment in
> switch_mm_irqs_off()
>
> v1 - https://lore.kernel.org/all/20250731161027.966196-1-donettom@linux.ibm.com/

Thanks for adding the details in the commit msg. The change looks good
to me. Please feel free to add - 

Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

next prev parent reply	other threads:[~2025-08-01 17:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-01 10:37 [PATCH v2] powerpc/mm: Fix SLB multihit issue during SLB preload Donet Tom
2025-08-01 17:26 ` Ritesh Harjani [this message]
2025-08-04 12:14 ` Vishal Chourasia
2025-08-04 12:32 ` Nicholas Piggin
2025-08-04 13:20   ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87qzxvq7g2.fsf@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=donettom@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=vishalc@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.