All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, "levi . yun" <yeoreum.yun@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	stable@vger.kernel.org, Steven Rostedt <rostedt@goodmis.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Will Deacon <will@kernel.org>, Aaron Lu <aaron.lu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Subject: Re: [PATCH] sched: Add missing memory barrier in switch_mm_cid
Date: Fri, 12 Apr 2024 12:22:56 +0200	[thread overview]
Message-ID: <ZhkLgJ2ZkI3JO0m/@gmail.com> (raw)
In-Reply-To: <20240411174302.353889-1-mathieu.desnoyers@efficios.com>


* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
> which the core scheduler code has depended upon since commit:
> 
>     commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
> 
> If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
> unset the actively used cid when it fails to observe active task after it
> sets lazy_put.
> 
> There *is* a memory barrier between storing to rq->curr and _return to
> userspace_ (as required by membarrier), but the rseq mm_cid has stricter
> requirements: the barrier needs to be issued between store to rq->curr
> and switch_mm_cid(), which happens earlier than:
> 
> - spin_unlock(),
> - switch_to().
> 
> So it's fine when the architecture switch_mm() happens to have that
> barrier already, but less so when the architecture only provides the
> full barrier in switch_to() or spin_unlock().
> 
> It is a bug in the rseq switch_mm_cid() implementation. All architectures
> that don't have memory barriers in switch_mm(), but rather have the full
> barrier either in finish_lock_switch() or switch_to() have them too late
> for the needs of switch_mm_cid().
> 
> Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
> generic barrier.h header, and use it in switch_mm_cid() for scheduler
> transitions where switch_mm() is expected to provide a memory barrier.
> 
> Architectures can override smp_mb__after_switch_mm() if their
> switch_mm() implementation provides an implicit memory barrier.
> Override it with a no-op on x86 which implicitly provide this memory
> barrier by writing to CR3.
> 
> Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/
> Reported-by: levi.yun <yeoreum.yun@arm.com>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> # for arm64
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # for x86
> Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
> Cc: <stable@vger.kernel.org> # 6.4.x
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: levi.yun <yeoreum.yun@arm.com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Aaron Lu <aaron.lu@intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-arch@vger.kernel.org
> Cc: linux-mm@kvack.org
> Cc: x86@kernel.org
> ---
>  arch/x86/include/asm/barrier.h |  3 +++
>  include/asm-generic/barrier.h  |  8 ++++++++
>  kernel/sched/sched.h           | 20 ++++++++++++++------
>  3 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 0216f63a366b..d0795b5fab46 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -79,6 +79,9 @@ do {									\
>  #define __smp_mb__before_atomic()	do { } while (0)
>  #define __smp_mb__after_atomic()	do { } while (0)
>  
> +/* Writing to CR3 provides a full memory barrier in switch_mm(). */
> +#define smp_mb__after_switch_mm()	do { } while (0)
> +
>  #include <asm-generic/barrier.h>
>  
>  #endif /* _ASM_X86_BARRIER_H */
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index 961f4d88f9ef..5a6c94d7a598 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -296,5 +296,13 @@ do {									\
>  #define io_stop_wc() do { } while (0)
>  #endif
>  
> +/*
> + * Architectures that guarantee an implicit smp_mb() in switch_mm()
> + * can override smp_mb__after_switch_mm.
> + */
> +#ifndef smp_mb__after_switch_mm
> +#define smp_mb__after_switch_mm()	smp_mb()
> +#endif
> +
>  #endif /* !__ASSEMBLY__ */
>  #endif /* __ASM_GENERIC_BARRIER_H */
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 001fe047bd5d..35717359d3ca 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -79,6 +79,8 @@
>  # include <asm/paravirt_api_clock.h>
>  #endif
>  
> +#include <asm/barrier.h>
> +
>  #include "cpupri.h"
>  #include "cpudeadline.h"
>  
> @@ -3445,13 +3447,19 @@ static inline void switch_mm_cid(struct rq *rq,
>  		 * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
>  		 * Provide it here.
>  		 */
> -		if (!prev->mm)                          // from kernel
> +		if (!prev->mm) {                        // from kernel
>  			smp_mb();
> -		/*
> -		 * user -> user transition guarantees a memory barrier through
> -		 * switch_mm() when current->mm changes. If current->mm is
> -		 * unchanged, no barrier is needed.
> -		 */
> +		} else {				// from user
> +			/*
> +			 * user -> user transition relies on an implicit
> +			 * memory barrier in switch_mm() when
> +			 * current->mm changes. If the architecture
> +			 * switch_mm() does not have an implicit memory
> +			 * barrier, it is emitted here.  If current->mm
> +			 * is unchanged, no barrier is needed.
> +			 */
> +			smp_mb__after_switch_mm();
> +		}
>  	}
>  	if (prev->mm_cid_active) {
>  		mm_cid_snapshot_time(rq, prev->mm);

Please move switch_mm_cid() from sched.h to core.c, where its only user 
resides.

Thanks,

	Ingo

  reply	other threads:[~2024-04-12 10:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11 17:43 [PATCH] sched: Add missing memory barrier in switch_mm_cid Mathieu Desnoyers
2024-04-12 10:22 ` Ingo Molnar [this message]
2024-04-12 14:38   ` Mathieu Desnoyers
  -- strict thread matches above, loose matches on Subject: below --
2024-03-08 15:07 Mathieu Desnoyers
2024-03-19  9:20 ` Yeo Reum Yun
2024-04-08  9:38   ` Yeo Reum Yun
2024-04-10 15:22     ` Mathieu Desnoyers
2024-04-09 10:17 ` Catalin Marinas
2024-04-10 17:18 ` Mathieu Desnoyers
2024-04-11 14:29   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhkLgJ2ZkI3JO0m/@gmail.com \
    --to=mingo@kernel.org \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hpa@zytor.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.