From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 229D920B3E; Tue, 23 Apr 2024 21:42:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713908533; cv=none; b=ji4bSmOgkHE+oSH3i0r9zPuy5Ix62TUIQJMhTm9r/w7m7gmjLdkIjCqpeqiU2EDQjf70lYZqGVfyGqfD364QGS5ia2z+FCdgH3NUC24QEIgvwTLDWAxuj+MDW1/Am5W0eO6eNROKczt9M33TngeU/81S0kc8xICrrwioCX4FKo4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713908533; c=relaxed/simple; bh=Ryewm76C+Tp0jXTFId8/0euSws61HTV6oKAYeNIjgFo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Nl2+JxI0yik4o66ToyczI7ax3j6WBbzt0MMKxwRLLwShytt3sJXJ9qq8E7sdMzdHCSa6QF8CfPaSN1lU0EQoUSYpLQ04a01HiY6/cr7GkjEAE/h/6SjvfZ9OkpL/NW4JYT9s2K4LoJHhz/p2CkU7t1NOjqGdl3v0ZOA5B05m+Nk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=e/uJ8LyA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="e/uJ8LyA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E55FCC32782; Tue, 23 Apr 2024 21:42:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1713908533; bh=Ryewm76C+Tp0jXTFId8/0euSws61HTV6oKAYeNIjgFo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=e/uJ8LyA4iZsaAIFEQViW5pzDaiWQTamXbDb6FZFbgOAX+VjoFH+d5tkm+8iDTQbN +09qzZe3yvGHfzQ8+XK4r54+clTCMVU6cw/jBZZ1ltvSux7GqbCFh+chFTUlTwVNb1 3TOiwvVk8hECNU/svCfRW2bCF3Pho21kitFC73TQ= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, "levi.yun" , Mathieu Desnoyers , Ingo Molnar , Linus Torvalds , Catalin Marinas , Dave Hansen Subject: [PATCH 6.8 125/158] sched: Add missing memory barrier in switch_mm_cid Date: Tue, 23 Apr 2024 14:39:07 -0700 Message-ID: <20240423213859.966625981@linuxfoundation.org> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240423213855.824778126@linuxfoundation.org> References: <20240423213855.824778126@linuxfoundation.org> User-Agent: quilt/0.67 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 6.8-stable review patch. If anyone has any objections, please let me know. ------------------ From: Mathieu Desnoyers commit fe90f3967bdb3e13f133e5f44025e15f943a99c5 upstream. Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb() which the core scheduler code has depended upon since commit: commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid") If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can unset the actively used cid when it fails to observe active task after it sets lazy_put. There *is* a memory barrier between storing to rq->curr and _return to userspace_ (as required by membarrier), but the rseq mm_cid has stricter requirements: the barrier needs to be issued between store to rq->curr and switch_mm_cid(), which happens earlier than: - spin_unlock(), - switch_to(). So it's fine when the architecture switch_mm() happens to have that barrier already, but less so when the architecture only provides the full barrier in switch_to() or spin_unlock(). It is a bug in the rseq switch_mm_cid() implementation. All architectures that don't have memory barriers in switch_mm(), but rather have the full barrier either in finish_lock_switch() or switch_to() have them too late for the needs of switch_mm_cid(). Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the generic barrier.h header, and use it in switch_mm_cid() for scheduler transitions where switch_mm() is expected to provide a memory barrier. Architectures can override smp_mb__after_switch_mm() if their switch_mm() implementation provides an implicit memory barrier. Override it with a no-op on x86 which implicitly provide this memory barrier by writing to CR3. Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid") Reported-by: levi.yun Signed-off-by: Mathieu Desnoyers Signed-off-by: Ingo Molnar Reviewed-by: Catalin Marinas # for arm64 Acked-by: Dave Hansen # for x86 Cc: # 6.4.x Cc: Linus Torvalds Link: https://lore.kernel.org/r/20240415152114.59122-2-mathieu.desnoyers@efficios.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/barrier.h | 3 +++ include/asm-generic/barrier.h | 8 ++++++++ kernel/sched/sched.h | 20 ++++++++++++++------ 3 files changed, 25 insertions(+), 6 deletions(-) --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -79,6 +79,9 @@ do { \ #define __smp_mb__before_atomic() do { } while (0) #define __smp_mb__after_atomic() do { } while (0) +/* Writing to CR3 provides a full memory barrier in switch_mm(). */ +#define smp_mb__after_switch_mm() do { } while (0) + #include #endif /* _ASM_X86_BARRIER_H */ --- a/include/asm-generic/barrier.h +++ b/include/asm-generic/barrier.h @@ -296,5 +296,13 @@ do { \ #define io_stop_wc() do { } while (0) #endif +/* + * Architectures that guarantee an implicit smp_mb() in switch_mm() + * can override smp_mb__after_switch_mm. + */ +#ifndef smp_mb__after_switch_mm +# define smp_mb__after_switch_mm() smp_mb() +#endif + #endif /* !__ASSEMBLY__ */ #endif /* __ASM_GENERIC_BARRIER_H */ --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -79,6 +79,8 @@ # include #endif +#include + #include "cpupri.h" #include "cpudeadline.h" @@ -3445,13 +3447,19 @@ static inline void switch_mm_cid(struct * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu]. * Provide it here. */ - if (!prev->mm) // from kernel + if (!prev->mm) { // from kernel smp_mb(); - /* - * user -> user transition guarantees a memory barrier through - * switch_mm() when current->mm changes. If current->mm is - * unchanged, no barrier is needed. - */ + } else { // from user + /* + * user->user transition relies on an implicit + * memory barrier in switch_mm() when + * current->mm changes. If the architecture + * switch_mm() does not have an implicit memory + * barrier, it is emitted here. If current->mm + * is unchanged, no barrier is needed. + */ + smp_mb__after_switch_mm(); + } } if (prev->mm_cid_active) { mm_cid_snapshot_time(rq, prev->mm);