public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	Boqun Feng <boqun.feng@gmail.com>, Andrew Hunter <ahh@google.com>,
	maged michael <maged.michael@gmail.com>,
	gromer <gromer@google.com>, Avi Kivity <avi@scylladb.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Dave Watson <davejwatson@fb.com>,
	x86@kernel.org
Subject: Re: [PATCH v4 for 4.14 3/3] membarrier: Document scheduler barrier requirements
Date: Tue, 26 Sep 2017 20:46:30 +0000 (UTC)	[thread overview]
Message-ID: <1068963628.19295.1506458790412.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20170926175151.14264-3-mathieu.desnoyers@efficios.com>

----- On Sep 26, 2017, at 1:51 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> Document the membarrier requirement on having a full memory barrier in
> __schedule() after coming from user-space, before storing to rq->curr.
> It is provided by smp_mb__after_spinlock() in __schedule().

Missed a few maintainers that should have been CC'd. Adding them now.
This patch is aimed to go through Paul E. McKenney's tree.

Thanks,

Mathieu

> 
> Document that membarrier requires a full barrier on transition from
> kernel thread to userspace thread. We currently have an implicit barrier
> from atomic_dec_and_test() in mmdrop() that ensures this.
> 
> The x86 switch_mm_irqs_off() full barrier is currently provided by many
> cpumask update operations as well as write_cr3(). Document that
> write_cr3() provides this barrier.
> 
> Changes since v1:
> - Update comments to match reality for code paths which are after
>  storing to rq->curr, before returning to user-space.
> Changes since v2:
> - Update changelog (smp_mb__before_spinlock -> smp_mb__after_spinlock).
> Changes since v3:
> - Clarify comments following feeback from Peter Zijlstra.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Maged Michael <maged.michael@gmail.com>
> CC: gromer@google.com
> CC: Avi Kivity <avi@scylladb.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: Dave Watson <davejwatson@fb.com>
> ---
> arch/x86/mm/tlb.c        |  5 +++++
> include/linux/sched/mm.h |  5 +++++
> kernel/sched/core.c      | 38 +++++++++++++++++++++++++++-----------
> 3 files changed, 37 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 93fe97cce581..5ba86b85953b 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -143,6 +143,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct
> mm_struct *next,
> 	}
> #endif
> 
> +	/*
> +	 * The membarrier system call requires a full memory barrier
> +	 * before returning to user-space, after storing to rq->curr.
> +	 * Writing to CR3 provides that full memory barrier.
> +	 */
> 	if (real_prev == next) {
> 		VM_BUG_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
> 			  next->context.ctx_id);
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index d3b81e48784d..1bd10c2c0893 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -38,6 +38,11 @@ static inline void mmgrab(struct mm_struct *mm)
> extern void __mmdrop(struct mm_struct *);
> static inline void mmdrop(struct mm_struct *mm)
> {
> +	/*
> +	 * The implicit full barrier implied by atomic_dec_and_test is
> +	 * required by the membarrier system call before returning to
> +	 * user-space, after storing to rq->curr.
> +	 */
> 	if (unlikely(atomic_dec_and_test(&mm->mm_count)))
> 		__mmdrop(mm);
> }
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b9d731283946..6254f87645de 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2649,6 +2649,12 @@ static struct rq *finish_task_switch(struct task_struct
> *prev)
> 	finish_arch_post_lock_switch();
> 
> 	fire_sched_in_preempt_notifiers(current);
> +	/*
> +	 * When transitioning from a kernel thread to a userspace
> +	 * thread, mmdrop()'s implicit full barrier is required by the
> +	 * membarrier system call, because the current active_mm can
> +	 * become the current mm without going through switch_mm().
> +	 */
> 	if (mm)
> 		mmdrop(mm);
> 	if (unlikely(prev_state == TASK_DEAD)) {
> @@ -2754,6 +2760,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
> 	 */
> 	arch_start_context_switch(prev);
> 
> +	/*
> +	 * If mm is non-NULL, we pass through switch_mm(). If mm is
> +	 * NULL, we will pass through mmdrop() in finish_task_switch().
> +	 * Both of these contain the full memory barrier required by
> +	 * membarrier after storing to rq->curr, before returning to
> +	 * user-space.
> +	 */
> 	if (!mm) {
> 		next->active_mm = oldmm;
> 		mmgrab(oldmm);
> @@ -3290,6 +3303,9 @@ static void __sched notrace __schedule(bool preempt)
> 	 * Make sure that signal_pending_state()->signal_pending() below
> 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> 	 * done by the caller to avoid the race with signal_wake_up().
> +	 *
> +	 * The membarrier system call requires a full memory barrier
> +	 * after coming from user-space, before storing to rq->curr.
> 	 */
> 	rq_lock(rq, &rf);
> 	smp_mb__after_spinlock();
> @@ -3337,17 +3353,17 @@ static void __sched notrace __schedule(bool preempt)
> 		/*
> 		 * The membarrier system call requires each architecture
> 		 * to have a full memory barrier after updating
> -		 * rq->curr, before returning to user-space. For TSO
> -		 * (e.g. x86), the architecture must provide its own
> -		 * barrier in switch_mm(). For weakly ordered machines
> -		 * for which spin_unlock() acts as a full memory
> -		 * barrier, finish_lock_switch() in common code takes
> -		 * care of this barrier. For weakly ordered machines for
> -		 * which spin_unlock() acts as a RELEASE barrier (only
> -		 * arm64 and PowerPC), arm64 has a full barrier in
> -		 * switch_to(), and PowerPC has
> -		 * smp_mb__after_unlock_lock() before
> -		 * finish_lock_switch().
> +		 * rq->curr, before returning to user-space.
> +		 *
> +		 * Here are the schemes providing that barrier on the
> +		 * various architectures:
> +		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc,
> +		 * - finish_lock_switch() for weakly-ordered
> +		 *   architectures where spin_unlock is a full barrier,
> +		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
> +		 *   is a RELEASE barrier),
> +		 * - membarrier_arch_sched_in() for PowerPC,
> +		 *   (weakly-ordered, spin_unlock is a RELEASE barrier).
> 		 */
> 		++*switch_count;
> 
> --
> 2.11.0

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

  reply	other threads:[~2017-09-26 20:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-26 17:51 [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command Mathieu Desnoyers
2017-09-26 17:51 ` [PATCH for 4.14 2/3] membarrier: selftest: Test private expedited cmd Mathieu Desnoyers
2017-09-26 19:41   ` Shuah Khan
2017-09-26 19:55     ` Mathieu Desnoyers
2017-09-26 20:08       ` Shuah Khan
2017-09-26 20:15         ` Mathieu Desnoyers
2017-09-26 21:15           ` Greg Kroah-Hartman
2017-09-26 17:51 ` [PATCH v4 for 4.14 3/3] membarrier: Document scheduler barrier requirements Mathieu Desnoyers
2017-09-26 20:46   ` Mathieu Desnoyers [this message]
2017-09-26 20:43 ` [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command Mathieu Desnoyers
2017-09-27 13:04   ` Nicholas Piggin
2017-09-28 13:31     ` Mathieu Desnoyers
2017-09-28 15:01       ` Nicholas Piggin
2017-09-28 15:29         ` Mathieu Desnoyers
2017-09-28 16:16           ` Nicholas Piggin
2017-09-28 18:28             ` Mathieu Desnoyers
2017-09-28 15:51         ` Peter Zijlstra
2017-09-28 16:27           ` Nicholas Piggin
2017-09-29 10:31             ` Peter Zijlstra
2017-09-29 11:38               ` Nicholas Piggin
2017-09-29 11:45                 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1068963628.19295.1506458790412.JavaMail.zimbra@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=ahh@google.com \
    --cc=avi@scylladb.com \
    --cc=benh@kernel.crashing.org \
    --cc=boqun.feng@gmail.com \
    --cc=davejwatson@fb.com \
    --cc=gromer@google.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maged.michael@gmail.com \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox