From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Boqun Feng <boqun.feng@gmail.com>, Andrew Hunter <ahh@google.com>,
maged michael <maged.michael@gmail.com>,
gromer <gromer@google.com>, Avi Kivity <avi@scylladb.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Dave Watson <davejwatson@fb.com>,
x86@kernel.org
Subject: Re: [PATCH v4 for 4.14 3/3] membarrier: Document scheduler barrier requirements
Date: Tue, 26 Sep 2017 20:46:30 +0000 (UTC) [thread overview]
Message-ID: <1068963628.19295.1506458790412.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20170926175151.14264-3-mathieu.desnoyers@efficios.com>
----- On Sep 26, 2017, at 1:51 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:
> Document the membarrier requirement on having a full memory barrier in
> __schedule() after coming from user-space, before storing to rq->curr.
> It is provided by smp_mb__after_spinlock() in __schedule().
Missed a few maintainers that should have been CC'd. Adding them now.
This patch is aimed to go through Paul E. McKenney's tree.
Thanks,
Mathieu
>
> Document that membarrier requires a full barrier on transition from
> kernel thread to userspace thread. We currently have an implicit barrier
> from atomic_dec_and_test() in mmdrop() that ensures this.
>
> The x86 switch_mm_irqs_off() full barrier is currently provided by many
> cpumask update operations as well as write_cr3(). Document that
> write_cr3() provides this barrier.
>
> Changes since v1:
> - Update comments to match reality for code paths which are after
> storing to rq->curr, before returning to user-space.
> Changes since v2:
> - Update changelog (smp_mb__before_spinlock -> smp_mb__after_spinlock).
> Changes since v3:
> - Clarify comments following feeback from Peter Zijlstra.
>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Maged Michael <maged.michael@gmail.com>
> CC: gromer@google.com
> CC: Avi Kivity <avi@scylladb.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: Dave Watson <davejwatson@fb.com>
> ---
> arch/x86/mm/tlb.c | 5 +++++
> include/linux/sched/mm.h | 5 +++++
> kernel/sched/core.c | 38 +++++++++++++++++++++++++++-----------
> 3 files changed, 37 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 93fe97cce581..5ba86b85953b 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -143,6 +143,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct
> mm_struct *next,
> }
> #endif
>
> + /*
> + * The membarrier system call requires a full memory barrier
> + * before returning to user-space, after storing to rq->curr.
> + * Writing to CR3 provides that full memory barrier.
> + */
> if (real_prev == next) {
> VM_BUG_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
> next->context.ctx_id);
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index d3b81e48784d..1bd10c2c0893 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -38,6 +38,11 @@ static inline void mmgrab(struct mm_struct *mm)
> extern void __mmdrop(struct mm_struct *);
> static inline void mmdrop(struct mm_struct *mm)
> {
> + /*
> + * The implicit full barrier implied by atomic_dec_and_test is
> + * required by the membarrier system call before returning to
> + * user-space, after storing to rq->curr.
> + */
> if (unlikely(atomic_dec_and_test(&mm->mm_count)))
> __mmdrop(mm);
> }
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b9d731283946..6254f87645de 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2649,6 +2649,12 @@ static struct rq *finish_task_switch(struct task_struct
> *prev)
> finish_arch_post_lock_switch();
>
> fire_sched_in_preempt_notifiers(current);
> + /*
> + * When transitioning from a kernel thread to a userspace
> + * thread, mmdrop()'s implicit full barrier is required by the
> + * membarrier system call, because the current active_mm can
> + * become the current mm without going through switch_mm().
> + */
> if (mm)
> mmdrop(mm);
> if (unlikely(prev_state == TASK_DEAD)) {
> @@ -2754,6 +2760,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
> */
> arch_start_context_switch(prev);
>
> + /*
> + * If mm is non-NULL, we pass through switch_mm(). If mm is
> + * NULL, we will pass through mmdrop() in finish_task_switch().
> + * Both of these contain the full memory barrier required by
> + * membarrier after storing to rq->curr, before returning to
> + * user-space.
> + */
> if (!mm) {
> next->active_mm = oldmm;
> mmgrab(oldmm);
> @@ -3290,6 +3303,9 @@ static void __sched notrace __schedule(bool preempt)
> * Make sure that signal_pending_state()->signal_pending() below
> * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> * done by the caller to avoid the race with signal_wake_up().
> + *
> + * The membarrier system call requires a full memory barrier
> + * after coming from user-space, before storing to rq->curr.
> */
> rq_lock(rq, &rf);
> smp_mb__after_spinlock();
> @@ -3337,17 +3353,17 @@ static void __sched notrace __schedule(bool preempt)
> /*
> * The membarrier system call requires each architecture
> * to have a full memory barrier after updating
> - * rq->curr, before returning to user-space. For TSO
> - * (e.g. x86), the architecture must provide its own
> - * barrier in switch_mm(). For weakly ordered machines
> - * for which spin_unlock() acts as a full memory
> - * barrier, finish_lock_switch() in common code takes
> - * care of this barrier. For weakly ordered machines for
> - * which spin_unlock() acts as a RELEASE barrier (only
> - * arm64 and PowerPC), arm64 has a full barrier in
> - * switch_to(), and PowerPC has
> - * smp_mb__after_unlock_lock() before
> - * finish_lock_switch().
> + * rq->curr, before returning to user-space.
> + *
> + * Here are the schemes providing that barrier on the
> + * various architectures:
> + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc,
> + * - finish_lock_switch() for weakly-ordered
> + * architectures where spin_unlock is a full barrier,
> + * - switch_to() for arm64 (weakly-ordered, spin_unlock
> + * is a RELEASE barrier),
> + * - membarrier_arch_sched_in() for PowerPC,
> + * (weakly-ordered, spin_unlock is a RELEASE barrier).
> */
> ++*switch_count;
>
> --
> 2.11.0
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2017-09-26 20:45 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-26 17:51 [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command Mathieu Desnoyers
2017-09-26 17:51 ` [PATCH for 4.14 2/3] membarrier: selftest: Test private expedited cmd Mathieu Desnoyers
2017-09-26 19:41 ` Shuah Khan
2017-09-26 19:55 ` Mathieu Desnoyers
2017-09-26 20:08 ` Shuah Khan
2017-09-26 20:15 ` Mathieu Desnoyers
2017-09-26 20:34 ` Shuah Khan
2017-09-26 21:15 ` Greg Kroah-Hartman
2017-09-26 21:15 ` Greg Kroah-Hartman
2017-09-26 17:51 ` [PATCH v4 for 4.14 3/3] membarrier: Document scheduler barrier requirements Mathieu Desnoyers
2017-09-26 20:46 ` Mathieu Desnoyers [this message]
2017-09-26 20:43 ` [PATCH v4 for 4.14 1/3] membarrier: Provide register expedited private command Mathieu Desnoyers
2017-09-27 13:04 ` Nicholas Piggin
2017-09-28 13:31 ` Mathieu Desnoyers
2017-09-28 15:01 ` Nicholas Piggin
2017-09-28 15:29 ` Mathieu Desnoyers
2017-09-28 16:16 ` Nicholas Piggin
2017-09-28 18:28 ` Mathieu Desnoyers
2017-09-28 15:51 ` Peter Zijlstra
2017-09-28 16:27 ` Nicholas Piggin
2017-09-29 10:31 ` Peter Zijlstra
2017-09-29 11:38 ` Nicholas Piggin
2017-09-29 11:45 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1068963628.19295.1506458790412.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=ahh@google.com \
--cc=avi@scylladb.com \
--cc=benh@kernel.crashing.org \
--cc=boqun.feng@gmail.com \
--cc=davejwatson@fb.com \
--cc=gromer@google.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maged.michael@gmail.com \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=paulmck@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.