From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: linux-kernel@vger.kernel.org,
Peter Zijlstra <peterz@infradead.org>,
Boqun Feng <boqun.feng@gmail.com>, Andrew Hunter <ahh@google.com>,
Maged Michael <maged.michael@gmail.com>,
gromer@google.com, Avi Kivity <avi@scylladb.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Dave Watson <davejwatson@fb.com>
Subject: Re: [PATCH v2] membarrier: Document scheduler barrier requirements
Date: Sat, 19 Aug 2017 22:05:46 -0700 [thread overview]
Message-ID: <20170820050546.GJ11320@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170819043916.18725-1-mathieu.desnoyers@efficios.com>
On Fri, Aug 18, 2017 at 09:39:16PM -0700, Mathieu Desnoyers wrote:
> Document the membarrier requirement on having a full memory barrier in
> __schedule() after coming from user-space, before storing to rq->curr.
> It is provided by smp_mb__before_spinlock() in __schedule().
>
> Document that membarrier requires a full barrier on transition from
> kernel thread to userspace thread, which skips the call to switch_mm(). We
> currently have an implicit barrier from atomic_dec_and_test() in mmdrop() that
> ensures this.
>
> The x86 switch_mm_irqs_off() full barrier is currently provided by many cpumask
> update operations as well as load_cr3(). Document that load_cr3() is providing
> this barrier.
>
> [ Rebased on top of linux-rcu for-mingo branch.
> Applies on top of "membarrier: Provide expedited private command". ]
I have queued this for review and testing, thank you!
Thanx, Paul
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> CC: Boqun Feng <boqun.feng@gmail.com>
> CC: Andrew Hunter <ahh@google.com>
> CC: Maged Michael <maged.michael@gmail.com>
> CC: gromer@google.com
> CC: Avi Kivity <avi@scylladb.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: Dave Watson <davejwatson@fb.com>
> ---
> arch/x86/mm/tlb.c | 3 +++
> include/linux/sched/mm.h | 4 ++++
> kernel/sched/core.c | 9 +++++++++
> 3 files changed, 16 insertions(+)
>
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 014d07a..cd815b6 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -133,6 +133,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> * and neither LOCK nor MFENCE orders them.
> * Fortunately, load_cr3() is serializing and gives the
> * ordering guarantee we need.
> + *
> + * This full barrier is also required by the membarrier
> + * system call.
> */
> load_cr3(next->pgd);
>
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 2b24a69..fe29d06 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -38,6 +38,10 @@ static inline void mmgrab(struct mm_struct *mm)
> extern void __mmdrop(struct mm_struct *);
> static inline void mmdrop(struct mm_struct *mm)
> {
> + /*
> + * The implicit full barrier implied by atomic_dec_and_test is
> + * required by the membarrier system call.
> + */
> if (unlikely(atomic_dec_and_test(&mm->mm_count)))
> __mmdrop(mm);
> }
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3f29c6a..b0f199f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2654,6 +2654,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> finish_arch_post_lock_switch();
>
> fire_sched_in_preempt_notifiers(current);
> + /*
> + * When transitioning from a kernel thread to a userspace
> + * thread, mmdrop()'s implicit full barrier is required by the
> + * membarrier system call, because the current active_mm can
> + * become the current mm without going through switch_mm().
> + */
> if (mm)
> mmdrop(mm);
> if (unlikely(prev_state == TASK_DEAD)) {
> @@ -3295,6 +3301,9 @@ static void __sched notrace __schedule(bool preempt)
> * Make sure that signal_pending_state()->signal_pending() below
> * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
> * done by the caller to avoid the race with signal_wake_up().
> + *
> + * The membarrier system call requires a full memory barrier
> + * after coming from user-space, before storing to rq->curr.
> */
> smp_mb__before_spinlock();
> rq_lock(rq, &rf);
> --
> 1.9.1
>
next prev parent reply other threads:[~2017-08-20 5:05 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-19 4:39 [PATCH v2] membarrier: Document scheduler barrier requirements Mathieu Desnoyers
2017-08-20 5:05 ` Paul E. McKenney [this message]
2017-08-21 8:42 ` Peter Zijlstra
2017-08-22 6:20 ` Paul E. McKenney
-- strict thread matches above, loose matches on Subject: below --
2017-09-19 19:56 Mathieu Desnoyers
2017-09-19 21:47 ` Andrea Parri
2017-09-19 22:02 ` Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170820050546.GJ11320@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=ahh@google.com \
--cc=avi@scylladb.com \
--cc=benh@kernel.crashing.org \
--cc=boqun.feng@gmail.com \
--cc=davejwatson@fb.com \
--cc=gromer@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maged.michael@gmail.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.