From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Steven Rostedt <rostedt@goodmis.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Nicholas Miell <nmiell@comcast.net>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>,
Alan Cox <gnomes@lxorguk.ukuu.org.uk>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Stephen Hemminger <stephen@networkplumber.org>,
Andrew Morton <akpm@linux-foundation.org>,
Josh Triplett <josh@joshtriplett.org>,
Thomas Gleixner <tglx@linutronix.de>,
David Howells <dhowells@redhat.com>,
Nick Piggin <npiggin@kernel.dk>
Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12)
Date: Tue, 17 Mar 2015 01:45:25 +0000 (UTC) [thread overview]
Message-ID: <910572156.13900.1426556725438.JavaMail.zimbra@efficios.com> (raw)
In-Reply-To: <20150316205435.GJ21418@twins.programming.kicks-ass.net>
----- Original Message -----
> From: "Peter Zijlstra" <peterz@infradead.org>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> Cc: linux-kernel@vger.kernel.org, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, "Steven Rostedt"
> <rostedt@goodmis.org>, "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>, "Nicholas Miell" <nmiell@comcast.net>,
> "Linus Torvalds" <torvalds@linux-foundation.org>, "Ingo Molnar" <mingo@redhat.com>, "Alan Cox"
> <gnomes@lxorguk.ukuu.org.uk>, "Lai Jiangshan" <laijs@cn.fujitsu.com>, "Stephen Hemminger"
> <stephen@networkplumber.org>, "Andrew Morton" <akpm@linux-foundation.org>, "Josh Triplett" <josh@joshtriplett.org>,
> "Thomas Gleixner" <tglx@linutronix.de>, "David Howells" <dhowells@redhat.com>, "Nick Piggin" <npiggin@kernel.dk>
> Sent: Monday, March 16, 2015 4:54:35 PM
> Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12)
>
> On Mon, Mar 16, 2015 at 06:53:35PM +0000, Mathieu Desnoyers wrote:
> > > I'm not entirely awake atm but I'm not seeing why it would need to be
> > > that strict; I think the current single MB on task switch is sufficient
> > > because if we're in the middle of schedule, userspace isn't actually
> > > running.
> > >
> > > So from the point of userspace the task switch is atomic. Therefore even
> > > if we do not get a barrier before setting ->curr, the expedited thing
> > > missing us doesn't matter as userspace cannot observe the difference.
> >
> > AFAIU, atomicity is not what matters here. It's more about memory ordering.
> > What is guaranteeing that upon entry in kernel-space, all prior memory
> > accesses (loads and stores) are ordered prior to following loads/stores ?
> >
> > The same applies when returning to user-space: what is guaranteeing that
> > all
> > prior loads/stores are ordered before the user-space loads/stores performed
> > after returning to user-space ?
>
> You're still one step ahead of me; why does this matter?
>
> Or put it another way; what can go wrong? By virtue of being in
> schedule() both tasks (prev and next) get an affective MB from the task
> switch.
>
> So even if we see the 'wrong' rq->curr, that CPU will still observe the
> MB by the time it gets to userspace.
>
> All of this is really only about userspace load/store ordering and the
> context switch already very much needs to guarantee userspace program
> order in the face of context switches.
Let's go through a memory ordering scenario to highlight my reasoning
there.
Let's consider the following memory barrier scenario performed in
user-space on an architecture with very relaxed ordering. PowerPC comes
to mind.
https://lwn.net/Articles/573436/
scenario 12:
CPU 0 CPU 1
CAO(x) = 1; r3 = CAO(y);
cmm_smp_wmb(); cmm_smp_rmb();
CAO(y) = 1; r4 = CAO(x);
BUG_ON(r3 == 1 && r4 == 0)
We tweak it to use sys_membarrier on CPU 1, and a simple compiler
barrier() on CPU 0:
CPU 0 CPU 1
CAO(x) = 1; r3 = CAO(y);
barrier(); sys_membarrier();
CAO(y) = 1; r4 = CAO(x);
BUG_ON(r3 == 1 && r4 == 0)
Now if CPU 1 executes sys_membarrier while CPU 0 is preempted after both
stores, we have:
CPU 0 CPU 1
CAO(x) = 1;
[1st store is slow to
reach other cores]
CAO(y) = 1;
[2nd store reaches other
cores more quickly]
[preempted]
r3 = CAO(y)
(may see y = 1)
sys_membarrier()
Scheduler changes rq->curr.
skips CPU 0, because rq->curr has
been updated.
[return to userspace]
r4 = CAO(x)
(may see x = 0)
BUG_ON(r3 == 1 && r4 == 0) -> fails.
load_cr3, with implied
memory barrier, comes
after CPU 1 has read "x".
The only way to make this scenario work is if a memory barrier is added
before updating rq->curr. (we could also do a similar scenario for the
needed barrier after store to rq->curr).
>
> > > > In order to be able to dereference rq->curr->mm without holding the
> > > > rq->lock, do you envision we should protect task reclaim with RCU-sched
> > > > ?
> > >
> > > A recent discussion had Linus suggest SLAB_DESTROY_BY_RCU, although I
> > > think Oleg did mention it would still be 'interesting'. I've not yet had
> > > time to really think about that.
> >
> > This might be an "interesting" modification. :) This could perhaps come
> > as an optimization later on ?
>
> Not really, again, take this for (;;) sys_membar(EXPEDITED) that'll
> generate horrendous rq lock contention, with or without the PRIVATE
> thing it'll pound a number of rq locks real bad.
>
> Typical scheduler syscalls only affect a single rq lock at a time -- the
> one the task is on. This one potentially pounds all of them.
Would you see it as acceptable if we start by implementing
only the non-expedited sys_membarrier() ? Then we can add
the expedited-private implementation after rq->curr becomes
available through RCU.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2015-03-17 1:45 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-15 19:24 [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12) Mathieu Desnoyers
2015-03-15 22:05 ` Paul E. McKenney
2015-03-16 3:25 ` Josh Triplett
2015-03-16 13:00 ` Mathieu Desnoyers
2015-03-16 14:19 ` Peter Zijlstra
2015-03-16 14:24 ` Steven Rostedt
2015-03-16 15:49 ` Mathieu Desnoyers
2015-03-16 15:49 ` Paul E. McKenney
2015-03-16 16:12 ` Steven Rostedt
2015-03-16 15:43 ` Mathieu Desnoyers
2015-03-16 15:57 ` Mathieu Desnoyers
2015-03-16 17:13 ` Peter Zijlstra
2015-03-16 17:21 ` Peter Zijlstra
2015-03-16 18:53 ` Mathieu Desnoyers
2015-03-16 20:54 ` Peter Zijlstra
2015-03-17 1:45 ` Mathieu Desnoyers [this message]
2015-03-17 2:26 ` Steven Rostedt
2015-03-17 6:40 ` Peter Zijlstra
2015-03-17 11:44 ` Paul E. McKenney
2015-03-17 14:10 ` Steven Rostedt
2015-03-17 16:35 ` Paul E. McKenney
2015-03-17 12:46 ` Mathieu Desnoyers
2015-03-18 1:06 ` Steven Rostedt
2015-03-17 6:30 ` Peter Zijlstra
2015-03-17 11:56 ` Paul E. McKenney
2015-03-17 12:01 ` Paul E. McKenney
2015-03-17 13:13 ` Mathieu Desnoyers
2015-03-17 16:36 ` Mathieu Desnoyers
2015-03-17 16:48 ` Paul E. McKenney
2015-03-17 17:55 ` josh
2015-03-17 16:37 ` Peter Zijlstra
2015-03-17 16:49 ` Paul E. McKenney
2015-03-17 17:00 ` Peter Zijlstra
2015-03-16 17:24 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=910572156.13900.1426556725438.JavaMail.zimbra@efficios.com \
--to=mathieu.desnoyers@efficios.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=gnomes@lxorguk.ukuu.org.uk \
--cc=josh@joshtriplett.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=nmiell@comcast.net \
--cc=npiggin@kernel.dk \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=stephen@networkplumber.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.