From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: akpm@linux-foundation.org, Ingo Molnar <mingo@elte.hu>,
linux-kernel@vger.kernel.org,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Steven Rostedt <rostedt@goodmis.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Nicholas Miell <nmiell@comcast.net>,
laijs@cn.fujitsu.com, dipankar@in.ibm.com, josh@joshtriplett.org,
dvhltc@us.ibm.com, niv@us.ibm.com, tglx@linutronix.de,
peterz@infradead.org, Valdis.Kletnieks@vt.edu,
dhowells@redhat.com
Subject: Re: [patch 2/3] scheduler: add full memory barriers upon task switch at runqueue lock/unlock
Date: Mon, 1 Feb 2010 12:45:00 -0500 [thread overview]
Message-ID: <20100201174500.GA13744@Krystal> (raw)
In-Reply-To: <alpine.LFD.2.00.1002010854110.4206@localhost.localdomain>
* Linus Torvalds (torvalds@linux-foundation.org) wrote:
>
>
> On Mon, 1 Feb 2010, Mathieu Desnoyers wrote:
> >
> > What we have to be careful about here is that it's not enough to just
> > rely on switch_mm() containing a memory barrier. What we really need to
> > enforce is that switch_mm() issues memory barriers both _before_ and
> > _after_ mm_cpumask modification. The "after" part is usually dealt with
> > by the TLB context switch, but the "before" part usually isn't.
>
> Ok, whatever. I vote for not doing anything at all, because this just all
> sounds like some totally crazy crap. You haven't explained the actual
> races, you haven't explained anything at all, you're apparently just
> randomly sprinkling smp_mb's around until you think it's all fine.
>
> Show the actual memory ordering constraints as it is related to the kernel
> data structures. I'm totally uninterested in general handwaving and "we
> must have smp_mb's here and here" without very explicit explanations of
> exactly WHAT the memory orderings are.
It's probably worthwhile to summarize here the mb-related discussions we
had in the previous RFC rounds.
Here is the detailed execution scenario showing the race. The race with
unlock showing that we need to order user-space loads before mm_cpumask
store. This is showing an execution sequence involving the userspace RCU
library and the Linux kernel with sys_membarrier(). As we see below, the
lack of ordering between "cpumask_clear" and user-space execution
creates a race with the scheduler where sys_membarrier() incorrectly
considers CPU 1 as not needing a membarrier_ipi.
CPU 0 CPU 1
-------------- --------------
<user space> <user space> (already in read-side C.S.)
obj = rcu_dereference(list->next);
-> load list->next
copy = obj->foo;
rcu_read_unlock();
-> store to urcu_reader.ctr
<urcu_reader.ctr store is globally visible>
list_del(obj);
<preempt>
<kernel space>
schedule();
cpumask_clear(mm_cpumask, cpu);
sys_membarrier();
<kernel space>
smp_mb()
iterates on mm_cpumask
smp_mb()
<user space>
set global g.p. (urcu_gp_ctr) phase to 1
wait for all urcu_reader.ctr in phase 0
set global g.p. (urcu_gp_ctr) phase to 0
wait for all urcu_reader.ctr in phase 1
sys_membarrier();
<kernel space>
smp_mb()
iterates on mm_cpumask
smp_mb()
<user space>
free(obj);
<list->next load hits memory>
<obj->foo load hits memory> <- corruption
There is a similar race scenario (symmetric to the one above) between
cpumask_set and a following user-space rcu_read_lock(), which I could
provide upon request.
This results in the following comments around smp_mb() in
sys_membarrier:
First sys_membarrier smp_mb():
/*
* Memory barrier on the caller thread before sending first
* IPI. Orders all memory accesses prior to sys_membarrier() before
* mm_cpumask read and membarrier_ipi executions.
* Matches memory barriers before and after mm_cpumask updates.
*/
Second sys_membarrier smp_mb():
/*
* Memory barrier on the caller thread after we finished
* waiting for the last IPI. Orders mm_cpumask read and membarrier_ipi
* executions before memory accesses following sys_membarrier()
* execution.
* Matches memory barriers before and after mm_cpumask updates.
*/
Thanks,
Mathieu
>
> Linus
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2010-02-01 17:45 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-31 20:52 [patch 0/3] introduce sys_membarrier(): process-wide memory barrier (v8) Mathieu Desnoyers
2010-01-31 20:52 ` [patch 1/3] Create spin lock/spin unlock with distinct memory barrier Mathieu Desnoyers
2010-02-01 7:25 ` Nick Piggin
2010-02-01 14:08 ` Mathieu Desnoyers
2010-02-01 7:28 ` Nick Piggin
2010-02-01 14:10 ` Mathieu Desnoyers
2010-02-01 15:22 ` Linus Torvalds
2010-02-01 15:41 ` Steven Rostedt
2010-01-31 20:52 ` [patch 2/3] scheduler: add full memory barriers upon task switch at runqueue lock/unlock Mathieu Desnoyers
2010-02-01 7:33 ` Nick Piggin
2010-02-01 9:42 ` Peter Zijlstra
2010-02-01 10:11 ` Nick Piggin
2010-02-01 10:36 ` Peter Zijlstra
2010-02-01 10:49 ` Nick Piggin
2010-02-01 14:47 ` Mathieu Desnoyers
2010-02-01 14:58 ` Nick Piggin
2010-02-01 15:23 ` Steven Rostedt
2010-02-01 15:44 ` Steven Rostedt
2010-02-01 16:00 ` Mike Galbraith
2010-02-01 15:27 ` Linus Torvalds
2010-02-01 16:09 ` Mathieu Desnoyers
2010-02-01 16:23 ` Linus Torvalds
2010-02-01 16:48 ` Mathieu Desnoyers
2010-02-01 16:56 ` Linus Torvalds
2010-02-01 17:45 ` Mathieu Desnoyers [this message]
2010-02-01 18:00 ` Steven Rostedt
2010-02-01 18:36 ` Linus Torvalds
2010-02-01 19:56 ` Mathieu Desnoyers
2010-02-01 20:42 ` Linus Torvalds
2010-02-01 22:42 ` Mathieu Desnoyers
2010-02-01 20:33 ` Steven Rostedt
2010-02-01 20:52 ` Linus Torvalds
2010-02-01 22:39 ` Steven Rostedt
2010-02-01 23:09 ` Steven Rostedt
2010-02-01 17:13 ` Steven Rostedt
2010-02-01 17:34 ` Linus Torvalds
2010-02-01 16:24 ` Steven Rostedt
2010-02-01 16:29 ` Peter Zijlstra
2010-02-01 16:46 ` Steven Rostedt
2010-02-01 16:11 ` Steven Rostedt
2010-01-31 20:52 ` [patch 3/3] introduce sys_membarrier(): process-wide memory barrier (v8) Mathieu Desnoyers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100201174500.GA13744@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=Valdis.Kletnieks@vt.edu \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=dipankar@in.ibm.com \
--cc=dvhltc@us.ibm.com \
--cc=josh@joshtriplett.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=niv@us.ibm.com \
--cc=nmiell@comcast.net \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox