public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: akpm@linux-foundation.org, Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Nicholas Miell <nmiell@comcast.net>,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com, josh@joshtriplett.org,
	dvhltc@us.ibm.com, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com
Subject: Re: [patch 2/3] scheduler: add full memory barriers upon task switch at runqueue lock/unlock
Date: Mon, 1 Feb 2010 12:45:00 -0500	[thread overview]
Message-ID: <20100201174500.GA13744@Krystal> (raw)
In-Reply-To: <alpine.LFD.2.00.1002010854110.4206@localhost.localdomain>

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> 
> 
> On Mon, 1 Feb 2010, Mathieu Desnoyers wrote:
> > 
> > What we have to be careful about here is that it's not enough to just
> > rely on switch_mm() containing a memory barrier. What we really need to
> > enforce is that switch_mm() issues memory barriers both _before_ and
> > _after_ mm_cpumask modification. The "after" part is usually dealt with
> > by the TLB context switch, but the "before" part usually isn't.
> 
> Ok, whatever. I vote for not doing anything at all, because this just all 
> sounds like some totally crazy crap. You haven't explained the actual 
> races, you haven't explained anything at all, you're apparently just 
> randomly sprinkling smp_mb's around until you think it's all fine.
> 
> Show the actual memory ordering constraints as it is related to the kernel 
> data structures. I'm totally uninterested in general handwaving and "we 
> must have smp_mb's here and here" without very explicit explanations of 
> exactly WHAT the memory orderings are.

It's probably worthwhile to summarize here the mb-related discussions we
had in the previous RFC rounds.

Here is the detailed execution scenario showing the race. The race with
unlock showing that we need to order user-space loads before mm_cpumask
store. This is showing an execution sequence involving the userspace RCU
library and the Linux kernel with sys_membarrier(). As we see below, the
lack of ordering between "cpumask_clear" and user-space execution
creates a race with the scheduler where sys_membarrier() incorrectly
considers CPU 1 as not needing a membarrier_ipi.


        CPU 0                   CPU 1
        --------------          --------------
        <user space>            <user space> (already in read-side C.S.)
                                obj = rcu_dereference(list->next);
                                  -> load list->next
                                copy = obj->foo;
                                rcu_read_unlock();
                                  -> store to urcu_reader.ctr
                                <urcu_reader.ctr store is globally visible>
        list_del(obj);
                                <preempt>
                                <kernel space>

                                schedule();
                                cpumask_clear(mm_cpumask, cpu);

        sys_membarrier();
          <kernel space>
          smp_mb()
          iterates on mm_cpumask
          smp_mb()
        <user space>
        set global g.p. (urcu_gp_ctr) phase to 1
        wait for all urcu_reader.ctr in phase 0
        set global g.p. (urcu_gp_ctr) phase to 0
        wait for all urcu_reader.ctr in phase 1
        sys_membarrier();
          <kernel space>
          smp_mb()
          iterates on mm_cpumask
          smp_mb()
        <user space>
        free(obj);
                                <list->next load hits memory>
                                <obj->foo load hits memory> <- corruption


There is a similar race scenario (symmetric to the one above) between
cpumask_set and a following user-space rcu_read_lock(), which I could
provide upon request.

This results in the following comments around smp_mb() in
sys_membarrier:

First sys_membarrier smp_mb():

        /*
         * Memory barrier on the caller thread before sending first
         * IPI. Orders all memory accesses prior to sys_membarrier() before
         * mm_cpumask read and membarrier_ipi executions.
         * Matches memory barriers before and after mm_cpumask updates.
         */

Second sys_membarrier smp_mb():

        /*
         * Memory barrier on the caller thread after we finished
         * waiting for the last IPI. Orders mm_cpumask read and membarrier_ipi
         * executions before memory accesses following sys_membarrier()
         * execution.
         * Matches memory barriers before and after mm_cpumask updates.
         */

Thanks,

Mathieu


> 
> 		Linus

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2010-02-01 17:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-31 20:52 [patch 0/3] introduce sys_membarrier(): process-wide memory barrier (v8) Mathieu Desnoyers
2010-01-31 20:52 ` [patch 1/3] Create spin lock/spin unlock with distinct memory barrier Mathieu Desnoyers
2010-02-01  7:25   ` Nick Piggin
2010-02-01 14:08     ` Mathieu Desnoyers
2010-02-01  7:28   ` Nick Piggin
2010-02-01 14:10     ` Mathieu Desnoyers
2010-02-01 15:22   ` Linus Torvalds
2010-02-01 15:41     ` Steven Rostedt
2010-01-31 20:52 ` [patch 2/3] scheduler: add full memory barriers upon task switch at runqueue lock/unlock Mathieu Desnoyers
2010-02-01  7:33   ` Nick Piggin
2010-02-01  9:42     ` Peter Zijlstra
2010-02-01 10:11       ` Nick Piggin
2010-02-01 10:36         ` Peter Zijlstra
2010-02-01 10:49           ` Nick Piggin
2010-02-01 14:47             ` Mathieu Desnoyers
2010-02-01 14:58               ` Nick Piggin
2010-02-01 15:23                 ` Steven Rostedt
2010-02-01 15:44                   ` Steven Rostedt
2010-02-01 16:00                   ` Mike Galbraith
2010-02-01 15:27   ` Linus Torvalds
2010-02-01 16:09     ` Mathieu Desnoyers
2010-02-01 16:23       ` Linus Torvalds
2010-02-01 16:48         ` Mathieu Desnoyers
2010-02-01 16:56           ` Linus Torvalds
2010-02-01 17:45             ` Mathieu Desnoyers [this message]
2010-02-01 18:00               ` Steven Rostedt
2010-02-01 18:36               ` Linus Torvalds
2010-02-01 19:56                 ` Mathieu Desnoyers
2010-02-01 20:42                   ` Linus Torvalds
2010-02-01 22:42                     ` Mathieu Desnoyers
2010-02-01 20:33                 ` Steven Rostedt
2010-02-01 20:52                   ` Linus Torvalds
2010-02-01 22:39                     ` Steven Rostedt
2010-02-01 23:09                       ` Steven Rostedt
2010-02-01 17:13           ` Steven Rostedt
2010-02-01 17:34             ` Linus Torvalds
2010-02-01 16:24       ` Steven Rostedt
2010-02-01 16:29         ` Peter Zijlstra
2010-02-01 16:46           ` Steven Rostedt
2010-02-01 16:11     ` Steven Rostedt
2010-01-31 20:52 ` [patch 3/3] introduce sys_membarrier(): process-wide memory barrier (v8) Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100201174500.GA13744@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=josh@joshtriplett.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=nmiell@comcast.net \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox