All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: akpm@linux-foundation.org, Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Nicholas Miell <nmiell@comcast.net>,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com, josh@joshtriplett.org,
	dvhltc@us.ibm.com, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, Valdis.Kletnieks@vt.edu,
	dhowells@redhat.com
Subject: Re: [patch 2/3] scheduler: add full memory barriers upon task switch at runqueue lock/unlock
Date: Mon, 1 Feb 2010 12:45:00 -0500	[thread overview]
Message-ID: <20100201174500.GA13744@Krystal> (raw)
In-Reply-To: <alpine.LFD.2.00.1002010854110.4206@localhost.localdomain>

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> 
> 
> On Mon, 1 Feb 2010, Mathieu Desnoyers wrote:
> > 
> > What we have to be careful about here is that it's not enough to just
> > rely on switch_mm() containing a memory barrier. What we really need to
> > enforce is that switch_mm() issues memory barriers both _before_ and
> > _after_ mm_cpumask modification. The "after" part is usually dealt with
> > by the TLB context switch, but the "before" part usually isn't.
> 
> Ok, whatever. I vote for not doing anything at all, because this just all 
> sounds like some totally crazy crap. You haven't explained the actual 
> races, you haven't explained anything at all, you're apparently just 
> randomly sprinkling smp_mb's around until you think it's all fine.
> 
> Show the actual memory ordering constraints as it is related to the kernel 
> data structures. I'm totally uninterested in general handwaving and "we 
> must have smp_mb's here and here" without very explicit explanations of 
> exactly WHAT the memory orderings are.

It's probably worthwhile to summarize here the mb-related discussions we
had in the previous RFC rounds.

Here is the detailed execution scenario showing the race. The race with
unlock showing that we need to order user-space loads before mm_cpumask
store. This is showing an execution sequence involving the userspace RCU
library and the Linux kernel with sys_membarrier(). As we see below, the
lack of ordering between "cpumask_clear" and user-space execution
creates a race with the scheduler where sys_membarrier() incorrectly
considers CPU 1 as not needing a membarrier_ipi.


        CPU 0                   CPU 1
        --------------          --------------
        <user space>            <user space> (already in read-side C.S.)
                                obj = rcu_dereference(list->next);
                                  -> load list->next
                                copy = obj->foo;
                                rcu_read_unlock();
                                  -> store to urcu_reader.ctr
                                <urcu_reader.ctr store is globally visible>
        list_del(obj);
                                <preempt>
                                <kernel space>

                                schedule();
                                cpumask_clear(mm_cpumask, cpu);

        sys_membarrier();
          <kernel space>
          smp_mb()
          iterates on mm_cpumask
          smp_mb()
        <user space>
        set global g.p. (urcu_gp_ctr) phase to 1
        wait for all urcu_reader.ctr in phase 0
        set global g.p. (urcu_gp_ctr) phase to 0
        wait for all urcu_reader.ctr in phase 1
        sys_membarrier();
          <kernel space>
          smp_mb()
          iterates on mm_cpumask
          smp_mb()
        <user space>
        free(obj);
                                <list->next load hits memory>
                                <obj->foo load hits memory> <- corruption


There is a similar race scenario (symmetric to the one above) between
cpumask_set and a following user-space rcu_read_lock(), which I could
provide upon request.

This results in the following comments around smp_mb() in
sys_membarrier:

First sys_membarrier smp_mb():

        /*
         * Memory barrier on the caller thread before sending first
         * IPI. Orders all memory accesses prior to sys_membarrier() before
         * mm_cpumask read and membarrier_ipi executions.
         * Matches memory barriers before and after mm_cpumask updates.
         */

Second sys_membarrier smp_mb():

        /*
         * Memory barrier on the caller thread after we finished
         * waiting for the last IPI. Orders mm_cpumask read and membarrier_ipi
         * executions before memory accesses following sys_membarrier()
         * execution.
         * Matches memory barriers before and after mm_cpumask updates.
         */

Thanks,

Mathieu


> 
> 		Linus

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2010-02-01 17:45 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-31 20:52 [patch 0/3] introduce sys_membarrier(): process-wide memory barrier (v8) Mathieu Desnoyers
2010-01-31 20:52 ` [patch 1/3] Create spin lock/spin unlock with distinct memory barrier Mathieu Desnoyers
2010-02-01  7:25   ` Nick Piggin
2010-02-01 14:08     ` Mathieu Desnoyers
2010-02-01  7:28   ` Nick Piggin
2010-02-01 14:10     ` Mathieu Desnoyers
2010-02-01 15:22   ` Linus Torvalds
2010-02-01 15:41     ` Steven Rostedt
2010-01-31 20:52 ` [patch 2/3] scheduler: add full memory barriers upon task switch at runqueue lock/unlock Mathieu Desnoyers
2010-02-01  7:33   ` Nick Piggin
2010-02-01  9:42     ` Peter Zijlstra
2010-02-01 10:11       ` Nick Piggin
2010-02-01 10:36         ` Peter Zijlstra
2010-02-01 10:49           ` Nick Piggin
2010-02-01 14:47             ` Mathieu Desnoyers
2010-02-01 14:58               ` Nick Piggin
2010-02-01 15:23                 ` Steven Rostedt
2010-02-01 15:44                   ` Steven Rostedt
2010-02-01 16:00                   ` Mike Galbraith
2010-02-01 15:27   ` Linus Torvalds
2010-02-01 16:09     ` Mathieu Desnoyers
2010-02-01 16:23       ` Linus Torvalds
2010-02-01 16:48         ` Mathieu Desnoyers
2010-02-01 16:56           ` Linus Torvalds
2010-02-01 17:45             ` Mathieu Desnoyers [this message]
2010-02-01 18:00               ` Steven Rostedt
2010-02-01 18:36               ` Linus Torvalds
2010-02-01 19:56                 ` Mathieu Desnoyers
2010-02-01 20:42                   ` Linus Torvalds
2010-02-01 22:42                     ` Mathieu Desnoyers
2010-02-01 20:33                 ` Steven Rostedt
2010-02-01 20:52                   ` Linus Torvalds
2010-02-01 22:39                     ` Steven Rostedt
2010-02-01 23:09                       ` Steven Rostedt
2010-02-01 17:13           ` Steven Rostedt
2010-02-01 17:34             ` Linus Torvalds
2010-02-01 16:24       ` Steven Rostedt
2010-02-01 16:29         ` Peter Zijlstra
2010-02-01 16:46           ` Steven Rostedt
2010-02-01 16:11     ` Steven Rostedt
2010-01-31 20:52 ` [patch 3/3] introduce sys_membarrier(): process-wide memory barrier (v8) Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100201174500.GA13744@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=josh@joshtriplett.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=nmiell@comcast.net \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.