All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: linux-kernel@vger.kernel.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@elte.hu>,
	akpm@linux-foundation.org, josh@joshtriplett.org,
	tglx@linutronix.de, Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com
Subject: Re: [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier (v5)
Date: Tue, 19 Jan 2010 17:47:47 +0100	[thread overview]
Message-ID: <1263919667.4283.732.camel@laptop> (raw)
In-Reply-To: <20100114162609.GC3487@Krystal>

On Thu, 2010-01-14 at 11:26 -0500, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Wed, 2010-01-13 at 14:36 -0500, Mathieu Desnoyers wrote:
> > > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > > On Tue, 2010-01-12 at 20:37 -0500, Mathieu Desnoyers wrote:
> > > > > +       for_each_cpu(cpu, tmpmask) {
> > > > > +               spin_lock_irq(&cpu_rq(cpu)->lock);
> > > > > +               mm = cpu_curr(cpu)->mm;
> > > > > +               spin_unlock_irq(&cpu_rq(cpu)->lock);
> > > > > +               if (current->mm != mm)
> > > > > +                       cpumask_clear_cpu(cpu, tmpmask);
> > > > > +       } 
> > > > 
> > > > Why not:
> > > > 
> > > >   rcu_read_lock();
> > > >   if (current->mm != cpu_curr(cpu)->mm)
> > > >     cpumask_clear_cpu(cpu, tmpmask);
> > > >   rcu_read_unlock();
> > > > 
> > > > the RCU read lock ensures the task_struct obtained remains valid, and it
> > > > avoids taking the rq->lock.
> > > > 
> > > 
> > > If we go for a simple rcu_read_lock, I think that we need a smp_mb()
> > > after switch_to() updates the current task on the remote CPU, before it
> > > returns to user-space. Do we have this guarantee for all architectures ?
> > > 
> > > So what I'm looking for, overall, is:
> > > 
> > > schedule()
> > >   ...
> > >   switch_mm()
> > >     smp_mb()
> > >     clear mm_cpumask
> > >     set mm_cpumask
> > >   switch_to()
> > >     update current task
> > >     smp_mb()
> > > 
> > > If we have that, then the rcu_read_lock should work.
> > > 
> > > What the rq lock currently gives us is the guarantee that if the current
> > > thread changes on a remote CPU while we are not holding this lock, then
> > > a full scheduler execution is performed, which implies a memory barrier
> > > if we change the current thread (it does, right ?).
> > 
> > I'm not quite seeing it, we have 4 possibilities, switches between
> > threads with:
> > 
> >  a) our mm, another mm
> > 
> >    - if we observe the former, we'll send an IPI (redundant)
> >    - if we observe the latter, the switch_mm will have issued an mb
> > 
> >  b) another mm, our mm
> > 
> >    - if we observe the former, we're good because the cpu didn't run our
> >      thread when we called sys_membarrier()
> >    - if we observe the latter, we'll send an IPI (redundant)
> 
> It's this scenario that is causing problem. Let's consider this
> execution:
> 
>        CPU 0 (membarrier)                  CPU 1 (another mm -> our mm)
>        <kernel-space>                      <kernel-space>
>                                            switch_mm()
>                                              smp_mb()
>                                              clear_mm_cpumask()
>                                              set_mm_cpumask()
>                                              smp_mb() (by load_cr3() on x86)
>                                            switch_to()
>        mm_cpumask includes CPU 1
>        rcu_read_lock()
>        if (CPU 1 mm != our mm)
>          skip CPU 1.
>        rcu_read_unlock()
>                                              current = next (1)

OK, so on x86 current uses esp and will be flipped somewhere in the
switch_to() magic, cpu_curr(cpu) as used by CPU 0 uses rq->curr, which
will be set before context_switch() and that always implies a mb() for
non matching ->mm's [*]

>                                            <switch back to user-space>
>                                            read-lock()
>                                              read gp, store local gp
>                                              barrier()
>                                              access critical section (2)
> 
> So if we don't have any memory barrier between (1) and (2), the memory
> operations can be reordered in such a way that CPU 0 will not send IPI
> to a CPU that would need to have it's barrier() promoted into a
> smp_mb().

OK, so I'm utterly failing to make sense of the above, do you need more
than the 2 cpus discussed to make it go boom?

> Replacing these kernel rcu_read_lock/unlock() by rq locks ensures that
> when the scheduler runs concurrently on another CPU, _all_ the scheduling
> code is executed atomically wrt the spin lock taken on cpu 0.

Sure, but taking the rq->lock is fairly heavy handed.

> When x86 uses iret to return to user-space, then we have a serializing
> instruction. But if it uses sysexit, or if we are on a different
> architecture, are we sure that a memory barrier is issued before
> returning to user-space ?

[*] and possibly also for matching ->mm's, because:

OK, so I had a quick look at the switch_to() magic, and from what I can
make of it it implies an mb, if only because poking at the segment
registers implies LOCK semantics.



  parent reply	other threads:[~2010-01-19 16:49 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-13  1:37 [RFC PATCH] introduce sys_membarrier(): process-wide memory barrier (v5) Mathieu Desnoyers
2010-01-13  3:23 ` KOSAKI Motohiro
2010-01-13  3:58   ` Mathieu Desnoyers
2010-01-13  4:47     ` KOSAKI Motohiro
2010-01-13  5:33       ` Paul E. McKenney
2010-01-13 15:03       ` Mathieu Desnoyers
2010-01-14  0:15         ` KOSAKI Motohiro
2010-01-14  2:16           ` Mathieu Desnoyers
2010-01-14  2:25             ` KOSAKI Motohiro
2010-01-13  5:00 ` Nicholas Miell
2010-01-13  5:31   ` Paul E. McKenney
2010-01-13  5:39     ` Nicholas Miell
2010-01-13 14:38       ` Mathieu Desnoyers
2010-01-13 18:07         ` Nicholas Miell
2010-01-13 18:24           ` Mathieu Desnoyers
2010-01-13 18:41             ` Nicholas Miell
2010-01-13 19:17               ` Mathieu Desnoyers
2010-01-13 19:42                 ` David Daney
2010-01-13 19:53                   ` Nicholas Miell
2010-01-13 23:42                     ` Mathieu Desnoyers
2010-01-13 15:58       ` Paul E. McKenney
2010-01-13 11:07 ` Heiko Carstens
2010-01-13 14:46   ` Mathieu Desnoyers
2010-01-13 16:38 ` Peter Zijlstra
2010-01-13 19:36   ` Mathieu Desnoyers
2010-01-14  9:08     ` Peter Zijlstra
2010-01-14 16:26       ` Mathieu Desnoyers
2010-01-14 17:03         ` Peter Zijlstra
2010-01-14 17:54           ` Mathieu Desnoyers
2010-01-14 18:37             ` Mathieu Desnoyers
2010-01-14 18:52               ` Steven Rostedt
2010-01-14 19:33                 ` Mathieu Desnoyers
2010-01-14 21:26                   ` Steven Rostedt
2010-01-19 18:37                   ` Peter Zijlstra
2010-01-19 19:06                     ` Peter Zijlstra
2010-01-20  3:13                       ` Mathieu Desnoyers
2010-01-20  8:45                         ` Peter Zijlstra
2010-01-21 11:26                       ` Peter Zijlstra
2010-01-21 16:07                         ` Mathieu Desnoyers
2010-01-21 16:12                           ` Steven Rostedt
2010-01-21 16:22                             ` Mathieu Desnoyers
2010-01-21 16:32                               ` Steven Rostedt
2010-01-21 17:02                                 ` Mathieu Desnoyers
2010-01-21 16:17                           ` Peter Zijlstra
2010-01-21 17:01                             ` Mathieu Desnoyers
2010-01-19 19:43                     ` Steven Rostedt
2010-01-14 18:50             ` Steven Rostedt
2010-01-19 16:47         ` Peter Zijlstra [this message]
2010-01-19 17:11           ` Mathieu Desnoyers
2010-01-19 17:30           ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1263919667.4283.732.camel@laptop \
    --to=peterz@infradead.org \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.