public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Oleg Nesterov <oleg@tv-sign.ru>
Cc: linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	mingo@elte.hu, akpm@linux-foundation.org, dipankar@in.ibm.com,
	josht@linux.vnet.ibm.com, tytso@us.ibm.com, dvhltc@us.ibm.com,
	tglx@linutronix.de, a.p.zijlstra@chello.nl, bunk@kernel.org,
	ego@in.ibm.com, srostedt@redhat.com
Subject: Re: [PATCH RFC 3/9] RCU: Preemptible RCU
Date: Fri, 28 Sep 2007 11:57:59 -0700	[thread overview]
Message-ID: <20070928185759.GC9153@linux.vnet.ibm.com> (raw)
In-Reply-To: <20070928144714.GA397@tv-sign.ru>

On Fri, Sep 28, 2007 at 06:47:14PM +0400, Oleg Nesterov wrote:
> On 09/27, Paul E. McKenney wrote:
> >
> > On Wed, Sep 26, 2007 at 07:13:51PM +0400, Oleg Nesterov wrote:
> > > 
> > > Yes, yes, I see now. We really need this barriers, except I think
> > > rcu_try_flip_idle() can use wmb. However, I have a bit offtopic question,
> > > 
> > > 	// rcu_try_flip_waitzero()
> > > 	if (A == 0) {
> > > 		mb();
> > > 		B == 0;
> > > 	}
> > > 
> > > Do we really need the mb() in this case? How it is possible that STORE
> > > goes before LOAD? "Obviously", the LOAD should be completed first, no?
> > 
> > Suppose that A was most recently stored by a CPU that shares a store
> > buffer with this CPU.  Then it is possible that some other CPU sees
> > the store to B as happening before the store that "A==0" above is
> > loading from.  This other CPU would naturally conclude that the store
> > to B must have happened before the load from A.
> 
> Ah, I was confused by the comment,
> 
> 	smp_mb();  /* Don't call for memory barriers before we see zero. */
> 	                                             ^^^^^^^^^^^^^^^^^^
> So, in fact, we need this barrier to make sure that _other_ CPUs see these
> changes in order, thanks. Of course, _we_ already saw zero.

Fair point!

Perhaps: "Ensure that all CPUs see their rcu_mb_flag -after- the
rcu_flipctrs sum to zero" or some such?

> But in that particular case this doesn't matter, rcu_try_flip_waitzero()
> is the only function which reads the "non-local" per_cpu(rcu_flipctr), so
> it doesn't really need the barrier? (besides, it is always called under
> fliplock).

The final rcu_read_unlock() that zeroed the sum was -not- under fliplock,
so we cannot necessarily rely on locking to trivialize all of this.

> > In more detail, suppose that CPU 0 and 1 share a store buffer, and
> > that CPU 2 and 3 share a second store buffer.  This happens naturally
> > if CPUs 0 and 1 are really just different hardware threads within a
> > single core.
> > 
> > So, suppose the cacheline for A is initially owned by CPUs 2 and 3,
> > and that the cacheline for B is initially owned by CPUs 0 and 1.
> > Then consider the following sequence of events:
> > 
> > o	CPU 0 stores zero to A.  This is a cache miss, so the new value
> > 	for A is placed in CPU 0's and 1's store buffer.
> > 
> > o	CPU 1 executes the above code, first loading A.  It sees
> > 	the value of A==0 in the store buffer, and therefore
> > 	stores zero to B, which hits in the cache.  (I am assuming
> > 	that we left out the mb() above).
> > 
> > o	CPU 2 loads from B, which misses the cache, and gets the
> > 	value that CPU 1 stored.  Suppose it checks the value,
> > 	and based on this check, loads A.  The old value of A might
> > 	still be in cache, which would lead CPU 2 to conclude that
> > 	the store to B by CPU 1 must have happened before the store
> > 	to A by CPU 0.
> > 
> > Memory barriers would prevent this confusion.
> 
> Thanks a lot!!! This fills another gap in my understanding.

Glad it helped!  ;-)

> OK, the last (I promise :) off-topic question. When CPU 0 and 1 share a
> store buffer, the situation is simple, we can replace "CPU 0 stores" with
> "CPU 1 stores". But what if CPU 0 is equally "far" from CPUs 1 and 2?
> 
> Suppose that CPU 1 does
> 
> 	wmb();
> 	B = 0
> 
> Can we assume that CPU 2 doing
> 
> 	if (B == 0) {
> 		rmb();
> 
> must see all invalidations from CPU 0 which were seen by CPU 1 before wmb() ?

Yes.  CPU 2 saw something following CPU 1's wmb(), so any of CPU 2's
reads following its rmb() must therefore see all of CPU 1's stores
preceding the wmb().

> > > > > If this is possible, can't we move the code doing "s/rcu_flipped/rcu_flip_seen/"
> > > > > from __rcu_advance_callbacks() to rcu_check_mb() to unify the "acks" ?
> > > > 
> > > > I believe that we cannot safely do this.  The rcu_flipped-to-rcu_flip_seen
> > > > transition has to be synchronized to the moving of the callbacks --
> > > > either that or we need more GP_STAGES.
> > > 
> > > Hmm. Still can't understand.
> > 
> > Callbacks would be able to be injected into a grace period after it
> > started.
> 
> Yes, but this is _exactly_ what the current code does in the scenario below,
> 
> > Or are you arguing that as long as interrupts remain disabled between
> > the two events, no harm done?
> 
> no,
> 
> > > Suppose that we are doing call_rcu(), and __rcu_advance_callbacks() sees
> > > rdp->completed == rcu_ctrlblk.completed but rcu_flip_flag = rcu_flipped
> > > (say, another CPU does rcu_try_flip_idle() in between).
> > > 
> > > We ack the flip, call_rcu() enables irqs, the timer interrupt calls
> > > __rcu_advance_callbacks() again and moves the callbacks.
> > > 
> > > So, it is still possible that "move callbacks" and "ack the flip" happen
> > > out of order. But why this is bad?
> 
> Look, what happens is
> 
> 	// call_rcu()
> 	rcu_flip_flag = rcu_flipped
> 	insert the new callback
> 	// timer irq
> 	move the callbacks (the new one goes to wait[0])
> 
> But I still can't understand why this is bad,
> 
> > > This can't "speedup" the moving of our callbacks from next to done lists.
> > > Yes, RCU state machine can switch to the next state/stage, but this looks
> > > safe to me.
> 
> Before this callback will be flushed, we need 2 rdp->completed != rcu_ctrlblk.completed
> further events, we can't miss rcu_read_lock() section, no?
> 
> > > Help!
> 
> Please :)

Quite possibly my paranoia -- need to think about this some more.

Guess I need to expand the code-level portion of the document to cover
the grace-period side.  That would force me either to get my explanation
into shape or admit that you are correct, as the case might be.  ;-)

> > > 	if (rcu_ctrlblk.completed == rdp->completed)
> > > 		rcu_try_flip();
> > > 
> > > Could you clarify the check above? Afaics this is just optimization,
> > > technically it is correct to rcu_try_flip() at any time, even if ->completed
> > > are not synchronized. Most probably in that case rcu_try_flip_waitack() will
> > > fail, but nothing bad can happen, yes?
> > 
> > >From a conceptual viewpoint, if this CPU hasn't caught up with the
> > last grace-period stage, it has no business trying to push forward to
> > the next stage.  So this might (or might not) happen to work with this
> > particular implementation, it needs to stay as is.  We need this code
> > to be robust enough to optimize the grace-period latencies, right?
> 
> Yes, yes. I just wanted to be sure I didn't miss some other subtle reason.
> 
> > > 	void synchronize_sched(void)
> > > 	{
> > > 		struct migration_req req;
> > > 
> > > 		req->task = NULL;
> > > 		init_completion(&req.done);
> > > 
> > > 		for_each_online_cpu(cpu) {
> > > 			struct rq *rq = cpu_rq(cpu);
> > > 			int online;
> > > 
> > > 			spin_lock_irq(&rq->lock);
> > > 			online = cpu_online(cpu); // HOTPLUG_CPU
> > > 			if (online) {
> > > 				list_add(&req->list, &rq->migration_queue);
> > > 				req.done.done = 0;
> > > 			}
> > > 			spin_unlock_irq(&rq->lock);
> > > 
> > > 			if (online) {
> > > 				wake_up_process(rq->migration_thread);
> > > 				wait_for_completion(&req.done);
> > > 			}
> > > 		}
> > > 	}
> > > 
> > 
> > I need to think about your approach above.  It looks like you are
> > leveraging the migration tasks, but I am concerned about concurrent
> > hotplug events.
> 
> I hope this is OK, note that migration_call(CPU_DEAD) flushes ->migration_queue,
> if we take rq->lock after that we must see !cpu_online(cpu). CPU_UP event is not
> interesting for us, we can miss it.
> 
> Hmm... but wake_up_process() should be moved under spin_lock().

The other approach would be to simply have a separate thread for this
purpose.  Batching would amortize the overhead (a single trip around the
CPUs could satisfy an arbitrarily large number of synchronize_sched()
requests).

						Thanx, Paul

  reply	other threads:[~2007-09-28 18:58 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-10 18:30 [PATCH RFC 0/9] RCU: Preemptible RCU Paul E. McKenney
2007-09-10 18:32 ` [PATCH RFC 1/9] RCU: Split API to permit multiple RCU implementations Paul E. McKenney
2007-09-21  4:14   ` Steven Rostedt
2007-09-10 18:33 ` [PATCH RFC 2/9] RCU: Fix barriers Paul E. McKenney
2007-09-10 18:34 ` [PATCH RFC 3/9] RCU: Preemptible RCU Paul E. McKenney
2007-09-21  4:17   ` Steven Rostedt
2007-09-21  5:50     ` Paul E. McKenney
2007-09-21  5:56     ` Dipankar Sarma
2007-09-21 14:40   ` Steven Rostedt
2007-09-21 15:46     ` Peter Zijlstra
2007-09-21 22:06       ` Paul E. McKenney
2007-09-21 22:31       ` Steven Rostedt
2007-09-21 22:44         ` Paul E. McKenney
2007-09-21 23:23           ` Steven Rostedt
2007-09-21 23:44             ` Paul E. McKenney
2007-09-22  0:26     ` Paul E. McKenney
2007-09-22  1:15       ` Steven Rostedt
2007-09-22  1:53         ` Paul E. McKenney
2007-09-22  3:15           ` Steven Rostedt
2007-09-22  4:07             ` Paul E. McKenney
2007-09-21 15:20   ` Steven Rostedt
2007-09-21 23:03     ` Paul E. McKenney
2007-09-22  0:32       ` Paul E. McKenney
2007-09-22  1:19         ` Steven Rostedt
2007-09-22  1:43           ` Paul E. McKenney
2007-09-22  2:56             ` Steven Rostedt
2007-09-22  4:10               ` Paul E. McKenney
2007-09-23 17:38   ` Oleg Nesterov
2007-09-24  0:15     ` Paul E. McKenney
2007-09-26 15:13       ` Oleg Nesterov
2007-09-27 15:46         ` Paul E. McKenney
2007-09-28 14:47           ` Oleg Nesterov
2007-09-28 18:57             ` Paul E. McKenney [this message]
2007-09-30 16:31               ` Oleg Nesterov
2007-09-30 23:02                 ` Davide Libenzi
2007-10-01  1:37                   ` Paul E. McKenney
2007-10-01 18:44                     ` Davide Libenzi
2007-10-01 19:21                       ` Paul E. McKenney
2007-10-01 22:09                         ` Davide Libenzi
2007-10-01 22:24                           ` Paul E. McKenney
2007-10-02 18:02                     ` Oleg Nesterov
2007-10-01  1:20                 ` Paul E. McKenney
2007-09-10 18:35 ` [PATCH RFC 4/9] RCU: synchronize_sched() workaround for CPU hotplug Paul E. McKenney
2007-09-10 18:36 ` [PATCH RFC 5/9] RCU: CPU hotplug support for preemptible RCU Paul E. McKenney
2007-09-30 16:38   ` Oleg Nesterov
2007-10-01  1:41     ` Paul E. McKenney
2007-09-10 18:39 ` [PATCH RFC 6/9] RCU priority boosting " Paul E. McKenney
2007-09-28 22:56   ` Gautham R Shenoy
2007-09-28 23:05     ` Steven Rostedt
2007-09-30  3:11       ` Paul E. McKenney
2007-10-05 11:46   ` Gautham R Shenoy
2007-10-05 12:24     ` Steven Rostedt
2007-10-05 13:21       ` Gautham R Shenoy
2007-10-05 14:07         ` Paul E. McKenney
2007-09-10 18:39 ` [PATCH RFC 7/9] RCU: rcutorture testing for RCU priority boosting Paul E. McKenney
2007-09-10 18:41 ` [PATCH RFC 8/9] RCU: Make RCU priority boosting consume less power Paul E. McKenney
2007-09-10 18:42 ` [PATCH RFC 9/9] RCU: preemptible documentation and comment cleanups Paul E. McKenney
2007-09-10 18:44 ` [PATCH RFC 0/9] RCU: Preemptible RCU Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070928185759.GC9153@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=bunk@kernel.org \
    --cc=dipankar@in.ibm.com \
    --cc=dvhltc@us.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=josht@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=oleg@tv-sign.ru \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox