From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Gautham R Shenoy <ego@in.ibm.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>,
Dipankar Sarma <dipankar@in.ibm.com>, Ingo Molnar <mingo@elte.hu>,
laijs@cn.fujitsu.com, Peter Zijlstra <a.p.zijlstra@chello.nl>,
lkml <linux-kernel@vger.kernel.org>,
Rusty Russel <rusty@rustcorp.com.au>
Subject: Re: [PATCH] fix rcu vs hotplug race
Date: Tue, 1 Jul 2008 12:46:13 -0700 [thread overview]
Message-ID: <20080701194613.GC7488@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080701053900.GB8205@in.ibm.com>
On Tue, Jul 01, 2008 at 11:09:00AM +0530, Gautham R Shenoy wrote:
> On Fri, Jun 27, 2008 at 07:58:45AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 27, 2008 at 11:19:59AM +0530, Dhaval Giani wrote:
> > > On Fri, Jun 27, 2008 at 10:48:55AM +0530, Dipankar Sarma wrote:
> > > > On Fri, Jun 27, 2008 at 10:17:38AM +0530, Gautham R Shenoy wrote:
> > > > > IMHO the warning is a spurious one.
> > > > > Here's the timeline.
> > > > > CPU_A CPU_B
> > > > > --------------------------------------------------------------
> > > > > cpu_down(): .
> > > > > . .
> > > > > . .
> > > > > stop_machine(): /* disables preemption, .
> > > > > * and irqs */ .
> > > > > . .
> > > > > . .
> > > > > take_cpu_down(); .
> > > > > . .
> > > > > . .
> > > > > . .
> > > > > cpu_disable(); /*this removes cpu .
> > > > > *from cpu_online_map .
> > > > > */ .
> > > > > . .
> > > > > . .
> > > > > restart_machine(); /* enables irqs */ .
> > > > > ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> > > > > . call_rcu();
> > > > > . /* disables irqs here */
> > > > > . .force_quiescent_state();
> > > > > .CPU_DEAD: .for_each_cpu(rcp->cpumask)
> > > > > . . smp_send_reschedule();
> > > > > . .
> > > > > . . WARN_ON() for offlined CPU!
> > > > > .
> > > >
> > > > Exactly. The call_rcu()s are coming from a different subsystem
> > > > and can happen anytime during the CPU hotplug path. So, RCU subsystem
> > > > doesn't have anything to do to keep rcu->cpumask consistent.
> > > > It is *safe* even if we miss poking a cpu or two while
> > > > forcing quiescent state in all CPUs. The worst that can happen
> > > > is a delay in grace period. No correctness problem here.
> > > >
> > >
> > > One question. What is preventing a CPU from clearing its mask after we
> > > have checked whether it is online but before we have called into
> > > smp_send_reschedule?
> >
> > This is my concern as well. Gautham, at which point in the above
> > timeline is the offlining CPU marked DYING? Before stop_machine(), right?
>
> No :) The offlining CPU is marked DYING after stop_machine(), inside
> take_cpu_down() which is the work we want to execute after stopping the
> machine.
>
> it's like
> _cpu_down()
> |
> |-> stop_machine_run();
> | |
> | |-> stop_machine(); /* All CPUs irqs disabled. */
> | |
> | |-> take_cpu_down() --> sets state to CPU_DYING. disables irqs on
> | | offlined cpu
> | |
> | |-> restart_machine(); /* All CPUs irqs reenabled */
> |
> |-> send_CPU_DEAD_notification.
>
> The very fact that a thread is running with irqs disabled means that
> stop_machine_run() thread cannot start executing the work it has been
> assinged to execute. Because for Machine to be stopped, stop_machine()
> needs to create n-1 high priority threads on n-1 online cpus, which will
> disable interrupts and preemption, and stop the machine. Then it will
> run the task assigned to it on the ith cpu, which in this case is the
> cpu to be offlined.
>
> So, it's the design of stop_machine() that's preventing someone
> from updating the cpu_online_map while
> force_quiescent_state() is performing the
> cpu_is_online() check. Becase we always call force_quiescent_state()
> with irqs disabled :)
Got it, so the patch looks good.
Thanx, Paul
> > If so, can't we just disable irqs, check for DYING or DEAD, and invoke
> > smp_send_reschedule() only if not DYING or DEAD?
>
>
>
> >
> > Thanx, Paul
>
> --
> Thanks and Regards
> gautham
next prev parent reply other threads:[~2008-07-01 19:46 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-23 10:37 [PATCH] fix rcu vs hotplug race Dhaval Giani
2008-06-23 10:58 ` Ingo Molnar
2008-06-23 11:49 ` Gautham R Shenoy
2008-06-24 11:01 ` Ingo Molnar
2008-06-26 15:27 ` Paul E. McKenney
2008-06-27 4:47 ` Gautham R Shenoy
2008-06-27 5:18 ` Dipankar Sarma
2008-06-27 5:49 ` Dhaval Giani
2008-06-27 14:58 ` Paul E. McKenney
2008-07-01 5:39 ` Gautham R Shenoy
2008-07-01 6:16 ` Ingo Molnar
2008-07-01 6:28 ` Dhaval Giani
2008-07-01 6:35 ` Ingo Molnar
2008-07-01 6:52 ` Ingo Molnar
2008-07-01 7:48 ` Ingo Molnar
2008-07-01 8:32 ` Ingo Molnar
2008-07-01 19:46 ` Paul E. McKenney [this message]
2008-08-01 21:01 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080701194613.GC7488@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=dhaval@linux.vnet.ibm.com \
--cc=dipankar@in.ibm.com \
--cc=ego@in.ibm.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rusty@rustcorp.com.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.