[PATCH] fix rcu vs hotplug race

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] fix rcu vs hotplug race
@ 2008-06-23 10:37 Dhaval Giani
  2008-06-23 10:58 ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Dhaval Giani @ 2008-06-23 10:37 UTC (permalink / raw)
  To: paulmck, Dipankar Sarma, Gautham Shenoy, laijs, Ingo Molnar,
	Peter Zijlstra
  Cc: lkml

On running kernel compiles in parallel with cpu hotplug,

------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:118
native_smp_send_reschedule+0x21/0x36()
Modules linked in:
Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
 [<c01217d9>] warn_on_slowpath+0x41/0x5d
 [<c01515b7>] ? generic_file_aio_read+0x10f/0x137
 [<c0151340>] ? file_read_actor+0x0/0xf7
 [<c013ae4c>] ? validate_chain+0xaa/0x29c
 [<c013c854>] ? __lock_acquire+0x612/0x666
 [<c013c854>] ? __lock_acquire+0x612/0x666
 [<c013ae4c>] ? validate_chain+0xaa/0x29c
 [<c01715d3>] ? file_kill+0x2d/0x30
 [<c013cbd7>] ? __lock_release+0x4b/0x51
 [<c01715d3>] ? file_kill+0x2d/0x30
 [<c0110355>] native_smp_send_reschedule+0x21/0x36
 [<c014fe8f>] force_quiescent_state+0x47/0x57
 [<c014fef0>] call_rcu+0x51/0x6d
 [<c01713b3>] __fput+0x130/0x158
 [<c0171231>] fput+0x17/0x19
 [<c016fd99>] filp_close+0x4d/0x57
 [<c016fdff>] sys_close+0x5c/0x97
 [<c0103861>] sysenter_past_esp+0x6a/0xb1
 =======================
---[ end trace aa35f3913ddf2d06 ]---

This is because a reschedule is sent to a CPU which is offline.
Just ensure that the CPU we send the smp_send_reschedule is actually
online.

Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
---
 kernel/rcuclassic.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.26-rc7/kernel/rcuclassic.c
===================================================================
--- linux-2.6.26-rc7.orig/kernel/rcuclassic.c
+++ linux-2.6.26-rc7/kernel/rcuclassic.c
@@ -93,7 +93,8 @@ static void force_quiescent_state(struct
 		cpumask = rcp->cpumask;
 		cpu_clear(rdp->cpu, cpumask);
 		for_each_cpu_mask(cpu, cpumask)
-			smp_send_reschedule(cpu);
+			if (cpu_online(cpu))
+				smp_send_reschedule(cpu);
 	}
 }
 #else
-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-23 10:37 [PATCH] fix rcu vs hotplug race Dhaval Giani
@ 2008-06-23 10:58 ` Ingo Molnar
  2008-06-23 11:49   ` Gautham R Shenoy
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-06-23 10:58 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: paulmck, Dipankar Sarma, Gautham Shenoy, laijs, Peter Zijlstra,
	lkml, Paul E. McKenney


* Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:

> On running kernel compiles in parallel with cpu hotplug,
> 
> ------------[ cut here ]------------
> WARNING: at arch/x86/kernel/smp.c:118
> native_smp_send_reschedule+0x21/0x36()
> Modules linked in:
> Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
>  [<c01217d9>] warn_on_slowpath+0x41/0x5d
>  [<c01515b7>] ? generic_file_aio_read+0x10f/0x137
>  [<c0151340>] ? file_read_actor+0x0/0xf7
>  [<c013ae4c>] ? validate_chain+0xaa/0x29c
>  [<c013c854>] ? __lock_acquire+0x612/0x666
>  [<c013c854>] ? __lock_acquire+0x612/0x666
>  [<c013ae4c>] ? validate_chain+0xaa/0x29c
>  [<c01715d3>] ? file_kill+0x2d/0x30
>  [<c013cbd7>] ? __lock_release+0x4b/0x51
>  [<c01715d3>] ? file_kill+0x2d/0x30
>  [<c0110355>] native_smp_send_reschedule+0x21/0x36
>  [<c014fe8f>] force_quiescent_state+0x47/0x57
>  [<c014fef0>] call_rcu+0x51/0x6d
>  [<c01713b3>] __fput+0x130/0x158
>  [<c0171231>] fput+0x17/0x19
>  [<c016fd99>] filp_close+0x4d/0x57
>  [<c016fdff>] sys_close+0x5c/0x97
>  [<c0103861>] sysenter_past_esp+0x6a/0xb1
>  =======================
> ---[ end trace aa35f3913ddf2d06 ]---
> 
> This is because a reschedule is sent to a CPU which is offline.
> Just ensure that the CPU we send the smp_send_reschedule is actually
> online.
> 
> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> ---
>  kernel/rcuclassic.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.26-rc7/kernel/rcuclassic.c
> ===================================================================
> --- linux-2.6.26-rc7.orig/kernel/rcuclassic.c
> +++ linux-2.6.26-rc7/kernel/rcuclassic.c
> @@ -93,7 +93,8 @@ static void force_quiescent_state(struct
>  		cpumask = rcp->cpumask;
>  		cpu_clear(rdp->cpu, cpumask);
>  		for_each_cpu_mask(cpu, cpumask)
> -			smp_send_reschedule(cpu);
> +			if (cpu_online(cpu))
> +				smp_send_reschedule(cpu);
>  	}

hm, not sure - we might just be fighting the symptom and we might now 
create a silent resource leak instead. Isnt a full RCU quiescent state 
forced (on all CPUs) before a CPU is cleared out of cpu_online_map? That 
way the to-be-offlined CPU should never actually show up in 
rcp->cpumask.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-23 10:58 ` Ingo Molnar
@ 2008-06-23 11:49   ` Gautham R Shenoy
  2008-06-24 11:01     ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Gautham R Shenoy @ 2008-06-23 11:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dhaval Giani, paulmck, Dipankar Sarma, laijs, Peter Zijlstra,
	lkml, Paul E. McKenney

On Mon, Jun 23, 2008 at 12:58:44PM +0200, Ingo Molnar wrote:
> 
> * Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:
> 
> > On running kernel compiles in parallel with cpu hotplug,
> > 
> > ------------[ cut here ]------------
> > WARNING: at arch/x86/kernel/smp.c:118
> > native_smp_send_reschedule+0x21/0x36()
> > Modules linked in:
> > Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
> >  [<c01217d9>] warn_on_slowpath+0x41/0x5d
> >  [<c01515b7>] ? generic_file_aio_read+0x10f/0x137
> >  [<c0151340>] ? file_read_actor+0x0/0xf7
> >  [<c013ae4c>] ? validate_chain+0xaa/0x29c
> >  [<c013c854>] ? __lock_acquire+0x612/0x666
> >  [<c013c854>] ? __lock_acquire+0x612/0x666
> >  [<c013ae4c>] ? validate_chain+0xaa/0x29c
> >  [<c01715d3>] ? file_kill+0x2d/0x30
> >  [<c013cbd7>] ? __lock_release+0x4b/0x51
> >  [<c01715d3>] ? file_kill+0x2d/0x30
> >  [<c0110355>] native_smp_send_reschedule+0x21/0x36
> >  [<c014fe8f>] force_quiescent_state+0x47/0x57
> >  [<c014fef0>] call_rcu+0x51/0x6d
> >  [<c01713b3>] __fput+0x130/0x158
> >  [<c0171231>] fput+0x17/0x19
> >  [<c016fd99>] filp_close+0x4d/0x57
> >  [<c016fdff>] sys_close+0x5c/0x97
> >  [<c0103861>] sysenter_past_esp+0x6a/0xb1
> >  =======================
> > ---[ end trace aa35f3913ddf2d06 ]---
> > 
> > This is because a reschedule is sent to a CPU which is offline.
> > Just ensure that the CPU we send the smp_send_reschedule is actually
> > online.
> > 
> > Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
> > ---
> >  kernel/rcuclassic.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > Index: linux-2.6.26-rc7/kernel/rcuclassic.c
> > ===================================================================
> > --- linux-2.6.26-rc7.orig/kernel/rcuclassic.c
> > +++ linux-2.6.26-rc7/kernel/rcuclassic.c
> > @@ -93,7 +93,8 @@ static void force_quiescent_state(struct
> >  		cpumask = rcp->cpumask;
> >  		cpu_clear(rdp->cpu, cpumask);
> >  		for_each_cpu_mask(cpu, cpumask)
> > -			smp_send_reschedule(cpu);
> > +			if (cpu_online(cpu))
> > +				smp_send_reschedule(cpu);
> >  	}
> 

Hi Ingo,

> hm, not sure - we might just be fighting the symptom and we might now 
> create a silent resource leak instead. Isnt a full RCU quiescent state 
> forced (on all CPUs) before a CPU is cleared out of cpu_online_map? That 
> way the to-be-offlined CPU should never actually show up in 
> rcp->cpumask.

No, this does not happen currently. The rcp->cpumask is always
initialized to  cpu_online_map&~nohz_cpu_mask when we start a new batch.
Hence, before the batch ends, if a cpu goes offline we _can_ have a
stale rcp->cpumask, till the RCU subsystem has handled it's CPU_DEAD
notification.

Thus for a tiny interval, the rcp->cpumask would contain the offlined
CPU. One of the alternatives is probably to handle this using CPU_DYING
notifier instead of CPU_DEAD where we can call __rcu_offline_cpu().

The warn_on that dhaval was hitting was because of some
cpu-offline that was called just before we did a local_irq_save inside
call_rcu(). But at that time, the rcp->cpumask was still stale, and
hence we ended up sending a smp_reschedule() to an offlined cpu. So the
check may not create any resource leak.

But probably there's a better way to fix this.
> 
> 	Ingo

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-23 11:49   ` Gautham R Shenoy
@ 2008-06-24 11:01     ` Ingo Molnar
  2008-06-26 15:27       ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-06-24 11:01 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Dhaval Giani, paulmck, Dipankar Sarma, laijs, Peter Zijlstra,
	lkml, Paul E. McKenney


* Gautham R Shenoy <ego@in.ibm.com> wrote:

> > hm, not sure - we might just be fighting the symptom and we might 
> > now create a silent resource leak instead. Isnt a full RCU quiescent 
> > state forced (on all CPUs) before a CPU is cleared out of 
> > cpu_online_map? That way the to-be-offlined CPU should never 
> > actually show up in rcp->cpumask.
> 
> No, this does not happen currently. The rcp->cpumask is always 
> initialized to cpu_online_map&~nohz_cpu_mask when we start a new 
> batch. Hence, before the batch ends, if a cpu goes offline we _can_ 
> have a stale rcp->cpumask, till the RCU subsystem has handled it's 
> CPU_DEAD notification.
> 
> Thus for a tiny interval, the rcp->cpumask would contain the offlined 
> CPU. One of the alternatives is probably to handle this using 
> CPU_DYING notifier instead of CPU_DEAD where we can call 
> __rcu_offline_cpu().
> 
> The warn_on that dhaval was hitting was because of some cpu-offline 
> that was called just before we did a local_irq_save inside call_rcu(). 
> But at that time, the rcp->cpumask was still stale, and hence we ended 
> up sending a smp_reschedule() to an offlined cpu. So the check may not 
> create any resource leak.

the check may not - but the problem it highlights might and with the 
patch we'd end up hiding potential problems in this area.

Paul, what do you think about this mixed CPU hotplug plus RCU workload?

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-24 11:01     ` Ingo Molnar
@ 2008-06-26 15:27       ` Paul E. McKenney
  2008-06-27  4:47         ` Gautham R Shenoy
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2008-06-26 15:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gautham R Shenoy, Dhaval Giani, Dipankar Sarma, laijs,
	Peter Zijlstra, lkml

On Tue, Jun 24, 2008 at 01:01:44PM +0200, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <ego@in.ibm.com> wrote:
> 
> > > hm, not sure - we might just be fighting the symptom and we might 
> > > now create a silent resource leak instead. Isnt a full RCU quiescent 
> > > state forced (on all CPUs) before a CPU is cleared out of 
> > > cpu_online_map? That way the to-be-offlined CPU should never 
> > > actually show up in rcp->cpumask.
> > 
> > No, this does not happen currently. The rcp->cpumask is always 
> > initialized to cpu_online_map&~nohz_cpu_mask when we start a new 
> > batch. Hence, before the batch ends, if a cpu goes offline we _can_ 
> > have a stale rcp->cpumask, till the RCU subsystem has handled it's 
> > CPU_DEAD notification.
> > 
> > Thus for a tiny interval, the rcp->cpumask would contain the offlined 
> > CPU. One of the alternatives is probably to handle this using 
> > CPU_DYING notifier instead of CPU_DEAD where we can call 
> > __rcu_offline_cpu().
> > 
> > The warn_on that dhaval was hitting was because of some cpu-offline 
> > that was called just before we did a local_irq_save inside call_rcu(). 
> > But at that time, the rcp->cpumask was still stale, and hence we ended 
> > up sending a smp_reschedule() to an offlined cpu. So the check may not 
> > create any resource leak.
> 
> the check may not - but the problem it highlights might and with the 
> patch we'd end up hiding potential problems in this area.
> 
> Paul, what do you think about this mixed CPU hotplug plus RCU workload?

RCU most certainly needs to work correctly in face of arbitrary sequences
of CPU-hotplug events, and should therefore be tested with arbitrary
CPU-hotplug tests.  And RCU also most certainly needs to refrain from
issuing spurious warning messages that might over time be ignored,
possibly causing someone to miss a real bug.  My concern with this patch
is in the second spurious-warning area.

Not sure I answered the actual question, though...

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-26 15:27       ` Paul E. McKenney
@ 2008-06-27  4:47         ` Gautham R Shenoy
  2008-06-27  5:18           ` Dipankar Sarma
  0 siblings, 1 reply; 18+ messages in thread
From: Gautham R Shenoy @ 2008-06-27  4:47 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Ingo Molnar, Dhaval Giani, Dipankar Sarma, laijs, Peter Zijlstra,
	lkml, Rusty Russel

On Thu, Jun 26, 2008 at 08:27:28AM -0700, Paul E. McKenney wrote:
> On Tue, Jun 24, 2008 at 01:01:44PM +0200, Ingo Molnar wrote:
> > 
> > * Gautham R Shenoy <ego@in.ibm.com> wrote:
> > 
> > > > hm, not sure - we might just be fighting the symptom and we might 
> > > > now create a silent resource leak instead. Isnt a full RCU quiescent 
> > > > state forced (on all CPUs) before a CPU is cleared out of 
> > > > cpu_online_map? That way the to-be-offlined CPU should never 
> > > > actually show up in rcp->cpumask.
> > > 
> > > No, this does not happen currently. The rcp->cpumask is always 
> > > initialized to cpu_online_map&~nohz_cpu_mask when we start a new 
> > > batch. Hence, before the batch ends, if a cpu goes offline we _can_ 
> > > have a stale rcp->cpumask, till the RCU subsystem has handled it's 
> > > CPU_DEAD notification.
> > > 
> > > Thus for a tiny interval, the rcp->cpumask would contain the offlined 
> > > CPU. One of the alternatives is probably to handle this using 
> > > CPU_DYING notifier instead of CPU_DEAD where we can call 
> > > __rcu_offline_cpu().
> > > 
> > > The warn_on that dhaval was hitting was because of some cpu-offline 
> > > that was called just before we did a local_irq_save inside call_rcu(). 
> > > But at that time, the rcp->cpumask was still stale, and hence we ended 
> > > up sending a smp_reschedule() to an offlined cpu. So the check may not 
> > > create any resource leak.
> > 
> > the check may not - but the problem it highlights might and with the 
> > patch we'd end up hiding potential problems in this area.
> > 
> > Paul, what do you think about this mixed CPU hotplug plus RCU workload?
> 
> RCU most certainly needs to work correctly in face of arbitrary sequences
> of CPU-hotplug events, and should therefore be tested with arbitrary
> CPU-hotplug tests.  And RCU also most certainly needs to refrain from
> issuing spurious warning messages that might over time be ignored,
> possibly causing someone to miss a real bug.  My concern with this patch
> is in the second spurious-warning area.

IMHO the warning is a spurious one.
Here's the timeline.
CPU_A						 CPU_B
--------------------------------------------------------------
cpu_down():					.
.					   	.
.						.
stop_machine(): /* disables preemption,		.
		 * and irqs */			.
.						.
.						.
take_cpu_down();				.
.						.
.						.
.						.
cpu_disable(); /*this removes cpu 		.
		*from cpu_online_map 		.
		*/				.
.						.
.						.
restart_machine(); /* enables irqs */		.
------WINDOW DURING WHICH rcp->cpumask is stale ---------------
.						call_rcu();
.						/* disables irqs here */
.						.force_quiescent_state();
.CPU_DEAD:					.for_each_cpu(rcp->cpumask)
.						.   smp_send_reschedule();
.						.
.						.   WARN_ON() for offlined CPU!
.
.
.
rcu_cpu_notify:
.
-------- WINDOW ENDS ------------------------------------------
rcu_offline_cpu() /* Which calls cpu_quiet()
		   * which removes
		   * cpu from rcp->cpumask.
		   */


If a new batch was started just before calling stop_machine_run(), the
"tobe-offlined" cpu is still present in rcp-cpumask.

During a cpu-offline, from take_cpu_down(),
we queue an rt-prio idle task
as the next task to be picked by the scheduler.
We also call cpu_disable() which will disable any further
interrupts and remove the cpu's bit from the cpu_online_map.

Once the stop_machine_run() successfully calls take_cpu_down(),
it calls schedule(). That's the last time a schedule is called
on the offlined cpu, and hence the last time when rdp->passed_quiesc
will be set to 1 through rcu_qsctr_inc().

But the cpu_quiet() will be on this cpu will be called only when the
next RCU_SOFTIRQ occurs on this CPU. So at this time, the offlined CPU
is still set in rcp->cpumask.

Now coming back to the idle_task which truely offlines the CPU, it does
check for a pending RCU and raises the softirq, since it will find
rdp->passed_quiesc to be 0 in this case. However, since the cpu is offline
I am not sure if the softirq will trigger on the CPU.

Even if it doesn't the rcu_offline_cpu() will find that rcp->completed
is not the same as rcp->cur, which means that our cpu could be holding
up the grace period progression. Hence we call cpu_quiet() and move
ahead.

But because of the window explained in the timeline, we could still have
a call_rcu() before the RCU subsystem executes it's CPU_DEAD
notification, and we send smp_send_reschedule() to offlined cpu while
trying to force the quiescent states. The appended patch adds comments
and prevents checking for offlined cpu everytime.

Author: Gautham R Shenoy <ego@in.ibm.com>
Date:   Fri Jun 27 09:33:55 2008 +0530

    cpu_online_map is updated by the _cpu_down()
    using stop_machine_run(). Since force_quiescent_state is invoked from
    irqs disabled section, stop_machine_run() won't be executing while
    a cpu is executing force_quiescent_state(). Hence the
    cpu_online_map is stable while we're in the irq disabled section.

    However,  a cpu might have been offlined _just_ before
    we disabled irqs while entering force_quiescent_state().
    And rcu subsystem might not yet have handled the CPU_DEAD
    notification, leading to the offlined cpu's bit
    being set in the rcp->cpumask.

    Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent
    sending smp_reschedule() to an offlined CPU.

    Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
    Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>

diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
index f4ffbd0..a38895a 100644
--- a/kernel/rcuclassic.c
+++ b/kernel/rcuclassic.c
@@ -89,8 +89,22 @@ static void force_quiescent_state(struct rcu_data *rdp,
 		/*
 		 * Don't send IPI to itself. With irqs disabled,
 		 * rdp->cpu is the current cpu.
+		 *
+		 * cpu_online_map is updated by the _cpu_down()
+		 * using stop_machine_run(). Since we're in irqs disabled
+		 * section, stop_machine_run() is not exectuting, hence
+		 * the cpu_online_map is stable.
+		 *
+		 * However,  a cpu might have been offlined _just_ before
+		 * we disabled irqs while entering here.
+		 * And rcu subsystem might not yet have handled the CPU_DEAD
+		 * notification, leading to the offlined cpu's bit
+		 * being set in the rcp->cpumask.
+		 *
+		 * Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent
+		 * sending smp_reschedule() to an offlined CPU.
 		 */
-		cpumask = rcp->cpumask;
+		cpus_and(cpumask, rcp->cpumask, cpu_online_map);
 		cpu_clear(rdp->cpu, cpumask);
 		for_each_cpu_mask(cpu, cpumask)
 			smp_send_reschedule(cpu);



> 
> Not sure I answered the actual question, though...
> 
> 							Thanx, Paul

-- 
Thanks and Regards
gautham

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-27  4:47         ` Gautham R Shenoy
@ 2008-06-27  5:18           ` Dipankar Sarma
  2008-06-27  5:49             ` Dhaval Giani
  0 siblings, 1 reply; 18+ messages in thread
From: Dipankar Sarma @ 2008-06-27  5:18 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Paul E. McKenney, Ingo Molnar, Dhaval Giani, laijs,
	Peter Zijlstra, lkml, Rusty Russel

On Fri, Jun 27, 2008 at 10:17:38AM +0530, Gautham R Shenoy wrote:
> IMHO the warning is a spurious one.
> Here's the timeline.
> CPU_A						 CPU_B
> --------------------------------------------------------------
> cpu_down():					.
> .					   	.
> .						.
> stop_machine(): /* disables preemption,		.
> 		 * and irqs */			.
> .						.
> .						.
> take_cpu_down();				.
> .						.
> .						.
> .						.
> cpu_disable(); /*this removes cpu 		.
> 		*from cpu_online_map 		.
> 		*/				.
> .						.
> .						.
> restart_machine(); /* enables irqs */		.
> ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> .						call_rcu();
> .						/* disables irqs here */
> .						.force_quiescent_state();
> .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> .						.   smp_send_reschedule();
> .						.
> .						.   WARN_ON() for offlined CPU!
> .

Exactly. The call_rcu()s are coming from a different subsystem
and can happen anytime during the CPU hotplug path. So, RCU subsystem
doesn't have anything to do to keep rcu->cpumask consistent.
It is *safe* even if we miss poking a cpu or two while
forcing quiescent state in all CPUs. The worst that can happen
is a delay in grace period. No correctness problem here.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-27  5:18           ` Dipankar Sarma
@ 2008-06-27  5:49             ` Dhaval Giani
  2008-06-27 14:58               ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Dhaval Giani @ 2008-06-27  5:49 UTC (permalink / raw)
  To: Dipankar Sarma
  Cc: Gautham R Shenoy, Paul E. McKenney, Ingo Molnar, laijs,
	Peter Zijlstra, lkml, Rusty Russel

On Fri, Jun 27, 2008 at 10:48:55AM +0530, Dipankar Sarma wrote:
> On Fri, Jun 27, 2008 at 10:17:38AM +0530, Gautham R Shenoy wrote:
> > IMHO the warning is a spurious one.
> > Here's the timeline.
> > CPU_A						 CPU_B
> > --------------------------------------------------------------
> > cpu_down():					.
> > .					   	.
> > .						.
> > stop_machine(): /* disables preemption,		.
> > 		 * and irqs */			.
> > .						.
> > .						.
> > take_cpu_down();				.
> > .						.
> > .						.
> > .						.
> > cpu_disable(); /*this removes cpu 		.
> > 		*from cpu_online_map 		.
> > 		*/				.
> > .						.
> > .						.
> > restart_machine(); /* enables irqs */		.
> > ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> > .						call_rcu();
> > .						/* disables irqs here */
> > .						.force_quiescent_state();
> > .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> > .						.   smp_send_reschedule();
> > .						.
> > .						.   WARN_ON() for offlined CPU!
> > .
> 
> Exactly. The call_rcu()s are coming from a different subsystem
> and can happen anytime during the CPU hotplug path. So, RCU subsystem
> doesn't have anything to do to keep rcu->cpumask consistent.
> It is *safe* even if we miss poking a cpu or two while
> forcing quiescent state in all CPUs. The worst that can happen
> is a delay in grace period. No correctness problem here.
> 

One question. What is preventing a CPU from clearing its mask after we
have checked whether it is online but before we have called into
smp_send_reschedule?

-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-27  5:49             ` Dhaval Giani
@ 2008-06-27 14:58               ` Paul E. McKenney
  2008-07-01  5:39                 ` Gautham R Shenoy
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2008-06-27 14:58 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: Dipankar Sarma, Gautham R Shenoy, Ingo Molnar, laijs,
	Peter Zijlstra, lkml, Rusty Russel

On Fri, Jun 27, 2008 at 11:19:59AM +0530, Dhaval Giani wrote:
> On Fri, Jun 27, 2008 at 10:48:55AM +0530, Dipankar Sarma wrote:
> > On Fri, Jun 27, 2008 at 10:17:38AM +0530, Gautham R Shenoy wrote:
> > > IMHO the warning is a spurious one.
> > > Here's the timeline.
> > > CPU_A						 CPU_B
> > > --------------------------------------------------------------
> > > cpu_down():					.
> > > .					   	.
> > > .						.
> > > stop_machine(): /* disables preemption,		.
> > > 		 * and irqs */			.
> > > .						.
> > > .						.
> > > take_cpu_down();				.
> > > .						.
> > > .						.
> > > .						.
> > > cpu_disable(); /*this removes cpu 		.
> > > 		*from cpu_online_map 		.
> > > 		*/				.
> > > .						.
> > > .						.
> > > restart_machine(); /* enables irqs */		.
> > > ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> > > .						call_rcu();
> > > .						/* disables irqs here */
> > > .						.force_quiescent_state();
> > > .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> > > .						.   smp_send_reschedule();
> > > .						.
> > > .						.   WARN_ON() for offlined CPU!
> > > .
> > 
> > Exactly. The call_rcu()s are coming from a different subsystem
> > and can happen anytime during the CPU hotplug path. So, RCU subsystem
> > doesn't have anything to do to keep rcu->cpumask consistent.
> > It is *safe* even if we miss poking a cpu or two while
> > forcing quiescent state in all CPUs. The worst that can happen
> > is a delay in grace period. No correctness problem here.
> > 
> 
> One question. What is preventing a CPU from clearing its mask after we
> have checked whether it is online but before we have called into
> smp_send_reschedule?

This is my concern as well.  Gautham, at which point in the above
timeline is the offlining CPU marked DYING?  Before stop_machine(), right?

If so, can't we just disable irqs, check for DYING or DEAD, and invoke
smp_send_reschedule() only if not DYING or DEAD?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-06-27 14:58               ` Paul E. McKenney
@ 2008-07-01  5:39                 ` Gautham R Shenoy
  2008-07-01  6:16                   ` Ingo Molnar
                                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Gautham R Shenoy @ 2008-07-01  5:39 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Dhaval Giani, Dipankar Sarma, Ingo Molnar, laijs, Peter Zijlstra,
	lkml, Rusty Russel

On Fri, Jun 27, 2008 at 07:58:45AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 27, 2008 at 11:19:59AM +0530, Dhaval Giani wrote:
> > On Fri, Jun 27, 2008 at 10:48:55AM +0530, Dipankar Sarma wrote:
> > > On Fri, Jun 27, 2008 at 10:17:38AM +0530, Gautham R Shenoy wrote:
> > > > IMHO the warning is a spurious one.
> > > > Here's the timeline.
> > > > CPU_A						 CPU_B
> > > > --------------------------------------------------------------
> > > > cpu_down():					.
> > > > .					   	.
> > > > .						.
> > > > stop_machine(): /* disables preemption,		.
> > > > 		 * and irqs */			.
> > > > .						.
> > > > .						.
> > > > take_cpu_down();				.
> > > > .						.
> > > > .						.
> > > > .						.
> > > > cpu_disable(); /*this removes cpu 		.
> > > > 		*from cpu_online_map 		.
> > > > 		*/				.
> > > > .						.
> > > > .						.
> > > > restart_machine(); /* enables irqs */		.
> > > > ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> > > > .						call_rcu();
> > > > .						/* disables irqs here */
> > > > .						.force_quiescent_state();
> > > > .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> > > > .						.   smp_send_reschedule();
> > > > .						.
> > > > .						.   WARN_ON() for offlined CPU!
> > > > .
> > > 
> > > Exactly. The call_rcu()s are coming from a different subsystem
> > > and can happen anytime during the CPU hotplug path. So, RCU subsystem
> > > doesn't have anything to do to keep rcu->cpumask consistent.
> > > It is *safe* even if we miss poking a cpu or two while
> > > forcing quiescent state in all CPUs. The worst that can happen
> > > is a delay in grace period. No correctness problem here.
> > > 
> > 
> > One question. What is preventing a CPU from clearing its mask after we
> > have checked whether it is online but before we have called into
> > smp_send_reschedule?
> 
> This is my concern as well.  Gautham, at which point in the above
> timeline is the offlining CPU marked DYING?  Before stop_machine(), right?

No :) The offlining CPU is marked DYING after stop_machine(), inside
take_cpu_down() which is the work we want to execute after stopping the
machine.

it's like
_cpu_down()
|
|-> stop_machine_run();
|   |
|   |-> stop_machine(); /* All CPUs irqs disabled. */
|   |
|   |-> take_cpu_down() --> sets state to CPU_DYING. disables irqs on
|   |				offlined cpu
|   |
|   |-> restart_machine(); /* All CPUs irqs reenabled */
|
|-> send_CPU_DEAD_notification.

The very fact that a thread is running with irqs disabled means that
stop_machine_run() thread cannot start executing the work it has been
assinged to execute. Because for Machine to be stopped, stop_machine()
needs to create n-1 high priority threads on n-1 online cpus, which will
disable interrupts and preemption, and stop the machine. Then it will
run the task assigned to it on the ith cpu, which in this case is the
cpu to be offlined.

So, it's the design of stop_machine() that's preventing someone
from updating the cpu_online_map while
force_quiescent_state() is performing the
cpu_is_online() check. Becase we always call force_quiescent_state()
with irqs disabled :)

> 
> If so, can't we just disable irqs, check for DYING or DEAD, and invoke
> smp_send_reschedule() only if not DYING or DEAD?



> 
> 							Thanx, Paul

-- 
Thanks and Regards
gautham

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-07-01  5:39                 ` Gautham R Shenoy
@ 2008-07-01  6:16                   ` Ingo Molnar
  2008-07-01  6:28                     ` Dhaval Giani
  2008-07-01 19:46                   ` Paul E. McKenney
  2008-08-01 21:01                   ` Paul E. McKenney
  2 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-07-01  6:16 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Paul E. McKenney, Dhaval Giani, Dipankar Sarma, laijs,
	Peter Zijlstra, lkml, Rusty Russel


* Gautham R Shenoy <ego@in.ibm.com> wrote:

> So, it's the design of stop_machine() that's preventing someone from 
> updating the cpu_online_map while force_quiescent_state() is 
> performing the cpu_is_online() check. Becase we always call 
> force_quiescent_state() with irqs disabled :)

Paul, do you concur? I'll apply the commit in the form below to 
tip/core/urgent if Paul agrees.

	Ingo

----------------------------------->
Subject: cpu-hotplug + rcu: fix spurious warning
From: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Date: Mon, 23 Jun 2008 16:07:00 +0530

On running kernel compiles in parallel with cpu hotplug,

------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:118
native_smp_send_reschedule+0x21/0x36()
Modules linked in:
Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
 [<c01217d9>] warn_on_slowpath+0x41/0x5d
 [<c01515b7>] ? generic_file_aio_read+0x10f/0x137
 [<c0151340>] ? file_read_actor+0x0/0xf7
 [<c013ae4c>] ? validate_chain+0xaa/0x29c
 [<c013c854>] ? __lock_acquire+0x612/0x666
 [<c013c854>] ? __lock_acquire+0x612/0x666
 [<c013ae4c>] ? validate_chain+0xaa/0x29c
 [<c01715d3>] ? file_kill+0x2d/0x30
 [<c013cbd7>] ? __lock_release+0x4b/0x51
 [<c01715d3>] ? file_kill+0x2d/0x30
 [<c0110355>] native_smp_send_reschedule+0x21/0x36
 [<c014fe8f>] force_quiescent_state+0x47/0x57
 [<c014fef0>] call_rcu+0x51/0x6d
 [<c01713b3>] __fput+0x130/0x158
 [<c0171231>] fput+0x17/0x19
 [<c016fd99>] filp_close+0x4d/0x57
 [<c016fdff>] sys_close+0x5c/0x97
 [<c0103861>] sysenter_past_esp+0x6a/0xb1
 =======================
---[ end trace aa35f3913ddf2d06 ]---

This is because a reschedule is sent to a CPU which is offline.
Just ensure that the CPU we send the smp_send_reschedule is actually
online.

It's the design of stop_machine() that's preventing someone from
updating the cpu_online_map while force_quiescent_state() is
performing the cpu_is_online() check. Becase we always call
force_quiescent_state() with irqs disabled :)

Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Acked-by: Gautham R Shenoy <ego@in.ibm.com>
Cc: paulmck@linux.vnet.ibm.com
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/rcuclassic.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: tip/kernel/rcuclassic.c
===================================================================
--- tip.orig/kernel/rcuclassic.c
+++ tip/kernel/rcuclassic.c
@@ -93,7 +93,8 @@ static void force_quiescent_state(struct
 		cpumask = rcp->cpumask;
 		cpu_clear(rdp->cpu, cpumask);
 		for_each_cpu_mask(cpu, cpumask)
-			smp_send_reschedule(cpu);
+			if (cpu_online(cpu))
+				smp_send_reschedule(cpu);
 	}
 }
 #else

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-07-01  6:16                   ` Ingo Molnar
@ 2008-07-01  6:28                     ` Dhaval Giani
  2008-07-01  6:35                       ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Dhaval Giani @ 2008-07-01  6:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Gautham R Shenoy, Paul E. McKenney, Dipankar Sarma, laijs,
	Peter Zijlstra, lkml, Rusty Russel

On Tue, Jul 01, 2008 at 08:16:01AM +0200, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <ego@in.ibm.com> wrote:
> 
> > So, it's the design of stop_machine() that's preventing someone from 
> > updating the cpu_online_map while force_quiescent_state() is 
> > performing the cpu_is_online() check. Becase we always call 
> > force_quiescent_state() with irqs disabled :)
> 
> Paul, do you concur? I'll apply the commit in the form below to 
> tip/core/urgent if Paul agrees.
> 
> 	Ingo

Ingo,

I believe Gautham's fix at http://lkml.org/lkml/2008/6/27/9 is better
and also explains it better.

Thanks.
-- 
regards,
Dhaval

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-07-01  6:28                     ` Dhaval Giani
@ 2008-07-01  6:35                       ` Ingo Molnar
  2008-07-01  6:52                         ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-07-01  6:35 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: Gautham R Shenoy, Paul E. McKenney, Dipankar Sarma, laijs,
	Peter Zijlstra, lkml, Rusty Russel


* Dhaval Giani <dhaval@linux.vnet.ibm.com> wrote:

> Ingo,
> 
> I believe Gautham's fix at http://lkml.org/lkml/2008/6/27/9 is better 
> and also explains it better.

ah, indeed - picked that one up instead.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-07-01  6:35                       ` Ingo Molnar
@ 2008-07-01  6:52                         ` Ingo Molnar
  2008-07-01  7:48                           ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-07-01  6:52 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: Gautham R Shenoy, Paul E. McKenney, Dipankar Sarma, laijs,
	Peter Zijlstra, lkml, Rusty Russel


* Ingo Molnar <mingo@elte.hu> wrote:

> > Ingo,
> > 
> > I believe Gautham's fix at http://lkml.org/lkml/2008/6/27/9 is 
> > better and also explains it better.
> 
> ah, indeed - picked that one up instead.

this is the patch i picked up:

-------------------------->
Subject: rcu: fix hotplug vs rcu race
From: Gautham R Shenoy <ego@in.ibm.com>
Date: Fri, 27 Jun 2008 10:17:38 +0530

Dhaval Giani reported this warning during cpu hotplug stress-tests:

| On running kernel compiles in parallel with cpu hotplug:
|
| WARNING: at arch/x86/kernel/smp.c:118
| native_smp_send_reschedule+0x21/0x36()
| Modules linked in:
| Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
| [...]
|  [<c0110355>] native_smp_send_reschedule+0x21/0x36
|  [<c014fe8f>] force_quiescent_state+0x47/0x57
|  [<c014fef0>] call_rcu+0x51/0x6d
|  [<c01713b3>] __fput+0x130/0x158
|  [<c0171231>] fput+0x17/0x19
|  [<c016fd99>] filp_close+0x4d/0x57
|  [<c016fdff>] sys_close+0x5c/0x97

IMHO the warning is a spurious one.

cpu_online_map is updated by the _cpu_down() using stop_machine_run().
Since force_quiescent_state is invoked from irqs disabled section,
stop_machine_run() won't be executing while a cpu is executing
force_quiescent_state(). Hence the cpu_online_map is stable while we're
in the irq disabled section.

However, a cpu might have been offlined _just_ before we disabled irqs
while entering force_quiescent_state(). And rcu subsystem might not yet
have handled the CPU_DEAD notification, leading to the offlined cpu's
bit being set in the rcp->cpumask.

Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent sending
smp_reschedule() to an offlined CPU.

Here's the timeline:

CPU_A						 CPU_B
--------------------------------------------------------------
cpu_down():					.
.					   	.
.						.
stop_machine(): /* disables preemption,		.
		 * and irqs */			.
.						.
.						.
take_cpu_down();				.
.						.
.						.
.						.
cpu_disable(); /*this removes cpu 		.
		*from cpu_online_map 		.
		*/				.
.						.
.						.
restart_machine(); /* enables irqs */		.
------WINDOW DURING WHICH rcp->cpumask is stale ---------------
.						call_rcu();
.						/* disables irqs here */
.						.force_quiescent_state();
.CPU_DEAD:					.for_each_cpu(rcp->cpumask)
.						.   smp_send_reschedule();
.						.
.						.   WARN_ON() for offlined CPU!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-07-01  6:52                         ` Ingo Molnar
@ 2008-07-01  7:48                           ` Ingo Molnar
  2008-07-01  8:32                             ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2008-07-01  7:48 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: Gautham R Shenoy, Paul E. McKenney, Dipankar Sarma, laijs,
	Peter Zijlstra, lkml, Rusty Russel


* Ingo Molnar <mingo@elte.hu> wrote:

> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > > Ingo,
> > > 
> > > I believe Gautham's fix at http://lkml.org/lkml/2008/6/27/9 is 
> > > better and also explains it better.
> > 
> > ah, indeed - picked that one up instead.
> 
> this is the patch i picked up:

for some reason my mail to lkml was cut in half - here it is again:

* Ingo Molnar <mingo@elte.hu> wrote:

> > Ingo,
> > 
> > I believe Gautham's fix at http://lkml.org/lkml/2008/6/27/9 is 
> > better and also explains it better.
> 
> ah, indeed - picked that one up instead.

this is the patch i picked up:

-------------------------->
Subject: rcu: fix hotplug vs rcu race
From: Gautham R Shenoy <ego@in.ibm.com>
Date: Fri, 27 Jun 2008 10:17:38 +0530

Dhaval Giani reported this warning during cpu hotplug stress-tests:

| On running kernel compiles in parallel with cpu hotplug:
|
| WARNING: at arch/x86/kernel/smp.c:118
| native_smp_send_reschedule+0x21/0x36()
| Modules linked in:
| Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
| [...]
|  [<c0110355>] native_smp_send_reschedule+0x21/0x36
|  [<c014fe8f>] force_quiescent_state+0x47/0x57
|  [<c014fef0>] call_rcu+0x51/0x6d
|  [<c01713b3>] __fput+0x130/0x158
|  [<c0171231>] fput+0x17/0x19
|  [<c016fd99>] filp_close+0x4d/0x57
|  [<c016fdff>] sys_close+0x5c/0x97

IMHO the warning is a spurious one.

cpu_online_map is updated by the _cpu_down() using stop_machine_run().
Since force_quiescent_state is invoked from irqs disabled section,
stop_machine_run() won't be executing while a cpu is executing
force_quiescent_state(). Hence the cpu_online_map is stable while we're
in the irq disabled section.

However, a cpu might have been offlined _just_ before we disabled irqs
while entering force_quiescent_state(). And rcu subsystem might not yet
have handled the CPU_DEAD notification, leading to the offlined cpu's
bit being set in the rcp->cpumask.

Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent sending
smp_reschedule() to an offlined CPU.

Here's the timeline:

CPU_A						 CPU_B
--------------------------------------------------------------
cpu_down():					.
.					   	.
.						.
stop_machine(): /* disables preemption,		.
		 * and irqs */			.
.						.
.						.
take_cpu_down();				.
.						.
.						.
.						.
cpu_disable(); /*this removes cpu 		.
		*from cpu_online_map 		.
		*/				.
.						.
.						.
restart_machine(); /* enables irqs */		.
------WINDOW DURING WHICH rcp->cpumask is stale ---------------
.						call_rcu();
.						/* disables irqs here */
.						.force_quiescent_state();
.CPU_DEAD:					.for_each_cpu(rcp->cpumask)
.						.   smp_send_reschedule();
.						.
.						.   WARN_ON() for offlined CPU!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-07-01  7:48                           ` Ingo Molnar
@ 2008-07-01  8:32                             ` Ingo Molnar
  0 siblings, 0 replies; 18+ messages in thread
From: Ingo Molnar @ 2008-07-01  8:32 UTC (permalink / raw)
  To: Dhaval Giani
  Cc: Gautham R Shenoy, Paul E. McKenney, Dipankar Sarma, laijs,
	Peter Zijlstra, lkml, Rusty Russel


* Ingo Molnar <mingo@elte.hu> wrote:

> ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> .						call_rcu();
> .						/* disables irqs here */
> .						.force_quiescent_state();
> .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> .						.   smp_send_reschedule();
> .						.
> .						.   WARN_ON() for offlined CPU!
> --

hm, for some reason the "-- WINDOW ENDS" line has cut off processing 
somewhere along the mail route. Here it is again, with that line 
removed.

----------->
commit 8558f8f81680a43d383abd1b5f23d3501fedfa65
Author: Gautham R Shenoy <ego@in.ibm.com>
Date:   Fri Jun 27 10:17:38 2008 +0530

    rcu: fix hotplug vs rcu race
    
    Dhaval Giani reported this warning during cpu hotplug stress-tests:
    
    | On running kernel compiles in parallel with cpu hotplug:
    |
    | WARNING: at arch/x86/kernel/smp.c:118
    | native_smp_send_reschedule+0x21/0x36()
    | Modules linked in:
    | Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
    | [...]
    |  [<c0110355>] native_smp_send_reschedule+0x21/0x36
    |  [<c014fe8f>] force_quiescent_state+0x47/0x57
    |  [<c014fef0>] call_rcu+0x51/0x6d
    |  [<c01713b3>] __fput+0x130/0x158
    |  [<c0171231>] fput+0x17/0x19
    |  [<c016fd99>] filp_close+0x4d/0x57
    |  [<c016fdff>] sys_close+0x5c/0x97
    
    IMHO the warning is a spurious one.
    
    cpu_online_map is updated by the _cpu_down() using stop_machine_run().
    Since force_quiescent_state is invoked from irqs disabled section,
    stop_machine_run() won't be executing while a cpu is executing
    force_quiescent_state(). Hence the cpu_online_map is stable while we're
    in the irq disabled section.
    
    However, a cpu might have been offlined _just_ before we disabled irqs
    while entering force_quiescent_state(). And rcu subsystem might not yet
    have handled the CPU_DEAD notification, leading to the offlined cpu's
    bit being set in the rcp->cpumask.
    
    Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent sending
    smp_reschedule() to an offlined CPU.
    
    Here's the timeline:
    
    CPU_A						 CPU_B
    --------------------------------------------------------------
    cpu_down():					.
    .					   	.
    .						.
    stop_machine(): /* disables preemption,		.
    		 * and irqs */			.
    .						.
    .						.
    take_cpu_down();				.
    .						.
    .						.
    .						.
    cpu_disable(); /*this removes cpu 		.
    		*from cpu_online_map 		.
    		*/				.
    .						.
    .						.
    restart_machine(); /* enables irqs */		.
    ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
    .						call_rcu();
    .						/* disables irqs here */
    .						.force_quiescent_state();
    .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
    .						.   smp_send_reschedule();
    .						.
    .						.   WARN_ON() for offlined CPU!
    .
    .
    .
    rcu_cpu_notify:
    .
    [ -- line removed -- ]
    rcu_offline_cpu() /* Which calls cpu_quiet()
    		   * which removes
    		   * cpu from rcp->cpumask.
    		   */
    
    If a new batch was started just before calling stop_machine_run(), the
    "tobe-offlined" cpu is still present in rcp-cpumask.
    
    During a cpu-offline, from take_cpu_down(), we queue an rt-prio idle
    task as the next task to be picked by the scheduler. We also call
    cpu_disable() which will disable any further interrupts and remove the
    cpu's bit from the cpu_online_map.
    
    Once the stop_machine_run() successfully calls take_cpu_down(), it calls
    schedule(). That's the last time a schedule is called on the offlined
    cpu, and hence the last time when rdp->passed_quiesc will be set to 1
    through rcu_qsctr_inc().
    
    But the cpu_quiet() will be on this cpu will be called only when the
    next RCU_SOFTIRQ occurs on this CPU. So at this time, the offlined CPU
    is still set in rcp->cpumask.
    
    Now coming back to the idle_task which truely offlines the CPU, it does
    check for a pending RCU and raises the softirq, since it will find
    rdp->passed_quiesc to be 0 in this case. However, since the cpu is
    offline I am not sure if the softirq will trigger on the CPU.
    
    Even if it doesn't the rcu_offline_cpu() will find that rcp->completed
    is not the same as rcp->cur, which means that our cpu could be holding
    up the grace period progression. Hence we call cpu_quiet() and move
    ahead.
    
    But because of the window explained in the timeline, we could still have
    a call_rcu() before the RCU subsystem executes it's CPU_DEAD
    notification, and we send smp_send_reschedule() to offlined cpu while
    trying to force the quiescent states. The appended patch adds comments
    and prevents checking for offlined cpu everytime.
    
    cpu_online_map is updated by the _cpu_down() using stop_machine_run().
    Since force_quiescent_state is invoked from irqs disabled section,
    stop_machine_run() won't be executing while a cpu is executing
    force_quiescent_state(). Hence the cpu_online_map is stable while we're
    in the irq disabled section.
    
    Reported-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
    Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
    Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
    Cc: Dipankar Sarma <dipankar@in.ibm.com>
    Cc: laijs@cn.fujitsu.com
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Rusty Russel <rusty@rustcorp.com.au>
    Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
index f4ffbd0..a38895a 100644
--- a/kernel/rcuclassic.c
+++ b/kernel/rcuclassic.c
@@ -89,8 +89,22 @@ static void force_quiescent_state(struct rcu_data *rdp,
 		/*
 		 * Don't send IPI to itself. With irqs disabled,
 		 * rdp->cpu is the current cpu.
+		 *
+		 * cpu_online_map is updated by the _cpu_down()
+		 * using stop_machine_run(). Since we're in irqs disabled
+		 * section, stop_machine_run() is not exectuting, hence
+		 * the cpu_online_map is stable.
+		 *
+		 * However,  a cpu might have been offlined _just_ before
+		 * we disabled irqs while entering here.
+		 * And rcu subsystem might not yet have handled the CPU_DEAD
+		 * notification, leading to the offlined cpu's bit
+		 * being set in the rcp->cpumask.
+		 *
+		 * Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent
+		 * sending smp_reschedule() to an offlined CPU.
 		 */
-		cpumask = rcp->cpumask;
+		cpus_and(cpumask, rcp->cpumask, cpu_online_map);
 		cpu_clear(rdp->cpu, cpumask);
 		for_each_cpu_mask(cpu, cpumask)
 			smp_send_reschedule(cpu);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-07-01  5:39                 ` Gautham R Shenoy
  2008-07-01  6:16                   ` Ingo Molnar
@ 2008-07-01 19:46                   ` Paul E. McKenney
  2008-08-01 21:01                   ` Paul E. McKenney
  2 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2008-07-01 19:46 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Dhaval Giani, Dipankar Sarma, Ingo Molnar, laijs, Peter Zijlstra,
	lkml, Rusty Russel

On Tue, Jul 01, 2008 at 11:09:00AM +0530, Gautham R Shenoy wrote:
> On Fri, Jun 27, 2008 at 07:58:45AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 27, 2008 at 11:19:59AM +0530, Dhaval Giani wrote:
> > > On Fri, Jun 27, 2008 at 10:48:55AM +0530, Dipankar Sarma wrote:
> > > > On Fri, Jun 27, 2008 at 10:17:38AM +0530, Gautham R Shenoy wrote:
> > > > > IMHO the warning is a spurious one.
> > > > > Here's the timeline.
> > > > > CPU_A						 CPU_B
> > > > > --------------------------------------------------------------
> > > > > cpu_down():					.
> > > > > .					   	.
> > > > > .						.
> > > > > stop_machine(): /* disables preemption,		.
> > > > > 		 * and irqs */			.
> > > > > .						.
> > > > > .						.
> > > > > take_cpu_down();				.
> > > > > .						.
> > > > > .						.
> > > > > .						.
> > > > > cpu_disable(); /*this removes cpu 		.
> > > > > 		*from cpu_online_map 		.
> > > > > 		*/				.
> > > > > .						.
> > > > > .						.
> > > > > restart_machine(); /* enables irqs */		.
> > > > > ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> > > > > .						call_rcu();
> > > > > .						/* disables irqs here */
> > > > > .						.force_quiescent_state();
> > > > > .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> > > > > .						.   smp_send_reschedule();
> > > > > .						.
> > > > > .						.   WARN_ON() for offlined CPU!
> > > > > .
> > > > 
> > > > Exactly. The call_rcu()s are coming from a different subsystem
> > > > and can happen anytime during the CPU hotplug path. So, RCU subsystem
> > > > doesn't have anything to do to keep rcu->cpumask consistent.
> > > > It is *safe* even if we miss poking a cpu or two while
> > > > forcing quiescent state in all CPUs. The worst that can happen
> > > > is a delay in grace period. No correctness problem here.
> > > > 
> > > 
> > > One question. What is preventing a CPU from clearing its mask after we
> > > have checked whether it is online but before we have called into
> > > smp_send_reschedule?
> > 
> > This is my concern as well.  Gautham, at which point in the above
> > timeline is the offlining CPU marked DYING?  Before stop_machine(), right?
> 
> No :) The offlining CPU is marked DYING after stop_machine(), inside
> take_cpu_down() which is the work we want to execute after stopping the
> machine.
> 
> it's like
> _cpu_down()
> |
> |-> stop_machine_run();
> |   |
> |   |-> stop_machine(); /* All CPUs irqs disabled. */
> |   |
> |   |-> take_cpu_down() --> sets state to CPU_DYING. disables irqs on
> |   |				offlined cpu
> |   |
> |   |-> restart_machine(); /* All CPUs irqs reenabled */
> |
> |-> send_CPU_DEAD_notification.
> 
> The very fact that a thread is running with irqs disabled means that
> stop_machine_run() thread cannot start executing the work it has been
> assinged to execute. Because for Machine to be stopped, stop_machine()
> needs to create n-1 high priority threads on n-1 online cpus, which will
> disable interrupts and preemption, and stop the machine. Then it will
> run the task assigned to it on the ith cpu, which in this case is the
> cpu to be offlined.
> 
> So, it's the design of stop_machine() that's preventing someone
> from updating the cpu_online_map while
> force_quiescent_state() is performing the
> cpu_is_online() check. Becase we always call force_quiescent_state()
> with irqs disabled :)

Got it, so the patch looks good.

							Thanx, Paul

> > If so, can't we just disable irqs, check for DYING or DEAD, and invoke
> > smp_send_reschedule() only if not DYING or DEAD?
> 
> 
> 
> > 
> > 							Thanx, Paul
> 
> -- 
> Thanks and Regards
> gautham

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fix rcu vs hotplug race
  2008-07-01  5:39                 ` Gautham R Shenoy
  2008-07-01  6:16                   ` Ingo Molnar
  2008-07-01 19:46                   ` Paul E. McKenney
@ 2008-08-01 21:01                   ` Paul E. McKenney
  2 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2008-08-01 21:01 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Dhaval Giani, Dipankar Sarma, Ingo Molnar, laijs, Peter Zijlstra,
	lkml, Rusty Russel

On Tue, Jul 01, 2008 at 11:09:00AM +0530, Gautham R Shenoy wrote:
> On Fri, Jun 27, 2008 at 07:58:45AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 27, 2008 at 11:19:59AM +0530, Dhaval Giani wrote:
> > > On Fri, Jun 27, 2008 at 10:48:55AM +0530, Dipankar Sarma wrote:
> > > > On Fri, Jun 27, 2008 at 10:17:38AM +0530, Gautham R Shenoy wrote:
> > > > > IMHO the warning is a spurious one.
> > > > > Here's the timeline.
> > > > > CPU_A						 CPU_B
> > > > > --------------------------------------------------------------
> > > > > cpu_down():					.
> > > > > .					   	.
> > > > > .						.
> > > > > stop_machine(): /* disables preemption,		.
> > > > > 		 * and irqs */			.
> > > > > .						.
> > > > > .						.
> > > > > take_cpu_down();				.
> > > > > .						.
> > > > > .						.
> > > > > .						.
> > > > > cpu_disable(); /*this removes cpu 		.
> > > > > 		*from cpu_online_map 		.
> > > > > 		*/				.
> > > > > .						.
> > > > > .						.
> > > > > restart_machine(); /* enables irqs */		.
> > > > > ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> > > > > .						call_rcu();
> > > > > .						/* disables irqs here */
> > > > > .						.force_quiescent_state();
> > > > > .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> > > > > .						.   smp_send_reschedule();
> > > > > .						.
> > > > > .						.   WARN_ON() for offlined CPU!
> > > > > .
> > > > 
> > > > Exactly. The call_rcu()s are coming from a different subsystem
> > > > and can happen anytime during the CPU hotplug path. So, RCU subsystem
> > > > doesn't have anything to do to keep rcu->cpumask consistent.
> > > > It is *safe* even if we miss poking a cpu or two while
> > > > forcing quiescent state in all CPUs. The worst that can happen
> > > > is a delay in grace period. No correctness problem here.
> > > > 
> > > 
> > > One question. What is preventing a CPU from clearing its mask after we
> > > have checked whether it is online but before we have called into
> > > smp_send_reschedule?
> > 
> > This is my concern as well.  Gautham, at which point in the above
> > timeline is the offlining CPU marked DYING?  Before stop_machine(), right?
> 
> No :) The offlining CPU is marked DYING after stop_machine(), inside
> take_cpu_down() which is the work we want to execute after stopping the
> machine.
> 
> it's like
> _cpu_down()
> |
> |-> stop_machine_run();
> |   |
> |   |-> stop_machine(); /* All CPUs irqs disabled. */
> |   |
> |   |-> take_cpu_down() --> sets state to CPU_DYING. disables irqs on
> |   |				offlined cpu
> |   |
> |   |-> restart_machine(); /* All CPUs irqs reenabled */
> |
> |-> send_CPU_DEAD_notification.
> 
> The very fact that a thread is running with irqs disabled means that
> stop_machine_run() thread cannot start executing the work it has been
> assinged to execute. Because for Machine to be stopped, stop_machine()
> needs to create n-1 high priority threads on n-1 online cpus, which will
> disable interrupts and preemption, and stop the machine. Then it will
> run the task assigned to it on the ith cpu, which in this case is the
> cpu to be offlined.
> 
> So, it's the design of stop_machine() that's preventing someone
> from updating the cpu_online_map while
> force_quiescent_state() is performing the
> cpu_is_online() check. Becase we always call force_quiescent_state()
> with irqs disabled :)

Got it, so the patch looks good.

							Thanx, Paul

> > If so, can't we just disable irqs, check for DYING or DEAD, and invoke
> > smp_send_reschedule() only if not DYING or DEAD?
> 
> 
> 
> > 
> > 							Thanx, Paul
> 
> -- 
> Thanks and Regards
> gautham

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-08-01 21:01 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-23 10:37 [PATCH] fix rcu vs hotplug race Dhaval Giani
2008-06-23 10:58 ` Ingo Molnar
2008-06-23 11:49   ` Gautham R Shenoy
2008-06-24 11:01     ` Ingo Molnar
2008-06-26 15:27       ` Paul E. McKenney
2008-06-27  4:47         ` Gautham R Shenoy
2008-06-27  5:18           ` Dipankar Sarma
2008-06-27  5:49             ` Dhaval Giani
2008-06-27 14:58               ` Paul E. McKenney
2008-07-01  5:39                 ` Gautham R Shenoy
2008-07-01  6:16                   ` Ingo Molnar
2008-07-01  6:28                     ` Dhaval Giani
2008-07-01  6:35                       ` Ingo Molnar
2008-07-01  6:52                         ` Ingo Molnar
2008-07-01  7:48                           ` Ingo Molnar
2008-07-01  8:32                             ` Ingo Molnar
2008-07-01 19:46                   ` Paul E. McKenney
2008-08-01 21:01                   ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox