All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@elte.hu>
To: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Dipankar Sarma <dipankar@in.ibm.com>,
	laijs@cn.fujitsu.com, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	lkml <linux-kernel@vger.kernel.org>,
	Rusty Russel <rusty@rustcorp.com.au>
Subject: Re: [PATCH] fix rcu vs hotplug race
Date: Tue, 1 Jul 2008 10:32:52 +0200	[thread overview]
Message-ID: <20080701083252.GA2796@elte.hu> (raw)
In-Reply-To: <20080701074824.GA19400@elte.hu>


* Ingo Molnar <mingo@elte.hu> wrote:

> ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> .						call_rcu();
> .						/* disables irqs here */
> .						.force_quiescent_state();
> .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> .						.   smp_send_reschedule();
> .						.
> .						.   WARN_ON() for offlined CPU!
> --

hm, for some reason the "-- WINDOW ENDS" line has cut off processing 
somewhere along the mail route. Here it is again, with that line 
removed.

----------->
commit 8558f8f81680a43d383abd1b5f23d3501fedfa65
Author: Gautham R Shenoy <ego@in.ibm.com>
Date:   Fri Jun 27 10:17:38 2008 +0530

    rcu: fix hotplug vs rcu race
    
    Dhaval Giani reported this warning during cpu hotplug stress-tests:
    
    | On running kernel compiles in parallel with cpu hotplug:
    |
    | WARNING: at arch/x86/kernel/smp.c:118
    | native_smp_send_reschedule+0x21/0x36()
    | Modules linked in:
    | Pid: 27483, comm: cc1 Not tainted 2.6.26-rc7 #1
    | [...]
    |  [<c0110355>] native_smp_send_reschedule+0x21/0x36
    |  [<c014fe8f>] force_quiescent_state+0x47/0x57
    |  [<c014fef0>] call_rcu+0x51/0x6d
    |  [<c01713b3>] __fput+0x130/0x158
    |  [<c0171231>] fput+0x17/0x19
    |  [<c016fd99>] filp_close+0x4d/0x57
    |  [<c016fdff>] sys_close+0x5c/0x97
    
    IMHO the warning is a spurious one.
    
    cpu_online_map is updated by the _cpu_down() using stop_machine_run().
    Since force_quiescent_state is invoked from irqs disabled section,
    stop_machine_run() won't be executing while a cpu is executing
    force_quiescent_state(). Hence the cpu_online_map is stable while we're
    in the irq disabled section.
    
    However, a cpu might have been offlined _just_ before we disabled irqs
    while entering force_quiescent_state(). And rcu subsystem might not yet
    have handled the CPU_DEAD notification, leading to the offlined cpu's
    bit being set in the rcp->cpumask.
    
    Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent sending
    smp_reschedule() to an offlined CPU.
    
    Here's the timeline:
    
    CPU_A						 CPU_B
    --------------------------------------------------------------
    cpu_down():					.
    .					   	.
    .						.
    stop_machine(): /* disables preemption,		.
    		 * and irqs */			.
    .						.
    .						.
    take_cpu_down();				.
    .						.
    .						.
    .						.
    cpu_disable(); /*this removes cpu 		.
    		*from cpu_online_map 		.
    		*/				.
    .						.
    .						.
    restart_machine(); /* enables irqs */		.
    ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
    .						call_rcu();
    .						/* disables irqs here */
    .						.force_quiescent_state();
    .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
    .						.   smp_send_reschedule();
    .						.
    .						.   WARN_ON() for offlined CPU!
    .
    .
    .
    rcu_cpu_notify:
    .
    [ -- line removed -- ]
    rcu_offline_cpu() /* Which calls cpu_quiet()
    		   * which removes
    		   * cpu from rcp->cpumask.
    		   */
    
    If a new batch was started just before calling stop_machine_run(), the
    "tobe-offlined" cpu is still present in rcp-cpumask.
    
    During a cpu-offline, from take_cpu_down(), we queue an rt-prio idle
    task as the next task to be picked by the scheduler. We also call
    cpu_disable() which will disable any further interrupts and remove the
    cpu's bit from the cpu_online_map.
    
    Once the stop_machine_run() successfully calls take_cpu_down(), it calls
    schedule(). That's the last time a schedule is called on the offlined
    cpu, and hence the last time when rdp->passed_quiesc will be set to 1
    through rcu_qsctr_inc().
    
    But the cpu_quiet() will be on this cpu will be called only when the
    next RCU_SOFTIRQ occurs on this CPU. So at this time, the offlined CPU
    is still set in rcp->cpumask.
    
    Now coming back to the idle_task which truely offlines the CPU, it does
    check for a pending RCU and raises the softirq, since it will find
    rdp->passed_quiesc to be 0 in this case. However, since the cpu is
    offline I am not sure if the softirq will trigger on the CPU.
    
    Even if it doesn't the rcu_offline_cpu() will find that rcp->completed
    is not the same as rcp->cur, which means that our cpu could be holding
    up the grace period progression. Hence we call cpu_quiet() and move
    ahead.
    
    But because of the window explained in the timeline, we could still have
    a call_rcu() before the RCU subsystem executes it's CPU_DEAD
    notification, and we send smp_send_reschedule() to offlined cpu while
    trying to force the quiescent states. The appended patch adds comments
    and prevents checking for offlined cpu everytime.
    
    cpu_online_map is updated by the _cpu_down() using stop_machine_run().
    Since force_quiescent_state is invoked from irqs disabled section,
    stop_machine_run() won't be executing while a cpu is executing
    force_quiescent_state(). Hence the cpu_online_map is stable while we're
    in the irq disabled section.
    
    Reported-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
    Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
    Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
    Cc: Dipankar Sarma <dipankar@in.ibm.com>
    Cc: laijs@cn.fujitsu.com
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Rusty Russel <rusty@rustcorp.com.au>
    Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
index f4ffbd0..a38895a 100644
--- a/kernel/rcuclassic.c
+++ b/kernel/rcuclassic.c
@@ -89,8 +89,22 @@ static void force_quiescent_state(struct rcu_data *rdp,
 		/*
 		 * Don't send IPI to itself. With irqs disabled,
 		 * rdp->cpu is the current cpu.
+		 *
+		 * cpu_online_map is updated by the _cpu_down()
+		 * using stop_machine_run(). Since we're in irqs disabled
+		 * section, stop_machine_run() is not exectuting, hence
+		 * the cpu_online_map is stable.
+		 *
+		 * However,  a cpu might have been offlined _just_ before
+		 * we disabled irqs while entering here.
+		 * And rcu subsystem might not yet have handled the CPU_DEAD
+		 * notification, leading to the offlined cpu's bit
+		 * being set in the rcp->cpumask.
+		 *
+		 * Hence cpumask = (rcp->cpumask & cpu_online_map) to prevent
+		 * sending smp_reschedule() to an offlined CPU.
 		 */
-		cpumask = rcp->cpumask;
+		cpus_and(cpumask, rcp->cpumask, cpu_online_map);
 		cpu_clear(rdp->cpu, cpumask);
 		for_each_cpu_mask(cpu, cpumask)
 			smp_send_reschedule(cpu);

  reply	other threads:[~2008-07-01  8:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-23 10:37 [PATCH] fix rcu vs hotplug race Dhaval Giani
2008-06-23 10:58 ` Ingo Molnar
2008-06-23 11:49   ` Gautham R Shenoy
2008-06-24 11:01     ` Ingo Molnar
2008-06-26 15:27       ` Paul E. McKenney
2008-06-27  4:47         ` Gautham R Shenoy
2008-06-27  5:18           ` Dipankar Sarma
2008-06-27  5:49             ` Dhaval Giani
2008-06-27 14:58               ` Paul E. McKenney
2008-07-01  5:39                 ` Gautham R Shenoy
2008-07-01  6:16                   ` Ingo Molnar
2008-07-01  6:28                     ` Dhaval Giani
2008-07-01  6:35                       ` Ingo Molnar
2008-07-01  6:52                         ` Ingo Molnar
2008-07-01  7:48                           ` Ingo Molnar
2008-07-01  8:32                             ` Ingo Molnar [this message]
2008-07-01 19:46                   ` Paul E. McKenney
2008-08-01 21:01                   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080701083252.GA2796@elte.hu \
    --to=mingo@elte.hu \
    --cc=a.p.zijlstra@chello.nl \
    --cc=dhaval@linux.vnet.ibm.com \
    --cc=dipankar@in.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.