All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: peterz@infradead.org, tglx@linutronix.de,
	linux-kernel@vger.kernel.org, rgkernel@gmail.com
Subject: Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
Date: Fri, 1 Jul 2016 11:40:54 -0700	[thread overview]
Message-ID: <20160701184054.GK4650@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160630232957.GB32568@lerouge>

On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 30, 2016 at 10:58:45AM -0700, Paul E. McKenney wrote:
> > Both timers and hrtimers are maintained on the outgoing CPU until
> > CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
> > mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
> > will splat in native_smp_send_reschedule() when attempting to wake up
> > the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
> > 
> > [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595] Modules linked in:
> > [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
> > [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
> > [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
> > [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
> > [ 7976.741595] Call Trace:
> > [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
> > [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
> > [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
> > [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
> > [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
> > [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
> > [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
> > [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
> > [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
> > [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
> > [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
> > [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
> > 
> > However, in this case, the wakeup is redundant, because the timer
> > migration will reprogram timer hardware as needed.  Note that the fact
> > that preemption is disabled does not avoid the splat, as the offline
> > operation has already passed both the synchronize_sched() and the
> > stop_machine() that would be blocked by disabled preemption.
> > 
> > This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
> > to wake up offline CPUs.  It also adds a comment stating that the
> > caller must tolerate lost wakeups when the target CPU is going offline,
> > and suggesting the CPU_DEAD notifier as a recovery mechanism.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > ---
> >  core.c |    7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 7f2cae4620c7..08502966e7df 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu)
> >  	return false;
> >  }
> >  
> > +/*
> > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > + * caller's responsibility to deal with the lost wakeup, for example,
> > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > + */
> >  void wake_up_nohz_cpu(int cpu)
> >  {
> > -	if (!wake_up_full_nohz_cpu(cpu))
> > +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
> 
> So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
> anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it,
> unless smp_processor_id() is the only alternative.

Right, but the timers have been posted long before even CPU_UP_PREPARE.
>From what I can see, they are left alone until CPU_DEAD.  Which means
that if you try to mod_timer() them between CPU_DYING and CPU_DEAD,
you can get the above splat.

Or am I missing somthing subtle here?

> Hence, that call to wake_up_nohz_cpu() can only happen to online CPUs or the current
> one (pinned). And wake_up_idle_cpu() on the current CPU is a no-op. So only
> wake_up_full_nohz_cpu() is concerned. Then perhaps it would be better to move that
> cpu_online() check to wake_up_full_nohz_cpu() ?

As in the patch shown below?  Either way works for me.

> BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that
> it queues timers at this stage?

Hmmm...  From what I can see, rcutorture cleans up its priority-boost
kthreads at CPU_DOWN_PREPARE time.  The other threads are allowed to
migrate wherever the scheduler wants, give or take the task shuffling.
The task shuffling only excludes one CPU at a time, and I have seen
this occur when multiple CPUs were running, e.g., 0, 2, and 3 while
offlining 1.

Besides which, doesn't the scheduler prevent anything but the idle
thread from running after CPU_DYING time?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4620c7..08502966e7df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	return false;
 }
 
+/*
+ * Wake up the specified CPU.  If the CPU is going offline, it is the
+ * caller's responsibility to deal with the lost wakeup, for example,
+ * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
+ */
 void wake_up_nohz_cpu(int cpu)
 {
-	if (!wake_up_full_nohz_cpu(cpu))
+	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
 		wake_up_idle_cpu(cpu);
 }
 

  reply	other threads:[~2016-07-01 18:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-30 17:58 [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
2016-06-30 23:29 ` Frederic Weisbecker
2016-07-01 18:40   ` Paul E. McKenney [this message]
2016-07-01 23:49     ` Frederic Weisbecker
2016-07-02  0:15       ` Paul E. McKenney
2016-07-04 12:21         ` Frederic Weisbecker
2016-07-04 16:55           ` Paul E. McKenney
2016-07-12 14:41             ` Paul E. McKenney
2016-07-12 14:19   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160701184054.GK4650@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rgkernel@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.