[PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
@ 2016-06-30 17:58 Paul E. McKenney
  2016-06-30 23:29 ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2016-06-30 17:58 UTC (permalink / raw)
  To: peterz, fweisbec, tglx; +Cc: linux-kernel, rgkernel

Both timers and hrtimers are maintained on the outgoing CPU until
CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
will splat in native_smp_send_reschedule() when attempting to wake up
the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:

[ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
[ 7976.741595] Modules linked in:
[ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
[ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
[ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
[ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
[ 7976.741595] Call Trace:
[ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
[ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
[ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
[ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
[ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
[ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
[ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
[ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
[ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
[ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
[ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
[ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
[ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200

However, in this case, the wakeup is redundant, because the timer
migration will reprogram timer hardware as needed.  Note that the fact
that preemption is disabled does not avoid the splat, as the offline
operation has already passed both the synchronize_sched() and the
stop_machine() that would be blocked by disabled preemption.

This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
to wake up offline CPUs.  It also adds a comment stating that the
caller must tolerate lost wakeups when the target CPU is going offline,
and suggesting the CPU_DEAD notifier as a recovery mechanism.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 core.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4620c7..08502966e7df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	return false;
 }
 
+/*
+ * Wake up the specified CPU.  If the CPU is going offline, it is the
+ * caller's responsibility to deal with the lost wakeup, for example,
+ * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
+ */
 void wake_up_nohz_cpu(int cpu)
 {
-	if (!wake_up_full_nohz_cpu(cpu))
+	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
 		wake_up_idle_cpu(cpu);
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-06-30 17:58 [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
@ 2016-06-30 23:29 ` Frederic Weisbecker
  2016-07-01 18:40   ` Paul E. McKenney
  2016-07-12 14:19   ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Frederic Weisbecker @ 2016-06-30 23:29 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: peterz, tglx, linux-kernel, rgkernel

On Thu, Jun 30, 2016 at 10:58:45AM -0700, Paul E. McKenney wrote:
> Both timers and hrtimers are maintained on the outgoing CPU until
> CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
> mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
> will splat in native_smp_send_reschedule() when attempting to wake up
> the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
> 
> [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
> [ 7976.741595] Modules linked in:
> [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
> [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
> [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
> [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
> [ 7976.741595] Call Trace:
> [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
> [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
> [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
> [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
> [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
> [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
> [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
> [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
> [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
> [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
> [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
> [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
> [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
> 
> However, in this case, the wakeup is redundant, because the timer
> migration will reprogram timer hardware as needed.  Note that the fact
> that preemption is disabled does not avoid the splat, as the offline
> operation has already passed both the synchronize_sched() and the
> stop_machine() that would be blocked by disabled preemption.
> 
> This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
> to wake up offline CPUs.  It also adds a comment stating that the
> caller must tolerate lost wakeups when the target CPU is going offline,
> and suggesting the CPU_DEAD notifier as a recovery mechanism.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> ---
>  core.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4620c7..08502966e7df 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  	return false;
>  }
>  
> +/*
> + * Wake up the specified CPU.  If the CPU is going offline, it is the
> + * caller's responsibility to deal with the lost wakeup, for example,
> + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> + */
>  void wake_up_nohz_cpu(int cpu)
>  {
> -	if (!wake_up_full_nohz_cpu(cpu))
> +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))

So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it,
unless smp_processor_id() is the only alternative.

Hence, that call to wake_up_nohz_cpu() can only happen to online CPUs or the current
one (pinned). And wake_up_idle_cpu() on the current CPU is a no-op. So only
wake_up_full_nohz_cpu() is concerned. Then perhaps it would be better to move that
cpu_online() check to wake_up_full_nohz_cpu() ?

BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that
it queues timers at this stage?

Thanks.

>  		wake_up_idle_cpu(cpu);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-06-30 23:29 ` Frederic Weisbecker
@ 2016-07-01 18:40   ` Paul E. McKenney
  2016-07-01 23:49     ` Frederic Weisbecker
  2016-07-12 14:19   ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2016-07-01 18:40 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: peterz, tglx, linux-kernel, rgkernel

On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 30, 2016 at 10:58:45AM -0700, Paul E. McKenney wrote:
> > Both timers and hrtimers are maintained on the outgoing CPU until
> > CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
> > mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
> > will splat in native_smp_send_reschedule() when attempting to wake up
> > the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
> > 
> > [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595] Modules linked in:
> > [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
> > [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
> > [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
> > [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
> > [ 7976.741595] Call Trace:
> > [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
> > [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
> > [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
> > [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
> > [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
> > [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
> > [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
> > [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
> > [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
> > [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
> > [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
> > [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
> > [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
> > 
> > However, in this case, the wakeup is redundant, because the timer
> > migration will reprogram timer hardware as needed.  Note that the fact
> > that preemption is disabled does not avoid the splat, as the offline
> > operation has already passed both the synchronize_sched() and the
> > stop_machine() that would be blocked by disabled preemption.
> > 
> > This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
> > to wake up offline CPUs.  It also adds a comment stating that the
> > caller must tolerate lost wakeups when the target CPU is going offline,
> > and suggesting the CPU_DEAD notifier as a recovery mechanism.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Frederic Weisbecker <fweisbec@gmail.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > ---
> >  core.c |    7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 7f2cae4620c7..08502966e7df 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu)
> >  	return false;
> >  }
> >  
> > +/*
> > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > + * caller's responsibility to deal with the lost wakeup, for example,
> > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > + */
> >  void wake_up_nohz_cpu(int cpu)
> >  {
> > -	if (!wake_up_full_nohz_cpu(cpu))
> > +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
> 
> So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
> anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it,
> unless smp_processor_id() is the only alternative.

Right, but the timers have been posted long before even CPU_UP_PREPARE.
>From what I can see, they are left alone until CPU_DEAD.  Which means
that if you try to mod_timer() them between CPU_DYING and CPU_DEAD,
you can get the above splat.

Or am I missing somthing subtle here?

> Hence, that call to wake_up_nohz_cpu() can only happen to online CPUs or the current
> one (pinned). And wake_up_idle_cpu() on the current CPU is a no-op. So only
> wake_up_full_nohz_cpu() is concerned. Then perhaps it would be better to move that
> cpu_online() check to wake_up_full_nohz_cpu() ?

As in the patch shown below?  Either way works for me.

> BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that
> it queues timers at this stage?

Hmmm...  From what I can see, rcutorture cleans up its priority-boost
kthreads at CPU_DOWN_PREPARE time.  The other threads are allowed to
migrate wherever the scheduler wants, give or take the task shuffling.
The task shuffling only excludes one CPU at a time, and I have seen
this occur when multiple CPUs were running, e.g., 0, 2, and 3 while
offlining 1.

Besides which, doesn't the scheduler prevent anything but the idle
thread from running after CPU_DYING time?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4620c7..08502966e7df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	return false;
 }
 
+/*
+ * Wake up the specified CPU.  If the CPU is going offline, it is the
+ * caller's responsibility to deal with the lost wakeup, for example,
+ * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
+ */
 void wake_up_nohz_cpu(int cpu)
 {
-	if (!wake_up_full_nohz_cpu(cpu))
+	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
 		wake_up_idle_cpu(cpu);
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-07-01 18:40   ` Paul E. McKenney
@ 2016-07-01 23:49     ` Frederic Weisbecker
  2016-07-02  0:15       ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2016-07-01 23:49 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: peterz, tglx, linux-kernel, rgkernel

On Fri, Jul 01, 2016 at 11:40:54AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote:
> > > +/*
> > > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > > + * caller's responsibility to deal with the lost wakeup, for example,
> > > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > > + */
> > >  void wake_up_nohz_cpu(int cpu)
> > >  {
> > > -	if (!wake_up_full_nohz_cpu(cpu))
> > > +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
> > 
> > So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
> > anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it,
> > unless smp_processor_id() is the only alternative.
> 
> Right, but the timers have been posted long before even CPU_UP_PREPARE.
> From what I can see, they are left alone until CPU_DEAD.  Which means
> that if you try to mod_timer() them between CPU_DYING and CPU_DEAD,
> you can get the above splat.
> 
> Or am I missing somthing subtle here?

Yes that's exactly what I meant. It happens on mod_timer() calls
between CPU_DYING and CPU_DEAD. I just wanted to clarify the
conditions for it to happen: the fact that it shouldn't concern
remote CPU targets, only local pinned timers.

> > Hence, that call to wake_up_nohz_cpu() can only happen to online CPUs or the current
> > one (pinned). And wake_up_idle_cpu() on the current CPU is a no-op. So only
> > wake_up_full_nohz_cpu() is concerned. Then perhaps it would be better to move that
> > cpu_online() check to wake_up_full_nohz_cpu() ?
> 
> As in the patch shown below?  Either way works for me.

Hmm, the patch doesn't seem to be different than the previous one :-)

> 
> > BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that
> > it queues timers at this stage?
> 
> Hmmm...  From what I can see, rcutorture cleans up its priority-boost
> kthreads at CPU_DOWN_PREPARE time.  The other threads are allowed to
> migrate wherever the scheduler wants, give or take the task shuffling.
> The task shuffling only excludes one CPU at a time, and I have seen
> this occur when multiple CPUs were running, e.g., 0, 2, and 3 while
> offlining 1.

But if rcutorture kthreads are cleaned up at CPU_DOWN_PREPARE, they
shouldn't be calling mod_timer() on CPU_DYING time. Or there are other
rcutorture threads?

> 
> Besides which, doesn't the scheduler prevent anything but the idle
> thread from running after CPU_DYING time?

Indeed migrate_tasks() is called on CPU_DYING but pinned kthreads, outside
smpboot, have their own way to deal with hotplug through notifiers.

Thanks.

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4620c7..08502966e7df 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  	return false;
>  }
>  
> +/*
> + * Wake up the specified CPU.  If the CPU is going offline, it is the
> + * caller's responsibility to deal with the lost wakeup, for example,
> + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> + */
>  void wake_up_nohz_cpu(int cpu)
>  {
> -	if (!wake_up_full_nohz_cpu(cpu))
> +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
>  		wake_up_idle_cpu(cpu);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-07-01 23:49     ` Frederic Weisbecker
@ 2016-07-02  0:15       ` Paul E. McKenney
  2016-07-04 12:21         ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2016-07-02  0:15 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: peterz, tglx, linux-kernel, rgkernel

On Sat, Jul 02, 2016 at 01:49:56AM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 01, 2016 at 11:40:54AM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote:
> > > > +/*
> > > > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > > > + * caller's responsibility to deal with the lost wakeup, for example,
> > > > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > > > + */
> > > >  void wake_up_nohz_cpu(int cpu)
> > > >  {
> > > > -	if (!wake_up_full_nohz_cpu(cpu))
> > > > +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
> > > 
> > > So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
> > > anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it,
> > > unless smp_processor_id() is the only alternative.
> > 
> > Right, but the timers have been posted long before even CPU_UP_PREPARE.
> > From what I can see, they are left alone until CPU_DEAD.  Which means
> > that if you try to mod_timer() them between CPU_DYING and CPU_DEAD,
> > you can get the above splat.
> > 
> > Or am I missing somthing subtle here?
> 
> Yes that's exactly what I meant. It happens on mod_timer() calls
> between CPU_DYING and CPU_DEAD. I just wanted to clarify the
> conditions for it to happen: the fact that it shouldn't concern
> remote CPU targets, only local pinned timers.

OK.  What happens in the following sequence of events?

o	CPU 5 posts a timer, which might well be locally pinned.
	This is rcu_torture_reader() posting its on-stack timer
	creatively named "t".

o	CPU 5 starts going offline, so that rcu_torture_reader() gets
	migrated to CPU 6.

o	CPU 5 reaches CPU_DYING but has not yet reached CPU_DEAD.

o	CPU 6 invokes mod_timer() on its timer "t".

Wouldn't that trigger the scenario that I am seeing?

> > > Hence, that call to wake_up_nohz_cpu() can only happen to online CPUs or the current
> > > one (pinned). And wake_up_idle_cpu() on the current CPU is a no-op. So only
> > > wake_up_full_nohz_cpu() is concerned. Then perhaps it would be better to move that
> > > cpu_online() check to wake_up_full_nohz_cpu() ?
> > 
> > As in the patch shown below?  Either way works for me.
> 
> Hmm, the patch doesn't seem to be different than the previous one :-)

Indeed it does not!  How about the one shown below this time?

> > > BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that
> > > it queues timers at this stage?
> > 
> > Hmmm...  From what I can see, rcutorture cleans up its priority-boost
> > kthreads at CPU_DOWN_PREPARE time.  The other threads are allowed to
> > migrate wherever the scheduler wants, give or take the task shuffling.
> > The task shuffling only excludes one CPU at a time, and I have seen
> > this occur when multiple CPUs were running, e.g., 0, 2, and 3 while
> > offlining 1.
> 
> But if rcutorture kthreads are cleaned up at CPU_DOWN_PREPARE, they
> shouldn't be calling mod_timer() on CPU_DYING time. Or there are other
> rcutorture threads?

The rcu_torture_reader() kthreads aren't associated with any particular
CPU, so when CPUs go offline, they just get migrated to other CPUs.
This allows them to execute on those other CPUs between CPU_DYING and
CPU_DEAD time, correct?

Other rcutorture kthreads -are- bound to specific CPUs, but they are
testing priority boosting, not simple reading.

> > Besides which, doesn't the scheduler prevent anything but the idle
> > thread from running after CPU_DYING time?
> 
> Indeed migrate_tasks() is called on CPU_DYING but pinned kthreads, outside
> smpboot, have their own way to deal with hotplug through notifiers.

Agreed, but the rcu_torture_reader() kthreads aren't pinned, so they
should migrate automatically at CPU_DYING time.

 							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4620c7..1a91fc733a0f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -580,6 +580,8 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	 * If needed we can still optimize that later with an
 	 * empty IRQ.
 	 */
+	if (cpu_is_offline(cpu))
+		return true;
 	if (tick_nohz_full_cpu(cpu)) {
 		if (cpu != smp_processor_id() ||
 		    tick_nohz_tick_stopped())
@@ -590,6 +592,11 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	return false;
 }
 
+/*
+ * Wake up the specified CPU.  If the CPU is going offline, it is the
+ * caller's responsibility to deal with the lost wakeup, for example,
+ * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
+ */
 void wake_up_nohz_cpu(int cpu)
 {
 	if (!wake_up_full_nohz_cpu(cpu))

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-07-02  0:15       ` Paul E. McKenney
@ 2016-07-04 12:21         ` Frederic Weisbecker
  2016-07-04 16:55           ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2016-07-04 12:21 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: peterz, tglx, linux-kernel, rgkernel

On Fri, Jul 01, 2016 at 05:15:06PM -0700, Paul E. McKenney wrote:
> On Sat, Jul 02, 2016 at 01:49:56AM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 01, 2016 at 11:40:54AM -0700, Paul E. McKenney wrote:
> > > On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote:
> > > > > +/*
> > > > > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > > > > + * caller's responsibility to deal with the lost wakeup, for example,
> > > > > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > > > > + */
> > > > >  void wake_up_nohz_cpu(int cpu)
> > > > >  {
> > > > > -	if (!wake_up_full_nohz_cpu(cpu))
> > > > > +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
> > > > 
> > > > So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
> > > > anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it,
> > > > unless smp_processor_id() is the only alternative.
> > > 
> > > Right, but the timers have been posted long before even CPU_UP_PREPARE.
> > > From what I can see, they are left alone until CPU_DEAD.  Which means
> > > that if you try to mod_timer() them between CPU_DYING and CPU_DEAD,
> > > you can get the above splat.
> > > 
> > > Or am I missing somthing subtle here?
> > 
> > Yes that's exactly what I meant. It happens on mod_timer() calls
> > between CPU_DYING and CPU_DEAD. I just wanted to clarify the
> > conditions for it to happen: the fact that it shouldn't concern
> > remote CPU targets, only local pinned timers.
> 
> OK.  What happens in the following sequence of events?
> 
> o	CPU 5 posts a timer, which might well be locally pinned.
> 	This is rcu_torture_reader() posting its on-stack timer
> 	creatively named "t".
> 
> o	CPU 5 starts going offline, so that rcu_torture_reader() gets
> 	migrated to CPU 6.
> 
> o	CPU 5 reaches CPU_DYING but has not yet reached CPU_DEAD.
> 
> o	CPU 6 invokes mod_timer() on its timer "t".
> 
> Wouldn't that trigger the scenario that I am seeing?

No I don't think so because "t" is then going to be enqueued to CPU 6.
__mod_timer() -> get_target_base() uses local accessor on timer base.

So the target of pinned timers is always the CPU of the caller.

> > > > BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that
> > > > it queues timers at this stage?
> > > 
> > > Hmmm...  From what I can see, rcutorture cleans up its priority-boost
> > > kthreads at CPU_DOWN_PREPARE time.  The other threads are allowed to
> > > migrate wherever the scheduler wants, give or take the task shuffling.
> > > The task shuffling only excludes one CPU at a time, and I have seen
> > > this occur when multiple CPUs were running, e.g., 0, 2, and 3 while
> > > offlining 1.
> > 
> > But if rcutorture kthreads are cleaned up at CPU_DOWN_PREPARE, they
> > shouldn't be calling mod_timer() on CPU_DYING time. Or there are other
> > rcutorture threads?
> 
> The rcu_torture_reader() kthreads aren't associated with any particular
> CPU, so when CPUs go offline, they just get migrated to other CPUs.
> This allows them to execute on those other CPUs between CPU_DYING and
> CPU_DEAD time, correct?

Indeed. Timers get migrated on CPU_DEAD only. So if rcu_torture_reader()
enqueued a pinned timer to CPU 5, then get migrated to CPU 6, the timer
may well fire after CPU_DYING on CPU 5 and even re-enqueue itself in CPU 5,
provided the timer is self-enqueued.

> 
> Other rcutorture kthreads -are- bound to specific CPUs, but they are
> testing priority boosting, not simple reading.

I see.

> 
> > > Besides which, doesn't the scheduler prevent anything but the idle
> > > thread from running after CPU_DYING time?
> > 
> > Indeed migrate_tasks() is called on CPU_DYING but pinned kthreads, outside
> > smpboot, have their own way to deal with hotplug through notifiers.
> 
> Agreed, but the rcu_torture_reader() kthreads aren't pinned, so they
> should migrate automatically at CPU_DYING time.

Yes indeed.

> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4620c7..1a91fc733a0f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -580,6 +580,8 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  	 * If needed we can still optimize that later with an
>  	 * empty IRQ.
>  	 */
> +	if (cpu_is_offline(cpu))
> +		return true;

Preferably put this under the tick_nohz_full_cpu() below because
it has a static key optimizations. Distros build NO_HZ_FULL but
don't use it 99.99999% of the time.

>  	if (tick_nohz_full_cpu(cpu)) {
>  		if (cpu != smp_processor_id() ||
>  		    tick_nohz_tick_stopped())
> @@ -590,6 +592,11 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  	return false;
>  }
>  
> +/*
> + * Wake up the specified CPU.  If the CPU is going offline, it is the
> + * caller's responsibility to deal with the lost wakeup, for example,
> + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> + */

I think it's more transparent than that for the caller. migrate_timers() is
called soon after and it takes care of waking up the destination of the migration
if necessary. So the caller shouldn't care after all.

But the cpu_is_offline() check above may need a comment about that.

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-07-04 12:21         ` Frederic Weisbecker
@ 2016-07-04 16:55           ` Paul E. McKenney
  2016-07-12 14:41             ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2016-07-04 16:55 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: peterz, tglx, linux-kernel, rgkernel

On Mon, Jul 04, 2016 at 02:21:43PM +0200, Frederic Weisbecker wrote:
> On Fri, Jul 01, 2016 at 05:15:06PM -0700, Paul E. McKenney wrote:
> > On Sat, Jul 02, 2016 at 01:49:56AM +0200, Frederic Weisbecker wrote:
> > > On Fri, Jul 01, 2016 at 11:40:54AM -0700, Paul E. McKenney wrote:
> > > > On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote:
> > > > > > +/*
> > > > > > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > > > > > + * caller's responsibility to deal with the lost wakeup, for example,
> > > > > > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > > > > > + */
> > > > > >  void wake_up_nohz_cpu(int cpu)
> > > > > >  {
> > > > > > -	if (!wake_up_full_nohz_cpu(cpu))
> > > > > > +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
> > > > > 
> > > > > So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
> > > > > anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it,
> > > > > unless smp_processor_id() is the only alternative.
> > > > 
> > > > Right, but the timers have been posted long before even CPU_UP_PREPARE.
> > > > From what I can see, they are left alone until CPU_DEAD.  Which means
> > > > that if you try to mod_timer() them between CPU_DYING and CPU_DEAD,
> > > > you can get the above splat.
> > > > 
> > > > Or am I missing somthing subtle here?
> > > 
> > > Yes that's exactly what I meant. It happens on mod_timer() calls
> > > between CPU_DYING and CPU_DEAD. I just wanted to clarify the
> > > conditions for it to happen: the fact that it shouldn't concern
> > > remote CPU targets, only local pinned timers.
> > 
> > OK.  What happens in the following sequence of events?
> > 
> > o	CPU 5 posts a timer, which might well be locally pinned.
> > 	This is rcu_torture_reader() posting its on-stack timer
> > 	creatively named "t".
> > 
> > o	CPU 5 starts going offline, so that rcu_torture_reader() gets
> > 	migrated to CPU 6.
> > 
> > o	CPU 5 reaches CPU_DYING but has not yet reached CPU_DEAD.
> > 
> > o	CPU 6 invokes mod_timer() on its timer "t".
> > 
> > Wouldn't that trigger the scenario that I am seeing?
> 
> No I don't think so because "t" is then going to be enqueued to CPU 6.
> __mod_timer() -> get_target_base() uses local accessor on timer base.
> 
> So the target of pinned timers is always the CPU of the caller.
> 
> > > > > BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that
> > > > > it queues timers at this stage?
> > > > 
> > > > Hmmm...  From what I can see, rcutorture cleans up its priority-boost
> > > > kthreads at CPU_DOWN_PREPARE time.  The other threads are allowed to
> > > > migrate wherever the scheduler wants, give or take the task shuffling.
> > > > The task shuffling only excludes one CPU at a time, and I have seen
> > > > this occur when multiple CPUs were running, e.g., 0, 2, and 3 while
> > > > offlining 1.
> > > 
> > > But if rcutorture kthreads are cleaned up at CPU_DOWN_PREPARE, they
> > > shouldn't be calling mod_timer() on CPU_DYING time. Or there are other
> > > rcutorture threads?
> > 
> > The rcu_torture_reader() kthreads aren't associated with any particular
> > CPU, so when CPUs go offline, they just get migrated to other CPUs.
> > This allows them to execute on those other CPUs between CPU_DYING and
> > CPU_DEAD time, correct?
> 
> Indeed. Timers get migrated on CPU_DEAD only. So if rcu_torture_reader()
> enqueued a pinned timer to CPU 5, then get migrated to CPU 6, the timer
> may well fire after CPU_DYING on CPU 5 and even re-enqueue itself in CPU 5,
> provided the timer is self-enqueued.

It is not self-enqueued, but rcu_torture_reader() will re-enqueue it
any time after it fires.  This means that it might re-enqueue it while
it is still executing.  If I understand correctly, at that point, the
timer would still be referencing the outgoing CPU.  But this is an odd
scenario, so I might well have missed something.

At this point, although I am sure we have a problem, I am not sure we
understand (to say nothing of agree on) exactly what the problem is.  ;-)

The timer is not self-enqueued.  It is posted like this, from
rcu_torture_reader() in kernel/rcu/rcutorture.c:

	mod_timer(&t, jiffies + 1);

That same function sets up the timer upon entry like this:

	setup_timer_on_stack(&t, rcu_torture_timer, 0);

The splat is quite rare, which is why I suspect that it is reasonable
to believe that it involves a race with the timer firing.

> > Other rcutorture kthreads -are- bound to specific CPUs, but they are
> > testing priority boosting, not simple reading.
> 
> I see.
> 
> > > > Besides which, doesn't the scheduler prevent anything but the idle
> > > > thread from running after CPU_DYING time?
> > > 
> > > Indeed migrate_tasks() is called on CPU_DYING but pinned kthreads, outside
> > > smpboot, have their own way to deal with hotplug through notifiers.
> > 
> > Agreed, but the rcu_torture_reader() kthreads aren't pinned, so they
> > should migrate automatically at CPU_DYING time.
> 
> Yes indeed.
> 
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 7f2cae4620c7..1a91fc733a0f 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -580,6 +580,8 @@ static bool wake_up_full_nohz_cpu(int cpu)
> >  	 * If needed we can still optimize that later with an
> >  	 * empty IRQ.
> >  	 */
> > +	if (cpu_is_offline(cpu))
> > +		return true;
> 
> Preferably put this under the tick_nohz_full_cpu() below because
> it has a static key optimizations. Distros build NO_HZ_FULL but
> don't use it 99.99999% of the time.

Except that I need to need to return "true" if the CPU is offline even
if it is not a tick_nohz_full_cpu(), don't I?

In fact, the stack trace shows native_smp_send_reschedule(), which leads
me to believe that the splat happened via the "return false" at the bottom
of wake_up_full_nohz_cpu().  So I don't see a way to completely hide
the check under tick_nohz_full_cpu().

Or am I missing something here?

> >  	if (tick_nohz_full_cpu(cpu)) {
> >  		if (cpu != smp_processor_id() ||
> >  		    tick_nohz_tick_stopped())
> > @@ -590,6 +592,11 @@ static bool wake_up_full_nohz_cpu(int cpu)
> >  	return false;
> >  }
> >  
> > +/*
> > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > + * caller's responsibility to deal with the lost wakeup, for example,
> > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > + */
> 
> I think it's more transparent than that for the caller. migrate_timers() is
> called soon after and it takes care of waking up the destination of the migration
> if necessary. So the caller shouldn't care after all.

For the current two callers, agreed.  But is this function designed to
be only used by timers and hrtimers?  If so, shouldn't there be some
kind of comment to that effect?  If not, shouldn't people considering
use of this function be warned that it is not unconditional?

> But the cpu_is_offline() check above may need a comment about that.

Like this?

	if (cpu_is_offline(cpu))
		return true;  /* Don't try to wake offline CPUs. */

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-07-04 16:55           ` Paul E. McKenney
@ 2016-07-12 14:41             ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2016-07-12 14:41 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: peterz, tglx, linux-kernel, rgkernel

On Mon, Jul 04, 2016 at 09:55:34AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 04, 2016 at 02:21:43PM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 01, 2016 at 05:15:06PM -0700, Paul E. McKenney wrote:
> > > On Sat, Jul 02, 2016 at 01:49:56AM +0200, Frederic Weisbecker wrote:
> > > > On Fri, Jul 01, 2016 at 11:40:54AM -0700, Paul E. McKenney wrote:

[ . . . ]

> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 7f2cae4620c7..1a91fc733a0f 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -580,6 +580,8 @@ static bool wake_up_full_nohz_cpu(int cpu)
> > >  	 * If needed we can still optimize that later with an
> > >  	 * empty IRQ.
> > >  	 */
> > > +	if (cpu_is_offline(cpu))
> > > +		return true;
> > 
> > Preferably put this under the tick_nohz_full_cpu() below because
> > it has a static key optimizations. Distros build NO_HZ_FULL but
> > don't use it 99.99999% of the time.
> 
> Except that I need to need to return "true" if the CPU is offline even
> if it is not a tick_nohz_full_cpu(), don't I?
> 
> In fact, the stack trace shows native_smp_send_reschedule(), which leads
> me to believe that the splat happened via the "return false" at the bottom
> of wake_up_full_nohz_cpu().  So I don't see a way to completely hide
> the check under tick_nohz_full_cpu().
> 
> Or am I missing something here?
> 
> > >  	if (tick_nohz_full_cpu(cpu)) {
> > >  		if (cpu != smp_processor_id() ||
> > >  		    tick_nohz_tick_stopped())
> > > @@ -590,6 +592,11 @@ static bool wake_up_full_nohz_cpu(int cpu)
> > >  	return false;
> > >  }
> > >  
> > > +/*
> > > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > > + * caller's responsibility to deal with the lost wakeup, for example,
> > > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > > + */
> > 
> > I think it's more transparent than that for the caller. migrate_timers() is
> > called soon after and it takes care of waking up the destination of the migration
> > if necessary. So the caller shouldn't care after all.
> 
> For the current two callers, agreed.  But is this function designed to
> be only used by timers and hrtimers?  If so, shouldn't there be some
> kind of comment to that effect?  If not, shouldn't people considering
> use of this function be warned that it is not unconditional?
> 
> > But the cpu_is_offline() check above may need a comment about that.
> 
> Like this?
> 
> 	if (cpu_is_offline(cpu))
> 		return true;  /* Don't try to wake offline CPUs. */

Hearing no objections, I am proceeding as shown below.

							Thanx, Paul

------------------------------------------------------------------------

commit b55692986a87f0cea92ddfebe7181598f774c2c2
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Thu Jun 30 10:37:20 2016 -0700

    sched: Make wake_up_nohz_cpu() handle CPUs going offline
    
    Both timers and hrtimers are maintained on the outgoing CPU until
    CPU_DEAD time, at which point they are migrated to a surviving CPU.  If a
    mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems
    will splat in native_smp_send_reschedule() when attempting to wake up
    the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel:
    
    [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40
    [ 7976.741595] Modules linked in:
    [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1
    [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    [ 7976.741595]  0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000
    [ 7976.741595]  0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18
    [ 7976.741595]  0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00
    [ 7976.741595] Call Trace:
    [ 7976.741595]  [<ffffffff8138ab2e>] dump_stack+0x67/0x99
    [ 7976.741595]  [<ffffffff8105cabc>] __warn+0xcc/0xf0
    [ 7976.741595]  [<ffffffff8105cb98>] warn_slowpath_null+0x18/0x20
    [ 7976.741595]  [<ffffffff8103cba9>] native_smp_send_reschedule+0x39/0x40
    [ 7976.741595]  [<ffffffff81089bc2>] wake_up_nohz_cpu+0x82/0x190
    [ 7976.741595]  [<ffffffff810d275a>] internal_add_timer+0x7a/0x80
    [ 7976.741595]  [<ffffffff810d3ee7>] mod_timer+0x187/0x2b0
    [ 7976.741595]  [<ffffffff810c89dd>] rcu_torture_reader+0x33d/0x380
    [ 7976.741595]  [<ffffffff810c66f0>] ? sched_torture_read_unlock+0x30/0x30
    [ 7976.741595]  [<ffffffff810c86a0>] ? rcu_bh_torture_read_lock+0x80/0x80
    [ 7976.741595]  [<ffffffff8108068f>] kthread+0xdf/0x100
    [ 7976.741595]  [<ffffffff819dd83f>] ret_from_fork+0x1f/0x40
    [ 7976.741595]  [<ffffffff810805b0>] ? kthread_create_on_node+0x200/0x200
    
    However, in this case, the wakeup is redundant, because the timer
    migration will reprogram timer hardware as needed.  Note that the fact
    that preemption is disabled does not avoid the splat, as the offline
    operation has already passed both the synchronize_sched() and the
    stop_machine() that would be blocked by disabled preemption.
    
    This commit therefore modifies wake_up_nohz_cpu() to avoid attempting
    to wake up offline CPUs.  It also adds a comment stating that the
    caller must tolerate lost wakeups when the target CPU is going offline,
    and suggesting the CPU_DEAD notifier as a recovery mechanism.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Frederic Weisbecker <fweisbec@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4620c7..26c01b2e9881 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -580,6 +580,8 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	 * If needed we can still optimize that later with an
 	 * empty IRQ.
 	 */
+	if (cpu_is_offline(cpu))
+		return true;  /* Don't try to wake offline CPUs. */
 	if (tick_nohz_full_cpu(cpu)) {
 		if (cpu != smp_processor_id() ||
 		    tick_nohz_tick_stopped())
@@ -590,6 +592,11 @@ static bool wake_up_full_nohz_cpu(int cpu)
 	return false;
 }
 
+/*
+ * Wake up the specified CPU.  If the CPU is going offline, it is the
+ * caller's responsibility to deal with the lost wakeup, for example,
+ * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
+ */
 void wake_up_nohz_cpu(int cpu)
 {
 	if (!wake_up_full_nohz_cpu(cpu))

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline
  2016-06-30 23:29 ` Frederic Weisbecker
  2016-07-01 18:40   ` Paul E. McKenney
@ 2016-07-12 14:19   ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2016-07-12 14:19 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Paul E. McKenney, tglx, linux-kernel, rgkernel

On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote:
> >  void wake_up_nohz_cpu(int cpu)
> >  {
> > -	if (!wake_up_full_nohz_cpu(cpu))
> > +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
> 
> So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
> anymore (correct me if I'm wrong), 

So rebuilding the domains is an utter trainwreck atm. But I suspect
that's wrong. Esp. with cpusets enabled we rebuild the domains very late
from a workqueue.

That is why the scheduler has cpu_active_mask to constrain the domains
during hotplug.

Now I need to go sort through that trainwreck because deadline needs it,
but I've not had the opportunity :/

> therefore get_nohz_timer_target() can't return it,
> unless smp_processor_id() is the only alternative.

With the below that should be true I think.

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6c0cdb5a73f8..b35cacbe9b9e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -626,7 +626,7 @@ int get_nohz_timer_target(void)
 
 	rcu_read_lock();
 	for_each_domain(cpu, sd) {
-		for_each_cpu(i, sched_domain_span(sd)) {
+		for_each_cpu_and(i, sched_domain_span(sd), cpu_active_mask) {
 			if (!idle_cpu(i) && is_housekeeping_cpu(cpu)) {
 				cpu = i;
 				goto unlock;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-07-12 14:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-30 17:58 [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline Paul E. McKenney
2016-06-30 23:29 ` Frederic Weisbecker
2016-07-01 18:40   ` Paul E. McKenney
2016-07-01 23:49     ` Frederic Weisbecker
2016-07-02  0:15       ` Paul E. McKenney
2016-07-04 12:21         ` Frederic Weisbecker
2016-07-04 16:55           ` Paul E. McKenney
2016-07-12 14:41             ` Paul E. McKenney
2016-07-12 14:19   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).