From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1753710AbcGDMVu (ORCPT <rfc822;w@1wt.eu>);
	Mon, 4 Jul 2016 08:21:50 -0400
Received: from mail-wm0-f49.google.com ([74.125.82.49]:37230 "EHLO
	mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753265AbcGDMVr (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Mon, 4 Jul 2016 08:21:47 -0400
Date: Mon, 4 Jul 2016 14:21:43 +0200
From: Frederic Weisbecker <fweisbec@gmail.com>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: peterz@infradead.org, tglx@linutronix.de, linux-kernel@vger.kernel.org,
        rgkernel@gmail.com
Subject: Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going
 offline
Message-ID: <20160704122141.GA28338@lerouge>
References: <20160630175845.GA10269@linux.vnet.ibm.com>
 <20160630232957.GB32568@lerouge>
 <20160701184054.GK4650@linux.vnet.ibm.com>
 <20160701234954.GB4147@lerouge>
 <20160702001506.GZ4650@linux.vnet.ibm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160702001506.GZ4650@linux.vnet.ibm.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Jul 01, 2016 at 05:15:06PM -0700, Paul E. McKenney wrote:
> On Sat, Jul 02, 2016 at 01:49:56AM +0200, Frederic Weisbecker wrote:
> > On Fri, Jul 01, 2016 at 11:40:54AM -0700, Paul E. McKenney wrote:
> > > On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote:
> > > > > +/*
> > > > > + * Wake up the specified CPU.  If the CPU is going offline, it is the
> > > > > + * caller's responsibility to deal with the lost wakeup, for example,
> > > > > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> > > > > + */
> > > > >  void wake_up_nohz_cpu(int cpu)
> > > > >  {
> > > > > -	if (!wake_up_full_nohz_cpu(cpu))
> > > > > +	if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu))
> > > > 
> > > > So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains
> > > > anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it,
> > > > unless smp_processor_id() is the only alternative.
> > > 
> > > Right, but the timers have been posted long before even CPU_UP_PREPARE.
> > > From what I can see, they are left alone until CPU_DEAD.  Which means
> > > that if you try to mod_timer() them between CPU_DYING and CPU_DEAD,
> > > you can get the above splat.
> > > 
> > > Or am I missing somthing subtle here?
> > 
> > Yes that's exactly what I meant. It happens on mod_timer() calls
> > between CPU_DYING and CPU_DEAD. I just wanted to clarify the
> > conditions for it to happen: the fact that it shouldn't concern
> > remote CPU targets, only local pinned timers.
> 
> OK.  What happens in the following sequence of events?
> 
> o	CPU 5 posts a timer, which might well be locally pinned.
> 	This is rcu_torture_reader() posting its on-stack timer
> 	creatively named "t".
> 
> o	CPU 5 starts going offline, so that rcu_torture_reader() gets
> 	migrated to CPU 6.
> 
> o	CPU 5 reaches CPU_DYING but has not yet reached CPU_DEAD.
> 
> o	CPU 6 invokes mod_timer() on its timer "t".
> 
> Wouldn't that trigger the scenario that I am seeing?

No I don't think so because "t" is then going to be enqueued to CPU 6.
__mod_timer() -> get_target_base() uses local accessor on timer base.

So the target of pinned timers is always the CPU of the caller.

> > > > BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that
> > > > it queues timers at this stage?
> > > 
> > > Hmmm...  From what I can see, rcutorture cleans up its priority-boost
> > > kthreads at CPU_DOWN_PREPARE time.  The other threads are allowed to
> > > migrate wherever the scheduler wants, give or take the task shuffling.
> > > The task shuffling only excludes one CPU at a time, and I have seen
> > > this occur when multiple CPUs were running, e.g., 0, 2, and 3 while
> > > offlining 1.
> > 
> > But if rcutorture kthreads are cleaned up at CPU_DOWN_PREPARE, they
> > shouldn't be calling mod_timer() on CPU_DYING time. Or there are other
> > rcutorture threads?
> 
> The rcu_torture_reader() kthreads aren't associated with any particular
> CPU, so when CPUs go offline, they just get migrated to other CPUs.
> This allows them to execute on those other CPUs between CPU_DYING and
> CPU_DEAD time, correct?

Indeed. Timers get migrated on CPU_DEAD only. So if rcu_torture_reader()
enqueued a pinned timer to CPU 5, then get migrated to CPU 6, the timer
may well fire after CPU_DYING on CPU 5 and even re-enqueue itself in CPU 5,
provided the timer is self-enqueued.

> 
> Other rcutorture kthreads -are- bound to specific CPUs, but they are
> testing priority boosting, not simple reading.

I see.

> 
> > > Besides which, doesn't the scheduler prevent anything but the idle
> > > thread from running after CPU_DYING time?
> > 
> > Indeed migrate_tasks() is called on CPU_DYING but pinned kthreads, outside
> > smpboot, have their own way to deal with hotplug through notifiers.
> 
> Agreed, but the rcu_torture_reader() kthreads aren't pinned, so they
> should migrate automatically at CPU_DYING time.

Yes indeed.

> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f2cae4620c7..1a91fc733a0f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -580,6 +580,8 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  	 * If needed we can still optimize that later with an
>  	 * empty IRQ.
>  	 */
> +	if (cpu_is_offline(cpu))
> +		return true;

Preferably put this under the tick_nohz_full_cpu() below because
it has a static key optimizations. Distros build NO_HZ_FULL but
don't use it 99.99999% of the time.

>  	if (tick_nohz_full_cpu(cpu)) {
>  		if (cpu != smp_processor_id() ||
>  		    tick_nohz_tick_stopped())
> @@ -590,6 +592,11 @@ static bool wake_up_full_nohz_cpu(int cpu)
>  	return false;
>  }
>  
> +/*
> + * Wake up the specified CPU.  If the CPU is going offline, it is the
> + * caller's responsibility to deal with the lost wakeup, for example,
> + * by hooking into the CPU_DEAD notifier like timers and hrtimers do.
> + */

I think it's more transparent than that for the caller. migrate_timers() is
called soon after and it takes care of waking up the destination of the migration
if necessary. So the caller shouldn't care after all.

But the cpu_is_offline() check above may need a comment about that.

Thanks!