From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752523AbcGASk7 (ORCPT ); Fri, 1 Jul 2016 14:40:59 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:34064 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752463AbcGASk5 (ORCPT ); Fri, 1 Jul 2016 14:40:57 -0400 X-IBM-Helo: d01dlp03.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com Date: Fri, 1 Jul 2016 11:40:54 -0700 From: "Paul E. McKenney" To: Frederic Weisbecker Cc: peterz@infradead.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, rgkernel@gmail.com Subject: Re: [PATCH RFC] sched: Make wake_up_nohz_cpu() handle CPUs going offline Reply-To: paulmck@linux.vnet.ibm.com References: <20160630175845.GA10269@linux.vnet.ibm.com> <20160630232957.GB32568@lerouge> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160630232957.GB32568@lerouge> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16070118-0044-0000-0000-0000008D516F X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16070118-0045-0000-0000-000004A361B7 Message-Id: <20160701184054.GK4650@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-07-01_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1607010182 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 01, 2016 at 01:29:59AM +0200, Frederic Weisbecker wrote: > On Thu, Jun 30, 2016 at 10:58:45AM -0700, Paul E. McKenney wrote: > > Both timers and hrtimers are maintained on the outgoing CPU until > > CPU_DEAD time, at which point they are migrated to a surviving CPU. If a > > mod_timer() executes between CPU_DYING and CPU_DEAD time, x86 systems > > will splat in native_smp_send_reschedule() when attempting to wake up > > the just-now-offlined CPU, as shown below from a NO_HZ_FULL kernel: > > > > [ 7976.741556] WARNING: CPU: 0 PID: 661 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:125 native_smp_send_reschedule+0x39/0x40 > > [ 7976.741595] Modules linked in: > > [ 7976.741595] CPU: 0 PID: 661 Comm: rcu_torture_rea Not tainted 4.7.0-rc2+ #1 > > [ 7976.741595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > [ 7976.741595] 0000000000000000 ffff88000002fcc8 ffffffff8138ab2e 0000000000000000 > > [ 7976.741595] 0000000000000000 ffff88000002fd08 ffffffff8105cabc 0000007d1fd0ee18 > > [ 7976.741595] 0000000000000001 ffff88001fd16d40 ffff88001fd0ee00 ffff88001fd0ee00 > > [ 7976.741595] Call Trace: > > [ 7976.741595] [] dump_stack+0x67/0x99 > > [ 7976.741595] [] __warn+0xcc/0xf0 > > [ 7976.741595] [] warn_slowpath_null+0x18/0x20 > > [ 7976.741595] [] native_smp_send_reschedule+0x39/0x40 > > [ 7976.741595] [] wake_up_nohz_cpu+0x82/0x190 > > [ 7976.741595] [] internal_add_timer+0x7a/0x80 > > [ 7976.741595] [] mod_timer+0x187/0x2b0 > > [ 7976.741595] [] rcu_torture_reader+0x33d/0x380 > > [ 7976.741595] [] ? sched_torture_read_unlock+0x30/0x30 > > [ 7976.741595] [] ? rcu_bh_torture_read_lock+0x80/0x80 > > [ 7976.741595] [] kthread+0xdf/0x100 > > [ 7976.741595] [] ret_from_fork+0x1f/0x40 > > [ 7976.741595] [] ? kthread_create_on_node+0x200/0x200 > > > > However, in this case, the wakeup is redundant, because the timer > > migration will reprogram timer hardware as needed. Note that the fact > > that preemption is disabled does not avoid the splat, as the offline > > operation has already passed both the synchronize_sched() and the > > stop_machine() that would be blocked by disabled preemption. > > > > This commit therefore modifies wake_up_nohz_cpu() to avoid attempting > > to wake up offline CPUs. It also adds a comment stating that the > > caller must tolerate lost wakeups when the target CPU is going offline, > > and suggesting the CPU_DEAD notifier as a recovery mechanism. > > > > Signed-off-by: Paul E. McKenney > > Cc: Peter Zijlstra > > Cc: Frederic Weisbecker > > Cc: Thomas Gleixner > > --- > > core.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 7f2cae4620c7..08502966e7df 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu) > > return false; > > } > > > > +/* > > + * Wake up the specified CPU. If the CPU is going offline, it is the > > + * caller's responsibility to deal with the lost wakeup, for example, > > + * by hooking into the CPU_DEAD notifier like timers and hrtimers do. > > + */ > > void wake_up_nohz_cpu(int cpu) > > { > > - if (!wake_up_full_nohz_cpu(cpu)) > > + if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu)) > > So at this point, as we passed CPU_DYING, I believe the CPU isn't visible in the domains > anymore (correct me if I'm wrong), therefore get_nohz_timer_target() can't return it, > unless smp_processor_id() is the only alternative. Right, but the timers have been posted long before even CPU_UP_PREPARE. >>From what I can see, they are left alone until CPU_DEAD. Which means that if you try to mod_timer() them between CPU_DYING and CPU_DEAD, you can get the above splat. Or am I missing somthing subtle here? > Hence, that call to wake_up_nohz_cpu() can only happen to online CPUs or the current > one (pinned). And wake_up_idle_cpu() on the current CPU is a no-op. So only > wake_up_full_nohz_cpu() is concerned. Then perhaps it would be better to move that > cpu_online() check to wake_up_full_nohz_cpu() ? As in the patch shown below? Either way works for me. > BTW, it seems that rcutorture stops its kthreads after CPU_DYING, is it expected that > it queues timers at this stage? Hmmm... From what I can see, rcutorture cleans up its priority-boost kthreads at CPU_DOWN_PREPARE time. The other threads are allowed to migrate wherever the scheduler wants, give or take the task shuffling. The task shuffling only excludes one CPU at a time, and I have seen this occur when multiple CPUs were running, e.g., 0, 2, and 3 while offlining 1. Besides which, doesn't the scheduler prevent anything but the idle thread from running after CPU_DYING time? Thanx, Paul ------------------------------------------------------------------------ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7f2cae4620c7..08502966e7df 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -590,9 +590,14 @@ static bool wake_up_full_nohz_cpu(int cpu) return false; } +/* + * Wake up the specified CPU. If the CPU is going offline, it is the + * caller's responsibility to deal with the lost wakeup, for example, + * by hooking into the CPU_DEAD notifier like timers and hrtimers do. + */ void wake_up_nohz_cpu(int cpu) { - if (!wake_up_full_nohz_cpu(cpu)) + if (cpu_online(cpu) && !wake_up_full_nohz_cpu(cpu)) wake_up_idle_cpu(cpu); }