From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752214AbdLHBEm (ORCPT ); Thu, 7 Dec 2017 20:04:42 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:41323 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751010AbdLHBEk (ORCPT ); Thu, 7 Dec 2017 20:04:40 -0500 X-Google-Smtp-Source: AGs4zMalozdPM1UMzkBRGvWgMo+pniwLk9+7ZLkiLpLhXoB522/G47+Iys1kCP4rL+c//dF2iqT34Q== X-ME-Sender: Date: Fri, 8 Dec 2017 09:06:40 +0800 From: Boqun Feng To: "Paul E. McKenney" Cc: anna-maria@linutronix.de, tglx@linutronix.de, linux-kernel@vger.kernel.org Subject: Re: Timer refuses to expire Message-ID: <20171208010640.GD9516@tardis> References: <20171201182529.GA6073@linux.vnet.ibm.com> <20171204174208.GA17376@linux.vnet.ibm.com> <20171205233744.GA2453@linux.vnet.ibm.com> <20171206220421.GA12886@linux.vnet.ibm.com> <20171207070350.GC1044@tardis> <20171207145617.GL7829@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="3Gf/FFewwPeBMqCJ" Content-Disposition: inline In-Reply-To: <20171207145617.GL7829@linux.vnet.ibm.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --3Gf/FFewwPeBMqCJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 07, 2017 at 06:56:17AM -0800, Paul E. McKenney wrote: > On Thu, Dec 07, 2017 at 03:03:50PM +0800, Boqun Feng wrote: > > Hi Paul, > >=20 > > On Wed, Dec 06, 2017 at 02:04:21PM -0800, Paul E. McKenney wrote: > > > On Tue, Dec 05, 2017 at 03:37:44PM -0800, Paul E. McKenney wrote: > > > > On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote: > > > > > On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote: > > > > > > Hello, Anna-Maria, > > > > > >=20 > > > > > > It turned out that one set of problems was due to NO_HZ_FULL is= sues, > > > > > > which are addressed with a pair of patches. However, there is = still one > > > > > > case of a timer refusing to expire in rcutorture's TREE01 test = scenario. > > > > > > The takes from two to four hours to trigger, and the most recen= t one > > > > > > gives the following diagnostic (see patch at the very end of th= is email): > > > > > >=20 > > > > > > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x10= 0c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0= x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 00= 000000000000100000000000108000000000000004000000000200000000000000000000000= 0000000000000040000000000000000000000000000000000000000000000000000 > > > > > >=20 > > > > > > The message appears any time a timer for less than five jiffies= takes > > > > > > more than eight seconds to expire. In all cases, this expiry d= id not > > > > > > happen naturally, but rather via an unsolicited wakeup from the= RCU CPU > > > > > > stall code. If I am interpreting this message correctly, base-= >clk > > > > > > has advanced past this timer, and the timer is correctly flagge= d in > > > > > > base->pending_map. This seems like part of the problem because= the > > > > > > timer's timeout was only three jiffies. However, base->clk has= not > > > > > > advanced for more than 20 seconds, which seems like another pro= blem. > > > > > >=20 > > > > > > What additional diagnostics would be helpful? I can capture da= ta > > > > > > at the beginning of the schedule_timeout in the timer_list stru= cture, > > > > > > and of course can print more information at the time of the wak= eup. > > > > >=20 > > > > > And on the off-chance that the following messages from this weeke= nd's > > > > > runs are at all helpful. One interesting point is that starting = at > > > > > time 4731.872311, there are repeated enqueues to CPU 5's timer wh= eel, > > > > > but CPU 5's ->clk does not advance. Things seem to advance at > > > > > time 4891.721190. > > > >=20 > > > > Another layer on the onion... For at least some of the failures, t= here > > > > is a stalled CPU-hotplug operation. This of course can mean that t= he > > > > timers are stuck on the partially offlined CPU. So I dumped the st= ack > > > > of the task that is taking the CPU offline. Please see below for c= onsole > > > > output and patch. > > > >=20 > > > > I am considering doing an unsolicited wakeup of the task doing the > > > > hotplug operation, but I am not convinced that the entirely of the = CPU > > > > hotplug code is willing to put up with that sort of thing. > > >=20 > > > What I did instead was to dump out the state of the task that > > > __cpuhp_kick_ap() waits on, please see the patch at the very end of t= his > > > email. This triggered as shown below, and you guessed it, that task = is > > > waiting on a grace period. Which I am guessing won't happen until the > > > outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cp= u(). > > > Which will prevent RCU's grace-period kthread from ever awakening, wh= ich > > > will prevent the task that __cpuhp_kick_ap() waits on from ever awake= ning, > > > which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD s= tate. > > >=20 > > > Deadlock. > >=20 > > There is one thing I'm confused here. Sure, this is a deadlock, but the > > timer should still work in such a deadlock, right? I mean, the timer of > > schedule_timeout() should be able to wake up rcu_gp_kthread() even in > > this case? And yes, the gp kthread will continue to wait due to the > > deadlock, but the deadlock can not explain the "Waylayed timer", right? >=20 > My belief is that the timer cannot fire because it is still on the > offlined CPU, and that CPU has not yet reached timers_dead_cpu(). > But I might be missing something subtle in either the timers code or the > CPU-hotplug code, so please do check my reasoning here. (I am relying on > the "timer->flags: 0x40000007" and the "cpuhp/7" below, which I believe > means that the timer is on CPU 7 and that it is CPU 7 that is in the > process of going offline.) >=20 So if CPU 7 is going offline and has not yet reached timers_dead_cpu(), the timer on CPU 7 can not be fired? I'm not sure this is true, because since timers_dead_cpu() has not been reached, the hrtimer and softirq on CPU 7 should still be working, so the timer base should still be working well on CPU 7, no? I didn't find a place where we "disable" timer base on the hotplug path. But maybe I'm missing something.. > The "Waylayed timer" happens because the RCU CPU stall warning code > wakes up the grace-period kthread. This is driven out of the > scheduling-clock tick, so is unaffected by timers, though it does > rely on the jiffies counter continuing to be incremented. >=20 I mean if the timer base works well, then there should be no "Waylayed timer", the timer of schedule_timeout() will wake up the gp kthread instead of RCU CPU stall code. Regards, Boqun > So what am I missing here? >=20 > Thanx, Paul >=20 > > Regards, > > Boqun > >=20 > > > Does that make sense, or am I missing a trick here? > > >=20 > > > I tried invoking timers_dead_cpu() from sched_cpu_deactivate(), but > > > that triggered the BUG_ON(cpu_online(cpu)). I removed this BUG_ON(), > > > and appear to have deadlocked on the timer ->lock. > > >=20 > > > Any thoughts on what else to try? > > >=20 > > > Thanx, Paul > > >=20 > > > ---------------------------------------------------------------------= --- > > >=20 > > > [ 2939.381210] schedule_timeout: Waylayed timer base->clk: 0x1002709c= 3 jiffies: 0x100284614 base->next_expiry: 0x1002709c3 timer->flags: 0x40000= 007 timer->expires 0x10027f3a2 idx: 100 idx_now: 105 base->pending_map 0000= 000000000008000000000000000000020000000000000200804000020008680000000000000= 10000000001000000000000000000000000000000000000000000000000000000 > > > [ 2939.381219] Torture onoff task state: > > > [ 2939.381221] torture_onoff D14584 842 2 0x80000000 > > > [ 2939.381239] Call Trace: > > > [ 2939.381245] ? __schedule+0x33c/0x6f0 > > > [ 2939.381248] ? preempt_count_add+0x51/0x90 > > > [ 2939.381250] schedule+0x37/0x90 > > > [ 2939.381270] schedule_timeout+0x20d/0x4c0 > > > [ 2939.381276] wait_for_completion+0x95/0x100 > > > [ 2939.381282] ? wake_up_q+0x60/0x60 > > > [ 2939.381299] __cpuhp_kick_ap+0x49/0x60 > > > [ 2939.381313] cpuhp_kick_ap+0x29/0x70 > > > [ 2939.381316] cpuhp_kick_ap_work+0x34/0x130 > > > [ 2939.381322] _cpu_down+0x17e/0x1f0 > > > [ 2939.381326] do_cpu_down+0x30/0x50 > > > [ 2939.381354] torture_offline+0x76/0x140 > > > [ 2939.381368] torture_onoff+0xef/0x1c0 > > > [ 2939.381371] kthread+0xf0/0x130 > > > [ 2939.381374] ? torture_kthread_stopping+0x70/0x70 > > > [ 2939.381376] ? kthread_destroy_worker+0x40/0x40 > > > [ 2939.381378] ret_from_fork+0x1f/0x30 > > > [ 2939.381381] __cpuhp_kick_ap task state: > > > [ 2939.381383] cpuhp/7 D14504 46 2 0x80000000 > > > [ 2939.381399] Call Trace: > > > [ 2939.381403] ? __schedule+0x33c/0x6f0 > > > [ 2939.381406] schedule+0x37/0x90 > > > [ 2939.381408] schedule_timeout+0x20d/0x4c0 > > > [ 2939.381414] ? enqueue_task_fair+0x556/0x12b0 > > > [ 2939.381417] wait_for_completion+0x95/0x100 > > > [ 2939.381419] ? wake_up_q+0x60/0x60 > > > [ 2939.381431] __wait_rcu_gp+0x10d/0x140 > > > [ 2939.381441] ? sched_cpu_activate+0xc0/0xc0 > > > [ 2939.381443] sched_cpu_deactivate+0x36/0xa0 > > > [ 2939.381453] ? kfree_call_rcu+0x20/0x20 > > > [ 2939.381454] ? __call_rcu+0x530/0x530 > > > [ 2939.381457] ? __rcu_read_unlock+0x50/0x50 > > > [ 2939.381459] ? __rcu_read_unlock+0x50/0x50 > > > [ 2939.381462] cpuhp_invoke_callback+0xa8/0x610 > > > [ 2939.381466] cpuhp_thread_fun+0xc5/0x150 > > > [ 2939.381469] smpboot_thread_fn+0x15c/0x220 > > > [ 2939.381476] kthread+0xf0/0x130 > > > [ 2939.381478] ? sort_range+0x20/0x20 > > > [ 2939.381480] ? kthread_destroy_worker+0x40/0x40 > > > [ 2939.381482] ret_from_fork+0x1f/0x30 > > >=20 > > > ---------------------------------------------------------------------= --- > > >=20 > > > commit aa2b802cb432a725589884d2ba47833de21e0871 > > > Author: Paul E. McKenney > > > Date: Wed Dec 6 09:46:30 2017 -0800 > > >=20 > > > EXP: timer: Dump __cpuhp_kick_ap task state > > > =20 > > > Signed-off-by: Paul E. McKenney > > >=20 > > > diff --git a/kernel/cpu.c b/kernel/cpu.c > > > index 04892a82f6ac..dc9098946b3a 100644 > > > --- a/kernel/cpu.c > > > +++ b/kernel/cpu.c > > > @@ -381,6 +381,8 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enu= m cpuhp_state prev_state) > > > st->bringup =3D !st->bringup; > > > } > > > =20 > > > +void schedule_timeout_set_task3dump(struct task_struct *t); > > > + > > > /* Regular hotplug invocation of the AP hotplug thread */ > > > static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st) > > > { > > > @@ -395,7 +397,9 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_stat= e *st) > > > smp_mb(); > > > st->should_run =3D true; > > > wake_up_process(st->thread); > > > + schedule_timeout_set_task3dump(st->thread); > > > wait_for_ap_thread(st, st->bringup); > > > + schedule_timeout_set_task3dump(NULL); > > > } > > > =20 > > > static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_stat= e target) > > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > > > index 0a67df4cdeb6..532b43d05219 100644 > > > --- a/kernel/time/timer.c > > > +++ b/kernel/time/timer.c > > > @@ -1725,6 +1725,7 @@ static void process_timeout(struct timer_list *= t) > > > } > > > =20 > > > static struct task_struct *schedule_timeout_task2dump; > > > +static struct task_struct *schedule_timeout_task3dump; > > > static DEFINE_MUTEX(schedule_timeout_task2dump_mutex); > > > void schedule_timeout_set_task2dump(struct task_struct *t) > > > { > > > @@ -1732,7 +1733,12 @@ void schedule_timeout_set_task2dump(struct tas= k_struct *t) > > > WRITE_ONCE(schedule_timeout_task2dump, t); > > > mutex_unlock(&schedule_timeout_task2dump_mutex); > > > } > > > -EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump); > > > +void schedule_timeout_set_task3dump(struct task_struct *t) > > > +{ > > > + mutex_lock(&schedule_timeout_task2dump_mutex); > > > + WRITE_ONCE(schedule_timeout_task3dump, t); > > > + mutex_unlock(&schedule_timeout_task2dump_mutex); > > > +} > > > =20 > > > /** > > > * schedule_timeout - sleep until timeout > > > @@ -1824,6 +1830,10 @@ signed long __sched schedule_timeout(signed lo= ng timeout) > > > pr_info("Torture onoff task state:\n"); > > > sched_show_task(schedule_timeout_task2dump); > > > } > > > + if (schedule_timeout_task3dump) { > > > + pr_info("__cpuhp_kick_ap task state:\n"); > > > + sched_show_task(schedule_timeout_task3dump); > > > + } > > > mutex_unlock(&schedule_timeout_task2dump_mutex); > > > } > > > } > > >=20 >=20 >=20 --3Gf/FFewwPeBMqCJ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAlop5Z0ACgkQSXnow7UH +rhn7gf+J2QzVoqnEwSyiwNEIZOL51ede2OhAPYr4lbREoxJ2iMRuFVpNxU8BTh/ ztQLtiYb397l86DMJY8t8bvdicF+fQXOehixuF3qKLUu9dfV4eomK6DjOCwFgHrm K2gvXL8ymnDVHvSAtErOGOcRGCkEpzjWauFHNIa9Q2Jw+PXLqbI570rNASJKXQ17 ekKEeAUJtiaKrOwjO1ctLzf0dbPi4aRX89i9MW58gJhsWtz3StQ6xyLP1KPFyZOb C1XABiCIl0Y4NOjCv59k4rPtMZnhb8BRuBs36iRIensUPQo/sw3MA/JD0WLmd5ce 4E22BeP8ac6wX+sOXHDshtaLIW4Ohw== =90EM -----END PGP SIGNATURE----- --3Gf/FFewwPeBMqCJ--