From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752276AbdLHAb5 (ORCPT ); Thu, 7 Dec 2017 19:31:57 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:39416 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750993AbdLHAbz (ORCPT ); Thu, 7 Dec 2017 19:31:55 -0500 Date: Thu, 7 Dec 2017 16:31:51 -0800 From: "Paul E. McKenney" To: Boqun Feng Cc: anna-maria@linutronix.de, tglx@linutronix.de, linux-kernel@vger.kernel.org Subject: Re: Timer refuses to expire Reply-To: paulmck@linux.vnet.ibm.com References: <20171201182529.GA6073@linux.vnet.ibm.com> <20171204174208.GA17376@linux.vnet.ibm.com> <20171205233744.GA2453@linux.vnet.ibm.com> <20171206220421.GA12886@linux.vnet.ibm.com> <20171207070350.GC1044@tardis> <20171207145617.GL7829@linux.vnet.ibm.com> <20171207214514.GA23709@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171207214514.GA23709@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17120800-0044-0000-0000-000003BB8F01 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008168; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000244; SDB=6.00956964; UDB=6.00483793; IPR=6.00736995; BA=6.00005729; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018413; XFM=3.00000015; UTC=2017-12-08 00:31:53 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17120800-0045-0000-0000-000007EAC83E Message-Id: <20171208003151.GA17264@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-07_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1712080005 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 07, 2017 at 01:45:14PM -0800, Paul E. McKenney wrote: > On Thu, Dec 07, 2017 at 06:56:17AM -0800, Paul E. McKenney wrote: > > On Thu, Dec 07, 2017 at 03:03:50PM +0800, Boqun Feng wrote: > > [ . . . ] > > > > > What I did instead was to dump out the state of the task that > > > > __cpuhp_kick_ap() waits on, please see the patch at the very end of this > > > > email. This triggered as shown below, and you guessed it, that task is > > > > waiting on a grace period. Which I am guessing won't happen until the > > > > outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu(). > > > > Which will prevent RCU's grace-period kthread from ever awakening, which > > > > will prevent the task that __cpuhp_kick_ap() waits on from ever awakening, > > > > which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state. > > > > > > > > Deadlock. > > > > > > There is one thing I'm confused here. Sure, this is a deadlock, but the > > > timer should still work in such a deadlock, right? I mean, the timer of > > > schedule_timeout() should be able to wake up rcu_gp_kthread() even in > > > this case? And yes, the gp kthread will continue to wait due to the > > > deadlock, but the deadlock can not explain the "Waylayed timer", right? > > > > My belief is that the timer cannot fire because it is still on the > > offlined CPU, and that CPU has not yet reached timers_dead_cpu(). > > But I might be missing something subtle in either the timers code or the > > CPU-hotplug code, so please do check my reasoning here. (I am relying on > > the "timer->flags: 0x40000007" and the "cpuhp/7" below, which I believe > > means that the timer is on CPU 7 and that it is CPU 7 that is in the > > process of going offline.) > > > > The "Waylayed timer" happens because the RCU CPU stall warning code > > wakes up the grace-period kthread. This is driven out of the > > scheduling-clock tick, so is unaffected by timers, though it does > > rely on the jiffies counter continuing to be incremented. > > > > So what am I missing here? > > Well, last night's runs had situations where the ->flags CPU didn't > match the CPU going offline, so I am clearly missing something or another. > > One thing I might have been missing was the CPU-online processing. > What happens if a CPU goes offline, comes back online, but before ->clk > gets adjusted there is a schedule_timeout()? Now, schedule_timeout() > does compute the absolute ->expires time using jiffies, so the wakeup time > should not be too far off of the desired time. Except that the timers > now have something like 8% slop, and that slop will be calculated on the > difference between the desired expiration time and the (way outdated) > ->clk value. So the 8% might be a rather large number. For example, > if the CPU was offline for 12 minutes (unlikely but entirely possible > with rcutorture testing's random onlining and offlining), the slop on > a 3-millisecond timer might be a full minute. > > To my timer-naive eyes, it looks like a simple fix is to set > old_base->must_forward_clk to true in timers_dead_cpu() for each > timer_base, as shown below. The other possibility that I considered > was to instead set ->is_idle, but that looked like an engraved > invitation to send IPIs to offline CPUs. > > I am giving it a spin. I still believe that the offline deadlock > scenario can happen, but one thing at a time... > > Thoughts? And it is hard to tell whether or not this is helping. Not too surprising, given that most of the splats seem to be the deadlock case instead. Thanx, Paul