From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752428AbdLDRey (ORCPT ); Mon, 4 Dec 2017 12:34:54 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:37376 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751619AbdLDRev (ORCPT ); Mon, 4 Dec 2017 12:34:51 -0500 Date: Mon, 4 Dec 2017 09:34:49 -0800 From: "Paul E. McKenney" To: anna-maria@linutronix.de Cc: tglx@linutronix.de, linux-kernel@vger.kernel.org Subject: Re: Timer refuses to expire Reply-To: paulmck@linux.vnet.ibm.com References: <20171201182529.GA6073@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171201182529.GA6073@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17120417-0008-0000-0000-000002AA9A0F X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008152; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000243; SDB=6.00955385; UDB=6.00482854; IPR=6.00735422; BA=6.00005727; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018342; XFM=3.00000015; UTC=2017-12-04 17:34:49 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17120417-0009-0000-0000-0000377F91AF Message-Id: <20171204173449.GA8327@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-04_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712040253 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote: > Hello, Anna-Maria, > > It turned out that one set of problems was due to NO_HZ_FULL issues, > which are addressed with a pair of patches. However, there is still one > case of a timer refusing to expire in rcutorture's TREE01 test scenario. > The takes from two to four hours to trigger, and the most recent one > gives the following diagnostic (see patch at the very end of this email): > > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000 > > The message appears any time a timer for less than five jiffies takes > more than eight seconds to expire. In all cases, this expiry did not > happen naturally, but rather via an unsolicited wakeup from the RCU CPU > stall code. If I am interpreting this message correctly, base->clk > has advanced past this timer, and the timer is correctly flagged in > base->pending_map. This seems like part of the problem because the > timer's timeout was only three jiffies. However, base->clk has not > advanced for more than 20 seconds, which seems like another problem. > > What additional diagnostics would be helpful? I can capture data > at the beginning of the schedule_timeout in the timer_list structure, > and of course can print more information at the time of the wakeup. And a few more from weekend testing, on the off-chance that any of them are at all helpful. One interesting thing about the several messages starting at time 4731 is that CPU 5 doesn't seem to be expiring timers, even when new ones are posted. But at time 4891, base->clk has in fact advanced. Thanx, Paul [ 4710.845607] schedule_timeout: Waylayed timer base->clk: 0x10042fb85 jiffies: 0x100434ddc base->next_expiry: 0x10042fb85 timer->flags: 0x1400005 timer->expires 0x10042fb84 idx: 5 idx_now: e7 base->pending_map 000000000000002000000000000000000000000000000000000000100000000000000000000000000000000000010000000000000000000000000000000000000000000000000000 [ 4731.872311] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10043a000 base->next_expiry: 0x100434800 timer->flags: 0x2e000005 timer->expires 0x100434de0 idx: b8 idx_now: d1 base->pending_map 000000000000000000000000000000000100000000000000020000100000140000000000000000000000000000010000000000000000000000000000000000000000000000000000 [ 4750.334637] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10043e81e base->next_expiry: 0x100434800 timer->flags: 0x33000005 timer->expires 0x10043960b idx: cc idx_now: 13f base->pending_map 000000000000000000000000000000000000000000000000020000100014140000000000000000000000000000010000000000000000000000000000000000000000000000000000 [ 4771.443542] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100443a93 base->next_expiry: 0x100434800 timer->flags: 0x4fc00005 timer->expires 0x10043e821 idx: 13f idx_now: 104 base->pending_map 000000000000000000000000000000000000000000000000020000100014040080000000000000000000000000010000000000000000000000000000000000000000000000000000 [ 4813.336678] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10044de38 base->next_expiry: 0x100434800 timer->flags: 0x4fc00005 timer->expires 0x10043e821 idx: 13f idx_now: 10e base->pending_map 000000000000000000000000000000000000000000000000020000100014040080000000000000100000000000010000000000000000000000000000000000000000000000000000 [ 4834.345313] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100453049 base->next_expiry: 0x100434800 timer->flags: 0x41000005 timer->expires 0x100443a96 idx: 104 idx_now: 114 base->pending_map 000000000000000000000000000000000000000000000000020000100014040000000000000040100000000000010000000000000000000000000000000000000000000000000000 [ 4834.561159] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100453121 base->next_expiry: 0x100434800 timer->flags: 0x43800005 timer->expires 0x10044de3b idx: 10e idx_now: 114 base->pending_map 000000000000000000000000000000000000000000000000020000100014040000000000000040000000000000000000000000000000000000000000000000000000000000000000 [ 4891.721190] schedule_timeout: Waylayed timer base->clk: 0x10045be44 jiffies: 0x100461069 base->next_expiry: 0x10045be44 timer->flags: 0x1000005 timer->expires 0x10045be43 idx: 4 idx_now: c9 base->pending_map 000000000000001000000000000000000000000000002000008010000000000100000000000000000000000000200000000000000000000000000000000000000000000000000000 [ 4891.929320] schedule_timeout: Waylayed timer base->clk: 0x10045be44 jiffies: 0x100461139 base->next_expiry: 0x10045be44 timer->flags: 0x1000005 timer->expires 0x10045be43 idx: 4 idx_now: c9 base->pending_map 000000000000001000000000000000000000000000002000008010000000000100000000000000000000000000200000000000000000000000000000000000000000000000000000 [ 4929.131690] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10046a28b base->next_expiry: 0x10045c340 timer->flags: 0x49800005 timer->expires 0x100465039 idx: 126 idx_now: 12b base->pending_map 0000000000000000000000000000000000000000000020000080100010008401000000e0000000000000000000200000000000000000000000000000000000000000000000000000 [ 4934.785783] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10046b8a1 base->next_expiry: 0x10045c340 timer->flags: 0x49c00005 timer->expires 0x100466634 idx: 127 idx_now: 12c base->pending_map 0000000000000000000000000000000000000000000020000080100010008401000000e0000000000000000000200000000000000000000000000000000000000000000000000000 [ 4997.703037] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10047ae66 base->next_expiry: 0x10045c340 timer->flags: 0x4b000005 timer->expires 0x10046b8a4 idx: 12c idx_now: 13b base->pending_map 000000000000000000000000000000000000000000002000008010001000840100181060000000000000000000200000000000000000000000000000000000000000000000000000 [ 5018.820354] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004800e4 base->next_expiry: 0x10045c340 timer->flags: 0x4ec00005 timer->expires 0x10047ae69 idx: 13b idx_now: 101 base->pending_map 000000000000000000000000000000000000000000002000008010001000840108180060000000010000000000200000000000000000000000000000000000000000000000000000 [ 5079.106104] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10048ecde base->next_expiry: 0x10045c340 timer->flags: 0x42800005 timer->expires 0x100489a5b idx: 10a idx_now: 10f base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000084030000000000200000000000000000000000000000000000000000000000000000 [ 5142.123589] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10049e28b base->next_expiry: 0x10045c340 timer->flags: 0x43c00005 timer->expires 0x10048ece1 idx: 10f idx_now: 154 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000200000000000000000000000000000000000000000000000000000 [ 5163.235561] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004a3503 base->next_expiry: 0x10045c340 timer->flags: 0x55000005 timer->expires 0x10049e28e idx: 154 idx_now: 155 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000300000000000000000000000000000000000000000000000000000 [ 5226.136310] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004b2ab8 base->next_expiry: 0x10045c340 timer->flags: 0x55400005 timer->expires 0x1004a3506 idx: 155 idx_now: 157 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000200000000000000000000000000000000000000000000000000000 [18001.753494] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b79 base->next_expiry: 0x10045c340 timer->flags: 0x55400005 timer->expires 0x1004a5e11 idx: 155 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180020000180030000000000200000000000000000000000000000000000000000000000000000 [18001.763893] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b83 base->next_expiry: 0x10045c340 timer->flags: 0x49400005 timer->expires 0x100464e01 idx: 125 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180020000180030000000000200000000000000000000000000000000000000000000000000000 [18001.772807] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b8c base->next_expiry: 0x10045c340 timer->flags: 0x40400005 timer->expires 0x100480601 idx: 101 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180000000180030000000000200000000000000000000000000000000000000000000000000000 [18001.782538] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b96 base->next_expiry: 0x10045c340 timer->flags: 0x30000005 timer->expires 0x10045fe01 idx: c0 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180000000180000000000000200000000000000000000000000000000000000000000000000000 [18001.791954] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b9f base->next_expiry: 0x10045c340 timer->flags: 0x44000005 timer->expires 0x10048f601 idx: 110 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000180000000180000000000000200000000000000000000000000000000000000000000000000000 [18001.801725] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1ba9 base->next_expiry: 0x10045c340 timer->flags: 0x4d000005 timer->expires 0x100473e12 idx: 134 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000180000000000000000000000200000000000000000000000000000000000000000000000000000 [18001.811052] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1bb2 base->next_expiry: 0x10045c340 timer->flags: 0x3b000005 timer->expires 0x10045d601 idx: ec idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000000000000000000000000000200000000000000000000000000000000000000000000000000000 [18001.820594] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1bbc base->next_expiry: 0x10045c340 timer->flags: 0x37000005 timer->expires 0x1004636e1 idx: dc idx_now: 184 base->pending_map 000000000000000000000000000000000000000000000000000000001000800000000000000000000000000000200000000000000000000000000000000000000000000000000000