From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750876AbeAOX5i (ORCPT + 1 other); Mon, 15 Jan 2018 18:57:38 -0500 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:58028 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750716AbeAOX5g (ORCPT ); Mon, 15 Jan 2018 18:57:36 -0500 Date: Mon, 15 Jan 2018 15:57:37 -0800 From: "Paul E. McKenney" To: bigeasy@linutronix.de, fweisbec@gmail.com, linux-kernel@vger.kernel.org, hpa@zytor.com, peterz@infradead.org, mingo@kernel.org, tglx@linutronix.de, anna-maria@linutronix.de Cc: linux-tip-commits@vger.kernel.org Subject: Re: [tip:timers/urgent] timers: Unconditionally check deferrable base Reply-To: paulmck@linux.vnet.ibm.com References: <20180114224415.GX9671@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180114224415.GX9671@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 18011523-0036-0000-0000-000002ACE259 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008385; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000245; SDB=6.00975584; UDB=6.00494473; IPR=6.00755512; BA=6.00005778; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00019057; XFM=3.00000015; UTC=2018-01-15 23:57:33 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18011523-0037-0000-0000-00004300D53E Message-Id: <20180115235737.GA31127@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2018-01-15_10:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801150335 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Sun, Jan 14, 2018 at 02:44:15PM -0800, Paul E. McKenney wrote: > On Sun, Jan 14, 2018 at 02:31:07PM -0800, tip-bot for Thomas Gleixner wrote: > > Commit-ID: ed4bbf7910b28ce3c691aef28d245585eaabda06 > > Gitweb: https://git.kernel.org/tip/ed4bbf7910b28ce3c691aef28d245585eaabda06 > > Author: Thomas Gleixner > > AuthorDate: Sun, 14 Jan 2018 23:19:49 +0100 > > Committer: Thomas Gleixner > > CommitDate: Sun, 14 Jan 2018 23:25:33 +0100 > > > > timers: Unconditionally check deferrable base > > > > When the timer base is checked for expired timers then the deferrable base > > must be checked as well. This was missed when making the deferrable base > > independent of base::nohz_active. > > Very cool, thank you! I have fired up some of the dreaded TREE01 tests. And (tentatively) a change in behavior! In one case, I did not see stall warnings (at least not until the ftrace dump at test end), but just the occasional "Waylayed timer" message: [ 1316.628105] schedule_timeout: Waylayed timer base->clk: 0x1000eff23 jiffies: 0x1000f8335 base->next_expiry: 0x1000eff23 timer->flags: 0x3c800006 timer->expires 0x1000f62ca idx: f2 idx_now: 139 base->pending_map 000000080000000000000000000000000000000000000000100400002000000000000000000000000000010000000000000000000000000000000000000000000000000000000000 [ 1316.638372] Torture onoff task state: [ 1316.639774] rcu_preempt I14984 8 2 0x80000000 [ 1316.641766] Call Trace: [ 1316.642757] ? __schedule+0x33c/0x6f0 [ 1316.644109] ? trace_event_raw_event_timer_start+0x6d/0xb0 [ 1316.646042] schedule+0x37/0x90 [ 1316.647219] schedule_timeout+0x159/0x470 [ 1316.648825] ? __next_timer_interrupt+0xc0/0xc0 [ 1316.650459] rcu_gp_kthread+0x9ee/0x1090 [ 1316.652147] kthread+0xf0/0x130 [ 1316.653415] ? rcu_oom_notify+0xd0/0xd0 [ 1316.654790] ? kthread_destroy_worker+0x40/0x40 [ 1316.656470] ret_from_fork+0x1f/0x30 If I am reading the waylay message correctly, the timer was delayed by about 8.3 seconds instead of the 21 seconds required to cause an RCU CPU stall warning. I am going to rearrange the ftrace-dump code a bit and run more tests. The next one got the usual stall warning and: [ 633.772036] schedule_timeout: Waylayed timer base->clk: 0x10004c5cb jiffies: 0x10005181e base->next_expiry: 0x10004c5cb timer->flags: 0x2c00006 timer->expires 0x10004c5ca idx: b idx_now: cd base->pending_map 000000000000080000000000000000000000020000000000000020000000000000000000000000000000000000080000000000000000000000000000000000000000000000000000 [ 633.855300] Torture onoff task state: [ 633.855300] rcu_preempt I14984 8 2 0x80000000 [ 633.855300] Call Trace: [ 633.855300] ? __schedule+0x33c/0x6f0 [ 633.855300] ? _raw_spin_lock_irqsave+0x12/0x40 [ 633.855300] schedule+0x37/0x90 [ 633.855300] rcu_gp_kthread+0xad/0x1090 [ 633.855300] kthread+0xf0/0x130 [ 633.855300] ? rcu_oom_notify+0xd0/0xd0 [ 633.855300] ? kthread_destroy_worker+0x40/0x40 [ 633.855300] ret_from_fork+0x1f/0x30 And the third one also got the usual stall warning and: [ 382.482240] schedule_timeout: Waylayed timer base->clk: 0x10000322a jiffies: 0x100014232 base->next_expiry: 0x10000322a timer->flags: 0x44000001 timer->expires 0x10000f00f idx: 110 idx_now: 115 base->pending_map 000004000000000000000000000000000000000000000000000000000100000000000000000140000000000000000400000000000000000000000000000000000000000000000000 [ 382.482245] Torture onoff task state: [ 382.482246] rcu_preempt I14984 8 2 0x80000000 [ 382.482273] Call Trace: [ 382.482280] ? __schedule+0x33c/0x6f0 [ 382.482283] ? trace_event_raw_event_timer_start+0x6d/0xb0 [ 382.482285] schedule+0x37/0x90 [ 382.482287] schedule_timeout+0x159/0x470 [ 382.482289] ? __next_timer_interrupt+0xc0/0xc0 [ 382.482292] rcu_gp_kthread+0x9ee/0x1090 [ 382.482297] kthread+0xf0/0x130 [ 382.482299] ? rcu_oom_notify+0xd0/0xd0 [ 382.482301] ? kthread_destroy_worker+0x40/0x40 [ 382.482303] ret_from_fork+0x1f/0x30 I don't have enough runs to be sure, but my current (sad) guess is that the error rate is unchanged. Thanx, Paul