From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756451AbcBQFqF (ORCPT ); Wed, 17 Feb 2016 00:46:05 -0500 Received: from e31.co.us.ibm.com ([32.97.110.149]:48877 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754085AbcBQFqC (ORCPT ); Wed, 17 Feb 2016 00:46:02 -0500 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Tue, 16 Feb 2016 21:45:49 -0800 From: "Paul E. McKenney" To: Ross Green Cc: linux-kernel@vger.kernel.org, mingo@kernel.org, jiangshanlai@gmail.com, dipankar@in.ibm.com, akpm@linux-foundation.org, Mathieu Desnoyers , josh@joshtriplett.org, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, dhowells@redhat.com, Eric Dumazet , dvhart@linux.intel.com, =?iso-8859-1?Q?Fr=E9d=E9ric?= Weisbecker , oleg@redhat.com, pranith kumar Subject: Re: rcu_preempt self-detected stall on CPU from 4.5-rc3, since 3.17 Message-ID: <20160217054549.GB6719@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16021705-8236-0000-0000-0000163071E5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 09, 2016 at 09:11:55PM +1100, Ross Green wrote: > Continued testing with the latest linux-4.5-rc3 release. > > Please find attached a copy of traces from dmesg: > > There is a lot more debug and trace data so hopefully this will shed > some light on what might be happening here. > > My testing remains run a series of simple benchmarks, let that run to > completion and then leave the system idle away with just a few daemons > running. > > the self detected stalls in this instance turned up after a days run time. > There were NO heavy artificial computational loads on the machine. It does indeed look quiet on that dmesg for a good long time. The following insanely crude not-for-mainline hack -might- be producing good results in my testing. It will take some time before I can claim statistically different results. But please feel free to give it a go in the meantime. (Thanks to Al Viro for pointing me in this direction.) Thanx, Paul ------------------------------------------------------------------------ commit 0c2c8d9fd1641809830a7a75f84dcad69936ef56 Author: Paul E. McKenney Date: Tue Feb 16 15:42:36 2016 -0800 rcu: Crude exploratory hack Signed-off-by: Paul E. McKenney diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 507d0ed48b97..5928e084620d 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2194,8 +2194,10 @@ static int __noreturn rcu_gp_kthread(void *arg) READ_ONCE(rsp->gpnum), TPS("fqswait")); rsp->gp_state = RCU_GP_WAIT_FQS; - ret = wait_event_interruptible_timeout(rsp->gp_wq, - rcu_gp_fqs_check_wake(rsp, &gf), j); + ret = schedule_timeout_interruptible(j > 0 ? j : 1); + rcu_gp_fqs_check_wake(rsp, &gf); + // ret = wait_event_interruptible_timeout(rsp->gp_wq, + // rcu_gp_fqs_check_wake(rsp, &gf), j); rsp->gp_state = RCU_GP_DOING_FQS; /* Locking provides needed memory barriers. */ /* If grace period done, leave loop. */