From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754357AbcHBUpS (ORCPT <rfc822;w@1wt.eu>);
	Tue, 2 Aug 2016 16:45:18 -0400
Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:34659 "EHLO
	mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1753059AbcHBUpK (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 2 Aug 2016 16:45:10 -0400
X-IBM-Helo: d01dlp02.pok.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
Date: Tue, 2 Aug 2016 13:45:04 -0700
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Rich Felker <dalias@libc.org>
Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, tglx@linutronix.de
Subject: Re: rcu_sched stalls in idle task introduced in pre-4.8?
Reply-To: paulmck@linux.vnet.ibm.com
References: <20160802170414.GA20083@brightrain.aerifal.cx>
 <20160802181636.GJ3482@linux.vnet.ibm.com>
 <20160802192036.GW15995@brightrain.aerifal.cx>
 <20160802194802.GK3482@linux.vnet.ibm.com>
 <20160802203217.GZ15995@brightrain.aerifal.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160802203217.GZ15995@brightrain.aerifal.cx>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16080220-0056-0000-0000-000000F2B7BA
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00005545; HX=3.00000240; KW=3.00000007;
 PH=3.00000004; SC=3.00000177; SDB=6.00739088; UDB=6.00347439; IPR=6.00511679;
 BA=6.00004637; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000;
 ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00012128; XFM=3.00000011;
 UTC=2016-08-02 20:45:06
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 16080220-0057-0000-0000-0000050CF3EE
Message-Id: <20160802204504.GL3482@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-08-02_15:,,
 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0
 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam
 adjust=0 reason=mlx scancount=1 engine=8.0.1-1606300000
 definitions=main-1608020210
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Aug 02, 2016 at 04:32:17PM -0400, Rich Felker wrote:
> On Tue, Aug 02, 2016 at 12:48:02PM -0700, Paul E. McKenney wrote:
> > > > > The system I'm testing on is an SMP configuration of the J2 SoC on
> > > > > FPGA (see: j-core.org), and the SMP support and timer driver (most
> > > > > relevant, imo) are new, but never exhibited this problem on 4.6 or
> > > > > 4.7. A tree with all drivers merged is available in the development
> > > > > branch of git://git.libc.org/linux-sh; it's regularly rebased but
> > > > > there's a perma-tag devel-20160801 you can check out instead. Fetching
> > > > > into an existing clone of Linus's tree is recommended since the server
> > > > > has limited resources.
> > > > > 
> > > > > I doubt the code from my devel branch will be useful without hardware
> > > > > to test on, though. Hopefully the issue is reproducible on other
> > > > > systems. Very low cpu speed (50 MHz) could be a factor. The problem
> > > > > seems to go away with dyntick disabled.
> > > > 
> > > > Indeed, it is quite possible that RCU's self-tuning needs a bit of help
> > > > for CPUs below a few hundred MHz.  As might other pieces of the kernel.
> > > > Should not be a big deal (famous last words...).  However, given that
> > > > you are running HZ=100, it should be reasonably close.  So first, let's
> > > > see what is going on here...
> > > > 
> > > > The following dmesg line usually indicates that RCU's kernel thread is
> > > > stuck for some reason.
> > > > 
> > > > [42039.188293] rcu_sched kthread starved for 2152 jiffies! g70249 c70248 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> > > > 
> > > > The RCU_GP_WAIT_FQS indicates that it is waiting for a timer to expire.
> > > > The task is in TASK_INTERRUPTIBLE state, but it should have been awakened
> > > > after only a few jiffies sleep time.  The stack trace looks like there
> > > > might be trouble canceling the timer.
> > > 
> > > Can you offer any advice on tracking down what it's waiting for, or
> > > what's going on with the timer? Is any arch- or clockevent-specific
> > > stuff that might be my fault involved here?
> > 
> > It is in an swait_event_interruptible_timeout(), most likely waiting on
> > a one-jiffy timer, as in __mod_timer().  If I understood your earlier
> > email correctly, setting CONFIG_HZ_PERIODIC avoided this problem.
> > If my understanding is correct, it might be worth reviewing any timer
> > code that might be pushed along by the once-per-jiffy scheduler tick.
> 
> Is it possible that something wacky is going on with broadcast timers?
> A dummy broadcast timer device gets initialized momentatily before the
> percpu timer for the secondary cpu is initialized, and in my
> experience these broadcast timers don't work when the real timer
> device has a cpumask limiting it to a specific cpu (which seems like a
> bug, but one I haven't had to care about since I'm not using it).

I need to defer to the timer guys on this one, adding them on CC.

> > Does the problem reproduces easily?
> 
> Yes, it happens right after boot and repeats every 30-90 seconds or
> so.

Well, that at least makes it easier to test any patches!

> > A bisection might be very helpful.
> 
> Bisection would require some manual work to setup because the whole
> reason I was rebasing on Linus's tree was to adapt the drivers to
> upstream infrastructure changes (the new cpuhp stuff replacing
> notifier for cpu starting). The unfortunate way it was done, each
> driver adds an enum to linux/cpuhotplug.h so all the patches have
> gratuitous conflicts. In addition, for older revisions in Linus's
> tree, there's at least one show-stopping (hang during boot) bug that
> needs a cherry-pick to fix. There may be other small issues too. I
> don't think they're at all insurmountible but it requires an annoying
> amount of scripting.

I had to ask!  Might eventually be necessary, but let's see what we
can learn from what you currently have.

> > > > Is it possible that the timer
> > > > handler is being prevented from making progress due to heavy interrupt
> > > > load or some such?
> > > 
> > > Actually the stall messages are only appearing when the system is idle
> > > and interrupts are infrequent. That's what leads me to believe it
> > > might be a situation where, on every (infrequent) wake, both the rcu
> > > task and something blocking its progress happen at roughly the same
> > > time, and the slowness of the cpu leads the rcu task to decide to wait
> > > for the next timer/schedule event and try again -- over and over. Of
> > > course this hypothesis is just based on a very high-level
> > > understanding of what's going on; I'm not sufficiently familiar wilth
> > > the implementation specifics to offer more.
> > > 
> > > One observation I just made versus old kernels where this was not
> > > happening: I've got a lot more timer interrupts happening than before.
> > > In 6 minutes of uptime:
> > > 
> > >            CPU0       CPU1
> > >  17:         38          0       AIC  17  abce0000.ethernet
> > >  18:        811          0       AIC  18  uartlite
> > >  20:        188        121       AIC  20  ipi
> > >  72:      11295       6499       AIC  72  jcore_pit
> > > 
> > > That's 30+ timer interrupts per sec on cpu0. Previously it was more
> > > like 1-5 per second when idle (with dyntick).
> > 
> > That is interesting, but I don't know what to make of it.  RCU does depend
> > heavily on timed sleeps to make forward progress, so if timer interrupts
> > were lost, that would be a smoking gun.  But you actually appear to have
> > -more- timer interrupts in the failure case rather than fewer!
> 
> Could the extra interrupts be a symptom of the rcu task doing a
> very-short sleep to retry over and over?

Possibly.  One way to check for this is to set the kernel boot variables
rcutree.jiffies_till_sched_qs=10 and rcutree.jiffies_till_first_fqs=10.

> > > > > My .config and message log are below. Let me know if there are any
> > > > > other details I can provide that would be helpful.
> > > > 
> > > > In case the timer issue I am suspecting is a red herring...
> > > > 
> > > > RCU does tune itself for different values of HZ, but it might need more
> > > > help for 50MHz CPUs.  (Although if this were the problem, I would expect
> > > > you to be seeing excessive CPU consumption by the rcu_sched kthread.)
> > > > But it might be worthwhile booting with rcutree.jiffies_till_sched_qs=10
> > > > and rcutree.jiffies_till_first_fqs=10 (or thereabouts).  It might also
> > > > be necessary to use a longer RCU CPU stall warning timeout (assuming
> > > > you don't just suppress it as above), for example, by booting with (say)
> > > > rcupdate.rcu_cpu_stall_timeout=60.
> > > 
> > > With that in mind, it seems plausible that the problem existed before
> > > and just went over the threshold for reporting the stall with a
> > > non-bug change going into 4.8. Would building an older known-good
> > > kernel with TREE_RCU_TRACE enabled and checking debugfs
> > > rcu/rcu_sched/rcugp for max=??? be informative?
> > 
> > It might be.  However, if ftrace is enabled on your platform, getting
> > a trace of the RCU-related events would be even more likely to be
> > helpful.  And for that matter, the timer-related events.
> 
> It looks like ftrace is supported on sh. Looks like I have some
> reading to do -- this could be very useful.

Sounds good!

> > One way to determine whether we were just on the edge on
> > the older systems would be to boot them with smaller values of
> > rcupdate.rcu_cpu_stall_timeout=60.  The default values of the other two
> > boot parameters looks to already be at the minimum usefule value, though.
> 
> CONFIG_RCU_CPU_STALL_TIMEOUT=21 seems to be what it's hitting.

Confused.  If you set CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values on
older kernels you see the RCU CPU stall warnings?

							Thanx, Paul