From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Rich Felker <dalias@libc.org>
Cc: linux-kernel@vger.kernel.org, john.stultz@linaro.org, tglx@linutronix.de
Subject: Re: rcu_sched stalls in idle task introduced in pre-4.8?
Date: Tue, 2 Aug 2016 13:45:04 -0700 [thread overview]
Message-ID: <20160802204504.GL3482@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160802203217.GZ15995@brightrain.aerifal.cx>
On Tue, Aug 02, 2016 at 04:32:17PM -0400, Rich Felker wrote:
> On Tue, Aug 02, 2016 at 12:48:02PM -0700, Paul E. McKenney wrote:
> > > > > The system I'm testing on is an SMP configuration of the J2 SoC on
> > > > > FPGA (see: j-core.org), and the SMP support and timer driver (most
> > > > > relevant, imo) are new, but never exhibited this problem on 4.6 or
> > > > > 4.7. A tree with all drivers merged is available in the development
> > > > > branch of git://git.libc.org/linux-sh; it's regularly rebased but
> > > > > there's a perma-tag devel-20160801 you can check out instead. Fetching
> > > > > into an existing clone of Linus's tree is recommended since the server
> > > > > has limited resources.
> > > > >
> > > > > I doubt the code from my devel branch will be useful without hardware
> > > > > to test on, though. Hopefully the issue is reproducible on other
> > > > > systems. Very low cpu speed (50 MHz) could be a factor. The problem
> > > > > seems to go away with dyntick disabled.
> > > >
> > > > Indeed, it is quite possible that RCU's self-tuning needs a bit of help
> > > > for CPUs below a few hundred MHz. As might other pieces of the kernel.
> > > > Should not be a big deal (famous last words...). However, given that
> > > > you are running HZ=100, it should be reasonably close. So first, let's
> > > > see what is going on here...
> > > >
> > > > The following dmesg line usually indicates that RCU's kernel thread is
> > > > stuck for some reason.
> > > >
> > > > [42039.188293] rcu_sched kthread starved for 2152 jiffies! g70249 c70248 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
> > > >
> > > > The RCU_GP_WAIT_FQS indicates that it is waiting for a timer to expire.
> > > > The task is in TASK_INTERRUPTIBLE state, but it should have been awakened
> > > > after only a few jiffies sleep time. The stack trace looks like there
> > > > might be trouble canceling the timer.
> > >
> > > Can you offer any advice on tracking down what it's waiting for, or
> > > what's going on with the timer? Is any arch- or clockevent-specific
> > > stuff that might be my fault involved here?
> >
> > It is in an swait_event_interruptible_timeout(), most likely waiting on
> > a one-jiffy timer, as in __mod_timer(). If I understood your earlier
> > email correctly, setting CONFIG_HZ_PERIODIC avoided this problem.
> > If my understanding is correct, it might be worth reviewing any timer
> > code that might be pushed along by the once-per-jiffy scheduler tick.
>
> Is it possible that something wacky is going on with broadcast timers?
> A dummy broadcast timer device gets initialized momentatily before the
> percpu timer for the secondary cpu is initialized, and in my
> experience these broadcast timers don't work when the real timer
> device has a cpumask limiting it to a specific cpu (which seems like a
> bug, but one I haven't had to care about since I'm not using it).
I need to defer to the timer guys on this one, adding them on CC.
> > Does the problem reproduces easily?
>
> Yes, it happens right after boot and repeats every 30-90 seconds or
> so.
Well, that at least makes it easier to test any patches!
> > A bisection might be very helpful.
>
> Bisection would require some manual work to setup because the whole
> reason I was rebasing on Linus's tree was to adapt the drivers to
> upstream infrastructure changes (the new cpuhp stuff replacing
> notifier for cpu starting). The unfortunate way it was done, each
> driver adds an enum to linux/cpuhotplug.h so all the patches have
> gratuitous conflicts. In addition, for older revisions in Linus's
> tree, there's at least one show-stopping (hang during boot) bug that
> needs a cherry-pick to fix. There may be other small issues too. I
> don't think they're at all insurmountible but it requires an annoying
> amount of scripting.
I had to ask! Might eventually be necessary, but let's see what we
can learn from what you currently have.
> > > > Is it possible that the timer
> > > > handler is being prevented from making progress due to heavy interrupt
> > > > load or some such?
> > >
> > > Actually the stall messages are only appearing when the system is idle
> > > and interrupts are infrequent. That's what leads me to believe it
> > > might be a situation where, on every (infrequent) wake, both the rcu
> > > task and something blocking its progress happen at roughly the same
> > > time, and the slowness of the cpu leads the rcu task to decide to wait
> > > for the next timer/schedule event and try again -- over and over. Of
> > > course this hypothesis is just based on a very high-level
> > > understanding of what's going on; I'm not sufficiently familiar wilth
> > > the implementation specifics to offer more.
> > >
> > > One observation I just made versus old kernels where this was not
> > > happening: I've got a lot more timer interrupts happening than before.
> > > In 6 minutes of uptime:
> > >
> > > CPU0 CPU1
> > > 17: 38 0 AIC 17 abce0000.ethernet
> > > 18: 811 0 AIC 18 uartlite
> > > 20: 188 121 AIC 20 ipi
> > > 72: 11295 6499 AIC 72 jcore_pit
> > >
> > > That's 30+ timer interrupts per sec on cpu0. Previously it was more
> > > like 1-5 per second when idle (with dyntick).
> >
> > That is interesting, but I don't know what to make of it. RCU does depend
> > heavily on timed sleeps to make forward progress, so if timer interrupts
> > were lost, that would be a smoking gun. But you actually appear to have
> > -more- timer interrupts in the failure case rather than fewer!
>
> Could the extra interrupts be a symptom of the rcu task doing a
> very-short sleep to retry over and over?
Possibly. One way to check for this is to set the kernel boot variables
rcutree.jiffies_till_sched_qs=10 and rcutree.jiffies_till_first_fqs=10.
> > > > > My .config and message log are below. Let me know if there are any
> > > > > other details I can provide that would be helpful.
> > > >
> > > > In case the timer issue I am suspecting is a red herring...
> > > >
> > > > RCU does tune itself for different values of HZ, but it might need more
> > > > help for 50MHz CPUs. (Although if this were the problem, I would expect
> > > > you to be seeing excessive CPU consumption by the rcu_sched kthread.)
> > > > But it might be worthwhile booting with rcutree.jiffies_till_sched_qs=10
> > > > and rcutree.jiffies_till_first_fqs=10 (or thereabouts). It might also
> > > > be necessary to use a longer RCU CPU stall warning timeout (assuming
> > > > you don't just suppress it as above), for example, by booting with (say)
> > > > rcupdate.rcu_cpu_stall_timeout=60.
> > >
> > > With that in mind, it seems plausible that the problem existed before
> > > and just went over the threshold for reporting the stall with a
> > > non-bug change going into 4.8. Would building an older known-good
> > > kernel with TREE_RCU_TRACE enabled and checking debugfs
> > > rcu/rcu_sched/rcugp for max=??? be informative?
> >
> > It might be. However, if ftrace is enabled on your platform, getting
> > a trace of the RCU-related events would be even more likely to be
> > helpful. And for that matter, the timer-related events.
>
> It looks like ftrace is supported on sh. Looks like I have some
> reading to do -- this could be very useful.
Sounds good!
> > One way to determine whether we were just on the edge on
> > the older systems would be to boot them with smaller values of
> > rcupdate.rcu_cpu_stall_timeout=60. The default values of the other two
> > boot parameters looks to already be at the minimum usefule value, though.
>
> CONFIG_RCU_CPU_STALL_TIMEOUT=21 seems to be what it's hitting.
Confused. If you set CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values on
older kernels you see the RCU CPU stall warnings?
Thanx, Paul
next prev parent reply other threads:[~2016-08-02 20:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-02 17:04 rcu_sched stalls in idle task introduced in pre-4.8? Rich Felker
2016-08-02 18:16 ` Paul E. McKenney
2016-08-02 19:20 ` Rich Felker
2016-08-02 19:48 ` Paul E. McKenney
2016-08-02 20:32 ` Rich Felker
2016-08-02 20:45 ` Paul E. McKenney [this message]
2016-08-03 16:16 ` Paul E. McKenney
2016-09-08 22:16 ` Rich Felker
2016-09-10 10:19 ` Paul E. McKenney
2016-09-12 15:13 ` Rich Felker
2016-09-12 15:36 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160802204504.GL3482@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=dalias@libc.org \
--cc=john.stultz@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox