linux-sh.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rich Felker <dalias@libc.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	devicetree@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-sh@vger.kernel.org, Rob Herring <robh+dt@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>
Subject: Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver
Date: Sun, 02 Oct 2016 03:59:25 +0000	[thread overview]
Message-ID: <20161002035925.GQ19318@brightrain.aerifal.cx> (raw)
In-Reply-To: <20161002000049.GP19318@brightrain.aerifal.cx>

On Sat, Oct 01, 2016 at 08:00:49PM -0400, Rich Felker wrote:
> > > > > > - During the whole sequence, hrtimer expiration times are being set to
> > > > > >   exact jiffies (@ 100 Hz), whereas before it they're quite arbitrary.
> > > > > 
> > > > > When a CPU goes into NOHZ idle and the next (timer/hrtimer) is farther out
> > > > > than the next tick, then tick_sched_timer is set to this next event which
> > > > > can be far out. So that's expected.
> > > > > 
> > > > > > - The CLOCK_MONOTONIC hrtimer times do not match up with the
> > > > > >   timestamps; they're off by about 0.087s. I assume this is just
> > > > > >   sched_clock vs clocksource time and not a big deal.
> > > > > 
> > > > > Yes. You can tell the tracer to use clock monotonic so then they should match.
> > > > > 
> > > > > > - The rcu_sched process is sleeping with timeout=1. This seems
> > > > > >   odd/excessive.
> > > > > 
> > > > > Why is that odd? That's one tick, i.e. 10ms in your case. And that's not
> > > > > the problem at all. The problem is your timer not firing, but the cpu is
> > > > > obviously either getting out of idle and then moves the tick ahead for some
> > > > > unknown reason.
> > > > 
> > > > And a one-jiffy timeout is in fact expected behavior when HZ\x100.
> > > > You have to be running HZ%0 or better to have two-jiffy timeouts,
> > > > and HZP0 or better for three-jiffy timeouts.
> > > 
> > > One possible theory I'm looking at is that the two cpus are both
> > > waking up (leaving cpu_idle_poll or cpuidle_idle_call) every jiffy
> > > with sufficient consistency that every time the rcu_gp_fqs_check_wake
> > > loop wakes up in rcu_gp_kthread, the other cpu is in cpu_idle_loop but
> > > outside the rcu_idle_enter/rcu_idle_exit range. Would this block
> > > forward process? I added an LED indicator in rcu_gp_fqs_check_wake
> > > that shows the low 2 bits of rnp->qsmask every time it's called, and
> > > under normal operation the LEDs just flash on momentarily or just one
> > > stays on for a few seconds then goes off. During a stall both are
> > > stuck on. I'm still trying to make sense of the code but my impression
> > > so far is that, on a 2-cpu machine, this is a leaf node and the 2 bits
> > > correspond directly to cpus; is that right? If so I'm a bit confused
> > > because I don't see how forward progress could ever happen if the cpu
> > > on which rcu_gp_kthread is blocking forward progress of
> > > rcu_gp_kthread.
> > 
> > No.  If the CPUs are entering and leaving idle, and if your timers
> > were waking up rcu_sched every few jiffies like it asks, then the
> > repeated idle entry/exit events would be noticed, courtesy of the atomic
> > increments of ->dynticks and the rcu_sched kthread's snapshotting and
> > checking of this value.
> 
> I don't see how rcu_sched could notice the changes if it's stuck in
> the wait loop I think it's stuck in. There is no check of ->dynticks
> in rcu_gp_fqs_check_wake. Just in case checking *gfp & RCU_GP_FLAG_FQS
> accomplishes this, I updated my LED hacks to clear the LEDs in that
> exit path (and killed the other place that could turn them back on
> from cpu_idle_loop) but I still get 2 LEDs on for 21s followed by a
> stall message.
> 
> > Even if the CPUs were always non-idle on every
> > time force_quiescent_state() is invoked, give or take the possibility
> > of counter wrap -- but even on a 32-bit system, that takes awhile.
> 
> Perhaps force_quiescent_state is not getting invoked? Does that sound
> plausible, and if so, how should I go about debugging it? I see it is
> called from the stall reporting code, so that's presumably what's
> breaking the stalls.

I can confirm that force_quiescent_state is not being called at all
except from the stall handler. Where is it supposed to be called from?
I can't find any code paths to it except the stall handler and
__call_rcu_core, but the latter seems to only be called when adding
new rcu callbacks, not as a response to a stalled rcu_sched thread.
Maybe I'm missing something but this seems like incorrect logic in the
rcu subsystem.

Rich

  reply	other threads:[~2016-10-02  3:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1474693319.git.dalias@libc.org>
     [not found] ` <22c1ee0f908fe3bf8b70f5e87d659ceb29af1434.1474693319.git.dalias@libc.org>
     [not found]   ` <22c1ee0f908fe3bf8b70f5e87d659ceb29af1434.1474693319.git.dalias-8zAoT0mYgF4@public.gmane.org>
2016-09-26 21:07     ` [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver Rich Felker
2016-09-26 21:27       ` Daniel Lezcano
     [not found]         ` <4b02ba7d-4a31-297a-bbbd-be26da615e7b-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2016-09-26 22:23           ` Rich Felker
2016-09-26 23:55             ` Thomas Gleixner
2016-09-27  0:42               ` Rich Felker
2016-09-27 22:08                 ` Rich Felker
2016-09-30 13:15                   ` Thomas Gleixner
2016-09-30 13:48                     ` Paul E. McKenney
2016-10-01 17:05                       ` Rich Felker
2016-10-01 17:58                         ` Paul E. McKenney
     [not found]                           ` <20161001175837.GP14933-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-10-02  0:00                             ` Rich Felker
2016-10-02  3:59                               ` Rich Felker [this message]
2016-10-02  5:59                                 ` Paul E. McKenney
     [not found]                               ` <20161002000049.GP19318-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
2016-10-02  6:30                                 ` Paul E. McKenney
2016-10-08  1:32                     ` Rich Felker
2016-10-08 11:32                       ` Thomas Gleixner
2016-10-08 16:26                         ` Rich Felker
     [not found]                           ` <20161008162614.GY19318-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
2016-10-08 17:03                             ` Thomas Gleixner
2016-10-09  1:28                               ` Rich Felker
2016-10-09  9:14                                 ` Thomas Gleixner
2016-10-09 14:35                                   ` Rich Felker
2016-10-03 22:10         ` Rich Felker
     [not found]           ` <20161003221039.GR19318-C3MtFaGISjmo6RMmaWD+6Sb1p8zYI1N1@public.gmane.org>
2016-10-04  7:06             ` Paul E. McKenney
     [not found]               ` <20161004070623.GM14933-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2016-10-04 20:58                 ` Rich Felker
2016-10-04 21:14                   ` Paul E. McKenney
2016-10-04 21:48                     ` Rich Felker
2016-10-05 12:41                       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161002035925.GQ19318@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=devicetree@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=robh+dt@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).