Re: RCU NOHZ, tsc, and clock_gettime

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: John Stultz <johnstul@us.ibm.com>
To: Prarit Bhargava <prarit@redhat.com>
Cc: Linux Kernel <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	paulmck@linux.vnet.ibm.com
Subject: Re: RCU NOHZ, tsc, and clock_gettime
Date: Thu, 11 Oct 2012 12:51:44 -0700	[thread overview]
Message-ID: <50772350.1070903@us.ibm.com> (raw)
In-Reply-To: <5077157A.7060401@redhat.com>

On 10/11/2012 11:52 AM, Prarit Bhargava wrote:
> I've been tracking an odd bug that may involve the RCU NOHZ code and
> just want to know if you have any ideas on debugging and/or what might be
> wrong.  Note the bug happens on *BOTH* upstream and the current RHEL6 tree.
> The data in this email is from running on RHEL6 because that's what I happen
> to be running ATM.  The result, however, is _identical_ to that of linux.git
> latest.
>
> The attached program compares userspace TSC reads to the time returned from
> the REALTIME_CLOCK[1].  The test does the following
>
> read tsc1
> get REALTIME_CLOCK value
> read tsc2
>
> and then does a comparison between the tsc read and the REALTIME_CLOCK value
> to see if they are in sync with each other.
>
> [I'm leaving out the guts of the analysis here.  It is sufficient to show
> examples of "good" data and "bad" data IMO.]
>
> On a good run, we see little variance in between the values:
>
>      0   144       0.1
>      1   138       1.8
>      2   147      -2.9
[snip]
>     29   144      -0.6
> n: 30, slope: 0.50 (1.99 GHz), dev: 1.1 ns, max: 2.9 ns
>
>
> On a bad run, there is a lot of variance between the values:
>
>      0   144    -346.0
>      1   138    1410.8
>      2   138    -806.9
>      3   141    4006.6
>      4   147   -3996.1
[snip]
>     29   141     -50.3
> n: 30, slope: 0.50 (1.99 GHz), dev: 1231.4 ns, max: 4006.6 ns

Do you see the same noisy variance when instead of doing:
     rdtsc()
     clock_gettime()
     rdtsc()

you do:
     clock_gettime()
     clock_gettime()

And calculate the delta of the timestamp results?

Also does this behavior change if you select different clocksources on 
the system?

> It was noted by the bug reporter that specifying "nohz=off" resolved the
> problem.  I tested with "nohz=off" and AFAICT it fixes the issue.  I started
> out debugging by assuming that delays in the c-state transitions were not being
> properly accounted for in the timing calculations.
>
> I ran a baseline test on an unmodified kernel (with no extra boot options) and
> confirmed that powertop shows the CPUs entering deep c-states while the test was
> running for 300 runs.
>
> I then instrumented the PM QoS and the power management code (specifically
> cpuidle).  I put in a large # of printk's to monitor the CPU transitions, and
> monitored the power states via powertop in order to verify that the system was
> behaving correctly wrt PM QoS.
>
> If you modify the tstsc script to run 300 times with this modified kernel, and
> run powertop in the middle of the script, you will see that the processors do
> NOT enter deep c-states.  **This means that PM QoS is doing its job correctly**.

So its not clear here,  do you see the same noisier latencies when using 
PM QoS to limit deep c-states?

Finally, how many cpus are on the machine you see this with?  Does the 
effect go away with maxcpus=1?

It looks like the TSC clocksource is being used, just because the 
latencies are so low, but Is the same clocksource being used in all of 
these cases?

One possibility is that if the cpu we're doing our timekeeping 
accumulation on is different then the one running the test, we might go 
into deeper idle for longer periods of time. Then when we accumulate 
time, we have more then a single tick to accumulate and that might 
require holding the timekeeper/xtime lock for longer times.

And the max 2.9ns variance seems particularly low, given that we do call 
update_vsyscall every so often, and that should block clock_gettime() 
callers while we update the vsyscall data.  Could it be that the test is 
too short to see the locking effect, so you're just getting lucky, and 
that adding nohz is jostling the regularity of the execution so you then 
see the lock wait times?  If you increase the samples and sample loops 
by 1000 does that change the behavior?

thanks
-john

next prev parent reply	other threads:[~2012-10-11 19:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-11 18:52 RCU NOHZ, tsc, and clock_gettime Prarit Bhargava
2012-10-11 19:51 ` John Stultz [this message]
2012-10-11 20:21   ` Paul E. McKenney
2012-10-12 15:40     ` Prarit Bhargava
2012-10-15 14:56       ` Paul E. McKenney
2012-11-12 23:27       ` John Stultz
2012-11-13 15:18         ` Prarit Bhargava
2012-10-12 18:27     ` Prarit Bhargava
2012-10-14 22:51       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50772350.1070903@us.ibm.com \
    --to=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=prarit@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox