From: John Stultz <john.stultz@linaro.org>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Don Zickus <dzickus@redhat.com>
Subject: Re: [RFC PATCH 6/6] timekeeping: Debug missing timekeeping updates
Date: Wed, 21 Aug 2013 10:25:57 -0700 [thread overview]
Message-ID: <5214F825.8010504@linaro.org> (raw)
In-Reply-To: <1377103341-15235-7-git-send-email-fweisbec@gmail.com>
On 08/21/2013 09:42 AM, Frederic Weisbecker wrote:
> With the full dynticks feature and the tricky full system idle
> detection code that is coming soon, it becomes necessary to have
> some debug code that makes sure that the timekeeping is always
> maintained and moving forward as expected.
>
> This provides a simple detection of missing timekeeping updates
> inspired by the lockup detector's use of CPU cycles clock.
>
> The jiffies are compared to the cpu clock after several snapshots taken
> from NMIs that trigger after arbitrary CPU cycles period overflow.
>
> If the jiffies progression appears to drift too far away from the CPU
> clock's, this triggers a warning.
>
> We just make sure not to account the tiny code on irq entry that
> may have stale jiffies values before tick_check_nohz() is called
> after the CPU is woken up while the system went full idle for some
> time.
>
> Same goes for idle exit in case the tick were stopped but idle
> was polling on need_resched().
So you're using sched_clock to try to detect timekeeping
inconsistencies. Hrm.. Do you have some examples of where this debug
infrastructure helped out?
A few thoughts:
1) Why are you using jiffies as the timekeeping reference instead of
reading some of actual timekeeping values? Jiffies usage has been
intentionally on the decline, and since the dynticks infrastructure
landed, jiffies are just derived from the timekeeping core, so its so
its sort of strange to see it used for this.
2) This seems very similar to the old lost-ticks compensation code we
had prior to the clocksource infrastructure, and seems like it might
suffer from some of the issues seen there. For instance, sched_clock has
been historically looser in its correctness requirements then the
timekeeping code, so using it to validate the more strict timekeeping
code, makes me worry we might see cases of false positives.
3) I'm also curious (maybe skeptical) as if sched_clock is reliable
enough to use for validating time, then we likely are using that same
hardware as the timekeeping clocksource. Thus cases where I'd suspect
you'd see likely issues w/ nohz, like clocksource counter overflows
being missed on quick wrapping clcoksources wouldn't really apply.
Personally, I've been thinking the timekeeping update code could use
some improvements/warnings around cases where update delay is larger
then the clocksource max_deferment - possibly falling back to a slower
overflow-proof multiply as is done in the CLOCK_SOURCE_SUSPEND_NONSTOP
resume case. This would allow more robust behaivor in cases like kvm
guests being paused for unreasonable lengths of time, and could also
provide very similar NOHZ debug warnings (assuming the clocksource
doesn't wrap quickly - but again, in those cases, I'm not confident we
can trust sched_clock either).
thanks
-john
next prev parent reply other threads:[~2013-08-21 17:26 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-21 16:42 [RFC PATCH 0/6] timekeeping: Missing timekeeping update detection Frederic Weisbecker
2013-08-21 16:42 ` [RFC PATCH 1/6] sched: Let arch tell us if sched clock is NMI-safe Frederic Weisbecker
2013-08-21 16:42 ` [RFC PATCH 2/6] x86: nsecs to cycles conversion Frederic Weisbecker
2013-08-21 18:26 ` Don Zickus
2013-08-30 10:35 ` Frederic Weisbecker
2013-08-21 16:42 ` [RFC PATCH 3/6] x86: Tell that sched clock is callable in nmi Frederic Weisbecker
2013-08-21 16:42 ` [RFC PATCH 4/6] seqlock: Add raw_seqbegin() for non-waiting readers Frederic Weisbecker
2013-08-21 16:42 ` [RFC PATCH 5/6] jiffies: Add jiffies_to_nsecs Frederic Weisbecker
2013-08-21 16:42 ` [RFC PATCH 6/6] timekeeping: Debug missing timekeeping updates Frederic Weisbecker
2013-08-21 17:25 ` John Stultz [this message]
2013-08-30 11:05 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5214F825.8010504@linaro.org \
--to=john.stultz@linaro.org \
--cc=dzickus@redhat.com \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox