From: Shaohua Li <shli@fb.com>
To: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
lkml <linux-kernel@vger.kernel.org>,
Prarit Bhargava <prarit@redhat.com>,
Richard Cochran <richardcochran@gmail.com>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Ingo Molnar <mingo@kernel.org>
Subject: Re: [PATCH 8/9] clocksource: Improve unstable clocksource detection
Date: Mon, 17 Aug 2015 19:57:06 -0700 [thread overview]
Message-ID: <20150818025704.GA1129225@devbig257.prn2.facebook.com> (raw)
In-Reply-To: <CALAqxLWOzkbbFeAUKrE9B=O786vuJVmWQdagbfxkpLJDm2mXwQ@mail.gmail.com>
On Mon, Aug 17, 2015 at 03:17:28PM -0700, John Stultz wrote:
> On Mon, Aug 17, 2015 at 3:04 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > On Mon, 17 Aug 2015, John Stultz wrote:
> >
> >> From: Shaohua Li <shli@fb.com>
> >>
> >> >From time to time we saw TSC is marked as unstable in our systems, while
> >
> > Stray '>'
> >
> >> the CPUs declare to have stable TSC. Looking at the clocksource unstable
> >> detection, there are two problems:
> >> - watchdog clock source wrap. HPET is the most common watchdog clock
> >> source. It's 32-bit and runs in 14.3Mhz. That means the hpet counter
> >> can wrap in about 5 minutes.
> >> - threshold isn't scaled against interval. The threshold is 0.0625s in
> >> 0.5s interval. What if the actual interval is bigger than 0.5s?
> >>
> >> The watchdog runs in a timer bh, so hard/soft irq can defer its running.
> >> Heavy network stack softirq can hog a cpu. IPMI driver can disable
> >> interrupt for a very long time.
> >
> > And they hold off the timer softirq for more than a second? Don't you
> > think that's the problem which needs to be fixed?
>
> Though this is an issue I've experienced (and tried unsuccessfully to
> fix in a more complicated way) with the RT kernel, where high priority
> tasks blocked the watchdog long enough that we'd disqualify the TSC.
>
> Ideally that sort of high-priority RT busyness would be avoided, but
> its also a pain to have false positive trigger when doing things like
> stress testing.
>
>
> >> The first problem is mostly we are suffering I think.
> >
> > So you think that's the root cause and because your patch makes it go
> > away it's not necessary to know for sure, right?
> >
> >> Here is a simple patch to fix the issues. If the waterdog doesn't run
> >
> > waterdog?
>
> Allergen-free. :)
>
>
> >> for a long time, we ignore the detection.
> >
> > What's 'long time'? Please explain the numbers chosen.
> >
> >> This should work for the two
> >
> > Emphasis on 'should'?
> >
> >> problems. For the second one, we probably doen't need to scale if the
> >> interval isn't very long.
> >
> > -ENOPARSE
> >
> >> @@ -122,9 +122,10 @@ static int clocksource_watchdog_kthread(void *data);
> >> static void __clocksource_change_rating(struct clocksource *cs, int rating);
> >>
> >> /*
> >> - * Interval: 0.5sec Threshold: 0.0625s
> >> + * Interval: 0.5sec MaxInterval: 1s Threshold: 0.0625s
> >> */
> >> #define WATCHDOG_INTERVAL (HZ >> 1)
> >> +#define WATCHDOG_MAX_INTERVAL_NS (NSEC_PER_SEC)
> >> #define WATCHDOG_THRESHOLD (NSEC_PER_SEC >> 4)
> >>
> >> static void clocksource_watchdog_work(struct work_struct *work)
> >> @@ -217,7 +218,9 @@ static void clocksource_watchdog(unsigned long data)
> >> continue;
> >>
> >> /* Check the deviation from the watchdog clocksource. */
> >> - if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
> >> + if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) &&
> >> + cs_nsec < WATCHDOG_MAX_INTERVAL_NS &&
> >> + wd_nsec < WATCHDOG_MAX_INTERVAL_NS) {
> >
> > So that adds a new opportunity for undiscovered wreckage:
> >
> > clocksource_watchdog();
> > .... <--- SMI skews TSC
> > looong_irq_disabled_region();
> > ....
> > clocksource_watchdog(); <--- Does not detect skew
> >
> > and it will not detect it later on if that SMI was a one time event.
> >
> > So 'fixing' the watchdog is the wrong approach. Fixing the stuff which
> > prevents the watchdog to run is the proper thing to do.
>
> I'm not sure here. I feel like these delay-caused false positives
> (I've seen similar reports w/ VMs being stalled) are more common then
> one-off SMI TSC skews.
>
> There are hard lines in the timekeeping code, where we do say: Don't
> delay us past X or we can't really handle it, but in this case, the
> main clocksource is fine and the limit is being caused by the
> watchdog. So I think some sort of a solution to remove this
> restriction would be good. We don't want to needlessly punish fine
> hardware because our checks for bad hardware add extra restrictions.
>
> That said, I agree the "should"s and other vague qualifiers in the
> commit description you point out should have more specifics to back
> things up. And I'm fine delaying this (and the follow-on) patch until
> those details are provided.
It's not something I guess. We do see the issue from time to time. The
IPMI driver accesses some IO ports in softirq and hog cpu for a very
long time, then the watchdog alert. The false alert on the other hand
has very worse effect. It forces to use HPET as clocksource, which has
very big performance penality. We can't even manually switch back to TSC
as current interface doesn't allow us to do it, then we can only reboot
the system. I agree the driver should be fixed, but the watchdog has
false alert, we definitively should fix it.
The 1s interval is arbitary. If you think there is better way to fix the
issue, please let me know.
Thanks,
Shaohua
next prev parent reply other threads:[~2015-08-18 2:57 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-17 20:40 [PATCH 0/9] Time items for 4.3 John Stultz
2015-08-17 20:40 ` [PATCH 1/9] timer_list: Add the base offset so remaining nsecs are accurate for non monotonic timers John Stultz
2015-08-17 21:01 ` Shuah Khan
2015-08-17 21:04 ` Shuah Khan
2015-08-17 21:05 ` John Stultz
2015-08-17 20:40 ` [PATCH 2/9] time: Fix nanosecond file time rounding in timespec_trunc() John Stultz
2015-08-17 22:14 ` Thomas Gleixner
2015-08-17 20:40 ` [PATCH 3/9] time: Always make sure wall_to_monotonic isn't positive John Stultz
2015-08-17 20:40 ` [PATCH 4/9] time: Add the common weak version of update_persistent_clock() John Stultz
2015-08-17 20:40 ` [PATCH 5/9] time: Introduce struct itimerspec64 John Stultz
2015-08-17 20:41 ` [PATCH 6/9] time: Introduce current_kernel_time64() John Stultz
2015-08-17 20:41 ` [PATCH 7/9] time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64() John Stultz
2015-08-17 20:41 ` [PATCH 8/9] clocksource: Improve unstable clocksource detection John Stultz
2015-08-17 22:04 ` Thomas Gleixner
2015-08-17 22:17 ` John Stultz
2015-08-18 2:57 ` Shaohua Li [this message]
2015-08-18 3:39 ` John Stultz
2015-08-18 8:57 ` Thomas Gleixner
2015-08-18 8:38 ` Thomas Gleixner
2015-08-18 17:49 ` John Stultz
2015-08-18 19:28 ` Thomas Gleixner
2015-08-18 20:11 ` John Stultz
2015-08-18 20:18 ` Thomas Gleixner
2015-08-26 17:15 ` Shaohua Li
2015-08-31 21:12 ` Shaohua Li
2015-08-31 21:47 ` Thomas Gleixner
2015-08-31 22:39 ` Shaohua Li
2015-09-01 17:13 ` Thomas Gleixner
2015-09-01 18:14 ` Shaohua Li
2015-09-01 18:55 ` Thomas Gleixner
2015-09-01 19:35 ` Steven Rostedt
2015-09-02 6:50 ` Peter Zijlstra
2015-08-17 20:41 ` [PATCH 9/9] clocksource: Sanity check watchdog clocksource John Stultz
2015-08-17 21:24 ` Thomas Gleixner
2015-08-17 22:03 ` John Stultz
2015-08-17 22:08 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150818025704.GA1129225@devbig257.prn2.facebook.com \
--to=shli@fb.com \
--cc=daniel.lezcano@linaro.org \
--cc=john.stultz@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=prarit@redhat.com \
--cc=richardcochran@gmail.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox