From: John Stultz <johnstul@us.ibm.com>
To: Frans Pop <elendil@planet.nl>, Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>,
linux-s390@vger.kernel.org, Roman Zippel <zippel@linux-m68k.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [BUG,2.6.28,s390] Fails to boot in Hercules S/390 emulator - hang traced
Date: Mon, 23 Mar 2009 15:19:00 -0700 [thread overview]
Message-ID: <1237846740.7068.15.camel@jstultz-laptop> (raw)
In-Reply-To: <200903230111.08814.elendil@planet.nl>
On Mon, 2009-03-23 at 01:11 +0100, Frans Pop wrote:
> On Wednesday 18 March 2009, John Stultz wrote:
> > > Ever increasing error with tod on 2.6.28.8 (with Martin's patch
> > > applied):
> > > 0.672655! timekeeping: clock source changed from jiffies to tod (shift: 12)
> > > 0.676889! tod/12 (150): xtime.tv: 1237377507/55524946 -> 1237377507/55524947
> > > 0.677020! clock->xtime: 0 -> -4096, error: 0 -> -4294967296
> > > 0.680788! tod/12 (151): xtime.tv: 1237377507/55524947 -> 1237377507/55524948
> [...]
> > > 491.860765! tod/12 (37189): xtime.tv: 1237377998/55561985 -> 1237377998/55561986
> > > 491.860886! clock->xtime: -4096 -> -4096, error: -159081293676544 -> -159085588643840
> >
> > Hrm. Is the box otherwise working ok? The TOD clock should not be
> > affected by the second issue (one shot mode) discussed.
>
> Yes, the box^Wsystem works fine. I've now also seen the eventual correction
> of the error in action: after 35 mins of uptime clock->multi changed from
> 1000 to 999 (with tod).
>
> So the only issue left, though only indirectly related to the hang, is
> the initial behavior with clocksource jiffies where clocksource_bigadjust
> gets called every time update_wall_time is called (I've confirmed that).
>
> And possibly the cleanup change of clock->xtime_nsec to S64.
>
> I'll happily leave those to you as I readily admit my understanding of the
> whole timekeeping thing is still very limited. But if you'd like patches
> tested, feel free to CC me.
Here's the fix to the tick_handle_periodic() tripping into an infinite
loop. Again, this was only triggered because the divide error that
caused jiffies to be skewed enough that the clock-steering code
increased the ns per jiffy conversion value enough that any slack we had
in the loop before was lost.
Fixing the divide issue avoids the problem (and is pretty important to
get upstream), but the underlying issue that we allow ONESHOT clockevent
mode to be used while the jiffies clocksource is in use is a concern.
Thomas had pointed out that ppc and other arches that do not have
PERIODIC mode clockevents don't trip over this, but I believe this has
been just luck so far, as we do not enable clocksource switching till
bootup is almost finished (to avoid clocksource churn), so after
interrupts are enabled, but before clocksource switching is allowed,
there is a chance (albeit very very small) that clock steering could
cause a similar problem on other arches.
Thomas, what do you think about this? With it s390 runs fine even
without the do_div() fix.
thanks
-john
The following patch avoids and endless loop issue by requiring that a
highres valid clocksource be installed before we call tick_periodic() in
a loop when using ONESHOT mode. The result is we will only increment
jiffies once per interrupt until a continuous hardware clocksource is
available.
Without this, we can run into a endless loop, where each cycle through
the loop, jiffies is updated which increments time by tick_period or
more (due to clock steering), which can cause the event programming to
think the next event was before the newly incremented time and fail
causing tick_periodic() to be called again and the whole process loops
forever.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 21a5ca8..83c4417 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -93,7 +93,17 @@ void tick_handle_periodic(struct clock_event_device *dev)
for (;;) {
if (!clockevents_program_event(dev, next, ktime_get()))
return;
- tick_periodic(cpu);
+ /*
+ * Have to be careful here. If we're in oneshot mode,
+ * before we call tick_periodic() in a loop, we need
+ * to be sure we're using a real hardware clocksource.
+ * Otherwise we could get trapped in an infinite
+ * loop, as the tick_periodic() increments jiffies,
+ * when then will increment time, posibly causing
+ * the loop to trigger again and again.
+ */
+ if (timekeeping_valid_for_hres())
+ tick_periodic(cpu);
next = ktime_add(next, tick_period);
}
}
next prev parent reply other threads:[~2009-03-23 22:22 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-08 1:30 [BUG,2.6.28,s390] Fails to boot in Hercules S/390 emulator Frans Pop
2009-03-08 7:21 ` Frans Pop
2009-03-09 15:04 ` Frans Pop
2009-03-11 1:00 ` john stultz
2009-03-11 9:00 ` Frans Pop
2009-03-11 16:03 ` Frans Pop
2009-03-11 17:05 ` Frans Pop
2009-03-11 19:05 ` Frans Pop
2009-03-12 0:34 ` john stultz
2009-03-12 4:47 ` john stultz
2009-03-12 6:51 ` Frans Pop
2009-03-17 5:15 ` john stultz
2009-03-17 14:39 ` Frans Pop
2009-03-12 0:30 ` john stultz
2009-03-12 0:47 ` john stultz
2009-03-12 1:30 ` Thomas Gleixner
2009-03-12 1:57 ` john stultz
2009-03-12 7:50 ` Thomas Gleixner
2009-03-12 17:05 ` [BUG,2.6.28,s390] Fails to boot in Hercules S/390 emulator - hang traced Frans Pop
2009-03-13 11:48 ` Frans Pop
2009-03-13 17:34 ` Frans Pop
2009-03-17 5:09 ` john stultz
2009-03-18 2:26 ` john stultz
2009-03-18 2:54 ` john stultz
2009-03-18 9:28 ` Martin Schwidefsky
2009-03-18 12:07 ` Frans Pop
2009-03-18 15:48 ` John Stultz
2009-03-23 0:11 ` Frans Pop
2009-03-23 22:19 ` John Stultz [this message]
2009-03-24 8:23 ` Martin Schwidefsky
2009-04-14 22:27 ` [PATCH] Avoid possible endless loop when using jiffies clocksource and ONESHOT mode clockevent john stultz
2009-03-18 15:39 ` [BUG,2.6.28,s390] Fails to boot in Hercules S/390 emulator - hang traced John Stultz
2009-03-10 3:09 ` [BUG,2.6.28,s390] Fails to boot in Hercules S/390 emulator John Stultz
2009-03-10 3:37 ` Frans Pop
2009-03-10 3:38 ` John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1237846740.7068.15.camel@jstultz-laptop \
--to=johnstul@us.ibm.com \
--cc=elendil@planet.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
--cc=zippel@linux-m68k.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox