* [PATCH] Fix a NO_IDLE_HZ timer bug
@ 2006-05-18 20:48 Zachary Amsden
2006-05-19 9:26 ` Martin Schwidefsky
0 siblings, 1 reply; 2+ messages in thread
From: Zachary Amsden @ 2006-05-18 20:48 UTC (permalink / raw)
To: Linux Kernel Mailing List, Andrew Morton, schwidefsky, george,
Xen-devel
[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]
This bug affects at least s390, Xen, and occurred during testing of
NO_IDLE_HZ with our VMI patches. The bug is rather subtle and rare.
Jiffies can be advanced by other CPUs, but if the current CPU has not
processed timer softirqs in a while, the timer wheel can be behind
jiffies, causing an overflow when comparing the next timer wheel
expiration with the default high resolution timeout (no timeout), which
is relative to jiffies.
It also looks like s390 has another bug. When compiling the 32-bit
kernel, doesn't this computation overflow:
arch/s390/kernel/time.c, stop_hz_timer:274
/*
* This cpu is going really idle. Set up the clock comparator
* for the next event.
*/
next = next_timer_interrupt();
do {
seq = read_seqbegin_irqsave(&xtime_lock, flags);
timer = (__u64)(next - jiffies) + jiffies_64;
} while (read_seqretry_irqrestore(&xtime_lock, seq, flags));
Since jiffies can advance between next_timer_interrupt and the read
under xtime lock, next-jiffies could be negative. I would think you
want to cast that to signed long instead of __u64, but I'm not totally
qualified to talk about s390.
Zach
[-- Attachment #2: no-idle-hz-timer-race --]
[-- Type: text/plain, Size: 1792 bytes --]
Under certain timing conditions, a race during boot occurs where timer ticks
are being processed on remote CPUs. The remote timer ticks can increment
jiffies, and if this happens during a window when a timeout is very close to
expiring but a local tick has not yet been delivered, you can end up with
1) No softirq pending
2) A local timer wheel which is not synced to jiffies
3) No high resolution timer active
4) A local timer which is supposed to fire before the current jiffies value.
In this circumstance, the comparison in next_timer_interrupt overflows, because
the base of the comparison for high resolution timers is jiffies, but for the
softirq timer wheel, it is relative the the current base of the wheel
(jiffies_base).
Signed-off-by: Zachary Amsden <zach@vmware.com>
Index: linux-2.6.17-rc/kernel/timer.c
===================================================================
--- linux-2.6.17-rc.orig/kernel/timer.c 2006-05-18 13:32:22.000000000 -0700
+++ linux-2.6.17-rc/kernel/timer.c 2006-05-18 13:34:59.000000000 -0700
@@ -541,6 +541,22 @@ found:
}
spin_unlock(&base->lock);
+ /*
+ * It can happen that other CPUs service timer IRQs and increment
+ * jiffies, but we have not yet got a local timer tick to process
+ * the timer wheels. In that case, the expiry time can be before
+ * jiffies, but since the high-resolution timer here is relative to
+ * jiffies, the default expression when high-resolution timers are
+ * not active,
+ *
+ * time_before(MAX_JIFFY_OFFSET + jiffies, expires)
+ *
+ * would falsely evaluate to true. If that is the case, just
+ * return jiffies so that we can immediately fire the local timer
+ */
+ if (time_before(expires, jiffies))
+ return jiffies;
+
if (time_before(hr_expires, expires))
return hr_expires;
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH] Fix a NO_IDLE_HZ timer bug
2006-05-18 20:48 [PATCH] Fix a NO_IDLE_HZ timer bug Zachary Amsden
@ 2006-05-19 9:26 ` Martin Schwidefsky
0 siblings, 0 replies; 2+ messages in thread
From: Martin Schwidefsky @ 2006-05-19 9:26 UTC (permalink / raw)
To: Zachary Amsden
Cc: Linux Kernel Mailing List, Andrew Morton, george, Xen-devel
On Thu, 2006-05-18 at 13:48 -0700, Zachary Amsden wrote:
> It also looks like s390 has another bug. When compiling the 32-bit
> kernel, doesn't this computation overflow:
>
> arch/s390/kernel/time.c, stop_hz_timer:274
>
> /*
> * This cpu is going really idle. Set up the clock comparator
> * for the next event.
> */
> next = next_timer_interrupt();
> do {
> seq = read_seqbegin_irqsave(&xtime_lock, flags);
> timer = (__u64)(next - jiffies) + jiffies_64;
> } while (read_seqretry_irqrestore(&xtime_lock, seq, flags));
>
>
> Since jiffies can advance between next_timer_interrupt and the read
> under xtime lock, next-jiffies could be negative. I would think you
> want to cast that to signed long instead of __u64, but I'm not totally
> qualified to talk about s390.
Seems like you are qualified to talk about s390 in this case. The
extension of (next - jiffies) to a 64 bit value needs to be done as a
signed extension, follow by a cast to u64. Blech. I think to cast next
and jiffies to u64 before subtracting them is cleaner. It takes a few
more cycles because we now do two 64 bit adds/subtracts but the code is
used for going idle so it doesn't matter. Patch attached, thanks Zach.
--
blue skies,
Martin.
Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH
"Reality continues to ruin my life." - Calvin.
--
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
[patch] s390: next_timer_interrupt overflow in stop_hz_timer.
The 32 bit unsigned substraction (next - jiffies) in stop_hz_timer
can overflow if jiffies gets advanced between next_timer_interrupt
and the read under the xtime lock. The cast to a u64 then results
in a large value which causes the cpu to wait too long.
Fix this by casting next and jiffies independently to u64 before
subtracting them.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---
arch/s390/kernel/time.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
diff -urpN linux-2.6/arch/s390/kernel/time.c linux-2.6-patched/arch/s390/kernel/time.c
--- linux-2.6/arch/s390/kernel/time.c 2006-05-16 09:44:29.000000000 +0200
+++ linux-2.6-patched/arch/s390/kernel/time.c 2006-05-19 11:04:04.000000000 +0200
@@ -272,7 +272,7 @@ static inline void stop_hz_timer(void)
next = next_timer_interrupt();
do {
seq = read_seqbegin_irqsave(&xtime_lock, flags);
- timer = (__u64)(next - jiffies) + jiffies_64;
+ timer = (__u64 next) - (__u64 jiffies) + jiffies_64;
} while (read_seqretry_irqrestore(&xtime_lock, seq, flags));
todval = -1ULL;
/* Be careful about overflows. */
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2006-05-19 9:26 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-18 20:48 [PATCH] Fix a NO_IDLE_HZ timer bug Zachary Amsden
2006-05-19 9:26 ` Martin Schwidefsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox