* [PATCH 0/3] [PATCH] more sched_clock updates
@ 2008-07-09 4:15 Steven Rostedt
2008-07-09 4:15 ` [PATCH 1/3] sched_clock: only update deltas with local reads Steven Rostedt
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Steven Rostedt @ 2008-07-09 4:15 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Andrew Morton,
john stultz, linux-kernel
ftrace is still seeing jumps in the sched_clock.c code that is not seen in
using sched_clock() function itself. The following patches fix some of
the issues that I've seen. There is still a slight drift but all my attempts
to remove it completely have failed. Patch 3 does the best sofar at
keeping the jumps to a minimum.
-- Steve
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/3] sched_clock: only update deltas with local reads.
2008-07-09 4:15 [PATCH 0/3] [PATCH] more sched_clock updates Steven Rostedt
@ 2008-07-09 4:15 ` Steven Rostedt
2008-07-09 4:15 ` [PATCH 2/3] sched_clock: record TSC after gtod Steven Rostedt
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2008-07-09 4:15 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Andrew Morton,
john stultz, linux-kernel
Cc: Steven Rostedt
[-- Attachment #1: sched_clock-update-only-local-cpu.patch --]
[-- Type: text/plain, Size: 2264 bytes --]
Reading the CPU clock should try to stay accurate within the CPU.
By reading the CPU clock from another CPU and updating the deltas can
cause unneeded jumps when reading from the local CPU.
This patch changes the code to update the last read TSC only when read
from the local CPU.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
kernel/sched_clock.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
Index: linux-tip.git/kernel/sched_clock.c
===================================================================
--- linux-tip.git.orig/kernel/sched_clock.c 2008-07-07 19:29:40.000000000 -0400
+++ linux-tip.git/kernel/sched_clock.c 2008-07-08 11:03:05.000000000 -0400
@@ -124,7 +124,7 @@ static int check_max(struct sched_clock_
* - filter out backward motion
* - use jiffies to generate a min,max window to clip the raw values
*/
-static void __update_sched_clock(struct sched_clock_data *scd, u64 now)
+static void __update_sched_clock(struct sched_clock_data *scd, u64 now, u64 *time)
{
unsigned long now_jiffies = jiffies;
long delta_jiffies = now_jiffies - scd->tick_jiffies;
@@ -162,8 +162,12 @@ static void __update_sched_clock(struct
if (unlikely(clock < min_clock))
clock = min_clock;
- scd->prev_raw = now;
- scd->clock = clock;
+ if (time)
+ *time = clock;
+ else {
+ scd->prev_raw = now;
+ scd->clock = clock;
+ }
}
static void lock_double_clock(struct sched_clock_data *data1,
@@ -207,15 +211,18 @@ u64 sched_clock_cpu(int cpu)
now -= scd->tick_gtod;
__raw_spin_unlock(&my_scd->lock);
+
+ __update_sched_clock(scd, now, &clock);
+
+ __raw_spin_unlock(&scd->lock);
+
} else {
__raw_spin_lock(&scd->lock);
+ __update_sched_clock(scd, now, NULL);
+ clock = scd->clock;
+ __raw_spin_unlock(&scd->lock);
}
- __update_sched_clock(scd, now);
- clock = scd->clock;
-
- __raw_spin_unlock(&scd->lock);
-
return clock;
}
@@ -234,7 +241,7 @@ void sched_clock_tick(void)
now_gtod = ktime_to_ns(ktime_get());
__raw_spin_lock(&scd->lock);
- __update_sched_clock(scd, now);
+ __update_sched_clock(scd, now, NULL);
/*
* update tick_gtod after __update_sched_clock() because that will
* already observe 1 new jiffy; adding a new tick_gtod to that would
--
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 2/3] sched_clock: record TSC after gtod
2008-07-09 4:15 [PATCH 0/3] [PATCH] more sched_clock updates Steven Rostedt
2008-07-09 4:15 ` [PATCH 1/3] sched_clock: only update deltas with local reads Steven Rostedt
@ 2008-07-09 4:15 ` Steven Rostedt
2008-07-09 4:15 ` [PATCH 3/3] sched_clock: and multiplier for TSC to gtod drift Steven Rostedt
2008-07-11 13:56 ` [PATCH 0/3] [PATCH] more sched_clock updates Ingo Molnar
3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2008-07-09 4:15 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Andrew Morton,
john stultz, linux-kernel
Cc: Steven Rostedt
[-- Attachment #1: sched_clock-reverse-order-raw-gtod.patch --]
[-- Type: text/plain, Size: 953 bytes --]
To read the gtod we need to grab the xtime lock for read. Reading the gtod
before the TSC can cause a bigger gab if the xtime lock is contended.
This patch simply reverses the order to read the TSC after the gtod.
The locking in the reading of the gtod handles any barriers one might
think is needed.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
kernel/sched_clock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-tip.git/kernel/sched_clock.c
===================================================================
--- linux-tip.git.orig/kernel/sched_clock.c 2008-07-08 11:03:05.000000000 -0400
+++ linux-tip.git/kernel/sched_clock.c 2008-07-08 16:53:34.000000000 -0400
@@ -237,8 +237,8 @@ void sched_clock_tick(void)
WARN_ON_ONCE(!irqs_disabled());
- now = sched_clock();
now_gtod = ktime_to_ns(ktime_get());
+ now = sched_clock();
__raw_spin_lock(&scd->lock);
__update_sched_clock(scd, now, NULL);
--
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 3/3] sched_clock: and multiplier for TSC to gtod drift
2008-07-09 4:15 [PATCH 0/3] [PATCH] more sched_clock updates Steven Rostedt
2008-07-09 4:15 ` [PATCH 1/3] sched_clock: only update deltas with local reads Steven Rostedt
2008-07-09 4:15 ` [PATCH 2/3] sched_clock: record TSC after gtod Steven Rostedt
@ 2008-07-09 4:15 ` Steven Rostedt
2008-07-11 13:56 ` [PATCH 0/3] [PATCH] more sched_clock updates Ingo Molnar
3 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2008-07-09 4:15 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Andrew Morton,
john stultz, linux-kernel
Cc: Steven Rostedt
[-- Attachment #1: sched-clock-drift-clock.patch --]
[-- Type: text/plain, Size: 4538 bytes --]
The sched_clock code currently tries to keep all CPU clocks of all CPUS
somewhat in sync. At every clock tick it records the gtod clock and
uses that and jiffies and the TSC to calculate a CPU clock that tries to
stay in sync with all the other CPUs.
ftrace depends heavily on this timer and it detects when this timer
"jumps". One problem is that the TSC and the gtod also drift.
When the TSC is 0.1% faster or slower than the gtod it is very noticeable
in ftrace. To help compensate for this, I've added a multiplier that
tries to keep the CPU clock updating at the same rate as the gtod.
I've tried various ways to get it to be in sync and this ended up being
the most reliable. At every scheduler tick we calculate the new multiplier:
multi = delta_gtod / delta_TSC
This means we perform a 64 bit divide at the tick (once a HZ). A shift
is used to handle the accuracy.
Other methods that failed due to dynamic HZ are:
(not used) multi += (gtod - tsc) / delta_gtod
(not used) multi += (gtod - (last_tsc + delta_tsc)) / delta_gtod
as well as other variants.
This code still allows for a slight drift between TSC and gtod, but
it keeps the damage down to a minimum.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
---
kernel/sched_clock.c | 40 +++++++++++++++++++++++++++++++++++++---
1 file changed, 37 insertions(+), 3 deletions(-)
Index: linux-tip.git/kernel/sched_clock.c
===================================================================
--- linux-tip.git.orig/kernel/sched_clock.c 2008-07-08 22:14:36.000000000 -0400
+++ linux-tip.git/kernel/sched_clock.c 2008-07-08 22:49:48.000000000 -0400
@@ -3,6 +3,9 @@
*
* Copyright (C) 2008 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
*
+ * Updates and enhancements:
+ * Copyright (C) 2008 Red Hat, Inc. Steven Rostedt <srostedt@redhat.com>
+ *
* Based on code by:
* Ingo Molnar <mingo@redhat.com>
* Guillaume Chazarain <guichaz@gmail.com>
@@ -32,6 +35,11 @@
#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
+#define MULTI_SHIFT 15
+/* Max is double, Min is 1/2 */
+#define MAX_MULTI (2LL << MULTI_SHIFT)
+#define MIN_MULTI (1LL << (MULTI_SHIFT-1))
+
struct sched_clock_data {
/*
* Raw spinlock - this is a special case: this might be called
@@ -45,6 +53,7 @@ struct sched_clock_data {
u64 tick_raw;
u64 tick_gtod;
u64 clock;
+ s64 multi;
#ifdef CONFIG_NO_HZ
int check_max;
#endif
@@ -79,6 +88,7 @@ void sched_clock_init(void)
scd->tick_raw = 0;
scd->tick_gtod = ktime_now;
scd->clock = ktime_now;
+ scd->multi = 1 << MULTI_SHIFT;
#ifdef CONFIG_NO_HZ
scd->check_max = 1;
#endif
@@ -134,8 +144,13 @@ static void __update_sched_clock(struct
WARN_ON_ONCE(!irqs_disabled());
- min_clock = scd->tick_gtod +
- (delta_jiffies ? delta_jiffies - 1 : 0) * TICK_NSEC;
+ /*
+ * At schedule tick the clock can be just under the gtod. We don't
+ * want to push it too prematurely.
+ */
+ min_clock = scd->tick_gtod + (delta_jiffies * TICK_NSEC);
+ if (min_clock > TICK_NSEC)
+ min_clock -= TICK_NSEC / 2;
if (unlikely(delta < 0)) {
clock++;
@@ -149,6 +164,9 @@ static void __update_sched_clock(struct
*/
max_clock = scd->tick_gtod + (2 + delta_jiffies) * TICK_NSEC;
+ delta *= scd->multi;
+ delta >>= MULTI_SHIFT;
+
if (unlikely(clock + delta > max_clock) && check_max(scd)) {
if (clock < max_clock)
clock = max_clock;
@@ -230,6 +248,7 @@ void sched_clock_tick(void)
{
struct sched_clock_data *scd = this_scd();
unsigned long now_jiffies = jiffies;
+ s64 mult, delta_gtod, delta_raw;
u64 now, now_gtod;
if (unlikely(!sched_clock_running))
@@ -247,9 +266,23 @@ void sched_clock_tick(void)
* already observe 1 new jiffy; adding a new tick_gtod to that would
* increase the clock 2 jiffies.
*/
- scd->tick_jiffies = now_jiffies;
+ delta_gtod = now_gtod - scd->tick_gtod;
+ delta_raw = now - scd->tick_raw;
+
+ if ((long)delta_raw > 0) {
+ mult = delta_gtod << MULTI_SHIFT;
+ do_div(mult, delta_raw);
+ scd->multi = mult;
+ if (scd->multi > MAX_MULTI)
+ scd->multi = MAX_MULTI;
+ else if (scd->multi < MIN_MULTI)
+ scd->multi = MIN_MULTI;
+ } else
+ scd->multi = 1 << MULTI_SHIFT;
+
scd->tick_raw = now;
scd->tick_gtod = now_gtod;
+ scd->tick_jiffies = now_jiffies;
__raw_spin_unlock(&scd->lock);
}
@@ -279,6 +312,7 @@ void sched_clock_idle_wakeup_event(u64 d
__raw_spin_lock(&scd->lock);
scd->prev_raw = now;
scd->clock += delta_ns;
+ scd->multi = 1 << MULTI_SHIFT;
__raw_spin_unlock(&scd->lock);
touch_softlockup_watchdog();
--
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3] [PATCH] more sched_clock updates
2008-07-09 4:15 [PATCH 0/3] [PATCH] more sched_clock updates Steven Rostedt
` (2 preceding siblings ...)
2008-07-09 4:15 ` [PATCH 3/3] sched_clock: and multiplier for TSC to gtod drift Steven Rostedt
@ 2008-07-11 13:56 ` Ingo Molnar
3 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2008-07-11 13:56 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra, Thomas Gleixner, Andrew Morton, john stultz,
linux-kernel, Dmitry Adamushko
* Steven Rostedt <rostedt@goodmis.org> wrote:
> ftrace is still seeing jumps in the sched_clock.c code that is not
> seen in using sched_clock() function itself. The following patches fix
> some of the issues that I've seen. There is still a slight drift but
> all my attempts to remove it completely have failed. Patch 3 does the
> best sofar at keeping the jumps to a minimum.
cool stuff! I've created a new -tip topic branch for your sched-clock
fixes and enhancements, under tip/sched/clock, and have picked up your
patches:
Steven Rostedt (7):
sched_clock: record from last tick
sched_clock: widen the max and min time
sched_clock: stop maximum check on NO HZ
sched_clock: fix calculation of other CPU
sched_clock: only update deltas with local reads.
sched_clock: record TSC after gtod
sched_clock: and multiplier for TSC to gtod drift
thanks,
Ingo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-07-11 13:59 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-09 4:15 [PATCH 0/3] [PATCH] more sched_clock updates Steven Rostedt
2008-07-09 4:15 ` [PATCH 1/3] sched_clock: only update deltas with local reads Steven Rostedt
2008-07-09 4:15 ` [PATCH 2/3] sched_clock: record TSC after gtod Steven Rostedt
2008-07-09 4:15 ` [PATCH 3/3] sched_clock: and multiplier for TSC to gtod drift Steven Rostedt
2008-07-11 13:56 ` [PATCH 0/3] [PATCH] more sched_clock updates Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox