* [PATCH] Close small window for vsyscall time inconsistencies
@ 2008-04-04 20:17 john stultz
2008-04-07 0:55 ` Roman Zippel
2008-04-07 11:27 ` Andi Kleen
0 siblings, 2 replies; 7+ messages in thread
From: john stultz @ 2008-04-04 20:17 UTC (permalink / raw)
To: Thomas Gleixner, Paul Mackerras, Tony Luck, Ingo Molnar; +Cc: lkml
So Thomas and Ingo pointed out to me that they were occasionally seeing
small 1ns inconsistencies from clock_gettime() (and more rarely, 1us
inconsistencies from gettimeofday() when the 1ns inconsistency occurred
on a us boundary)
Looking over the code, the only possible reason I could find would be
from an interaction with the vsyscall code.
In update_wall_time(), if we read the hardware at time A and start
accumulating time, and adjusting the clocksource frequency, slowing it
for ntp.
Right before we call update_vsyscall(), another processor makes a
vsyscall gettimeofday call, reading the hardware at time B, but using
the clocksource frequency and offsets from pre-time A.
The update_vsyscall then runs, and updates the clocksource frequency
with a slower frequency.
Another processor immediately calls vsyscall gettimeofday, reading the
hardware (lets imagine its very slow hardware) at time B (or very
shortly there after), and then uses the post-time A clocksource
frequency which has been slowed.
Since we're using basically the same hardware value B, but using
different frequencies, its possible for a very small 1ns time
inconsistency to occur.
The solution, is to block the vsyscall reads prior to the hardware read
at time A. Basically synchronizing the vsyscall gettimeofday calls with
the entire update_wall_time function() rather then just the
update_vsyscall call.
Since each update_vsyscall is implemented in an arch specific way, we've
added a update_vsyscall_lock/unlock function pair that can be called
from arch generic code.
Paul, Tony: I'd appreciate a careful review for the non-x86_64 bits, as
I'm less familiar with the subtleties there.
Build tested for x86_64, powerpc, ia64, and alpha.
thanks
-john
Signed-off-by: John Stultz <johnstul@us.ibm.com>
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 17fda52..bc76c2e 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -349,11 +349,22 @@ void update_vsyscall_tz(void)
{
}
-void update_vsyscall(struct timespec *wall, struct clocksource *c)
+/* update_vsyscall_lock/unlock:
+ * methods for timekeeping core to block vsyscalls during update
+ */
+void update_vsyscall_lock(unsigned long *flags)
{
- unsigned long flags;
+ write_seqlock_irqsave(&fsyscall_gtod_data.lock, *flags);
+}
- write_seqlock_irqsave(&fsyscall_gtod_data.lock, flags);
+void update_vsyscall_unlock(unsigned long *flags)
+{
+ write_sequnlock_irqrestore(&fsyscall_gtod_data.lock, *flags);
+}
+
+/* Assumes fsyscall_gtod_data.lock has been taken via update_vsyscall_lock() */
+void update_vsyscall(struct timespec *wall, struct clocksource *c)
+{
/* copy fsyscall clock data */
fsyscall_gtod_data.clk_mask = c->mask;
@@ -375,7 +386,5 @@ void update_vsyscall(struct timespec *wall, struct clocksource *c)
fsyscall_gtod_data.monotonic_time.tv_nsec -= NSEC_PER_SEC;
fsyscall_gtod_data.monotonic_time.tv_sec++;
}
-
- write_sequnlock_irqrestore(&fsyscall_gtod_data.lock, flags);
}
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3b26fbd..c51d2f8 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -456,8 +456,6 @@ static inline void update_gtod(u64 new_tb_stamp, u64 new_stamp_xsec,
vdso_data->tb_to_xs = new_tb_to_xs;
vdso_data->wtom_clock_sec = wall_to_monotonic.tv_sec;
vdso_data->wtom_clock_nsec = wall_to_monotonic.tv_nsec;
- smp_wmb();
- ++(vdso_data->tb_update_count);
}
#ifdef CONFIG_SMP
@@ -801,6 +799,23 @@ static cycle_t timebase_read(void)
return (cycle_t)get_tb();
}
+/* update_vsyscall_lock/unlock:
+ * methods for timekeeping core to block vsyscalls during update
+ */
+void update_vsyscall_lock(unsigned long *flags)
+{
+ /* Make userspace gettimeofday spin until we're done. */
+ ++vdso_data->tb_update_count;
+ smp_mb();
+}
+
+void update_vsyscall_unlock(unsigned long *flags)
+{
+ smp_wmb();
+ ++(vdso_data->tb_update_count);
+}
+
+/* Assumes update_vsyscall_lock() has been called */
void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
{
u64 t2x, stamp_xsec;
@@ -808,10 +823,6 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
if (clock != &clocksource_timebase)
return;
- /* Make userspace gettimeofday spin until we're done. */
- ++vdso_data->tb_update_count;
- smp_mb();
-
/* XXX this assumes clock->shift == 22 */
/* 4611686018 ~= 2^(20+64-22) / 1e9 */
t2x = (u64) clock->mult * 4611686018ULL;
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index edff4c9..8a2eb77 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -69,11 +69,22 @@ void update_vsyscall_tz(void)
write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
}
-void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
+/* update_vsyscall_lock/unlock:
+ * methods for timekeeping core to block vsyscalls during update
+ */
+void update_vsyscall_lock(unsigned long *flags)
{
- unsigned long flags;
+ write_seqlock_irqsave(&vsyscall_gtod_data.lock, *flags);
+}
- write_seqlock_irqsave(&vsyscall_gtod_data.lock, flags);
+void update_vsyscall_unlock(unsigned long *flags)
+{
+ write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, *flags);
+}
+
+/* Assumes vsyscall_gtod_data.lock has been taken via update_vsyscall_lock() */
+void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
+{
/* copy vsyscall data */
vsyscall_gtod_data.clock.vread = clock->vread;
vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
@@ -83,7 +94,6 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
vsyscall_gtod_data.wall_to_monotonic = wall_to_monotonic;
- write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
}
/* RED-PEN may want to readd seq locking, but then the variable should be
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 85778a4..78d9bc0 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -221,9 +221,19 @@ extern void clocksource_change_rating(struct clocksource *cs, int rating);
extern void clocksource_resume(void);
#ifdef CONFIG_GENERIC_TIME_VSYSCALL
+void update_vsyscall_lock(unsigned long *flags);
+void update_vsyscall_unlock(unsigned long *flags);
extern void update_vsyscall(struct timespec *ts, struct clocksource *c);
extern void update_vsyscall_tz(void);
#else
+static inline void update_vsyscall_lock(unsigned long *flags)
+{
+}
+
+static inline void update_vsyscall_unlock(unsigned long *flags)
+{
+}
+
static inline void update_vsyscall(struct timespec *ts, struct clocksource *c)
{
}
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index a3fa587..47ca292 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -129,7 +129,7 @@ EXPORT_SYMBOL(do_gettimeofday);
*/
int do_settimeofday(struct timespec *tv)
{
- unsigned long flags;
+ unsigned long flags, vflags;
time_t wtm_sec, sec = tv->tv_sec;
long wtm_nsec, nsec = tv->tv_nsec;
@@ -137,6 +137,7 @@ int do_settimeofday(struct timespec *tv)
return -EINVAL;
write_seqlock_irqsave(&xtime_lock, flags);
+ update_vsyscall_lock(&vflags);
nsec -= __get_nsec_offset();
@@ -152,6 +153,7 @@ int do_settimeofday(struct timespec *tv)
update_vsyscall(&xtime, clock);
+ update_vsyscall_unlock(&vflags);
write_sequnlock_irqrestore(&xtime_lock, flags);
/* signal hrtimers about time change */
@@ -442,12 +444,15 @@ static void clocksource_adjust(s64 offset)
*/
void update_wall_time(void)
{
+ unsigned long flags;
cycle_t offset;
/* Make sure we're fully resumed: */
if (unlikely(timekeeping_suspended))
return;
+ /* grab the vsyscall lock to block vsyscalls during update */
+ update_vsyscall_lock(&flags);
#ifdef CONFIG_GENERIC_TIME
offset = (clocksource_read(clock) - clock->cycle_last) & clock->mask;
#else
@@ -487,6 +492,7 @@ void update_wall_time(void)
/* check to see if there is a new clocksource to use */
change_clocksource();
update_vsyscall(&xtime, clock);
+ update_vsyscall_unlock(&flags);
}
/**
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Close small window for vsyscall time inconsistencies
2008-04-04 20:17 [PATCH] Close small window for vsyscall time inconsistencies john stultz
@ 2008-04-07 0:55 ` Roman Zippel
2008-04-07 6:27 ` Thomas Gleixner
2008-04-07 11:27 ` Andi Kleen
1 sibling, 1 reply; 7+ messages in thread
From: Roman Zippel @ 2008-04-07 0:55 UTC (permalink / raw)
To: john stultz; +Cc: Thomas Gleixner, Paul Mackerras, Tony Luck, Ingo Molnar, lkml
Hi,
On Friday 4. April 2008, john stultz wrote:
> So Thomas and Ingo pointed out to me that they were occasionally seeing
> small 1ns inconsistencies from clock_gettime() (and more rarely, 1us
> inconsistencies from gettimeofday() when the 1ns inconsistency occurred
> on a us boundary)
What does inconsistency mean?
> Looking over the code, the only possible reason I could find would be
> from an interaction with the vsyscall code.
>
> In update_wall_time(), if we read the hardware at time A and start
> accumulating time, and adjusting the clocksource frequency, slowing it
> for ntp.
>
> Right before we call update_vsyscall(), another processor makes a
> vsyscall gettimeofday call, reading the hardware at time B, but using
> the clocksource frequency and offsets from pre-time A.
>
> The update_vsyscall then runs, and updates the clocksource frequency
> with a slower frequency.
>
> Another processor immediately calls vsyscall gettimeofday, reading the
> hardware (lets imagine its very slow hardware) at time B (or very
> shortly there after), and then uses the post-time A clocksource
> frequency which has been slowed.
>
> Since we're using basically the same hardware value B, but using
> different frequencies, its possible for a very small 1ns time
> inconsistency to occur.
One thing to keep in mind here is that if update_wall_time() adjusts the
frequency at time A, the time is still the same after the frequency change at
this point.
This means on the same cpu the time keeps increasing, if the update on another
cpu is now delayed due to update_vsyscall() at time B, it's possible that
there is a small time jump at this time, but in the common case it should be
quite small to be even noticable, e.g. if the frequency is changed by 1us/s
and it takes 1ms for the update the jump is 1ns and IMO that is already a
lot.
I'm not saying that it's impossible that it results in a visible problem, but
I think it should be rather rare. NTP frequency should be quite rare, at most
every 16s and in standard configurations every 64s (over time even less).
Inbetween these updates NTP changes its frequency very slowly. That leaves
the clock frequency when it tries to match the NTP frequency, if you really
see that large frequency changes, it suggest that something else is quite
wrong, e.g. if the clock code has a problem to hold a halfway steady
frequency, this should be fixed first.
So instead of shooting in the dark, I'd suggest to collect some numbers first,
which support your theory. This starts with the NTP logs and then try add
some stats to the adjustment code to see by how much the clock frequency is
changed (e.g. the min/max/last mult values and the same for the number of
cycles until update_vsyscall() is called).
bye, Roman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Close small window for vsyscall time inconsistencies
2008-04-07 0:55 ` Roman Zippel
@ 2008-04-07 6:27 ` Thomas Gleixner
2008-04-07 10:02 ` Roman Zippel
2008-04-07 11:29 ` Andi Kleen
0 siblings, 2 replies; 7+ messages in thread
From: Thomas Gleixner @ 2008-04-07 6:27 UTC (permalink / raw)
To: Roman Zippel; +Cc: john stultz, Paul Mackerras, Tony Luck, Ingo Molnar, lkml
On Mon, 7 Apr 2008, Roman Zippel wrote:
> Hi,
>
> On Friday 4. April 2008, john stultz wrote:
>
> > So Thomas and Ingo pointed out to me that they were occasionally seeing
> > small 1ns inconsistencies from clock_gettime() (and more rarely, 1us
> > inconsistencies from gettimeofday() when the 1ns inconsistency occurred
> > on a us boundary)
>
> What does inconsistency mean?
Time is going backwards by 1us/1nsec.
> So instead of shooting in the dark, I'd suggest to collect some
> numbers first,
That's what we did and John's analysis of the problem is pretty
correct.
Thanks,
tglx
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Close small window for vsyscall time inconsistencies
2008-04-07 6:27 ` Thomas Gleixner
@ 2008-04-07 10:02 ` Roman Zippel
2008-04-07 11:29 ` Andi Kleen
1 sibling, 0 replies; 7+ messages in thread
From: Roman Zippel @ 2008-04-07 10:02 UTC (permalink / raw)
To: Thomas Gleixner; +Cc: john stultz, Paul Mackerras, Tony Luck, Ingo Molnar, lkml
Hi,
On Mon, 7 Apr 2008, Thomas Gleixner wrote:
> > So instead of shooting in the dark, I'd suggest to collect some
> > numbers first,
>
> That's what we did and John's analysis of the problem is pretty
> correct.
How about sharing this information?
I'm not saying that his analysis is incorrect, I just suspect it's
incomplete, so I would simply like to see a little more information.
Is that too much to ask?
bye, Roman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Close small window for vsyscall time inconsistencies
2008-04-04 20:17 [PATCH] Close small window for vsyscall time inconsistencies john stultz
2008-04-07 0:55 ` Roman Zippel
@ 2008-04-07 11:27 ` Andi Kleen
2008-04-07 16:32 ` John Stultz
1 sibling, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2008-04-07 11:27 UTC (permalink / raw)
To: john stultz; +Cc: Thomas Gleixner, Paul Mackerras, Tony Luck, Ingo Molnar, lkml
john stultz <johnstul@us.ibm.com> writes:
>
> The update_vsyscall then runs, and updates the clocksource frequency
> with a slower frequency.
In the new code TSC should be only used on fixed-pstate TSC systems
(cpufreq always marks it unstable), so that cannot really happen.
Jiri had his patchkit to fix that, but it never got merged.
-Andi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Close small window for vsyscall time inconsistencies
2008-04-07 6:27 ` Thomas Gleixner
2008-04-07 10:02 ` Roman Zippel
@ 2008-04-07 11:29 ` Andi Kleen
1 sibling, 0 replies; 7+ messages in thread
From: Andi Kleen @ 2008-04-07 11:29 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Roman Zippel, john stultz, Paul Mackerras, Tony Luck, Ingo Molnar,
lkml
Thomas Gleixner <tglx@linutronix.de> writes:
>
>> So instead of shooting in the dark, I'd suggest to collect some
>> numbers first,
>
> That's what we did and John's analysis of the problem is pretty
> correct.
The source of the problem is that RDTSC is not always 100% sync
right?
We debugged a similar problem a long time ago and in that case
it was the CPU speculating around the RDTSC. That was stopped
by adding the CPUIDs to sync the core.
I would double check that the CPUIDs are still executed as needed
on the systems showing the issue.
(the code to turn that on and off is somewhat subtle and breaks occasionally)
Also it was assumed at some point it wasn't needed on P4, but that turned
out to be wrong later. Perhaps the enable logic is still not quite
right.
Or perhaps the CPUIDs need to be moved inside or outside the
seqlocks?
-Andi
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Close small window for vsyscall time inconsistencies
2008-04-07 11:27 ` Andi Kleen
@ 2008-04-07 16:32 ` John Stultz
0 siblings, 0 replies; 7+ messages in thread
From: John Stultz @ 2008-04-07 16:32 UTC (permalink / raw)
To: Andi Kleen; +Cc: Thomas Gleixner, Paul Mackerras, Tony Luck, Ingo Molnar, lkml
On Mon, 2008-04-07 at 13:27 +0200, Andi Kleen wrote:
> john stultz <johnstul@us.ibm.com> writes:
> >
> > The update_vsyscall then runs, and updates the clocksource frequency
> > with a slower frequency.
>
> In the new code TSC should be only used on fixed-pstate TSC systems
> (cpufreq always marks it unstable), so that cannot really happen.
Sorry for being confusing here. This is not about hardware (TSC or
other) frequency changes, but the modification of the software's
representation of the frequency (as done w/ NTP adjustments).
thanks
-john
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-04-07 16:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04 20:17 [PATCH] Close small window for vsyscall time inconsistencies john stultz
2008-04-07 0:55 ` Roman Zippel
2008-04-07 6:27 ` Thomas Gleixner
2008-04-07 10:02 ` Roman Zippel
2008-04-07 11:29 ` Andi Kleen
2008-04-07 11:27 ` Andi Kleen
2008-04-07 16:32 ` John Stultz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox