* [PATCH 0/2] Make the TSC safe to be used by gettimeofday().
@ 2006-11-14 1:06 Suleiman Souhlal
2006-11-14 1:08 ` [PATCH 1/2] " Suleiman Souhlal
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Suleiman Souhlal @ 2006-11-14 1:06 UTC (permalink / raw)
To: Andi Kleen; +Cc: Linux Kernel ML
I've had a proof-of-concept for this since August, and finally got around to
somewhat cleaning it up.
It can certainly still be improved, namely by using vgetcpu() instead of CPUID
to find the cpu number (but I couldn't get it to work, when I tried).
Another possible improvement would be to use RDTSCP when available.
There's also a small race in do_gettimeofday(), vgettimeofday() and
vmonotonic_clock() but I've never seen it happen.
Suggestions are welcome.
-- Suleiman
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 1:06 [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Suleiman Souhlal @ 2006-11-14 1:08 ` Suleiman Souhlal 2006-11-14 2:05 ` Andi Kleen 2006-11-14 1:09 ` [PATCH 2/2] Introduce a vmonotonic_clock() vsyscall Suleiman Souhlal 2006-11-14 1:50 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Andi Kleen 2 siblings, 1 reply; 18+ messages in thread From: Suleiman Souhlal @ 2006-11-14 1:08 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel ML This is done by a per-cpu vxtime structure that stores the last TSC and HPET values. Whenever we switch to a userland process after a HLT instruction has been executed or after the CPU frequency has changed, we force a new read of the TSC, HPET and xtime so that we know the correct frequency we have to deal with. With this, we can safely use RDTSC in gettimeofday() in CPUs where the TSCs are not synchronized, such as Opterons, instead of doing a very expensive HPET read. Signed-off-by: Suleiman Souhlal <suleiman@google.com> --- arch/x86_64/kernel/process.c | 23 ++++++- arch/x86_64/kernel/time.c | 38 ++++++++++-- arch/x86_64/kernel/vmlinux.lds.S | 6 +- arch/x86_64/kernel/vsyscall.c | 121 +++++++++++++++++++++++++++++++------- include/asm-x86_64/timex.h | 2 + include/asm-x86_64/vsyscall.h | 13 ++++ include/linux/hrtimer.h | 2 + 7 files changed, 170 insertions(+), 35 deletions(-) diff --git a/arch/x86_64/kernel/process.c b/arch/x86_64/kernel/process.c index 49f7fac..b80bc00 100644 --- a/arch/x86_64/kernel/process.c +++ b/arch/x86_64/kernel/process.c @@ -51,6 +51,7 @@ #include <asm/desc.h> #include <asm/proto.h> #include <asm/ia32.h> #include <asm/idle.h> +#include <asm/vsyscall.h> asmlinkage extern void ret_from_fork(void); @@ -109,15 +110,21 @@ void exit_idle(void) */ static void default_idle(void) { + int cpu; + + cpu = hard_smp_processor_id(); + local_irq_enable(); current_thread_info()->status &= ~TS_POLLING; smp_mb__after_clear_bit(); while (!need_resched()) { local_irq_disable(); - if (!need_resched()) + if (!need_resched()) { + if (vxtime.pcpu[cpu].need_update == 0) + vxtime.pcpu[cpu].need_update = 1; safe_halt(); - else + } else local_irq_enable(); } current_thread_info()->status |= TS_POLLING; @@ -560,7 +567,7 @@ __switch_to(struct task_struct *prev_p, { struct thread_struct *prev = &prev_p->thread, *next = &next_p->thread; - int cpu = smp_processor_id(); + int apicid, cpu = smp_processor_id(); struct tss_struct *tss = &per_cpu(init_tss, cpu); /* we're going to use this soon, after a few expensive things */ @@ -657,6 +664,16 @@ #endif */ if (next_p->fpu_counter>5) math_state_restore(); + + apicid = hard_smp_processor_id(); + + /* + * We will need to also do this when switching to kernel tasks if we + * want to use the per-cpu monotonic_clock in the kernel + */ + if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL) + vxtime_update_pcpu(); + return prev_p; } diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c index 88722f1..55fefa2 100644 --- a/arch/x86_64/kernel/time.c +++ b/arch/x86_64/kernel/time.c @@ -118,14 +118,17 @@ unsigned int (*do_gettimeoffset)(void) = void do_gettimeofday(struct timeval *tv) { - unsigned long seq; + unsigned long seq, t, x; unsigned int sec, usec; + int cpu; do { seq = read_seqbegin(&xtime_lock); + preempt_disable(); + cpu = hard_smp_processor_id(); - sec = xtime.tv_sec; - usec = xtime.tv_nsec / NSEC_PER_USEC; + sec = vxtime.pcpu[cpu].tv_sec; + usec = vxtime.pcpu[cpu].tv_usec; /* i386 does some correction here to keep the clock monotonous even when ntpd is fixing drift. @@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv) be found. Note when you fix it here you need to do the same in arch/x86_64/kernel/vsyscall.c and export all needed variables in vmlinux.lds. -AK */ - usec += do_gettimeoffset(); + t = get_cycles_sync(); + x = (((t - vxtime.pcpu[cpu].last_tsc) * + vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC; + usec += x; } while (read_seqretry(&xtime_lock, seq)); + preempt_enable(); tv->tv_sec = sec + usec / USEC_PER_SEC; tv->tv_usec = usec % USEC_PER_SEC; @@ -624,10 +631,20 @@ #endif cpufreq_scale(loops_per_jiffy_ref, ref_freq, freq->new); cpu_khz = cpufreq_scale(cpu_khz_ref, ref_freq, freq->new); - if (!(freq->flags & CPUFREQ_CONST_LOOPS)) + if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { + int cpu; + + cpu = cpu_physical_id(freq->cpu); vxtime.tsc_quot = (USEC_PER_MSEC << US_SCALE) / cpu_khz; + + vxtime.pcpu[cpu].tsc_nsquot = (NSEC_PER_SEC << NS_SCALE) + / (cpufreq_scale(cpu_khz_ref, ref_freq, freq->new) * + NSEC_PER_USEC); + if (vxtime.pcpu[cpu].need_update == 0) + vxtime.pcpu[cpu].need_update = 1; + } } - + set_cyc2ns_scale(cpu_khz_ref); return 0; @@ -887,6 +904,9 @@ time_cpu_notifier(struct notifier_block void __init time_init(void) { + char *timename; + int i; + if (nohpet) vxtime.hpet_address = 0; @@ -931,13 +951,17 @@ #endif #ifndef CONFIG_SMP time_init_gtod(); #endif + + for (i = 0; i < NR_CPUS; i++) + vxtime.pcpu[i].tsc_nsquot = (NSEC_PER_SEC << NS_SCALE) + / (cpu_khz * NSEC_PER_USEC); } /* * Make an educated guess if the TSC is trustworthy and synchronized * over all CPUs. */ -__cpuinit int unsynchronized_tsc(void) +int unsynchronized_tsc(void) { #ifdef CONFIG_SMP if (apic_is_clustered_box()) diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S index edb24aa..b1a39d1 100644 --- a/arch/x86_64/kernel/vmlinux.lds.S +++ b/arch/x86_64/kernel/vmlinux.lds.S @@ -96,9 +96,6 @@ #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET .xtime_lock : AT(VLOAD(.xtime_lock)) { *(.xtime_lock) } xtime_lock = VVIRT(.xtime_lock); - .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) } - vxtime = VVIRT(.vxtime); - .vgetcpu_mode : AT(VLOAD(.vgetcpu_mode)) { *(.vgetcpu_mode) } vgetcpu_mode = VVIRT(.vgetcpu_mode); @@ -119,6 +116,9 @@ #define VVIRT(x) (ADDR(x) - VVIRT_OFFSET .vsyscall_2 ADDR(.vsyscall_0) + 2048: AT(VLOAD(.vsyscall_2)) { *(.vsyscall_2) } .vsyscall_3 ADDR(.vsyscall_0) + 3072: AT(VLOAD(.vsyscall_3)) { *(.vsyscall_3) } + .vxtime : AT(VLOAD(.vxtime)) { *(.vxtime) } + vxtime = VVIRT(.vxtime); + . = VSYSCALL_VIRT_ADDR + 4096; #undef VSYSCALL_ADDR diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c index a98b460..9025699 100644 --- a/arch/x86_64/kernel/vsyscall.c +++ b/arch/x86_64/kernel/vsyscall.c @@ -27,6 +27,8 @@ #include <linux/seqlock.h> #include <linux/jiffies.h> #include <linux/sysctl.h> #include <linux/getcpu.h> +#include <linux/smp.h> +#include <linux/kthread.h> #include <asm/vsyscall.h> #include <asm/pgtable.h> @@ -37,6 +39,8 @@ #include <asm/io.h> #include <asm/segment.h> #include <asm/desc.h> #include <asm/topology.h> +#include <asm/smp.h> +#include <asm/proto.h> #define __vsyscall(nr) __attribute__ ((unused,__section__(".vsyscall_" #nr))) @@ -46,6 +50,12 @@ int __vgetcpu_mode __section_vgetcpu_mod #include <asm/unistd.h> +#define NS_SCALE 10 +#define NSEC_PER_TICK (NSEC_PER_SEC / HZ) +extern unsigned long hpet_tick; + +extern unsigned long vxtime_hz; + static __always_inline void timeval_normalize(struct timeval * tv) { time_t __sec; @@ -57,35 +67,107 @@ static __always_inline void timeval_norm } } +inline int apicid(void) +{ + int cpu; + + __asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx"); + return (cpu >> 24); +} + static __always_inline void do_vgettimeofday(struct timeval * tv) { long sequence, t; unsigned long sec, usec; + int cpu; do { sequence = read_seqbegin(&__xtime_lock); - - sec = __xtime.tv_sec; - usec = __xtime.tv_nsec / 1000; - - if (__vxtime.mode != VXTIME_HPET) { - t = get_cycles_sync(); - if (t < __vxtime.last_tsc) - t = __vxtime.last_tsc; - usec += ((t - __vxtime.last_tsc) * - __vxtime.tsc_quot) >> 32; - /* See comment in x86_64 do_gettimeofday. */ - } else { - usec += ((readl((void __iomem *) - fix_to_virt(VSYSCALL_HPET) + 0xf0) - - __vxtime.last) * __vxtime.quot) >> 32; - } - } while (read_seqretry(&__xtime_lock, sequence)); + cpu = apicid(); + + sec = __vxtime.pcpu[cpu].tv_sec; + usec = __vxtime.pcpu[cpu].tv_usec; + rdtscll(t); + + usec += (((t - __vxtime.pcpu[cpu].last_tsc) * + __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC; + } while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu); tv->tv_sec = sec + usec / 1000000; tv->tv_usec = usec % 1000000; } +void vxtime_update_pcpu(void) +{ + unsigned long flags, offset, seq; + int cpu; + + write_seqlock_irqsave(&vxtime.vx_seq, flags); + + cpu = hard_smp_processor_id(); + + do { + seq = read_seqbegin(&xtime_lock); + vxtime.pcpu[cpu].tv_sec = xtime.tv_sec; + vxtime.pcpu[cpu].tv_usec = xtime.tv_nsec / 1000; + offset = hpet_readl(HPET_COUNTER) - vxtime.last; + } while (read_seqretry(&xtime_lock, seq)); + + vxtime.pcpu[cpu].tv_usec += (offset * vxtime.quot) >> 32; + vxtime.pcpu[cpu].last_tsc = get_cycles_sync(); + + if (vxtime.pcpu[cpu].need_update == 1) + vxtime.pcpu[cpu].need_update = 0; + + write_sequnlock_irqrestore(&vxtime.vx_seq, flags); +} + +static void _vxtime_update_pcpu(void *arg) +{ + vxtime_update_pcpu(); +} + +static int vxtime_periodic(void *arg) +{ + while (1) { + set_current_state(TASK_INTERRUPTIBLE); + schedule_timeout(msecs_to_jiffies(1000)); + + smp_call_function(_vxtime_update_pcpu, NULL, 1, 1); + _vxtime_update_pcpu(NULL); + } + + return (0); /* NOTREACHED */ +} + +void clock_was_set(void) +{ + smp_call_function(_vxtime_update_pcpu, NULL, 1, 1); + _vxtime_update_pcpu(NULL); +} + + +static __init int vxtime_init_pcpu(void) +{ + int i; + + seqlock_init(&vxtime.vx_seq); + + /* + * Don't bother updating the per-cpu data after each HLT + * if we don't need to. + */ + if (!unsynchronized_tsc()) + for (i = 0; i < NR_CPUS; i++) + vxtime.pcpu[i].need_update = -1; + + kthread_create(vxtime_periodic, NULL, "vxtime_periodic"); + + return (0); +} + +core_initcall(vxtime_init_pcpu); + /* RED-PEN may want to readd seq locking, but then the variable should be write-once. */ static __always_inline void do_get_tz(struct timezone * tz) { @@ -174,11 +256,6 @@ vgetcpu(unsigned *cpu, unsigned *node, s return 0; } -long __vsyscall(3) venosys_1(void) -{ - return -ENOSYS; -} - #ifdef CONFIG_SYSCTL #define SYSCALL 0x050f diff --git a/include/asm-x86_64/timex.h b/include/asm-x86_64/timex.h index b9e5320..91bad25 100644 --- a/include/asm-x86_64/timex.h +++ b/include/asm-x86_64/timex.h @@ -46,4 +46,6 @@ #define ARCH_HAS_READ_CURRENT_TIMER 1 extern struct vxtime_data vxtime; +void clock_was_set(void); + #endif diff --git a/include/asm-x86_64/vsyscall.h b/include/asm-x86_64/vsyscall.h index fd452fc..707353b 100644 --- a/include/asm-x86_64/vsyscall.h +++ b/include/asm-x86_64/vsyscall.h @@ -30,6 +30,14 @@ #define VXTIME_PMTMR 3 #define VGETCPU_RDTSCP 1 #define VGETCPU_LSL 2 +struct vxtime_pcpu { + time_t tv_sec; + long tv_usec; + unsigned long tsc_nsquot; + unsigned long last_tsc; + int need_update; +}; + struct vxtime_data { long hpet_address; /* HPET base address */ int last; @@ -37,6 +45,8 @@ struct vxtime_data { long quot; long tsc_quot; int mode; + seqlock_t vx_seq; + struct vxtime_pcpu pcpu[NR_CPUS] ____cacheline_aligned; }; #define hpet_readl(a) readl((const void __iomem *)fix_to_virt(FIX_HPET_BASE) + a) @@ -61,7 +71,10 @@ extern int sysctl_vsyscall; extern void vsyscall_set_cpu(int cpu); +void vxtime_update_pcpu(void); + #define ARCH_HAVE_XTIME_LOCK 1 +#define ARCH_HAVE_CLOCK_WAS_SET 1 #endif /* __KERNEL__ */ diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index fca9302..7f59619 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -95,12 +95,14 @@ struct hrtimer_base { struct lock_class_key lock_key; }; +#ifndef ARCH_HAVE_CLOCK_WAS_SET /* * clock_was_set() is a NOP for non- high-resolution systems. The * time-sorted order guarantees that a timer does not expire early and * is expired in the next softirq when the clock was advanced. */ #define clock_was_set() do { } while (0) +#endif /* Exported timer functions: */ ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 1:08 ` [PATCH 1/2] " Suleiman Souhlal @ 2006-11-14 2:05 ` Andi Kleen 2006-11-14 2:25 ` Suleiman Souhlal 0 siblings, 1 reply; 18+ messages in thread From: Andi Kleen @ 2006-11-14 2:05 UTC (permalink / raw) To: Suleiman Souhlal; +Cc: Linux Kernel ML, vojtech, Jiri Bohac On Tuesday 14 November 2006 02:08, Suleiman Souhlal wrote: > This is done by a per-cpu vxtime structure that stores the last TSC and HPET > values. > > Whenever we switch to a userland process after a HLT instruction has been > executed or after the CPU frequency has changed, we force a new read of the > TSC, HPET and xtime so that we know the correct frequency we have to deal > with. Hmm, interesting approach. > > With this, we can safely use RDTSC in gettimeofday() in CPUs where the > TSCs are not synchronized, such as Opterons, instead of doing a very expensive > HPET read. > + cpu = hard_smp_processor_id(); Why not smp_processor_id ? > + > + apicid = hard_smp_processor_id(); The apicid <-> linux cpuid mapping should be 1:1 so i don't know why you do this separately. All the uses of hard_smp_processor_id are bogus. > + > + /* > + * We will need to also do this when switching to kernel tasks if we > + * want to use the per-cpu monotonic_clock in the kernel > + */ > + if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL) > + vxtime_update_pcpu(); Why? It should be really only needed on HLT/cpufreq change, or? Anyways, I'm not very fond of adding code to the context switch critical path. > seq = read_seqbegin(&xtime_lock); > + preempt_disable(); > + cpu = hard_smp_processor_id(); Again, shouldn't use hard > > - sec = xtime.tv_sec; > - usec = xtime.tv_nsec / NSEC_PER_USEC; > + sec = vxtime.pcpu[cpu].tv_sec; > + usec = vxtime.pcpu[cpu].tv_usec; > > /* i386 does some correction here to keep the clock > monotonous even when ntpd is fixing drift. > @@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv) > be found. Note when you fix it here you need to do the same > in arch/x86_64/kernel/vsyscall.c and export all needed > variables in vmlinux.lds. -AK */ > - usec += do_gettimeoffset(); > + t = get_cycles_sync(); > + x = (((t - vxtime.pcpu[cpu].last_tsc) * > + vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC; > + usec += x; > > } while (read_seqretry(&xtime_lock, seq)); > + preempt_enable(); If it can be implemented without races in user space, why does it need preempt disable in kernel space? The faster way to access all the vxtime code would be to stick a pointer to the per CPU vxtime data into the PDA > +#define NSEC_PER_TICK (NSEC_PER_SEC / HZ) > +extern unsigned long hpet_tick; > + > +extern unsigned long vxtime_hz; No externs in .c files. Happened earlier too I think > + > static __always_inline void timeval_normalize(struct timeval * tv) > { > time_t __sec; > @@ -57,35 +67,107 @@ static __always_inline void timeval_norm > } > } > > +inline int apicid(void) > +{ > + int cpu; > + > + __asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx"); > + return (cpu >> 24); The faster way to do this is to use LSL from a magic GDT entry. > +} > + > static __always_inline void do_vgettimeofday(struct timeval * tv) > { > long sequence, t; > unsigned long sec, usec; > + int cpu; > > do { > sequence = read_seqbegin(&__xtime_lock); > - > - sec = __xtime.tv_sec; > - usec = __xtime.tv_nsec / 1000; > - > - if (__vxtime.mode != VXTIME_HPET) { > - t = get_cycles_sync(); > - if (t < __vxtime.last_tsc) > - t = __vxtime.last_tsc; > - usec += ((t - __vxtime.last_tsc) * > - __vxtime.tsc_quot) >> 32; > - /* See comment in x86_64 do_gettimeofday. */ > - } else { > - usec += ((readl((void __iomem *) > - fix_to_virt(VSYSCALL_HPET) + 0xf0) - > - __vxtime.last) * __vxtime.quot) >> 32; > - } > - } while (read_seqretry(&__xtime_lock, sequence)); > + cpu = apicid(); > + > + sec = __vxtime.pcpu[cpu].tv_sec; > + usec = __vxtime.pcpu[cpu].tv_usec; > + rdtscll(t); > + > + usec += (((t - __vxtime.pcpu[cpu].last_tsc) * > + __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC; > + } while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu); Hmm, isn't there still a (unlikely, but possible) race: CPU #0 CPU #1 read cpu number switch to CPU #1 read TSC switch back to CPU #0 check cpu number check succeeds, but you got wrong TSC data How do you prevent that? I suppose you could force the seqlock to retry in this case in the context switch, but I don't think you do that? Also I'm not sure what your context switch code is good for. -Andi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 2:05 ` Andi Kleen @ 2006-11-14 2:25 ` Suleiman Souhlal 2006-11-14 2:44 ` Andi Kleen 0 siblings, 1 reply; 18+ messages in thread From: Suleiman Souhlal @ 2006-11-14 2:25 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel ML, vojtech, Jiri Bohac Andi Kleen wrote: > On Tuesday 14 November 2006 02:08, Suleiman Souhlal wrote: > >>This is done by a per-cpu vxtime structure that stores the last TSC and HPET >>values. >> >>Whenever we switch to a userland process after a HLT instruction has been >>executed or after the CPU frequency has changed, we force a new read of the >>TSC, HPET and xtime so that we know the correct frequency we have to deal >>with. > > > Hmm, interesting approach. > > >>With this, we can safely use RDTSC in gettimeofday() in CPUs where the >>TSCs are not synchronized, such as Opterons, instead of doing a very expensive >>HPET read. > > >>+ cpu = hard_smp_processor_id(); > > > Why not smp_processor_id ? Because CPUID returns the APIC ID, and I was under the impression that the cpu numbers smp_processor_id() were dynamically allocated and didn't necessarily match the APIC ID. >>+ >>+ apicid = hard_smp_processor_id(); > > > The apicid <-> linux cpuid mapping should be 1:1 so i don't know > why you do this separately. All the uses of hard_smp_processor_id > are bogus. > > >>+ >>+ /* >>+ * We will need to also do this when switching to kernel tasks if we >>+ * want to use the per-cpu monotonic_clock in the kernel >>+ */ >>+ if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL) >>+ vxtime_update_pcpu(); > > > Why? It should be really only needed on HLT/cpufreq change, or? > > Anyways, I'm not very fond of adding code to the context switch critical > path. Yes, it's only needed on HLT and cpufreq change. The code here is to force a "resynch" with the HPET if we've done a HLT. It has to be done before we switch to any userland thread that might use the per-cpu vxtime. switch_to() seemed like the most natural place to put this. > > >> seq = read_seqbegin(&xtime_lock); >>+ preempt_disable(); >>+ cpu = hard_smp_processor_id(); > > > Again, shouldn't use hard > > >> >>- sec = xtime.tv_sec; >>- usec = xtime.tv_nsec / NSEC_PER_USEC; >>+ sec = vxtime.pcpu[cpu].tv_sec; >>+ usec = vxtime.pcpu[cpu].tv_usec; >> >> /* i386 does some correction here to keep the clock >> monotonous even when ntpd is fixing drift. >>@@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv) >> be found. Note when you fix it here you need to do the same >> in arch/x86_64/kernel/vsyscall.c and export all needed >> variables in vmlinux.lds. -AK */ >>- usec += do_gettimeoffset(); >>+ t = get_cycles_sync(); >>+ x = (((t - vxtime.pcpu[cpu].last_tsc) * >>+ vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC; >>+ usec += x; >> >> } while (read_seqretry(&xtime_lock, seq)); >>+ preempt_enable(); > > > If it can be implemented without races in user space, why does > it need preempt disable in kernel space? Because there is indeed a small race in user space (as you noted below) that I still haven't found how to address. > The faster way to access all the vxtime code would be to stick > a pointer to the per CPU vxtime data into the PDA > > > >>+#define NSEC_PER_TICK (NSEC_PER_SEC / HZ) >>+extern unsigned long hpet_tick; >>+ >>+extern unsigned long vxtime_hz; > > > No externs in .c files. Happened earlier too I think > > >>+ >> static __always_inline void timeval_normalize(struct timeval * tv) >> { >> time_t __sec; >>@@ -57,35 +67,107 @@ static __always_inline void timeval_norm >> } >> } >> >>+inline int apicid(void) >>+{ >>+ int cpu; >>+ >>+ __asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx"); >>+ return (cpu >> 24); > > > The faster way to do this is to use LSL from a magic GDT entry. A cow-orker suggested that we use SIDT and encode the CPU number in the limit of the IDT, which should be even faster than LSL. > >>+} >>+ >> static __always_inline void do_vgettimeofday(struct timeval * tv) >> { >> long sequence, t; >> unsigned long sec, usec; >>+ int cpu; >> >> do { >> sequence = read_seqbegin(&__xtime_lock); >>- >>- sec = __xtime.tv_sec; >>- usec = __xtime.tv_nsec / 1000; >>- >>- if (__vxtime.mode != VXTIME_HPET) { >>- t = get_cycles_sync(); >>- if (t < __vxtime.last_tsc) >>- t = __vxtime.last_tsc; >>- usec += ((t - __vxtime.last_tsc) * >>- __vxtime.tsc_quot) >> 32; >>- /* See comment in x86_64 do_gettimeofday. */ >>- } else { >>- usec += ((readl((void __iomem *) >>- fix_to_virt(VSYSCALL_HPET) + 0xf0) - >>- __vxtime.last) * __vxtime.quot) >> 32; >>- } >>- } while (read_seqretry(&__xtime_lock, sequence)); >>+ cpu = apicid(); >>+ >>+ sec = __vxtime.pcpu[cpu].tv_sec; >>+ usec = __vxtime.pcpu[cpu].tv_usec; >>+ rdtscll(t); >>+ >>+ usec += (((t - __vxtime.pcpu[cpu].last_tsc) * >>+ __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC; >>+ } while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu); > > > Hmm, isn't there still a (unlikely, but possible) race: > > CPU #0 CPU #1 > read cpu number > switch to CPU #1 > read TSC > switch back to CPU #0 > check cpu number > check succeeds, but you got wrong TSC data > > How do you prevent that? I suppose you could force the seqlock to retry > in this case in the context switch, but I don't think you do that? I couldn't figure out how to tell if a context switch has happened from userland. I tried putting a per-cpu context switch count, but I couldn't figure out how to get it atomically along with the CPU number.. > Also I'm not sure what your context switch code is good for. > > -Andi Thanks for the feedback. -- Suleiman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 2:25 ` Suleiman Souhlal @ 2006-11-14 2:44 ` Andi Kleen 2006-11-14 3:35 ` dean gaudet 2006-11-14 3:54 ` Suleiman Souhlal 0 siblings, 2 replies; 18+ messages in thread From: Andi Kleen @ 2006-11-14 2:44 UTC (permalink / raw) To: Suleiman Souhlal; +Cc: Linux Kernel ML, vojtech, Jiri Bohac > > Because CPUID returns the APIC ID, and I was under the impression that > the cpu numbers > smp_processor_id() were dynamically allocated and didn't necessarily > match the > APIC ID. Correct, but one can use a mapping table. > Yes, it's only needed on HLT and cpufreq change. > The code here is to force a "resynch" with the HPET if we've done a HLT. > It has to be done before we switch to any userland thread that might > use the per-cpu vxtime. switch_to() seemed like the most natural place > to put this. I don't think so. The natural place after HLT is in the idle loop or better in idle notifiers[1] and after cpufreq is in the appropiate cpufreq notifiers. [1] unfortunately they are still subtly broken in .19, but will be fixed in .20 > A cow-orker suggested that we use SIDT and encode the CPU number in the > limit of the IDT, which should be even faster than LSL. Possible yes. Did you time it? But then we would make the IDT variable length in memory? While the CPUs probably won't care some Hypervisors seem to be picky about these limits. LSL still seems somewhat safer. > I couldn't figure out how to tell if a context switch has happened from > userland. I tried putting a per-cpu context switch count, but I couldn't > figure out how to get it atomically along with the CPU number.. It's tricky. That is why we asked for RDTSCP. -Andi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 2:44 ` Andi Kleen @ 2006-11-14 3:35 ` dean gaudet 2006-11-14 4:22 ` dean gaudet 2006-11-14 3:54 ` Suleiman Souhlal 1 sibling, 1 reply; 18+ messages in thread From: dean gaudet @ 2006-11-14 3:35 UTC (permalink / raw) To: Andi Kleen; +Cc: Suleiman Souhlal, Linux Kernel ML, vojtech, Jiri Bohac On Tue, 14 Nov 2006, Andi Kleen wrote: > > A cow-orker suggested that we use SIDT and encode the CPU number in the > > limit of the IDT, which should be even faster than LSL. > > Possible yes. Did you time it? > > But then we would make the IDT variable length in memory? While > the CPUs probably won't care some Hypervisors seem to be picky > about these limits. LSL still seems somewhat safer. i'm one of the coworkers suleiman is referring to... below is the README from <http://arctic.org/~dean/vtime64.tar.gz>. see the tarball if you want to peruse the code. the nomenclature in this benchmark doesn't line up with the patch suleiman posted, but the concept is similar. in this code i mock-up an implementation of a "uint64_t vtime64(void)" vsyscall which return 64-bit ns since the epoch. i think this is a much more useful syscall than gettimeofday() because it doesn't require extra multiply/divide to break the data into two pieces (which most folks then recombine back into a uint64_t). the concepts are the same for a vgettimeofday. note that fundamentally the same code as vtime64() can be used to provide clock_gettime(CLOCK_MONOTONIC) (akin to gethrtime() on slowaris). it just needs a different epoch (one which causes the values to remain monotonic across adjtime()). i mock up three new methods of implementing vgetcpu() suggested by Nathan Laredo -- using sidt/sgdt/sldt, and present a comparison vs. the existing kernel lsl code. the s*dt instructions have varying degrees of complexity in their use in a vgetcpu() implementation. sldt is clearly the fastest but has conflicts with code such as wine. note that the sidt limit is essentially "infinity" if it's >= 0xfff (64-bit) or 0x7ff (32-bit) ... because there are only 256 software interrupts. the s*dt instructions are faster than lsl everywhere simply because lsl is microcoded, involves protection tests, and extra memory references. note that i don't present the data, but sidt is faster than rdtscp on rev F opteron especially if all you want is the cpuid. first -- an implementation where the userland code handles restarting the vsyscall. guide to the table: ff = family mm = model ss = stepping lm = long mode or not (note the 32-bit code was timed on 64-bit boxes in long mode... it might be different if the kernel itself was also in 32-bit mode) vendor name ffmmss lm |----------- timings all in cycles ----------| note GenuineIntel 060f05 32 sgdt 112.0 sidt 111.1 sldt 107.1 lsl 196.1 core2 GenuineIntel 060f05 64 sgdt 102.1 sidt 104.1 sldt 96.2 lsl 178.1 core2 AuthenticAMD 0f4102 32 sgdt 80.1 sidt 80.1 sldt 58.1 lsl 156.0 revF opteron AuthenticAMD 0f4102 64 sgdt 67.1 sidt 65.1 sldt 41.0 lsl 136.0 revF opteron AuthenticAMD 0f2102 32 sgdt 77.0 sidt 77.7 sldt 56.0 lsl 154.3 revE opteron AuthenticAMD 0f2102 64 sgdt 65.0 sidt 63.1 sldt 40.0 lsl 137.6 revE opteron GenuineIntel 0f0401 32 sgdt 231.7 sidt 225.9 sldt 218.0 lsl 421.5 nocona GenuineIntel 0f0401 64 sgdt 212.3 sidt 210.0 sldt 200.7 lsl 449.9 nocona GenuineIntel 0f0403 32 sgdt 232.1 sidt 244.1 sldt 221.4 lsl 420.1 p4 desktop GenuineIntel 0f0403 64 sgdt 216.1 sidt 216.8 sldt 204.1 lsl 396.1 p4 desktop GenuineIntel 0f0209 32 sgdt 240.1 sidt 232.1 sldt 224.4 lsl 384.1 xeon next an implementation which relies on the kernel restarting the computation when necessary. this would be achieved by testing to see when the task to be restarted is on the vsyscall page and backtracking the task to the vsyscall entry point. this is challenging when the vsyscall is implemented in C -- because of potential stack usage. there are ways to get this to work though, even without resorting to assembly. i'm presenting this only as a best case scenario should such an effort be undertaken. (i have a crazy idea involving the direction flag which i need to mock up.) vendor name ffmmss lm |----------- timings all in cycles ----------| note GenuineIntel 060f05 32 sgdt 90.0 sidt 91.0 sldt 86.0 lsl 128.0 core2 GenuineIntel 060f05 64 sgdt 76.0 sidt 78.1 sldt 76.5 lsl 113.0 core2 AuthenticAMD 0f4102 32 sgdt 43.0 sidt 43.1 sldt 28.0 lsl 82.0 revF opteron AuthenticAMD 0f4102 64 sgdt 29.0 sidt 28.6 sldt 16.0 lsl 72.0 revF opteron AuthenticAMD 0f2102 32 sgdt 44.0 sidt 42.8 sldt 28.0 lsl 82.6 revE opteron AuthenticAMD 0f2102 64 sgdt 27.0 sidt 25.6 sldt 14.5 lsl 72.0 revE opteron GenuineIntel 0f0401 32 sgdt 111.9 sidt 120.1 sldt 108.6 lsl 225.0 nocona GenuineIntel 0f0401 64 sgdt 100.9 sidt 100.0 sldt 100.0 lsl 158.0 nocona GenuineIntel 0f0403 32 sgdt 129.5 sidt 116.1 sldt 112.0 lsl 228.0 p4 desktop GenuineIntel 0f0403 64 sgdt 104.9 sidt 102.2 sldt 100.0 lsl 138.0 p4 desktop GenuineIntel 0f0209 32 sgdt 136.0 sidt 136.1 sldt 132.5 lsl 200.0 xeon -dean p.s. i work at google, and google paid for this experiment. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 3:35 ` dean gaudet @ 2006-11-14 4:22 ` dean gaudet 0 siblings, 0 replies; 18+ messages in thread From: dean gaudet @ 2006-11-14 4:22 UTC (permalink / raw) To: Andi Kleen; +Cc: Suleiman Souhlal, Linux Kernel ML, vojtech, Jiri Bohac On Mon, 13 Nov 2006, dean gaudet wrote: > next an implementation which relies on the kernel restarting the computation when > necessary. this would be achieved by testing to see when the task to be restarted > is on the vsyscall page and backtracking the task to the vsyscall entry point. > > this is challenging when the vsyscall is implemented in C -- because of potential > stack usage. there are ways to get this to work though, even without resorting to > assembly. i'm presenting this only as a best case scenario should such an effort > be undertaken. (i have a crazy idea involving the direction flag which i need to > mock up.) nevermind the crazy idea using DF... i was hoping to use DF as a generic "restart a vsyscall" indicator -- switch_to() would note the task is on the vsyscall page and unilaterally clear DF before restoring eflags. then a vsyscall critical section could be surrounded like so: unsigned long tmp; do { asm volatile("std"); critical section asm volatile( "\n pushf" "\n pop %0" "\n cld" : "=r" (tmp)); } while ((tmp & 0x400) == 0); it works great on k8 ... but DF manipulation hurts way too much on core2 and p4. i even tried reading DF using a string instruction: long tmp; do { asm volatile("std"); critical section asm volatile( "\n mov %%rsp,%%rsi" "\n lodsl" "\n sub %%rsp,%%rsi" "\n cld" : "=S" (tmp)); } while (tmp > 0); it's no better. i've also tried similar tricks setting the EFLAGS.ID bit... but the popf hurts in that case. i think a general vsyscall restart mechanism would be useful (for more than just the time functions), but still haven't found one which is cheap enough. -dean ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 2:44 ` Andi Kleen 2006-11-14 3:35 ` dean gaudet @ 2006-11-14 3:54 ` Suleiman Souhlal 2006-11-14 11:12 ` Andi Kleen 1 sibling, 1 reply; 18+ messages in thread From: Suleiman Souhlal @ 2006-11-14 3:54 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel ML, vojtech, Jiri Bohac Andi Kleen wrote: >>Because CPUID returns the APIC ID, and I was under the impression that >>the cpu numbers >>smp_processor_id() were dynamically allocated and didn't necessarily >>match the >>APIC ID. > > > Correct, but one can use a mapping table. > >>Yes, it's only needed on HLT and cpufreq change. >>The code here is to force a "resynch" with the HPET if we've done a HLT. >> It has to be done before we switch to any userland thread that might >>use the per-cpu vxtime. switch_to() seemed like the most natural place >>to put this. > > > I don't think so. The natural place after HLT is in the idle loop or > better in idle notifiers[1] and after cpufreq is in the appropiate cpufreq > notifiers. > > > [1] unfortunately they are still subtly broken in .19, but will be fixed > in .20 I initially had it after the HLT in the idle loop, but realized there would be a small race: We could get switched away from the idle thread after the HLT but before doing the resynch. It seems like idle notifiers would do the trick, except that they don't appear to include the id of the CPU that went/exited from idle. Do you want me to also send a patch that addresses this? >>A cow-orker suggested that we use SIDT and encode the CPU number in the >>limit of the IDT, which should be even faster than LSL. > > > Possible yes. Did you time it? > > But then we would make the IDT variable length in memory? While > the CPUs probably won't care some Hypervisors seem to be picky > about these limits. LSL still seems somewhat safer. Does the LSL from the magic GDT entry return the APIC ID or the smp_processor_id? If it's the latter, I can avoid using the smp_processor_id->apic id table completely. > >>I couldn't figure out how to tell if a context switch has happened from >>userland. I tried putting a per-cpu context switch count, but I couldn't >>figure out how to get it atomically along with the CPU number.. > > > It's tricky. That is why we asked for RDTSCP. Another way I could fix this would be to just touch the seqlock, in switch_to(), if we happen to be in vgettimeofday. Would this be acceptable? -- Suleiman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 3:54 ` Suleiman Souhlal @ 2006-11-14 11:12 ` Andi Kleen 0 siblings, 0 replies; 18+ messages in thread From: Andi Kleen @ 2006-11-14 11:12 UTC (permalink / raw) To: Suleiman Souhlal; +Cc: Linux Kernel ML, vojtech, Jiri Bohac > I initially had it after the HLT in the idle loop, but realized there > would be a small race: We could get switched away from the idle thread > after the HLT but before doing the resynch. idle threads are pinned to CPUs of course. > It seems like idle notifiers would do the trick, except that they don't > appear to include the id of the CPU that went/exited from idle. Just use smp_processor_id() > Does the LSL from the magic GDT entry return the APIC ID or the > smp_processor_id? smp_processor_id > Another way I could fix this would be to just touch the seqlock, in > switch_to(), if we happen to be in vgettimeofday. Would this be acceptable? Hmm, i'm not sure that would cover all cases with multiple cpu switches. You would need to do it for all CPUs, which might get tricky. I would also prefer to not do anything complicated in switch_to -Andi ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] Introduce a vmonotonic_clock() vsyscall. 2006-11-14 1:06 [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Suleiman Souhlal 2006-11-14 1:08 ` [PATCH 1/2] " Suleiman Souhlal @ 2006-11-14 1:09 ` Suleiman Souhlal 2006-11-14 1:50 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Andi Kleen 2 siblings, 0 replies; 18+ messages in thread From: Suleiman Souhlal @ 2006-11-14 1:09 UTC (permalink / raw) Cc: Andi Kleen, Linux Kernel ML This is the vsyscall equivalent of monotonic_clock(), using the per-cpu vxtime. It returns the number of nanoseconds since boot. It's intended to be used by processes that need a cheap and nanosecond resolution time counter. Signed-off-by: Suleiman Souhlal <suleiman@google.com> --- arch/x86_64/kernel/vsyscall.c | 21 +++++++++++++++++++++ include/asm-x86_64/vsyscall.h | 1 + 2 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c index 9025699..f124cc6 100644 --- a/arch/x86_64/kernel/vsyscall.c +++ b/arch/x86_64/kernel/vsyscall.c @@ -52,6 +52,7 @@ #include <asm/unistd.h> #define NS_SCALE 10 #define NSEC_PER_TICK (NSEC_PER_SEC / HZ) +extern unsigned long long monotonic_base; extern unsigned long hpet_tick; extern unsigned long vxtime_hz; @@ -108,11 +109,13 @@ void vxtime_update_pcpu(void) do { seq = read_seqbegin(&xtime_lock); + vxtime.pcpu[cpu].monotonic_base = monotonic_base; vxtime.pcpu[cpu].tv_sec = xtime.tv_sec; vxtime.pcpu[cpu].tv_usec = xtime.tv_nsec / 1000; offset = hpet_readl(HPET_COUNTER) - vxtime.last; } while (read_seqretry(&xtime_lock, seq)); + vxtime.pcpu[cpu].monotonic_base += offset * NSEC_PER_TICK / hpet_tick; vxtime.pcpu[cpu].tv_usec += (offset * vxtime.quot) >> 32; vxtime.pcpu[cpu].last_tsc = get_cycles_sync(); @@ -256,6 +259,24 @@ vgetcpu(unsigned *cpu, unsigned *node, s return 0; } +unsigned long __vsyscall(3) vmonotonic_clock(void) +{ + unsigned long nsec, seq, t; + int cpu; + + do { + seq = read_seqbegin(&__vxtime.vx_seq); + cpu = apicid(); + nsec = __vxtime.pcpu[cpu].monotonic_base; + + rdtscll(t); + nsec += ((t - __vxtime.pcpu[cpu].last_tsc) + * __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE; + } while (read_seqretry(&__vxtime.vx_seq, seq) || cpu != apicid()); + + return (nsec); +} + #ifdef CONFIG_SYSCTL #define SYSCALL 0x050f diff --git a/include/asm-x86_64/vsyscall.h b/include/asm-x86_64/vsyscall.h index 707353b..5a10fd0 100644 --- a/include/asm-x86_64/vsyscall.h +++ b/include/asm-x86_64/vsyscall.h @@ -31,6 +31,7 @@ #define VGETCPU_RDTSCP 1 #define VGETCPU_LSL 2 struct vxtime_pcpu { + unsigned long monotonic_base; time_t tv_sec; long tv_usec; unsigned long tsc_nsquot; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 1:06 [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Suleiman Souhlal 2006-11-14 1:08 ` [PATCH 1/2] " Suleiman Souhlal 2006-11-14 1:09 ` [PATCH 2/2] Introduce a vmonotonic_clock() vsyscall Suleiman Souhlal @ 2006-11-14 1:50 ` Andi Kleen 2006-11-14 2:06 ` Suleiman Souhlal 2006-11-14 7:42 ` Arjan van de Ven 2 siblings, 2 replies; 18+ messages in thread From: Andi Kleen @ 2006-11-14 1:50 UTC (permalink / raw) To: Suleiman Souhlal; +Cc: Linux Kernel ML, vojtech, Jiri Bohac On Tuesday 14 November 2006 02:06, Suleiman Souhlal wrote: > I've had a proof-of-concept for this since August, and finally got around to > somewhat cleaning it up. Thanks. I got a competing implementation for this unfortunately now from Vojtech & Jiri Yours is simpler, but I'm not sure as complete. What are your assurances against non monoticity for example? > > It can certainly still be improved, namely by using vgetcpu() instead of CPUID > to find the cpu number (but I couldn't get it to work, when I tried). What did not work? > Another possible improvement would be to use RDTSCP when available. > There's also a small race in do_gettimeofday(), vgettimeofday() and > vmonotonic_clock() but I've never seen it happen. I did a vposix_getclock with monotonic clock support on my own already, was about to be merged with the vDSO. It still used global synchronized TSC though. -Andi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 1:50 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Andi Kleen @ 2006-11-14 2:06 ` Suleiman Souhlal 2006-11-14 11:10 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() II Andi Kleen ` (2 more replies) 2006-11-14 7:42 ` Arjan van de Ven 1 sibling, 3 replies; 18+ messages in thread From: Suleiman Souhlal @ 2006-11-14 2:06 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel ML, vojtech, Jiri Bohac Andi Kleen wrote: > On Tuesday 14 November 2006 02:06, Suleiman Souhlal wrote: > >>I've had a proof-of-concept for this since August, and finally got around to >>somewhat cleaning it up. > > > Thanks. > > I got a competing implementation for this unfortunately now from Vojtech & Jiri > > Yours is simpler, but I'm not sure as complete. What are your assurances > against non monoticity for example? I believe that the results returned will always be monotonic, as long as the frequency of the TSC does not change from under us (that is, without the kernel knowing). This is because we "synchronize" each CPU's vxtime with a global time source (HPET) every time we know the TSC rate changes. >>It can certainly still be improved, namely by using vgetcpu() instead of CPUID >>to find the cpu number (but I couldn't get it to work, when I tried). > > > What did not work? I was not able to make vgettimeofday use vgetcpu(). It seemed like vgetcpu() was not returning the same value as smp_processor_id() would, so the values I'd get with vgettimeofday() did not completely agree with the ones from gettimeofday(2). I didn't have the chance to investigate more. >>Another possible improvement would be to use RDTSCP when available. >>There's also a small race in do_gettimeofday(), vgettimeofday() and >>vmonotonic_clock() but I've never seen it happen. > > > I did a vposix_getclock with monotonic clock support on my own already, was about to > be merged with the vDSO. It still used global synchronized TSC though. -- Suleiman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday() II 2006-11-14 2:06 ` Suleiman Souhlal @ 2006-11-14 11:10 ` Andi Kleen 2006-11-14 12:30 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Shem Multinymous 2006-11-14 21:28 ` Christoph Lameter 2 siblings, 0 replies; 18+ messages in thread From: Andi Kleen @ 2006-11-14 11:10 UTC (permalink / raw) To: Suleiman Souhlal; +Cc: Linux Kernel ML, vojtech, Jiri Bohac > I was not able to make vgettimeofday use vgetcpu(). It seemed like vgetcpu() > was not returning the same value as smp_processor_id() would, so the > values I'd get with vgettimeofday() did not completely agree with the > ones from gettimeofday(2). I didn't have the chance to investigate more. I investigated now. You got cpu hotplug disabled, right? The code currently only works with cpu hotplug enabled. Will fix. -Andi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 2:06 ` Suleiman Souhlal 2006-11-14 11:10 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() II Andi Kleen @ 2006-11-14 12:30 ` Shem Multinymous 2006-11-14 17:06 ` Suleiman Souhlal 2006-11-14 21:28 ` Christoph Lameter 2 siblings, 1 reply; 18+ messages in thread From: Shem Multinymous @ 2006-11-14 12:30 UTC (permalink / raw) To: Suleiman Souhlal Cc: Andi Kleen, Linux Kernel ML, vojtech, Jiri Bohac, Nigel Cunningham, Pavel Machek On 11/14/06, Suleiman Souhlal <ssouhlal@freebsd.org> wrote: > I believe that the results returned will always be monotonic, as long as > the frequency of the TSC does not change from under us (that is, without > the kernel knowing). This is because we "synchronize" each CPU's vxtime > with a global time source (HPET) every time we know the TSC rate changes. Does this hold after a suspend/resume cycle? You could be resuming with a different CPU clock than what you suspended with, and I'm not sure anything guarantees an early enough re-sync. Shem ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 12:30 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Shem Multinymous @ 2006-11-14 17:06 ` Suleiman Souhlal 2006-11-14 18:30 ` Andi Kleen 0 siblings, 1 reply; 18+ messages in thread From: Suleiman Souhlal @ 2006-11-14 17:06 UTC (permalink / raw) To: Shem Multinymous Cc: Andi Kleen, Linux Kernel ML, vojtech, Jiri Bohac, Nigel Cunningham, Pavel Machek Shem Multinymous wrote: > On 11/14/06, Suleiman Souhlal <ssouhlal@freebsd.org> wrote: > >> I believe that the results returned will always be monotonic, as long as >> the frequency of the TSC does not change from under us (that is, without >> the kernel knowing). This is because we "synchronize" each CPU's vxtime >> with a global time source (HPET) every time we know the TSC rate changes. > > > Does this hold after a suspend/resume cycle? You could be resuming > with a different CPU clock than what you suspended with, and I'm not > sure anything guarantees an early enough re-sync. I have to admit that I didn't really design this patch with suspend/resume in mind, and that I'm not too familiar with how they work. Do the HPET/ACPI timers still run while the system is suspended? If not, there shouldn't be any problem, as long as the kernel is informed of the resume early enough. -- Suleiman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 17:06 ` Suleiman Souhlal @ 2006-11-14 18:30 ` Andi Kleen 0 siblings, 0 replies; 18+ messages in thread From: Andi Kleen @ 2006-11-14 18:30 UTC (permalink / raw) To: Suleiman Souhlal Cc: Shem Multinymous, Linux Kernel ML, vojtech, Jiri Bohac, Nigel Cunningham, Pavel Machek > I have to admit that I didn't really design this patch with > suspend/resume in mind, and that I'm not too familiar with how they work. > > Do the HPET/ACPI timers still run while the system is suspended? No. In suspend to RAM only the RAM is still powered. On suspend to disk everything is off. But you can do every needed synchronization on resume. -Andi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 2:06 ` Suleiman Souhlal 2006-11-14 11:10 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() II Andi Kleen 2006-11-14 12:30 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Shem Multinymous @ 2006-11-14 21:28 ` Christoph Lameter 2 siblings, 0 replies; 18+ messages in thread From: Christoph Lameter @ 2006-11-14 21:28 UTC (permalink / raw) To: Suleiman Souhlal; +Cc: Andi Kleen, Linux Kernel ML, vojtech, Jiri Bohac On Mon, 13 Nov 2006, Suleiman Souhlal wrote: > I believe that the results returned will always be monotonic, as long as the > frequency of the TSC does not change from under us (that is, without the > kernel knowing). This is because we "synchronize" each CPU's vxtime with a > global time source (HPET) every time we know the TSC rate changes. TSC frequencies of different clocks coming with multiple processors may vary which creates a differential even if they are initially synchronized. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] Make the TSC safe to be used by gettimeofday(). 2006-11-14 1:50 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Andi Kleen 2006-11-14 2:06 ` Suleiman Souhlal @ 2006-11-14 7:42 ` Arjan van de Ven 1 sibling, 0 replies; 18+ messages in thread From: Arjan van de Ven @ 2006-11-14 7:42 UTC (permalink / raw) To: Andi Kleen; +Cc: Suleiman Souhlal, Linux Kernel ML, vojtech, Jiri Bohac On Tue, 2006-11-14 at 02:50 +0100, Andi Kleen wrote: > On Tuesday 14 November 2006 02:06, Suleiman Souhlal wrote: > > I've had a proof-of-concept for this since August, and finally got around to > > somewhat cleaning it up. > > Thanks. > > I got a competing implementation for this unfortunately now from Vojtech & Jiri where is this posted? Since last weeks discussion on tickless several people started implementing alternatives since nothing existed in public ... ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2006-11-14 21:28 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-14 1:06 [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Suleiman Souhlal 2006-11-14 1:08 ` [PATCH 1/2] " Suleiman Souhlal 2006-11-14 2:05 ` Andi Kleen 2006-11-14 2:25 ` Suleiman Souhlal 2006-11-14 2:44 ` Andi Kleen 2006-11-14 3:35 ` dean gaudet 2006-11-14 4:22 ` dean gaudet 2006-11-14 3:54 ` Suleiman Souhlal 2006-11-14 11:12 ` Andi Kleen 2006-11-14 1:09 ` [PATCH 2/2] Introduce a vmonotonic_clock() vsyscall Suleiman Souhlal 2006-11-14 1:50 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Andi Kleen 2006-11-14 2:06 ` Suleiman Souhlal 2006-11-14 11:10 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() II Andi Kleen 2006-11-14 12:30 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Shem Multinymous 2006-11-14 17:06 ` Suleiman Souhlal 2006-11-14 18:30 ` Andi Kleen 2006-11-14 21:28 ` Christoph Lameter 2006-11-14 7:42 ` Arjan van de Ven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox