From: Suleiman Souhlal <ssouhlal@FreeBSD.org>
To: Andi Kleen <ak@suse.de>
Cc: Linux Kernel ML <linux-kernel@vger.kernel.org>,
vojtech@suse.cz, Jiri Bohac <jbohac@suse.cz>
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().
Date: Mon, 13 Nov 2006 18:25:45 -0800 [thread overview]
Message-ID: <45592929.2000606@FreeBSD.org> (raw)
In-Reply-To: <200611140305.00383.ak@suse.de>
Andi Kleen wrote:
> On Tuesday 14 November 2006 02:08, Suleiman Souhlal wrote:
>
>>This is done by a per-cpu vxtime structure that stores the last TSC and HPET
>>values.
>>
>>Whenever we switch to a userland process after a HLT instruction has been
>>executed or after the CPU frequency has changed, we force a new read of the
>>TSC, HPET and xtime so that we know the correct frequency we have to deal
>>with.
>
>
> Hmm, interesting approach.
>
>
>>With this, we can safely use RDTSC in gettimeofday() in CPUs where the
>>TSCs are not synchronized, such as Opterons, instead of doing a very expensive
>>HPET read.
>
>
>>+ cpu = hard_smp_processor_id();
>
>
> Why not smp_processor_id ?
Because CPUID returns the APIC ID, and I was under the impression that
the cpu numbers
smp_processor_id() were dynamically allocated and didn't necessarily
match the
APIC ID.
>>+
>>+ apicid = hard_smp_processor_id();
>
>
> The apicid <-> linux cpuid mapping should be 1:1 so i don't know
> why you do this separately. All the uses of hard_smp_processor_id
> are bogus.
>
>
>>+
>>+ /*
>>+ * We will need to also do this when switching to kernel tasks if we
>>+ * want to use the per-cpu monotonic_clock in the kernel
>>+ */
>>+ if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL)
>>+ vxtime_update_pcpu();
>
>
> Why? It should be really only needed on HLT/cpufreq change, or?
>
> Anyways, I'm not very fond of adding code to the context switch critical
> path.
Yes, it's only needed on HLT and cpufreq change.
The code here is to force a "resynch" with the HPET if we've done a HLT.
It has to be done before we switch to any userland thread that might
use the per-cpu vxtime. switch_to() seemed like the most natural place
to put this.
>
>
>> seq = read_seqbegin(&xtime_lock);
>>+ preempt_disable();
>>+ cpu = hard_smp_processor_id();
>
>
> Again, shouldn't use hard
>
>
>>
>>- sec = xtime.tv_sec;
>>- usec = xtime.tv_nsec / NSEC_PER_USEC;
>>+ sec = vxtime.pcpu[cpu].tv_sec;
>>+ usec = vxtime.pcpu[cpu].tv_usec;
>>
>> /* i386 does some correction here to keep the clock
>> monotonous even when ntpd is fixing drift.
>>@@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv)
>> be found. Note when you fix it here you need to do the same
>> in arch/x86_64/kernel/vsyscall.c and export all needed
>> variables in vmlinux.lds. -AK */
>>- usec += do_gettimeoffset();
>>+ t = get_cycles_sync();
>>+ x = (((t - vxtime.pcpu[cpu].last_tsc) *
>>+ vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
>>+ usec += x;
>>
>> } while (read_seqretry(&xtime_lock, seq));
>>+ preempt_enable();
>
>
> If it can be implemented without races in user space, why does
> it need preempt disable in kernel space?
Because there is indeed a small race in user space (as you noted below)
that I still haven't found how to address.
> The faster way to access all the vxtime code would be to stick
> a pointer to the per CPU vxtime data into the PDA
>
>
>
>>+#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
>>+extern unsigned long hpet_tick;
>>+
>>+extern unsigned long vxtime_hz;
>
>
> No externs in .c files. Happened earlier too I think
>
>
>>+
>> static __always_inline void timeval_normalize(struct timeval * tv)
>> {
>> time_t __sec;
>>@@ -57,35 +67,107 @@ static __always_inline void timeval_norm
>> }
>> }
>>
>>+inline int apicid(void)
>>+{
>>+ int cpu;
>>+
>>+ __asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx");
>>+ return (cpu >> 24);
>
>
> The faster way to do this is to use LSL from a magic GDT entry.
A cow-orker suggested that we use SIDT and encode the CPU number in the
limit of the IDT, which should be even faster than LSL.
>
>>+}
>>+
>> static __always_inline void do_vgettimeofday(struct timeval * tv)
>> {
>> long sequence, t;
>> unsigned long sec, usec;
>>+ int cpu;
>>
>> do {
>> sequence = read_seqbegin(&__xtime_lock);
>>-
>>- sec = __xtime.tv_sec;
>>- usec = __xtime.tv_nsec / 1000;
>>-
>>- if (__vxtime.mode != VXTIME_HPET) {
>>- t = get_cycles_sync();
>>- if (t < __vxtime.last_tsc)
>>- t = __vxtime.last_tsc;
>>- usec += ((t - __vxtime.last_tsc) *
>>- __vxtime.tsc_quot) >> 32;
>>- /* See comment in x86_64 do_gettimeofday. */
>>- } else {
>>- usec += ((readl((void __iomem *)
>>- fix_to_virt(VSYSCALL_HPET) + 0xf0) -
>>- __vxtime.last) * __vxtime.quot) >> 32;
>>- }
>>- } while (read_seqretry(&__xtime_lock, sequence));
>>+ cpu = apicid();
>>+
>>+ sec = __vxtime.pcpu[cpu].tv_sec;
>>+ usec = __vxtime.pcpu[cpu].tv_usec;
>>+ rdtscll(t);
>>+
>>+ usec += (((t - __vxtime.pcpu[cpu].last_tsc) *
>>+ __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
>>+ } while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu);
>
>
> Hmm, isn't there still a (unlikely, but possible) race:
>
> CPU #0 CPU #1
> read cpu number
> switch to CPU #1
> read TSC
> switch back to CPU #0
> check cpu number
> check succeeds, but you got wrong TSC data
>
> How do you prevent that? I suppose you could force the seqlock to retry
> in this case in the context switch, but I don't think you do that?
I couldn't figure out how to tell if a context switch has happened from
userland. I tried putting a per-cpu context switch count, but I couldn't
figure out how to get it atomically along with the CPU number..
> Also I'm not sure what your context switch code is good for.
>
> -Andi
Thanks for the feedback.
-- Suleiman
next prev parent reply other threads:[~2006-11-14 2:25 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-14 1:06 [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Suleiman Souhlal
2006-11-14 1:08 ` [PATCH 1/2] " Suleiman Souhlal
2006-11-14 2:05 ` Andi Kleen
2006-11-14 2:25 ` Suleiman Souhlal [this message]
2006-11-14 2:44 ` Andi Kleen
2006-11-14 3:35 ` dean gaudet
2006-11-14 4:22 ` dean gaudet
2006-11-14 3:54 ` Suleiman Souhlal
2006-11-14 11:12 ` Andi Kleen
2006-11-14 1:09 ` [PATCH 2/2] Introduce a vmonotonic_clock() vsyscall Suleiman Souhlal
2006-11-14 1:50 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Andi Kleen
2006-11-14 2:06 ` Suleiman Souhlal
2006-11-14 11:10 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() II Andi Kleen
2006-11-14 12:30 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Shem Multinymous
2006-11-14 17:06 ` Suleiman Souhlal
2006-11-14 18:30 ` Andi Kleen
2006-11-14 21:28 ` Christoph Lameter
2006-11-14 7:42 ` Arjan van de Ven
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45592929.2000606@FreeBSD.org \
--to=ssouhlal@freebsd.org \
--cc=ak@suse.de \
--cc=jbohac@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=vojtech@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox