Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andi Kleen <ak@suse.de>
To: Suleiman Souhlal <ssouhlal@freebsd.org>
Cc: Linux Kernel ML <linux-kernel@vger.kernel.org>,
	vojtech@suse.cz, Jiri Bohac <jbohac@suse.cz>
Subject: Re: [PATCH 1/2] Make the TSC safe to be used by gettimeofday().
Date: Tue, 14 Nov 2006 03:05:00 +0100	[thread overview]
Message-ID: <200611140305.00383.ak@suse.de> (raw)
In-Reply-To: <45591706.3030106@FreeBSD.org>

On Tuesday 14 November 2006 02:08, Suleiman Souhlal wrote:
> This is done by a per-cpu vxtime structure that stores the last TSC and HPET
> values.
> 
> Whenever we switch to a userland process after a HLT instruction has been
> executed or after the CPU frequency has changed, we force a new read of the
> TSC, HPET and xtime so that we know the correct frequency we have to deal
> with.

Hmm, interesting approach. 

> 
> With this, we can safely use RDTSC in gettimeofday() in CPUs where the
> TSCs are not synchronized, such as Opterons, instead of doing a very expensive
> HPET read.

> +	cpu = hard_smp_processor_id();

Why not smp_processor_id ? 


> +
> +	apicid = hard_smp_processor_id();

The apicid <-> linux cpuid mapping should be 1:1 so i don't know
why you do this separately. All the uses of hard_smp_processor_id
are bogus.

> +
> +	/*
> +	 * We will need to also do this when switching to kernel tasks if we
> +	 * want to use the per-cpu monotonic_clock in the kernel
> +	 */
> +	if (vxtime.pcpu[apicid].need_update == 1 && next_p->mm != NULL)
> +		vxtime_update_pcpu();

Why? It should be really only needed on HLT/cpufreq change, or?

Anyways, I'm not very fond of adding code to the context switch critical
path.


>  		seq = read_seqbegin(&xtime_lock);
> +		preempt_disable();
> +		cpu = hard_smp_processor_id();

Again, shouldn't use hard

>  
> -		sec = xtime.tv_sec;
> -		usec = xtime.tv_nsec / NSEC_PER_USEC;
> +		sec = vxtime.pcpu[cpu].tv_sec;
> +		usec = vxtime.pcpu[cpu].tv_usec;
>  
>  		/* i386 does some correction here to keep the clock 
>  		   monotonous even when ntpd is fixing drift.
> @@ -135,9 +138,13 @@ void do_gettimeofday(struct timeval *tv)
>  		   be found. Note when you fix it here you need to do the same
>  		   in arch/x86_64/kernel/vsyscall.c and export all needed
>  		   variables in vmlinux.lds. -AK */ 
> -		usec += do_gettimeoffset();
> +		t = get_cycles_sync();
> +		x = (((t - vxtime.pcpu[cpu].last_tsc) *
> +		    vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
> +		usec += x;
>  
>  	} while (read_seqretry(&xtime_lock, seq));
> +	preempt_enable();

If it can be implemented without races in user space, why does
it need preempt disable in kernel space?

The faster way to access all the vxtime code would be to stick
a pointer to the per CPU vxtime data into the PDA


> +#define NSEC_PER_TICK (NSEC_PER_SEC / HZ)
> +extern unsigned long hpet_tick;
> +
> +extern unsigned long vxtime_hz;

No externs in .c files. Happened earlier too I think

> +
>  static __always_inline void timeval_normalize(struct timeval * tv)
>  {
>  	time_t __sec;
> @@ -57,35 +67,107 @@ static __always_inline void timeval_norm
>  	}
>  }
>  
> +inline int apicid(void)
> +{
> +	int cpu;
> +
> +	__asm __volatile("cpuid" : "=b" (cpu) : "a" (1) : "cx", "dx");
> +	return (cpu >> 24);

The faster way to do this is to use LSL from a magic GDT entry.

> +}
> +
>  static __always_inline void do_vgettimeofday(struct timeval * tv)
>  {
>  	long sequence, t;
>  	unsigned long sec, usec;
> +	int cpu;
>  
>  	do {
>  		sequence = read_seqbegin(&__xtime_lock);
> -		
> -		sec = __xtime.tv_sec;
> -		usec = __xtime.tv_nsec / 1000;
> -
> -		if (__vxtime.mode != VXTIME_HPET) {
> -			t = get_cycles_sync();
> -			if (t < __vxtime.last_tsc)
> -				t = __vxtime.last_tsc;
> -			usec += ((t - __vxtime.last_tsc) *
> -				 __vxtime.tsc_quot) >> 32;
> -			/* See comment in x86_64 do_gettimeofday. */
> -		} else {
> -			usec += ((readl((void __iomem *)
> -				   fix_to_virt(VSYSCALL_HPET) + 0xf0) -
> -				  __vxtime.last) * __vxtime.quot) >> 32;
> -		}
> -	} while (read_seqretry(&__xtime_lock, sequence));
> +		cpu = apicid();
> +
> +		sec = __vxtime.pcpu[cpu].tv_sec;
> +		usec = __vxtime.pcpu[cpu].tv_usec;
> +		rdtscll(t);
> +
> +		usec += (((t - __vxtime.pcpu[cpu].last_tsc) *
> +		    __vxtime.pcpu[cpu].tsc_nsquot) >> NS_SCALE) / NSEC_PER_USEC;
> +	} while (read_seqretry(&__xtime_lock, sequence) || apicid() != cpu);

Hmm, isn't there still a (unlikely, but possible) race: 

         CPU #0                          CPU #1
         read cpu number
 	 switch to CPU #1
                                         read TSC 
                                         switch back to CPU #0
         check cpu number
         check succeeds, but you got wrong TSC data

How do you prevent that? I suppose you could force the seqlock to retry
in this case in the context switch, but I don't think you do that? 

Also I'm not sure what your context switch code is good for.

-Andi

next prev parent reply	other threads:[~2006-11-14  2:05 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-14  1:06 [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Suleiman Souhlal
2006-11-14  1:08 ` [PATCH 1/2] " Suleiman Souhlal
2006-11-14  2:05   ` Andi Kleen [this message]
2006-11-14  2:25     ` Suleiman Souhlal
2006-11-14  2:44       ` Andi Kleen
2006-11-14  3:35         ` dean gaudet
2006-11-14  4:22           ` dean gaudet
2006-11-14  3:54         ` Suleiman Souhlal
2006-11-14 11:12           ` Andi Kleen
2006-11-14  1:09 ` [PATCH 2/2] Introduce a vmonotonic_clock() vsyscall Suleiman Souhlal
2006-11-14  1:50 ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Andi Kleen
2006-11-14  2:06   ` Suleiman Souhlal
2006-11-14 11:10     ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() II Andi Kleen
2006-11-14 12:30     ` [PATCH 0/2] Make the TSC safe to be used by gettimeofday() Shem Multinymous
2006-11-14 17:06       ` Suleiman Souhlal
2006-11-14 18:30         ` Andi Kleen
2006-11-14 21:28     ` Christoph Lameter
2006-11-14  7:42   ` Arjan van de Ven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200611140305.00383.ak@suse.de \
    --to=ak@suse.de \
    --cc=jbohac@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ssouhlal@freebsd.org \
    --cc=vojtech@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox