public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>,
	akpm@linux-foundation.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: [patch 17/17] x86 trace clock
Date: Fri, 28 Nov 2008 09:17:04 -0500	[thread overview]
Message-ID: <20081128141704.GE10401@Krystal> (raw)
In-Reply-To: <878wr64cmj.fsf@basil.nowhere.org>

* Andi Kleen (andi@firstfloor.org) wrote:
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> writes:
> 
> > X86 trace clock. Depends on tsc_sync to detect if timestamp counters are
> > synchronized on the machine.
> 
> For that tsc_sync needs to be fixed first? e.g. see thread about
> it firing incorrectly on vmware some time ago.
> 

I assume you are talking about this thread:
http://lkml.org/lkml/2008/10/20/467

Thanks for pointing it out to me.

Yes, the x86 trace clock depends on the tsc_sync detection code. Note
that the patchset I propose here includes a new tsc_sync detection code
which behaves differently from the current x86 tsc_sync code. We can
specify an acceptable cycle delta between the CPUs and also, given I
retry 10 times for each CPU and take the _best_ result of the test, it
should make sure it does not suffer from NMI, MCE, SMI nor hypervisor
delays.


> > A "Big Fat" (TM) warning is shown on the console when the trace clock is used on
> > systems without synchronized TSCs to tell the user to
> >
> 
> How about Intel systems where the TSC only stops in C3 and deeper?
> You don't seem to handle that case well.
> 

The thing is : if the users specify the idle=poll kernel command line
option, the cycle counters will stay synchronized even if it would stop
in C3 and deeper halt states. This code assumes that the idle thread has
already executed on at least one of the CPUs before the synchronization
code is called. Maybe it would be safer to explicitly call the idle loop
on one CPU to insure we would detect delta caused by it ?

But hrm, if we have a virtualized environment, with one guest using
idle=poll and then a second guest using the halt states starting later,
I wonder what the result would be ?

Also, all this stuff about vmware giving TSCs "synchronized enough" for
timekeeping does not work work tracing. NTP precision, if I recall well,
was too loose for tracing needs.

> On modern Intel systems that's a common case.
> > +			new_tsc = last_tsc + TRACE_CLOCK_MIN_PROBE_DURATION;
> > +		/*
> > +		 * If cmpxchg fails with a value higher than the new_tsc, don't
> > +		 * retry : the value has been incremented and the events
> > +		 * happened almost at the same time.
> > +		 * We must retry if cmpxchg fails with a lower value :
> > +		 * it means that we are the CPU with highest frequency and
> > +		 * therefore MUST update the value.
> 
> Sorry but any global TSC measurements without scaling when the TSCs
> run on different frequencies just don't make sense. The results will
> be always poor. You really have to scale appropimately then and also
> handle the "instable period"

This scheme only offers the following guarantees :

- the max time imprecision is 1 jiffy.
- the cycles counter values read will increase monotonically across the
  system.

And also, yes, if all CPUs happen to go slower than their original
speed, the cpu_khz value used will be wrong and time will appear to go
slower than it does.

Given that, we could do "better" by also keep per cpu offset/frequency
and scale each frequency read using these reference values, so the TSCs
are almost in sync, but this would not deal correctly with the C3 and
lower halt states anyway.. so ther might be limited interest in this.

And yes, we would eventually have to record the frequency change events
in a specific tracing buffer, which would let us reconstruct the CPU
speed along the execution, but I think starting with such less precise
but still accurate (in term of event order) tracing data would be a
simpler start. We can always add this frequency change monitoring layer
in a second phase.

So, given that, what would you think would be "good enough" for a first
shot as a tracing clock, given we keep room for later improvement ?

Considering your comments here, we may want to add some knowledge about
the current CPU frequency in the trace clock tsc read primitive so we go
at a rate which follows the real speed more closely, even though this
would not take care of the C3 and - halt states. And also given that, we
might want to resynchronize the TSC values based on HPET reads
periodically (e.g. every time we go out of idle, but at most once per
jiffy) to deal with halt states. We could wrap that into the current
cmpxchg-based method I proposed here to deal with the "instable period".

Hm ?

Mathieu

> 
> -Andi
> 
> -- 
> ak@linux.intel.com

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-11-28 14:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-26 12:42 [patch 00/17] Trace Clock v4 Mathieu Desnoyers
2008-11-26 12:42 ` [patch 01/17] get_cycles() : kconfig HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-26 12:42 ` [patch 02/17] get_cycles() : x86 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-26 12:42 ` [patch 03/17] get_cycles() : sparc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-26 12:42 ` [patch 04/17] get_cycles() : powerpc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-11-26 12:42 ` [patch 05/17] MIPS : export hpt frequency for get_cycles Mathieu Desnoyers
2008-11-26 12:42 ` [patch 06/17] get_cycles() : MIPS HAVE_GET_CYCLES_32 Mathieu Desnoyers
2008-11-26 12:42 ` [patch 07/17] Trace clock core Mathieu Desnoyers
2008-11-26 12:42 ` [patch 08/17] Trace clock generic Mathieu Desnoyers
2008-11-26 12:42 ` [patch 09/17] Powerpc : Trace clock Mathieu Desnoyers
2008-11-26 12:42 ` [patch 10/17] Sparc64 " Mathieu Desnoyers
2008-11-26 12:42 ` [patch 11/17] LTTng timestamp sh Mathieu Desnoyers
2008-11-26 12:42 ` [patch 12/17] LTTng - TSC synchronicity test Mathieu Desnoyers
2008-11-26 12:42 ` [patch 13/17] x86 : remove arch-specific tsc_sync.c Mathieu Desnoyers
2008-11-26 12:43 ` [patch 14/17] MIPS use tsc_sync.c Mathieu Desnoyers
2008-11-26 12:43 ` [patch 15/17] MIPS create empty sync_core() Mathieu Desnoyers
2008-11-26 12:43 ` [patch 16/17] MIPS : Trace clock Mathieu Desnoyers
2008-11-26 12:43 ` [patch 17/17] x86 trace clock Mathieu Desnoyers
2008-11-26 17:32   ` Andi Kleen
2008-11-28 14:17     ` Mathieu Desnoyers [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-11-12 23:15 [patch 00/17] Trace Clock v3 Mathieu Desnoyers
2008-11-12 23:16 ` [patch 17/17] x86 trace clock Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081128141704.GE10401@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox