public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>,
	David Miller <davem@davemloft.net>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC patch 15/15] LTTng timestamp x86
Date: Thu, 16 Oct 2008 21:28:35 -0400	[thread overview]
Message-ID: <20081017012835.GA30195@Krystal> (raw)
In-Reply-To: <alpine.LFD.2.00.0810161701470.3288@nehalem.linux-foundation.org>

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> 
> 
> On Thu, 16 Oct 2008, Mathieu Desnoyers wrote:
> >
> > +static inline cycles_t ltt_async_tsc_read(void)
> 
> (a) this shouldn't be inline
> 

Ok, will fix. I will put this in a new arch/x86/kernel/ltt.c.

> > +	rdtsc_barrier();
> > +	new_tsc = get_cycles();
> > +	rdtsc_barrier();
> > +	do {
> > +		last_tsc = ltt_last_tsc;
> > +		if (new_tsc < last_tsc)
> > +			new_tsc = last_tsc + LTT_MIN_PROBE_DURATION;
> > +		/*
> > +		 * If cmpxchg fails with a value higher than the new_tsc, don't
> > +		 * retry : the value has been incremented and the events
> > +		 * happened almost at the same time.
> > +		 * We must retry if cmpxchg fails with a lower value :
> > +		 * it means that we are the CPU with highest frequency and
> > +		 * therefore MUST update the value.
> > +		 */
> > +	} while (cmpxchg64(&ltt_last_tsc, last_tsc, new_tsc) < new_tsc);
> 
> (b) This is really quite expensive.
> 

Ok, let's try to figure out what the use-cases are, because we are
really facing an architectural mess (thanks to Intel and AMD). I don't
think there is a single perfect solution for all, but I'll try to
explain why I accept the cache-line bouncing behavior when
unsynchronized TSCs are detected by LTTng.

First, the most important thing in LTTng is to provide the event flow
in the correct order across CPUs. Secondary to that, getting the precise
execution time is a nice-to-have when the architecture supports it, but
the time granularity itself is not crucially important, as long as we
have a way to determine which of two events close in time happens first.
The principal use-case where I have seen such tracer in action is when
one have to understand why one or more processes are slower than
expected. The root cause can easily sit on another CPU, be a locking
delay in a particular race condition, or just a process waiting for
other processes waiting for a timeout.


> Why do things like this? Make the timestamps be per-cpu. If you do things 
> like the above, then just getting the timestamp means that every single 
> trace event will cause a cacheline bounce, and if you do that, you might 
> as well just not have per-cpu tracing at all.
> 

This cache-line bouncing global clock is a best-effort to provide
correct event order in the trace on architectures with unsync tsc. It's
actually better than a global tracing buffer because it limits the
number of cache line transfers required to one per event. Global tracing
buffers may require to transfer many cache lines across CPUs when events
are written across cache lines or larger than a cache line.

> It really boils down to two cases:
> 
>  - you do per-CPU traces
> 
>    If so, you need to ONLY EVER touch per-cpu data when tracing, and the 
>    above is a fundamental BUG. Dirtying shared cachelines makes the whole 
>    per-cpu thing pointless.

Sharing only a single cache-line is not completely pointless, as
explained above, but yes, there is a big performance hit involved.

I agree that we should maybe add a degree of flexibility in this time
infrastructure to let users select the type of time source they want :

- Global clock, potentially slow on unsynchronized CPUs.
- Local clock, fast, possibility unsynchronized across CPUs.

> 
>  - you do global traces
> 
>    Sure, then the above works, but why bother? You'll get the ordering 
>    from the global trace, you might as well do time stamps with local 
>    counts.
> 

I simply don't like the global traces because of the extra cache-line
bouncing experienced by events written on multiple cache-lines.

> So in neither case does it make any sense to try to do that global 
> ltt_last_tsc.
> 
> Perhaps more importantly - if the TSC really are out of whack, that just 
> means that now all your timestamps are worthless, because the value you 
> calculate ends up having NOTHING to do with the timestamp. So you cannot 
> even use it to see how long something took, because it may be that you're 
> running on the CPU that runs behind, and all you ever see is the value of 
> LTT_MIN_PROBE_DURATION.
> 

I thought about this one. There is actually a FIXME in the code which
plans to add an IPI called at each timer interrupt to do a "read tsc" on
each CPU. This would give an HZ upper bound to the time precision, which
would give a trace with events ordered across CPUs and manage to have
the execution time at a HZ precision.

So given that global buffers are less efficient that just synchronizing
a single cache-line and that some people are willing to pay the price to
get events synchronized across CPUs and others are not, what do you
think of leaving the choice to the user about globally/locally
synchronized timestamps ?

Thanks for the feedback,

Mathieu

> 			Linus

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  parent reply	other threads:[~2008-10-17  1:28 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-16 23:27 [RFC patch 00/15] Tracer Timestamping Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 01/15] get_cycles() : kconfig HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 02/15] get_cycles() : x86 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 03/15] get_cycles() : sparc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-17  2:48   ` [RFC patch 03/15] get_cycles() : sparc64 HAVE_GET_CYCLES (update) Mathieu Desnoyers
2008-10-17  2:57     ` David Miller
2008-10-16 23:27 ` [RFC patch 04/15] get_cycles() : powerpc64 HAVE_GET_CYCLES Mathieu Desnoyers
2008-10-17  0:26   ` Paul Mackerras
2008-10-17  0:43     ` [RFC patch 04/15] get_cycles() : powerpc64 HAVE_GET_CYCLES (update) Mathieu Desnoyers
2008-10-17  0:54       ` Paul Mackerras
2008-10-17  1:42       ` David Miller
2008-10-17  2:08         ` Mathieu Desnoyers
2008-10-17  2:33           ` David Miller
2008-10-16 23:27 ` [RFC patch 05/15] get_cycles() : MIPS HAVE_GET_CYCLES_32 Mathieu Desnoyers
2008-10-26 10:18   ` Ralf Baechle
2008-10-26 20:39     ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 06/15] LTTng build Mathieu Desnoyers
2008-10-17  8:10   ` KOSAKI Motohiro
2008-10-17 16:18     ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 07/15] LTTng timestamp Mathieu Desnoyers
2008-10-17  8:15   ` KOSAKI Motohiro
2008-10-17 16:23     ` Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 08/15] LTTng - Timestamping Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 09/15] LTTng mips export hpt frequency Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 10/15] LTTng timestamp mips Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 11/15] LTTng timestamp powerpc Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 12/15] LTTng timestamp sparc64 Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 13/15] LTTng timestamp sh Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 14/15] LTTng - TSC synchronicity test Mathieu Desnoyers
2008-10-16 23:27 ` [RFC patch 15/15] LTTng timestamp x86 Mathieu Desnoyers
2008-10-17  0:08   ` Linus Torvalds
2008-10-17  0:12     ` Linus Torvalds
2008-10-17  1:28     ` Mathieu Desnoyers [this message]
2008-10-17  2:19       ` Luck, Tony
2008-10-17 17:25         ` Steven Rostedt
2008-10-17 18:08           ` Luck, Tony
2008-10-17 18:42             ` Mathieu Desnoyers
2008-10-17 18:58               ` Luck, Tony
2008-10-17 20:23                 ` Mathieu Desnoyers
2008-10-17 23:52                   ` Luck, Tony
2008-10-18 17:01                     ` Mathieu Desnoyers
2008-10-18 17:35                       ` Linus Torvalds
2008-10-18 17:50                         ` Ingo Molnar
2008-10-22 16:19                           ` Mathieu Desnoyers
2008-10-22 15:53                         ` Mathieu Desnoyers
2008-10-20 18:07                       ` Luck, Tony
2008-10-22 16:51                         ` Mathieu Desnoyers
2008-10-17 19:17               ` Steven Rostedt
2008-10-20 20:10               ` Linus Torvalds
2008-10-20 21:38                 ` john stultz
2008-10-20 22:06                   ` Linus Torvalds
2008-10-20 22:17                     ` Ingo Molnar
2008-10-20 22:29                     ` H. Peter Anvin
2008-10-21 18:10                       ` Bjorn Helgaas
2008-10-23 15:47                         ` Linus Torvalds
2008-10-23 16:39                           ` H. Peter Anvin
2008-10-23 21:54                           ` Paul Mackerras
2008-10-20 23:47                     ` john stultz
2008-10-22 17:05                 ` Mathieu Desnoyers
2008-10-17 19:36         ` Christoph Lameter
2008-10-17  7:59 ` [RFC patch 00/15] Tracer Timestamping Peter Zijlstra
2008-10-20 20:25   ` Mathieu Desnoyers
2008-10-21  0:20     ` Nicolas Pitre
2008-10-21  1:32       ` Mathieu Desnoyers
2008-10-21  2:32         ` Nicolas Pitre
2008-10-21  4:05           ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081017012835.GA30195@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox