public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: john stultz <johnstul@us.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>,
	bob.picco@hp.com, Steven Rostedt <rostedt@goodmis.org>,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Christoph Hellwig <hch@infradead.org>,
	Gregory Haskins <ghaskins@novell.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tim Bird <tim.bird@am.sony.com>, Sam Ravnborg <sam@ravnborg.org>,
	"Frank Ch. Eigler" <fche@redhat.com>,
	Steven Rostedt <srostedt@redhat.com>
Subject: Re: [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays
Date: Thu, 10 Jan 2008 15:42:59 -0500	[thread overview]
Message-ID: <20080110204259.GB5836@Krystal> (raw)
In-Reply-To: <1199996959.6108.18.camel@localhost.localdomain>

* john stultz (johnstul@us.ibm.com) wrote:
> 
> On Thu, 2008-01-10 at 11:54 -0800, Tony Luck wrote:
> > > Tony:  ia64 also needs something like this, but I found the fsyscall asm
> > > bits a little difficult to grasp. So I'll need some assistance on how to
> > > include the accumulated cycles into the final calculation.
> > 
> > I'm trying to figure out all the ramifications of the new
> > "cycle_accumulated" field.  Does it really need to be
> > propagated all the way to the low level assembler (which
> > I don't want to mess with unless I really, really have to).
> > Can't I do the necessary calculations in update_vsyscall()
> > [Where I can do them in C :-)] and keep the same low
> > level assembly code.  I think I must be missing some
> > important bit of what is going on here.
> 
> (Added Bob Picco to the mail, as he was involved in the ia64 clocksource
> work).
> 
> So the background on the patch is this:
> 
> Some clocksources wrap frequently (every few seconds, for example). This
> can cause issues if we defer the update_wall_time() function where we
> accumulate time for too long (this really only happens on -rt systems
> right now). 
> 
> To avoid that issue, we've added the cycle_accumulated value, which acts
> as a midpoint, where we can quickly accumulate cycles off of the
> counter, without doing the more expensive update_wall_time() function.
> This avoids issues with the clocksource wrapping, but requires that
> cycle_accumulated be added in to the gettiemofday() calculation.
> 
> If you noticed in my email, the fix for ppc was a bit easier, as it has
> only a 64bit counter that is quite unlikely to wrap twice between calls
> to update_wall_time(). There we could decrement the cycles_last value by
> cycles_accumulated and get the same effect of adding it in.
> 
> Unfortunately on ia64, I suspect it will be necessary to do similar to
> the x86_64 code and add in the cycles accumulated value in
> vgettime/fgettime function, since there is the possibility of quickly
> wrapping clocksources on that architecture. 
> 
> So unless someone can point out a nicer trick, it likely means adding a
> new cycles_accumulated value to the fsyscall structure and the asm to do
> the addition. :(
> 

I think it's about time I introduce the approach I have taken for LTTng
timestamping. Basically, one of the main issues with the clock sources
is the xtime lock : having a read seqlock nested over a write seqlock is
a really, really bad idea. This can happen with NMIs. Basically, it
would cause a deadlock.

What I have done is an RCU algorithm that extends a 32 bits TSC (that's
the case on MIPS, for instance) to 64 bits. The update of the MSBs is
done by a periodical timer (fired often enough to make sure we always
detect the 32 LSBs wrap-around) and the read-side only has to disable
preemption.

I use a 2 slots array, each of them keeping, alternatively, the last 64
bits counter value, to implement the RCU algorithm.

Since we are discussing time source modification, this is one that I
would really like to see in the Linux kernel : it would provide the kind
of time source needed for function entry/exit tracing and for generic
kernel tracing as well.

Mathieu

Here is the patch, for reference. It applies on 2.6.24-rc7, after some
other LTTng patches in my patchset.


LTTng timestamp

LTTng synthetic TSC code for timestamping. Extracts 64 bits tsc from a 32 bits
counter, kept up to date by periodical timer interrupt. Lockless.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
 init/Kconfig        |    2 
 ltt/Kconfig         |   17 ++++
 ltt/Makefile        |    1 
 ltt/ltt-timestamp.c |  198 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 218 insertions(+)

Index: linux-2.6-lttng/ltt/ltt-timestamp.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-timestamp.c	2007-12-05 20:54:48.000000000 -0500
@@ -0,0 +1,198 @@
+/*
+ * (C) Copyright	2006,2007 -
+ * 		Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * notes : ltt-timestamp timer-based clock cannot be used for early tracing in
+ * the boot process, as it depends on timer interrupts.
+ *
+ * The timer needs to be only on one CPU to support hotplug.
+ * We have the choice between schedule_delayed_work_on and an IPI to get each
+ * CPU to write the heartbeat. IPI has been chosen because it is considered
+ * faster than passing through the timer to get the work scheduled on all the
+ * CPUs.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/timer.h>
+#include <linux/workqueue.h>
+#include <linux/cpu.h>
+#include <linux/timex.h>
+#include <linux/bitops.h>
+#include <linux/ltt.h>
+#include <linux/smp.h>
+#include <linux/sched.h> /* FIX for m68k local_irq_enable in on_each_cpu */
+
+atomic_t lttng_generic_clock;
+EXPORT_SYMBOL(lttng_generic_clock);
+
+/* Expected maximum interrupt latency in ms : 15ms, *2 for security */
+#define EXPECTED_INTERRUPT_LATENCY	30
+
+static struct timer_list stsc_timer;
+static unsigned int precalc_expire;
+
+/* For architectures with 32 bits TSC */
+static struct synthetic_tsc_struct {
+	u32 tsc[2][2];	/* a pair of 2 32 bits. [0] is the MSB, [1] is LSB */
+	unsigned int index;	/* Index of the current synth. tsc. */
+} ____cacheline_aligned synthetic_tsc[NR_CPUS];
+
+/* Called from IPI : either in interrupt or process context */
+static void ltt_update_synthetic_tsc(void)
+{
+	struct synthetic_tsc_struct *cpu_synth;
+	u32 tsc;
+
+	preempt_disable();
+	cpu_synth = &synthetic_tsc[smp_processor_id()];
+	tsc = ltt_get_timestamp32();		/* We deal with a 32 LSB TSC */
+
+	if (tsc < cpu_synth->tsc[cpu_synth->index][1]) {
+		unsigned int new_index = cpu_synth->index ? 0 : 1; /* 0 <-> 1 */
+		/*
+		 * Overflow
+		 * Non atomic update of the non current synthetic TSC, followed
+		 * by an atomic index change. There is no write concurrency,
+		 * so the index read/write does not need to be atomic.
+		 */
+		cpu_synth->tsc[new_index][1] = tsc; /* LSB update */
+		cpu_synth->tsc[new_index][0] =
+			cpu_synth->tsc[cpu_synth->index][0]+1; /* MSB update */
+		cpu_synth->index = new_index;	/* atomic change of index */
+	} else {
+		/*
+		 * No overflow : we can simply update the 32 LSB of the current
+		 * synthetic TSC as it's an atomic write.
+		 */
+		cpu_synth->tsc[cpu_synth->index][1] = tsc;
+	}
+	preempt_enable();
+}
+
+/* Called from buffer switch : in _any_ context (even NMI) */
+u64 ltt_read_synthetic_tsc(void)
+{
+	struct synthetic_tsc_struct *cpu_synth;
+	u64 ret;
+	unsigned int index;
+	u32 tsc;
+
+	preempt_disable();
+	cpu_synth = &synthetic_tsc[smp_processor_id()];
+	index = cpu_synth->index; /* atomic read */
+	tsc = ltt_get_timestamp32();		/* We deal with a 32 LSB TSC */
+
+	if (tsc < cpu_synth->tsc[index][1]) {
+		/* Overflow */
+		ret = ((u64)(cpu_synth->tsc[index][0]+1) << 32) | ((u64)tsc);
+	} else {
+		/* no overflow */
+		ret = ((u64)cpu_synth->tsc[index][0] << 32) | ((u64)tsc);
+	}
+	preempt_enable();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_read_synthetic_tsc);
+
+static void synthetic_tsc_ipi(void *info)
+{
+	ltt_update_synthetic_tsc();
+}
+
+/* We need to be in process context to do an IPI */
+static void synthetic_tsc_work(struct work_struct *work)
+{
+	on_each_cpu(synthetic_tsc_ipi, NULL, 1, 1);
+}
+static DECLARE_WORK(stsc_work, synthetic_tsc_work);
+
+/*
+ * stsc_timer : - Timer function synchronizing synthetic TSC.
+ * @data: unused
+ *
+ * Guarantees at least 1 execution before low word of TSC wraps.
+ */
+static void stsc_timer_fct(unsigned long data)
+{
+	PREPARE_WORK(&stsc_work, synthetic_tsc_work);
+	schedule_work(&stsc_work);
+
+	mod_timer(&stsc_timer, jiffies + precalc_expire);
+}
+
+/*
+ * precalc_stsc_interval: - Precalculates the interval between the 32 bits TSC
+ * wraparounds.
+ */
+static int __init precalc_stsc_interval(void)
+{
+	unsigned long mask;
+
+	mask = 0xFFFFFFFFUL;
+	precalc_expire =
+		(mask/((ltt_frequency() / HZ * ltt_freq_scale()) << 1)
+			- 1 - (EXPECTED_INTERRUPT_LATENCY*HZ/1000)) >> 1;
+	WARN_ON(precalc_expire == 0);
+	printk(KERN_DEBUG "Synthetic TSC timer will fire each %u jiffies.\n",
+		precalc_expire);
+	return 0;
+}
+
+/*
+ * 	hotcpu_callback - CPU hotplug callback
+ * 	@nb: notifier block
+ * 	@action: hotplug action to take
+ * 	@hcpu: CPU number
+ *
+ *	Sets the new CPU's current synthetic TSC to the same value as the
+ *	currently running CPU.
+ *
+ * 	Returns the success/failure of the operation. (NOTIFY_OK, NOTIFY_BAD)
+ */
+static int __cpuinit hotcpu_callback(struct notifier_block *nb,
+				unsigned long action,
+				void *hcpu)
+{
+	unsigned int hotcpu = (unsigned long)hcpu;
+	struct synthetic_tsc_struct *cpu_synth;
+	u64 local_count;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+		cpu_synth = &synthetic_tsc[hotcpu];
+		local_count = ltt_read_synthetic_tsc();
+		cpu_synth->tsc[0][1] = (u32)local_count; /* LSB */
+		cpu_synth->tsc[0][0] = (u32)(local_count >> 32); /* MSB */
+		cpu_synth->index = 0;
+		smp_wmb();	/* Writing in data of CPU about to come up */
+		break;
+	case CPU_ONLINE:
+		/*
+		 * FIXME : heartbeat events are currently broken with CPU
+		 * hotplug : events can be recorded before heartbeat, heartbeat
+		 * too far from trace start and are broken with trace stop/start
+		 * as well.
+		 */
+		/* As we are preemptible, make sure it runs on the right cpu */
+		smp_call_function_single(hotcpu, synthetic_tsc_ipi, NULL, 1, 0);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+/* Called from one CPU, before any tracing starts, to init each structure */
+static int __init ltt_init_synthetic_tsc(void)
+{
+	int cpu;
+	hotcpu_notifier(hotcpu_callback, 3);
+	precalc_stsc_interval();
+	init_timer(&stsc_timer);
+	stsc_timer.function = stsc_timer_fct;
+	stsc_timer.expires = jiffies + precalc_expire;
+	add_timer(&stsc_timer);
+	return 0;
+}
+
+__initcall(ltt_init_synthetic_tsc);
Index: linux-2.6-lttng/ltt/Kconfig
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/Kconfig	2007-12-05 20:54:48.000000000 -0500
@@ -0,0 +1,17 @@
+menu "Linux Trace Toolkit"
+
+config LTT_TIMESTAMP
+	bool "LTTng fine-grained timestamping"
+	default y
+	help
+	  Allow fine-grained timestamps to be taken from tracing applications.
+
+config HAVE_LTT_CLOCK
+	def_bool n
+
+config HAVE_LTT_SYNTHETIC_TSC
+	bool
+	default y if (!HAVE_LTT_CLOCK)
+	default n if HAVE_LTT_CLOCK
+
+endmenu
Index: linux-2.6-lttng/ltt/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/Makefile	2007-12-05 20:54:48.000000000 -0500
@@ -0,0 +1 @@
+obj-$(CONFIG_HAVE_LTT_SYNTHETIC_TSC)	+= ltt-timestamp.o
Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig	2007-12-05 20:53:35.000000000 -0500
+++ linux-2.6-lttng/init/Kconfig	2007-12-05 20:54:48.000000000 -0500
@@ -682,6 +682,8 @@ config MARKERS
 	  Place an empty function call at each marker site. Can be
 	  dynamically changed for a probe function.
 
+source "ltt/Kconfig"
+
 source "arch/Kconfig"
 
 config DISABLE_IMMEDIATE



-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

  reply	other threads:[~2008-01-10 20:43 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-09 23:29 [RFC PATCH 00/22 -v2] mcount and latency tracing utility -v2 Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 01/22 -v2] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-10 18:19   ` Jan Kiszka
2008-01-10 19:54     ` Steven Rostedt
2008-01-10 23:02     ` Steven Rostedt
2008-01-10 18:28   ` Sam Ravnborg
2008-01-10 19:10     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 02/22 -v2] Annotate core code that should not be traced Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 03/22 -v2] x86_64: notrace annotations Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 04/22 -v2] add notrace annotations to vsyscall Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 05/22 -v2] add notrace annotations for NMI routines Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 06/22 -v2] mcount based trace in the form of a header file library Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 07/22 -v2] tracer add debugfs interface Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 08/22 -v2] mcount tracer output file Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 09/22 -v2] mcount tracer show task comm and pid Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 10/22 -v2] Add a symbol only trace output Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 11/22 -v2] Reset the tracer when started Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 12/22 -v2] separate out the percpu date into a percpu struct Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays Steven Rostedt
2008-01-10  0:00   ` john stultz
2008-01-10  0:09     ` Steven Rostedt
2008-01-10 19:54     ` Tony Luck
2008-01-10 20:15       ` Steven Rostedt
2008-01-10 20:41         ` john stultz
2008-01-10 20:29       ` john stultz
2008-01-10 20:42         ` Mathieu Desnoyers [this message]
2008-01-10 21:25           ` john stultz
2008-01-10 22:00             ` Mathieu Desnoyers
2008-01-10 22:40               ` Steven Rostedt
2008-01-10 22:51               ` john stultz
2008-01-10 23:05                 ` john stultz
2008-01-10 21:33         ` [RFC PATCH 13/22 -v2] handle accurate time keeping over longdelays Luck, Tony
2008-01-10  0:19   ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays john stultz
2008-01-10  0:25     ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 14/22 -v2] time keeping add cycle_raw for actual incrementation Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 15/22 -v2] initialize the clock source to jiffies clock Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 16/22 -v2] add get_monotonic_cycles Steven Rostedt
2008-01-10  3:28   ` Daniel Walker
2008-01-15 21:46   ` Mathieu Desnoyers
2008-01-15 22:01     ` Steven Rostedt
2008-01-15 22:03       ` Steven Rostedt
2008-01-15 22:08       ` Mathieu Desnoyers
2008-01-16  1:38         ` Steven Rostedt
2008-01-16  3:17           ` Mathieu Desnoyers
2008-01-16 13:17             ` Steven Rostedt
2008-01-16 14:56               ` Mathieu Desnoyers
2008-01-16 15:06                 ` Steven Rostedt
2008-01-16 15:28                   ` Mathieu Desnoyers
2008-01-16 15:58                     ` Steven Rostedt
2008-01-16 17:00                       ` Mathieu Desnoyers
2008-01-16 17:49                         ` Mathieu Desnoyers
2008-01-16 19:43                         ` Steven Rostedt
2008-01-16 20:17                           ` Mathieu Desnoyers
2008-01-16 20:45                             ` Tim Bird
2008-01-16 20:49                             ` Steven Rostedt
2008-01-17 20:08                             ` Steven Rostedt
2008-01-17 20:37                               ` Frank Ch. Eigler
2008-01-17 21:03                                 ` Steven Rostedt
2008-01-18 22:26                                   ` Mathieu Desnoyers
2008-01-18 22:49                                     ` Steven Rostedt
2008-01-18 23:19                                       ` Mathieu Desnoyers
2008-01-19  3:36                                         ` Frank Ch. Eigler
2008-01-19  3:55                                           ` Steven Rostedt
2008-01-19  4:23                                             ` Frank Ch. Eigler
2008-01-19 15:29                                               ` Mathieu Desnoyers
2008-01-19  3:32                                       ` Frank Ch. Eigler
2008-01-16 18:01                       ` Tim Bird
2008-01-16 22:36                 ` john stultz
2008-01-16 22:51                   ` john stultz
2008-01-16 23:33                     ` Steven Rostedt
2008-01-17  2:28                       ` john stultz
2008-01-17  2:40                         ` Mathieu Desnoyers
2008-01-17  2:50                           ` Mathieu Desnoyers
2008-01-17  3:02                             ` Steven Rostedt
2008-01-17  3:21                             ` Paul Mackerras
2008-01-17  3:39                               ` Steven Rostedt
2008-01-17  4:22                                 ` Mathieu Desnoyers
2008-01-17  4:25                                 ` Mathieu Desnoyers
2008-01-17  4:14                               ` Mathieu Desnoyers
2008-01-17 15:22                                 ` Steven Rostedt
2008-01-17 17:46                                 ` Linus Torvalds
2008-01-17  2:51                           ` Steven Rostedt
2008-01-16 23:39                     ` Mathieu Desnoyers
2008-01-16 23:50                       ` Steven Rostedt
2008-01-17  0:36                         ` Steven Rostedt
2008-01-17  0:33                       ` john stultz
2008-01-17  2:20                         ` Mathieu Desnoyers
2008-01-17  1:03                       ` Linus Torvalds
2008-01-17  1:35                         ` Mathieu Desnoyers
2008-01-17  2:20                       ` john stultz
2008-01-17  2:35                         ` Mathieu Desnoyers
2008-01-09 23:29 ` [RFC PATCH 17/22 -v2] Add timestamps to tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 18/22 -v2] Sort trace by timestamp Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 19/22 -v2] speed up the output of the tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 20/22 -v2] Add latency_trace format tor tracer Steven Rostedt
2008-01-10  3:41   ` Daniel Walker
2008-01-09 23:29 ` [RFC PATCH 21/22 -v2] Split out specific tracing functions Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 22/22 -v2] Trace irq disabled critical timings Steven Rostedt
2008-01-10  3:58   ` Daniel Walker
2008-01-10 14:45     ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080110204259.GB5836@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@ghostprotocols.net \
    --cc=akpm@linux-foundation.org \
    --cc=bob.picco@hp.com \
    --cc=fche@redhat.com \
    --cc=ghaskins@novell.com \
    --cc=hch@infradead.org \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rostedt@goodmis.org \
    --cc=sam@ravnborg.org \
    --cc=srostedt@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tim.bird@am.sony.com \
    --cc=tony.luck@intel.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox