From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: john stultz <johnstul@us.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>,
bob.picco@hp.com, Steven Rostedt <rostedt@goodmis.org>,
LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Christoph Hellwig <hch@infradead.org>,
Gregory Haskins <ghaskins@novell.com>,
Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
Thomas Gleixner <tglx@linutronix.de>,
Tim Bird <tim.bird@am.sony.com>, Sam Ravnborg <sam@ravnborg.org>,
"Frank Ch. Eigler" <fche@redhat.com>,
Steven Rostedt <srostedt@redhat.com>
Subject: Re: [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays
Date: Thu, 10 Jan 2008 15:42:59 -0500 [thread overview]
Message-ID: <20080110204259.GB5836@Krystal> (raw)
In-Reply-To: <1199996959.6108.18.camel@localhost.localdomain>
* john stultz (johnstul@us.ibm.com) wrote:
>
> On Thu, 2008-01-10 at 11:54 -0800, Tony Luck wrote:
> > > Tony: ia64 also needs something like this, but I found the fsyscall asm
> > > bits a little difficult to grasp. So I'll need some assistance on how to
> > > include the accumulated cycles into the final calculation.
> >
> > I'm trying to figure out all the ramifications of the new
> > "cycle_accumulated" field. Does it really need to be
> > propagated all the way to the low level assembler (which
> > I don't want to mess with unless I really, really have to).
> > Can't I do the necessary calculations in update_vsyscall()
> > [Where I can do them in C :-)] and keep the same low
> > level assembly code. I think I must be missing some
> > important bit of what is going on here.
>
> (Added Bob Picco to the mail, as he was involved in the ia64 clocksource
> work).
>
> So the background on the patch is this:
>
> Some clocksources wrap frequently (every few seconds, for example). This
> can cause issues if we defer the update_wall_time() function where we
> accumulate time for too long (this really only happens on -rt systems
> right now).
>
> To avoid that issue, we've added the cycle_accumulated value, which acts
> as a midpoint, where we can quickly accumulate cycles off of the
> counter, without doing the more expensive update_wall_time() function.
> This avoids issues with the clocksource wrapping, but requires that
> cycle_accumulated be added in to the gettiemofday() calculation.
>
> If you noticed in my email, the fix for ppc was a bit easier, as it has
> only a 64bit counter that is quite unlikely to wrap twice between calls
> to update_wall_time(). There we could decrement the cycles_last value by
> cycles_accumulated and get the same effect of adding it in.
>
> Unfortunately on ia64, I suspect it will be necessary to do similar to
> the x86_64 code and add in the cycles accumulated value in
> vgettime/fgettime function, since there is the possibility of quickly
> wrapping clocksources on that architecture.
>
> So unless someone can point out a nicer trick, it likely means adding a
> new cycles_accumulated value to the fsyscall structure and the asm to do
> the addition. :(
>
I think it's about time I introduce the approach I have taken for LTTng
timestamping. Basically, one of the main issues with the clock sources
is the xtime lock : having a read seqlock nested over a write seqlock is
a really, really bad idea. This can happen with NMIs. Basically, it
would cause a deadlock.
What I have done is an RCU algorithm that extends a 32 bits TSC (that's
the case on MIPS, for instance) to 64 bits. The update of the MSBs is
done by a periodical timer (fired often enough to make sure we always
detect the 32 LSBs wrap-around) and the read-side only has to disable
preemption.
I use a 2 slots array, each of them keeping, alternatively, the last 64
bits counter value, to implement the RCU algorithm.
Since we are discussing time source modification, this is one that I
would really like to see in the Linux kernel : it would provide the kind
of time source needed for function entry/exit tracing and for generic
kernel tracing as well.
Mathieu
Here is the patch, for reference. It applies on 2.6.24-rc7, after some
other LTTng patches in my patchset.
LTTng timestamp
LTTng synthetic TSC code for timestamping. Extracts 64 bits tsc from a 32 bits
counter, kept up to date by periodical timer interrupt. Lockless.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
---
init/Kconfig | 2
ltt/Kconfig | 17 ++++
ltt/Makefile | 1
ltt/ltt-timestamp.c | 198 ++++++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 218 insertions(+)
Index: linux-2.6-lttng/ltt/ltt-timestamp.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/ltt-timestamp.c 2007-12-05 20:54:48.000000000 -0500
@@ -0,0 +1,198 @@
+/*
+ * (C) Copyright 2006,2007 -
+ * Mathieu Desnoyers (mathieu.desnoyers@polymtl.ca)
+ *
+ * notes : ltt-timestamp timer-based clock cannot be used for early tracing in
+ * the boot process, as it depends on timer interrupts.
+ *
+ * The timer needs to be only on one CPU to support hotplug.
+ * We have the choice between schedule_delayed_work_on and an IPI to get each
+ * CPU to write the heartbeat. IPI has been chosen because it is considered
+ * faster than passing through the timer to get the work scheduled on all the
+ * CPUs.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/delay.h>
+#include <linux/timer.h>
+#include <linux/workqueue.h>
+#include <linux/cpu.h>
+#include <linux/timex.h>
+#include <linux/bitops.h>
+#include <linux/ltt.h>
+#include <linux/smp.h>
+#include <linux/sched.h> /* FIX for m68k local_irq_enable in on_each_cpu */
+
+atomic_t lttng_generic_clock;
+EXPORT_SYMBOL(lttng_generic_clock);
+
+/* Expected maximum interrupt latency in ms : 15ms, *2 for security */
+#define EXPECTED_INTERRUPT_LATENCY 30
+
+static struct timer_list stsc_timer;
+static unsigned int precalc_expire;
+
+/* For architectures with 32 bits TSC */
+static struct synthetic_tsc_struct {
+ u32 tsc[2][2]; /* a pair of 2 32 bits. [0] is the MSB, [1] is LSB */
+ unsigned int index; /* Index of the current synth. tsc. */
+} ____cacheline_aligned synthetic_tsc[NR_CPUS];
+
+/* Called from IPI : either in interrupt or process context */
+static void ltt_update_synthetic_tsc(void)
+{
+ struct synthetic_tsc_struct *cpu_synth;
+ u32 tsc;
+
+ preempt_disable();
+ cpu_synth = &synthetic_tsc[smp_processor_id()];
+ tsc = ltt_get_timestamp32(); /* We deal with a 32 LSB TSC */
+
+ if (tsc < cpu_synth->tsc[cpu_synth->index][1]) {
+ unsigned int new_index = cpu_synth->index ? 0 : 1; /* 0 <-> 1 */
+ /*
+ * Overflow
+ * Non atomic update of the non current synthetic TSC, followed
+ * by an atomic index change. There is no write concurrency,
+ * so the index read/write does not need to be atomic.
+ */
+ cpu_synth->tsc[new_index][1] = tsc; /* LSB update */
+ cpu_synth->tsc[new_index][0] =
+ cpu_synth->tsc[cpu_synth->index][0]+1; /* MSB update */
+ cpu_synth->index = new_index; /* atomic change of index */
+ } else {
+ /*
+ * No overflow : we can simply update the 32 LSB of the current
+ * synthetic TSC as it's an atomic write.
+ */
+ cpu_synth->tsc[cpu_synth->index][1] = tsc;
+ }
+ preempt_enable();
+}
+
+/* Called from buffer switch : in _any_ context (even NMI) */
+u64 ltt_read_synthetic_tsc(void)
+{
+ struct synthetic_tsc_struct *cpu_synth;
+ u64 ret;
+ unsigned int index;
+ u32 tsc;
+
+ preempt_disable();
+ cpu_synth = &synthetic_tsc[smp_processor_id()];
+ index = cpu_synth->index; /* atomic read */
+ tsc = ltt_get_timestamp32(); /* We deal with a 32 LSB TSC */
+
+ if (tsc < cpu_synth->tsc[index][1]) {
+ /* Overflow */
+ ret = ((u64)(cpu_synth->tsc[index][0]+1) << 32) | ((u64)tsc);
+ } else {
+ /* no overflow */
+ ret = ((u64)cpu_synth->tsc[index][0] << 32) | ((u64)tsc);
+ }
+ preempt_enable();
+ return ret;
+}
+EXPORT_SYMBOL_GPL(ltt_read_synthetic_tsc);
+
+static void synthetic_tsc_ipi(void *info)
+{
+ ltt_update_synthetic_tsc();
+}
+
+/* We need to be in process context to do an IPI */
+static void synthetic_tsc_work(struct work_struct *work)
+{
+ on_each_cpu(synthetic_tsc_ipi, NULL, 1, 1);
+}
+static DECLARE_WORK(stsc_work, synthetic_tsc_work);
+
+/*
+ * stsc_timer : - Timer function synchronizing synthetic TSC.
+ * @data: unused
+ *
+ * Guarantees at least 1 execution before low word of TSC wraps.
+ */
+static void stsc_timer_fct(unsigned long data)
+{
+ PREPARE_WORK(&stsc_work, synthetic_tsc_work);
+ schedule_work(&stsc_work);
+
+ mod_timer(&stsc_timer, jiffies + precalc_expire);
+}
+
+/*
+ * precalc_stsc_interval: - Precalculates the interval between the 32 bits TSC
+ * wraparounds.
+ */
+static int __init precalc_stsc_interval(void)
+{
+ unsigned long mask;
+
+ mask = 0xFFFFFFFFUL;
+ precalc_expire =
+ (mask/((ltt_frequency() / HZ * ltt_freq_scale()) << 1)
+ - 1 - (EXPECTED_INTERRUPT_LATENCY*HZ/1000)) >> 1;
+ WARN_ON(precalc_expire == 0);
+ printk(KERN_DEBUG "Synthetic TSC timer will fire each %u jiffies.\n",
+ precalc_expire);
+ return 0;
+}
+
+/*
+ * hotcpu_callback - CPU hotplug callback
+ * @nb: notifier block
+ * @action: hotplug action to take
+ * @hcpu: CPU number
+ *
+ * Sets the new CPU's current synthetic TSC to the same value as the
+ * currently running CPU.
+ *
+ * Returns the success/failure of the operation. (NOTIFY_OK, NOTIFY_BAD)
+ */
+static int __cpuinit hotcpu_callback(struct notifier_block *nb,
+ unsigned long action,
+ void *hcpu)
+{
+ unsigned int hotcpu = (unsigned long)hcpu;
+ struct synthetic_tsc_struct *cpu_synth;
+ u64 local_count;
+
+ switch (action) {
+ case CPU_UP_PREPARE:
+ cpu_synth = &synthetic_tsc[hotcpu];
+ local_count = ltt_read_synthetic_tsc();
+ cpu_synth->tsc[0][1] = (u32)local_count; /* LSB */
+ cpu_synth->tsc[0][0] = (u32)(local_count >> 32); /* MSB */
+ cpu_synth->index = 0;
+ smp_wmb(); /* Writing in data of CPU about to come up */
+ break;
+ case CPU_ONLINE:
+ /*
+ * FIXME : heartbeat events are currently broken with CPU
+ * hotplug : events can be recorded before heartbeat, heartbeat
+ * too far from trace start and are broken with trace stop/start
+ * as well.
+ */
+ /* As we are preemptible, make sure it runs on the right cpu */
+ smp_call_function_single(hotcpu, synthetic_tsc_ipi, NULL, 1, 0);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+/* Called from one CPU, before any tracing starts, to init each structure */
+static int __init ltt_init_synthetic_tsc(void)
+{
+ int cpu;
+ hotcpu_notifier(hotcpu_callback, 3);
+ precalc_stsc_interval();
+ init_timer(&stsc_timer);
+ stsc_timer.function = stsc_timer_fct;
+ stsc_timer.expires = jiffies + precalc_expire;
+ add_timer(&stsc_timer);
+ return 0;
+}
+
+__initcall(ltt_init_synthetic_tsc);
Index: linux-2.6-lttng/ltt/Kconfig
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/Kconfig 2007-12-05 20:54:48.000000000 -0500
@@ -0,0 +1,17 @@
+menu "Linux Trace Toolkit"
+
+config LTT_TIMESTAMP
+ bool "LTTng fine-grained timestamping"
+ default y
+ help
+ Allow fine-grained timestamps to be taken from tracing applications.
+
+config HAVE_LTT_CLOCK
+ def_bool n
+
+config HAVE_LTT_SYNTHETIC_TSC
+ bool
+ default y if (!HAVE_LTT_CLOCK)
+ default n if HAVE_LTT_CLOCK
+
+endmenu
Index: linux-2.6-lttng/ltt/Makefile
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/ltt/Makefile 2007-12-05 20:54:48.000000000 -0500
@@ -0,0 +1 @@
+obj-$(CONFIG_HAVE_LTT_SYNTHETIC_TSC) += ltt-timestamp.o
Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig 2007-12-05 20:53:35.000000000 -0500
+++ linux-2.6-lttng/init/Kconfig 2007-12-05 20:54:48.000000000 -0500
@@ -682,6 +682,8 @@ config MARKERS
Place an empty function call at each marker site. Can be
dynamically changed for a probe function.
+source "ltt/Kconfig"
+
source "arch/Kconfig"
config DISABLE_IMMEDIATE
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
next prev parent reply other threads:[~2008-01-10 20:43 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-09 23:29 [RFC PATCH 00/22 -v2] mcount and latency tracing utility -v2 Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 01/22 -v2] Add basic support for gcc profiler instrumentation Steven Rostedt
2008-01-10 18:19 ` Jan Kiszka
2008-01-10 19:54 ` Steven Rostedt
2008-01-10 23:02 ` Steven Rostedt
2008-01-10 18:28 ` Sam Ravnborg
2008-01-10 19:10 ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 02/22 -v2] Annotate core code that should not be traced Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 03/22 -v2] x86_64: notrace annotations Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 04/22 -v2] add notrace annotations to vsyscall Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 05/22 -v2] add notrace annotations for NMI routines Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 06/22 -v2] mcount based trace in the form of a header file library Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 07/22 -v2] tracer add debugfs interface Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 08/22 -v2] mcount tracer output file Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 09/22 -v2] mcount tracer show task comm and pid Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 10/22 -v2] Add a symbol only trace output Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 11/22 -v2] Reset the tracer when started Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 12/22 -v2] separate out the percpu date into a percpu struct Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays Steven Rostedt
2008-01-10 0:00 ` john stultz
2008-01-10 0:09 ` Steven Rostedt
2008-01-10 19:54 ` Tony Luck
2008-01-10 20:15 ` Steven Rostedt
2008-01-10 20:41 ` john stultz
2008-01-10 20:29 ` john stultz
2008-01-10 20:42 ` Mathieu Desnoyers [this message]
2008-01-10 21:25 ` john stultz
2008-01-10 22:00 ` Mathieu Desnoyers
2008-01-10 22:40 ` Steven Rostedt
2008-01-10 22:51 ` john stultz
2008-01-10 23:05 ` john stultz
2008-01-10 21:33 ` [RFC PATCH 13/22 -v2] handle accurate time keeping over longdelays Luck, Tony
2008-01-10 0:19 ` [RFC PATCH 13/22 -v2] handle accurate time keeping over long delays john stultz
2008-01-10 0:25 ` Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 14/22 -v2] time keeping add cycle_raw for actual incrementation Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 15/22 -v2] initialize the clock source to jiffies clock Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 16/22 -v2] add get_monotonic_cycles Steven Rostedt
2008-01-10 3:28 ` Daniel Walker
2008-01-15 21:46 ` Mathieu Desnoyers
2008-01-15 22:01 ` Steven Rostedt
2008-01-15 22:03 ` Steven Rostedt
2008-01-15 22:08 ` Mathieu Desnoyers
2008-01-16 1:38 ` Steven Rostedt
2008-01-16 3:17 ` Mathieu Desnoyers
2008-01-16 13:17 ` Steven Rostedt
2008-01-16 14:56 ` Mathieu Desnoyers
2008-01-16 15:06 ` Steven Rostedt
2008-01-16 15:28 ` Mathieu Desnoyers
2008-01-16 15:58 ` Steven Rostedt
2008-01-16 17:00 ` Mathieu Desnoyers
2008-01-16 17:49 ` Mathieu Desnoyers
2008-01-16 19:43 ` Steven Rostedt
2008-01-16 20:17 ` Mathieu Desnoyers
2008-01-16 20:45 ` Tim Bird
2008-01-16 20:49 ` Steven Rostedt
2008-01-17 20:08 ` Steven Rostedt
2008-01-17 20:37 ` Frank Ch. Eigler
2008-01-17 21:03 ` Steven Rostedt
2008-01-18 22:26 ` Mathieu Desnoyers
2008-01-18 22:49 ` Steven Rostedt
2008-01-18 23:19 ` Mathieu Desnoyers
2008-01-19 3:36 ` Frank Ch. Eigler
2008-01-19 3:55 ` Steven Rostedt
2008-01-19 4:23 ` Frank Ch. Eigler
2008-01-19 15:29 ` Mathieu Desnoyers
2008-01-19 3:32 ` Frank Ch. Eigler
2008-01-16 18:01 ` Tim Bird
2008-01-16 22:36 ` john stultz
2008-01-16 22:51 ` john stultz
2008-01-16 23:33 ` Steven Rostedt
2008-01-17 2:28 ` john stultz
2008-01-17 2:40 ` Mathieu Desnoyers
2008-01-17 2:50 ` Mathieu Desnoyers
2008-01-17 3:02 ` Steven Rostedt
2008-01-17 3:21 ` Paul Mackerras
2008-01-17 3:39 ` Steven Rostedt
2008-01-17 4:22 ` Mathieu Desnoyers
2008-01-17 4:25 ` Mathieu Desnoyers
2008-01-17 4:14 ` Mathieu Desnoyers
2008-01-17 15:22 ` Steven Rostedt
2008-01-17 17:46 ` Linus Torvalds
2008-01-17 2:51 ` Steven Rostedt
2008-01-16 23:39 ` Mathieu Desnoyers
2008-01-16 23:50 ` Steven Rostedt
2008-01-17 0:36 ` Steven Rostedt
2008-01-17 0:33 ` john stultz
2008-01-17 2:20 ` Mathieu Desnoyers
2008-01-17 1:03 ` Linus Torvalds
2008-01-17 1:35 ` Mathieu Desnoyers
2008-01-17 2:20 ` john stultz
2008-01-17 2:35 ` Mathieu Desnoyers
2008-01-09 23:29 ` [RFC PATCH 17/22 -v2] Add timestamps to tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 18/22 -v2] Sort trace by timestamp Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 19/22 -v2] speed up the output of the tracer Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 20/22 -v2] Add latency_trace format tor tracer Steven Rostedt
2008-01-10 3:41 ` Daniel Walker
2008-01-09 23:29 ` [RFC PATCH 21/22 -v2] Split out specific tracing functions Steven Rostedt
2008-01-09 23:29 ` [RFC PATCH 22/22 -v2] Trace irq disabled critical timings Steven Rostedt
2008-01-10 3:58 ` Daniel Walker
2008-01-10 14:45 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080110204259.GB5836@Krystal \
--to=mathieu.desnoyers@polymtl.ca \
--cc=a.p.zijlstra@chello.nl \
--cc=acme@ghostprotocols.net \
--cc=akpm@linux-foundation.org \
--cc=bob.picco@hp.com \
--cc=fche@redhat.com \
--cc=ghaskins@novell.com \
--cc=hch@infradead.org \
--cc=johnstul@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rostedt@goodmis.org \
--cc=sam@ravnborg.org \
--cc=srostedt@redhat.com \
--cc=tglx@linutronix.de \
--cc=tim.bird@am.sony.com \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox