[RFC] Fast assurate clock readable from user space and NMI handler

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: mbligh@google.com, Daniel Walker <dwalker@mvista.com>
Cc: linux-kernel@vger.kernel.org
Subject: [RFC] Fast assurate clock readable from user space and NMI handler
Date: Sat, 24 Feb 2007 11:19:06 -0500	[thread overview]
Message-ID: <20070224161906.GA9497@Krystal> (raw)
In-Reply-To: <1164585589.16871.52.camel@localhost.localdomain>

Hi,

I am trying to improve the Linux kernel time source so it can be read
without seqlock from NMI handlers. I have also seen some interest for
such an accurate monotonic clock readable from user space. It mainly
implies an atomic update of the time value. I am also trying to figure a
way to support architectures with multiple CPUs with non-synchronized
TSCs.

I would like to have your comments on the following idea.

Thanks in advance,

Mathieu

Monotonic accurate time

The goal of this design is to provide a monotonic time :

Readable from userspace without a system call
Readable from NMI handler
Readable without disabling interrupts
Readable without disabling preemption
Only one clock source (most precise available : tsc)
Support architectures with variable TSC frequency.

Main difference with wall time currently implemented in the Linux kernel : the
time update is done atomically instead of using a write seqlock. It permits
reading time from NMI handler and from userspace.

struct time_info {
	u64 tsc;
	u64 freq;
	u64 walltime;
}

static struct time_struct {
	struct time_info time_sel[2];
	long update_count;
}

DECLARE_PERCPU(struct time_struct, cpu_time);

/* Number of times the scheduler is called on each CPU */
DECLARE_PERCPU(unsigned long, sched_nr);

/* On frequency change event */
/* In irq context */
void freq_change_cb(unsigned int new_freq)
{
	struct time_struct this_cpu_time = 
		per_cpu(cpu_time, smp_processor_id());
	struct time_info *write_time, *current_time;
	write_time =
		this_cpu_time->time_sel[(this_cpu_time->update_count+1)&1];
	current_time =
		this_cpu_time->time_sel[(this_cpu_time->update_count)&1];
	write_time->tsc = get_cycles();
	write_time->freq = new_freq;
	/* We cumulate the division imprecision. This is the downside of using
	 * the TSC with variable frequency as a time base. */
	write_time->walltime = 
		current_time->walltime + 
			(write_time->tsc - current_time->tsc) /
			current_time->freq;
	wmb();
	this_cpu_time->update_count++;
}

/* Init cpu freq */
init_cpu_freq()
{
	struct time_struct this_cpu_time = 
		per_cpu(cpu_time, smp_processor_id());
	struct time_info *current_time;
	memset(this_cpu_time, 0, sizeof(this_cpu_time));
	current_time = this_cpu_time->time_sel[this_cpu_time->update_count&1];
	/* Init current time */
	/* Get frequency */
	/* Reset cpus to 0 ns, 0 tsc, start their tsc. */
}

/* After a CPU comes back from hlt */
/* The trick is to sync all the other CPUs on the first CPU up when they come
 * up. If all CPUs are down, then there is no need to increment the walltime :
 * let's simply define the useful walltime on a machine as the time elapsed
 * while there is a CPU running. If we want, when no cpu is active, we can use
 * a lower resolution clock to somehow keep track of walltime. */

wake_from_hlt()
{
	/* TODO */
}

/* Read time from anywhere in the kernel. Return time in walltime. (ns) */
/* If the update_count changes while we read the context, it may be invalid.
 * This would happen if we are scheduled out for a period of time long enough to
 * permit 2 frequency changes. We simply start the loop again if it happens.
 * We detect it by comparing the update_count running counter.
 * We detect preemption by incrementing a counter sched_nr within schedule(). 
 * This counter is readable by user space through the vsyscall page. */
 */
u64 read_time(void)
{
	u64 walltime;
	long update_count;
	struct time_struct this_cpu_time;
	struct time_info *current_time;
	unsigned int cpu;
	long prev_sched_nr;
	do {
		cpu = _smp_processor_id();
		prev_sched_nr = per_cpu(sched_nr, cpu);
		if(cpu != _smp_processor_id())
			continue;	/* changed CPU between CPUID and getting
					   sched_nr */
		this_cpu_time = per_cpu(cpu_time, cpu);
		update_count = this_cpu_time->update_count;
		current_time = this_cpu_time->time_sel[update_count&1];
		walltime = current_time->walltime + 
				(get_cycles() - current_time->tsc) /
				current_time->freq;
		if(per_cpu(sched_nr, cpu) != prev_sched_nr)
			continue;	/* been preempted */
	} while(this_cpu_time->update_count != update_count);
	return walltime;
}

/* Userspace */
/* Export all this data to user space through the vsyscall page. Use a function
 * like read_time to read the walltime. This function can be implemented as-is
 * because it doesn't need to disable preemption. */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Candidate, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

next prev parent reply	other threads:[~2007-02-24 16:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-24 21:59 [PATCH 8/16] LTTng 0.6.36 for 2.6.18 : Timestamp Mathieu Desnoyers
     [not found] ` <1164475747.5196.5.camel@localhost.localdomain>
     [not found]   ` <20061126170542.GA30771@Krystal>
     [not found]     ` <1164561427.16871.14.camel@localhost.localdomain>
     [not found]       ` <20061126231833.GA22241@Krystal>
     [not found]         ` <1164585589.16871.52.camel@localhost.localdomain>
2007-02-24 16:19           ` Mathieu Desnoyers [this message]
2007-02-24 18:06             ` [RFC] Fast assurate clock readable from user space and NMI handler Daniel Walker
2007-02-26 20:53               ` Mathieu Desnoyers
2007-02-26 21:27                 ` Daniel Walker
2007-02-26 22:14                   ` Mathieu Desnoyers
2007-02-26 23:12                     ` Daniel Walker
2007-02-27  3:54                       ` Mathieu Desnoyers
2007-02-27  4:22                         ` Daniel Walker
2007-02-27  4:47                           ` Mathieu Desnoyers
2007-02-27  6:29                           ` Ingo Molnar
2007-02-27  7:38                             ` Mathieu Desnoyers
2007-02-27  8:48                               ` Thomas Gleixner
2007-02-27 10:18                               ` Daniel Walker
2007-02-27 16:02                                 ` Mathieu Desnoyers
2007-02-27 17:24                                   ` Daniel Walker
2007-02-27 19:04                                     ` Mathieu Desnoyers
2007-02-27 19:40                                       ` john stultz
2007-02-27 20:09                                       ` Daniel Walker
2007-02-27  9:59                             ` Daniel Walker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070224161906.GA9497@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=dwalker@mvista.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox