[RFC] Fast assurate clock readable from user space and NMI handler

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
To: mbligh@google.com, Daniel Walker <dwalker@mvista.com>
Cc: linux-kernel@vger.kernel.org
Subject: [RFC] Fast assurate clock readable from user space and NMI handler
Date: Sat, 24 Feb 2007 11:19:06 -0500	[thread overview]
Message-ID: <20070224161906.GA9497@Krystal> (raw)
In-Reply-To: <1164585589.16871.52.camel@localhost.localdomain>

Hi,

I am trying to improve the Linux kernel time source so it can be read
without seqlock from NMI handlers. I have also seen some interest for
such an accurate monotonic clock readable from user space. It mainly
implies an atomic update of the time value. I am also trying to figure a
way to support architectures with multiple CPUs with non-synchronized
TSCs.

I would like to have your comments on the following idea.

Thanks in advance,

Mathieu

Monotonic accurate time

The goal of this design is to provide a monotonic time :

Readable from userspace without a system call
Readable from NMI handler
Readable without disabling interrupts
Readable without disabling preemption
Only one clock source (most precise available : tsc)
Support architectures with variable TSC frequency.

Main difference with wall time currently implemented in the Linux kernel : the
time update is done atomically instead of using a write seqlock. It permits
reading time from NMI handler and from userspace.

struct time_info {
	u64 tsc;
	u64 freq;
	u64 walltime;
}

static struct time_struct {
	struct time_info time_sel[2];
	long update_count;
}

DECLARE_PERCPU(struct time_struct, cpu_time);

/* Number of times the scheduler is called on each CPU */
DECLARE_PERCPU(unsigned long, sched_nr);

/* On frequency change event */
/* In irq context */
void freq_change_cb(unsigned int new_freq)
{
	struct time_struct this_cpu_time = 
		per_cpu(cpu_time, smp_processor_id());
	struct time_info *write_time, *current_time;
	write_time =
		this_cpu_time->time_sel[(this_cpu_time->update_count+1)&1];
	current_time =
		this_cpu_time->time_sel[(this_cpu_time->update_count)&1];
	write_time->tsc = get_cycles();
	write_time->freq = new_freq;
	/* We cumulate the division imprecision. This is the downside of using
	 * the TSC with variable frequency as a time base. */
	write_time->walltime = 
		current_time->walltime + 
			(write_time->tsc - current_time->tsc) /
			current_time->freq;
	wmb();
	this_cpu_time->update_count++;
}

/* Init cpu freq */
init_cpu_freq()
{
	struct time_struct this_cpu_time = 
		per_cpu(cpu_time, smp_processor_id());
	struct time_info *current_time;
	memset(this_cpu_time, 0, sizeof(this_cpu_time));
	current_time = this_cpu_time->time_sel[this_cpu_time->update_count&1];
	/* Init current time */
	/* Get frequency */
	/* Reset cpus to 0 ns, 0 tsc, start their tsc. */
}

/* After a CPU comes back from hlt */
/* The trick is to sync all the other CPUs on the first CPU up when they come
 * up. If all CPUs are down, then there is no need to increment the walltime :
 * let's simply define the useful walltime on a machine as the time elapsed
 * while there is a CPU running. If we want, when no cpu is active, we can use
 * a lower resolution clock to somehow keep track of walltime. */

wake_from_hlt()
{
	/* TODO */
}

/* Read time from anywhere in the kernel. Return time in walltime. (ns) */
/* If the update_count changes while we read the context, it may be invalid.
 * This would happen if we are scheduled out for a period of time long enough to
 * permit 2 frequency changes. We simply start the loop again if it happens.
 * We detect it by comparing the update_count running counter.
 * We detect preemption by incrementing a counter sched_nr within schedule(). 
 * This counter is readable by user space through the vsyscall page. */
 */
u64 read_time(void)
{
	u64 walltime;
	long update_count;
	struct time_struct this_cpu_time;
	struct time_info *current_time;
	unsigned int cpu;
	long prev_sched_nr;
	do {
		cpu = _smp_processor_id();
		prev_sched_nr = per_cpu(sched_nr, cpu);
		if(cpu != _smp_processor_id())
			continue;	/* changed CPU between CPUID and getting
					   sched_nr */
		this_cpu_time = per_cpu(cpu_time, cpu);
		update_count = this_cpu_time->update_count;
		current_time = this_cpu_time->time_sel[update_count&1];
		walltime = current_time->walltime + 
				(get_cycles() - current_time->tsc) /
				current_time->freq;
		if(per_cpu(sched_nr, cpu) != prev_sched_nr)
			continue;	/* been preempted */
	} while(this_cpu_time->update_count != update_count);
	return walltime;
}

/* Userspace */
/* Export all this data to user space through the vsyscall page. Use a function
 * like read_time to read the walltime. This function can be implemented as-is
 * because it doesn't need to disable preemption. */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Candidate, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

next prev parent reply	other threads:[~2007-02-24 16:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-24 21:59 [PATCH 8/16] LTTng 0.6.36 for 2.6.18 : Timestamp Mathieu Desnoyers
     [not found] ` <1164475747.5196.5.camel@localhost.localdomain>
     [not found]   ` <20061126170542.GA30771@Krystal>
     [not found]     ` <1164561427.16871.14.camel@localhost.localdomain>
     [not found]       ` <20061126231833.GA22241@Krystal>
     [not found]         ` <1164585589.16871.52.camel@localhost.localdomain>
2007-02-24 16:19           ` Mathieu Desnoyers [this message]
2007-02-24 18:06             ` [RFC] Fast assurate clock readable from user space and NMI handler Daniel Walker
2007-02-26 20:53               ` Mathieu Desnoyers
2007-02-26 21:27                 ` Daniel Walker
2007-02-26 22:14                   ` Mathieu Desnoyers
2007-02-26 23:12                     ` Daniel Walker
2007-02-27  3:54                       ` Mathieu Desnoyers
2007-02-27  4:22                         ` Daniel Walker
2007-02-27  4:47                           ` Mathieu Desnoyers
2007-02-27  6:29                           ` Ingo Molnar
2007-02-27  7:38                             ` Mathieu Desnoyers
2007-02-27  8:48                               ` Thomas Gleixner
2007-02-27 10:18                               ` Daniel Walker
2007-02-27 16:02                                 ` Mathieu Desnoyers
2007-02-27 17:24                                   ` Daniel Walker
2007-02-27 19:04                                     ` Mathieu Desnoyers
2007-02-27 19:40                                       ` john stultz
2007-02-27 20:09                                       ` Daniel Walker
2007-02-27  9:59                             ` Daniel Walker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070224161906.GA9497@Krystal \
    --to=mathieu.desnoyers@polymtl.ca \
    --cc=dwalker@mvista.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.