Re: [patch 15/23] high-res timers: core

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@osdl.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
	Jim Gettys <jg@laptop.org>, John Stultz <johnstul@us.ibm.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Arjan van de Ven <arjan@infradead.org>,
	Dave Jones <davej@redhat.com>
Subject: Re: [patch 15/23] high-res timers: core
Date: Sat, 30 Sep 2006 01:43:56 -0700	[thread overview]
Message-ID: <20060930014356.aa22dd48.akpm@osdl.org> (raw)
In-Reply-To: <20060929234440.521325000@cruncher.tec.linutronix.de>

On Fri, 29 Sep 2006 23:58:34 -0000
Thomas Gleixner <tglx@linutronix.de> wrote:

> From: Thomas Gleixner <tglx@linutronix.de>
> 
> add the core bits of high-res timers support.
> 
> the design makes use of the existing hrtimers subsystem which manages a
> per-CPU and per-clock tree of timers, and the clockevents framework, which
> provides a standard API to request programmable clock events from. The
> core code does not have to know about the clock details - it makes use
> of clockevents_set_next_event().
> 
> the code also provides dyntick functionality: it is implemented via a
> per-cpu sched_tick hrtimer that is set to HZ frequency, but which is
> reprogrammed to a longer timeout before going idle, and reprogrammed to
> HZ again once the CPU goes busy again. (If an non-timer IRQ hits the
> idle task then it will process jiffies before calling the IRQ code.)
> 
> the impact to non-high-res architectures is intended to be minimal.
> 
> ...
>  
> @@ -108,17 +134,53 @@ struct hrtimer_cpu_base {
>  	spinlock_t			lock;
>  	struct lock_class_key		lock_key;
>  	struct hrtimer_clock_base	clock_base[HRTIMER_MAX_CLOCK_BASES];
> +#ifdef CONFIG_HIGH_RES_TIMERS
> +	ktime_t				expires_next;
> +	int				hres_active;
> +	unsigned long			check_clocks;
> +	struct list_head		cb_pending;
> +	struct hrtimer			sched_timer;
> +	struct pt_regs			*sched_regs;
> +	unsigned long			events;
> +#endif

You forgot to update the kerneldoc for this struct.

Does `events' needs to be long?

<looks>

oh, it's a scalar this time ;)

> +#ifdef CONFIG_HIGH_RES_TIMERS
> +
> +extern void hrtimer_clock_notify(void);
> +extern void clock_was_set(void);
> +extern void hrtimer_interrupt(struct pt_regs *regs);
> +
> +# define hrtimer_cb_get_time(t)	(t)->base->get_time()
> +# define hrtimer_hres_active	(__get_cpu_var(hrtimer_bases).hres_active)

These two could be inline functions?

That might cause include file ordering problems I guess.

> +/*
> + * The resolution of the clocks. The resolution value is returned in
> + * the clock_getres() system call to give application programmers an
> + * idea of the (in)accuracy of timers. Timer values are rounded up to
> + * this resolution values.
> + */
> +# define KTIME_HIGH_RES		(ktime_t) { .tv64 = CONFIG_HIGH_RES_RESOLUTION }
> +# define KTIME_MONOTONIC_RES	KTIME_HIGH_RES
> +
> +#else
> +
> +# define KTIME_MONOTONIC_RES	KTIME_LOW_RES
> +
>  /*
>   * clock_was_set() is a NOP for non- high-resolution systems. The
>   * time-sorted order guarantees that a timer does not expire early and
>   * is expired in the next softirq when the clock was advanced.
>   */
> -#define clock_was_set()		do { } while (0)
> -#define hrtimer_clock_notify()	do { } while (0)
> -extern ktime_t ktime_get(void);
> -extern ktime_t ktime_get_real(void);
> +# define clock_was_set()		do { } while (0)
> +# define hrtimer_clock_notify()		do { } while (0)

these could be inlines.

> +# define hrtimer_cb_get_time(t)		(t)->base->softirq_time

Does this need parenthesisation?  Probably it's OK..  An inline function
would be nicer.

> +# define hrtimer_hres_active		0

Perhaps this would be better if it was presented as a function.

> +/* High resolution timer related functions */
> +#ifdef CONFIG_HIGH_RES_TIMERS
> +
> +static ktime_t last_jiffies_update;

What's this do?

> +/*
> + * Reprogramm the event source with checking both queues for the

"Reprogramme" ;)

> + * next event
> + * Called with interrupts disabled and base->lock held
> + */
> +static void hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base)
> +{
> +	int i;
> +	struct hrtimer_clock_base *base = cpu_base->clock_base;
> +	ktime_t expires;
> +
> +	cpu_base->expires_next.tv64 = KTIME_MAX;
> +
> +	for (i = HRTIMER_MAX_CLOCK_BASES; i ; i--, base++) {

Downcounting loops hurt my brain.  Does it actually generate better code?

> +		struct hrtimer *timer;
> +
> +		if (!base->first)
> +			continue;
> +		timer = rb_entry(base->first, struct hrtimer, node);
> +		expires = ktime_sub(timer->expires, base->offset);
> +		if (expires.tv64 < cpu_base->expires_next.tv64)
> +			cpu_base->expires_next = expires;
> +	}
> +
> +	if (cpu_base->expires_next.tv64 != KTIME_MAX)
> +		clockevents_set_next_event(cpu_base->expires_next, 1);
> +}
> +
> +/*
> + * Shared reprogramming for clock_realtime and clock_monotonic
> + *
> + * When a new expires first timer is enqueued, we have

That sentence might need work.

> +/*
> + * Retrigger next event is called after clock was set
> + */
> +static void retrigger_next_event(void *arg)
> +{
> +	struct hrtimer_cpu_base *base;
> +	struct timespec realtime_offset;
> +	unsigned long flags, seq;
> +
> +	do {
> +		seq = read_seqbegin(&xtime_lock);
> +		set_normalized_timespec(&realtime_offset,
> +					-wall_to_monotonic.tv_sec,
> +					-wall_to_monotonic.tv_nsec);
> +	} while (read_seqretry(&xtime_lock, seq));
> +
> +	base = &per_cpu(hrtimer_bases, smp_processor_id());
> +
> +	/* Adjust CLOCK_REALTIME offset */
> +	spin_lock_irqsave(&base->lock, flags);
> +	base->clock_base[CLOCK_REALTIME].offset =
> +		timespec_to_ktime(realtime_offset);
> +
> +	hrtimer_force_reprogram(base);
> +	spin_unlock_irqrestore(&base->lock, flags);
> +}
> +
> +/*
> + * Clock realtime was set
> + *
> + * Change the offset of the realtime clock vs. the monotonic
> + * clock.
> + *
> + * We might have to reprogram the high resolution timer interrupt. On
> + * SMP we call the architecture specific code to retrigger _all_ high
> + * resolution timer interrupts. On UP we just disable interrupts and
> + * call the high resolution interrupt code.
> + */
> +void clock_was_set(void)
> +{
> +	preempt_disable();
> +	if (hrtimer_hres_active) {
> +		retrigger_next_event(NULL);
> +
> +		if (smp_call_function(retrigger_next_event, NULL, 1, 1))
> +			BUG();
> +	}
> +	preempt_enable();
> +}

If you use on_each_cpu() here you know that retrigger_next_event() will be
called under local_irq_disable().  The preempt_disable() goes away and the
spin_lock_irqsave() in retrigger_next_event() becomes a spin_lock() and
everything becomes simpler.

> +/**
> + * hrtimer_clock_notify - A clock source or a clock event has been installed
> + *
> + * Notify the per cpu softirqs to recheck the clock sources and events
> + */
> +void hrtimer_clock_notify(void)
> +{
> +	int i;
> +
> +	for (i = 0; i < NR_CPUS; i++)
> +		set_bit(0, &per_cpu(hrtimer_bases, i).check_clocks);
> +}

This will go splat if/when the arch chooses to not implement per-cpu
storage for not-possible CPUs.  Use for_each_possible_cpu().

> +
> +static const ktime_t nsec_per_hz = { .tv64 = NSEC_PER_SEC / HZ };
> +

This could use the same trick as KTIME_HIGH_RES and friends.  But perhaps
the compiler will generate the same code..

> +/*
> + * We switched off the global tick source when switching to high resolution
> + * mode. Update jiffies64.
> + *
> + * Must be called with interrupts disabled !
> + *
> + * FIXME: We need a mechanism to assign the update to a CPU. In principle this
> + * is not hard, but when dynamic ticks come into play it starts to be. We don't
> + * want to wake up a complete idle cpu just to update jiffies, so we need
> + * something more intellegent than a mere "do this only on CPUx".
> + */
> +static void update_jiffies64(ktime_t now)
> +{
> +	ktime_t delta;
> +
> +	write_seqlock(&xtime_lock);
> +
> +	delta = ktime_sub(now, last_jiffies_update);
> +	if (delta.tv64 >= nsec_per_hz.tv64) {
> +

stray blank line.

> +		unsigned long orun = 1;

"orun"?

> +
> +		delta = ktime_sub(delta, nsec_per_hz);
> +		last_jiffies_update = ktime_add(last_jiffies_update,
> +						nsec_per_hz);
> +
> +		/* Slow path for long timeouts */
> +		if (unlikely(delta.tv64 >= nsec_per_hz.tv64)) {
> +			s64 incr = ktime_to_ns(nsec_per_hz);
> +			orun = ktime_divns(delta, incr);
> +
> +			last_jiffies_update = ktime_add_ns(last_jiffies_update,
> +							   incr * orun);
> +			jiffies_64 += orun;
> +			orun++;
> +		}

That's a bit of a hack isn't it?  do_timer() owns the modification of
jiffies_64, so why is this code modifying it as well?

> +		do_timer(orun);

twice?

I suspect a bug.

> +	}
> +	write_sequnlock(&xtime_lock);
> +}
> +
> +/*
> + * We rearm the timer until we get disabled by the idle code
> + */
> +static int hrtimer_sched_tick(struct hrtimer *timer)
> +{
> +	unsigned long flags;
> +	struct hrtimer_cpu_base *cpu_base =
> +		container_of(timer, struct hrtimer_cpu_base, sched_timer);
> +
> +	local_irq_save(flags);
> +	/*
> +	 * Do not call, when we are not in irq context and have
> +	 * no valid regs pointer
> +	 */
> +	if (cpu_base->sched_regs) {
> +		update_process_times(user_mode(cpu_base->sched_regs));
> +		profile_tick(CPU_PROFILING, cpu_base->sched_regs);
> +	}
> +
> +	hrtimer_forward(timer, hrtimer_cb_get_time(timer), nsec_per_hz);
> +	local_irq_restore(flags);
> +
> +	return HRTIMER_RESTART;

bah.  hrtimer_restart is an `enum hrtimer_restart', not an integer.

> +	printk(KERN_INFO "hrtimers: Switched to high resolution mode CPU %d\n",
> +	       smp_processor_id());

"on CPU"

> +
> +static inline int hrtimer_enqueue_reprogram(struct hrtimer *timer,
> +					    struct hrtimer_clock_base *base)
> +{
> +	/*
> +	 * When High resolution timers are active try to reprogram. Note, that
> +	 * in case the state has HRTIMER_CALLBACK set, no reprogramming and no
> +	 * expiry check happens. The timer gets enqueued into the rbtree and
> +	 * the reprogramming / expiry check is done in the hrtimer_interrupt or
> +	 * in the softirq.
> +	 */

This (useful) comment should be above the function, not inside it.

> +	if (hrtimer_hres_active && hrtimer_reprogram(timer, base)) {
> +
> +		/* Timer is expired, act upon the callback mode */
> +		switch(timer->cb_mode) {
> +		case HRTIMER_CB_IRQSAFE_NO_RESTART:
> +			/*
> +			 * We can call the callback from here. No restart
> +			 * happens, so no danger of recursion
> +			 */
> +			BUG_ON(timer->function(timer) != HRTIMER_NORESTART);

Doing assert(thing-which-has-side-effects) is poor form.

I doubt if the kernel will work if someone goes and disables BUG_ON, but
it's a laudable objective.


> +			return 1;
> +		case HRTIMER_CB_IRQSAFE_NO_SOFTIRQ:
> +			/*
> +			 * This is solely for the sched tick emulation with
> +			 * dynamic tick support to ensure that we do not
> +			 * restart the tick right on the edge and end up with
> +			 * the tick timer in the softirq ! The calling site
> +			 * takes care of this.
> +			 */
> +			return 1;
> +		case HRTIMER_CB_IRQSAFE:
> +		case HRTIMER_CB_SOFTIRQ:
> +			/*
> +			 * Move everything else into the softirq pending list !
> +			 */
> +			hrtimer_add_cb_pending(timer, base);
> +			raise_softirq(HRTIMER_SOFTIRQ);
> +			return 1;
> +		default:
> +			BUG();
> +		}
> +	}
> +	return 0;
> +}
> +
> +static inline void hrtimer_resume_jiffie_update(void)

hrtimer_resume_jiffy_update

> +{
> +	unsigned long flags;
> +	ktime_t now = ktime_get();
> +
> +	write_seqlock_irqsave(&xtime_lock, flags);
> +	last_jiffies_update = now;
> +	write_sequnlock_irqrestore(&xtime_lock, flags);
> +}
> +
> +#else
> +
> +# define hrtimer_hres_active		0
> +# define hrtimer_check_clocks()		do { } while (0)
> +# define hrtimer_enqueue_reprogram(t,b)	0
> +# define hrtimer_force_reprogram(b)	do { } while (0)
> +# define hrtimer_cb_pending(t)		0
> +# define hrtimer_remove_cb_pending(t)	do { } while (0)
> +# define hrtimer_init_hres(c)		do { } while (0)
> +# define hrtimer_init_timer_hres(t)	do { } while (0)
> +# define hrtimer_resume_jiffie_update()	do { } while (0)
> +
> +#endif /* CONFIG_HIGH_RES_TIMERS */
> +
>  /*
>   * Timekeeping resumed notification

resume

> +#ifdef CONFIG_HIGH_RES_TIMERS
> +
> +/*
> + * High resolution timer interrupt
> + * Called with interrupts disabled
> + */
> +void hrtimer_interrupt(struct pt_regs *regs)
> +{
> +	struct hrtimer_clock_base *base;
> +	ktime_t expires_next, now;
> +	int i, raise = 0, cpu = smp_processor_id();
> +	struct hrtimer_cpu_base *cpu_base = &per_cpu(hrtimer_bases, cpu);
> +
> +	BUG_ON(!cpu_base->hres_active);
> +
> +	/* Store the regs for an possible sched_timer callback */
> +	cpu_base->sched_regs = regs;
> +	cpu_base->events++;
> +
> + retry:
> +	now = ktime_get();
> +
> +	/* Check, if the jiffies need an update */
> +	update_jiffies64(now);
> +
> +	expires_next.tv64 = KTIME_MAX;
> +
> +	base = cpu_base->clock_base;
> +
> +	for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
> +		ktime_t basenow;
> +		struct rb_node *node;
> +
> +		spin_lock(&cpu_base->lock);
> +
> +		basenow = ktime_add(now, base->offset);

Would it be better to take the lock outside the loop, rather than hammering
on it like this?


> +		while ((node = base->first)) {
> +			struct hrtimer *timer;
> +
> +			timer = rb_entry(node, struct hrtimer, node);
> +
> +			if (basenow.tv64 < timer->expires.tv64) {
> +				ktime_t expires;
> +
> +				expires = ktime_sub(timer->expires,
> +						    base->offset);
> +				if (expires.tv64 < expires_next.tv64)
> +					expires_next = expires;
> +				break;
> +			}
> +
> +			/* Move softirq callbacks to the pending list */
> +			if (timer->cb_mode == HRTIMER_CB_SOFTIRQ) {
> +				__remove_hrtimer(timer, base, HRTIMER_PENDING, 0);
> +				hrtimer_add_cb_pending(timer, base);
> +				raise = 1;
> +				continue;
> +			}
> +
> +			__remove_hrtimer(timer, base, HRTIMER_CALLBACK, 0);
> +
> +			if (timer->function(timer) != HRTIMER_NORESTART) {
> +				BUG_ON(timer->state != HRTIMER_CALLBACK);
> +				/*
> +				 * state == HRTIMER_CALLBACK prevents
> +				 * reprogramming. We do this when we break out
> +				 * of the loop !
> +				 */
> +				enqueue_hrtimer(timer, base);
> +			}
> +			timer->state &= ~HRTIMER_CALLBACK;
> +		}
> +		spin_unlock(&cpu_base->lock);
> +		base++;
> +	}
> +
> +	cpu_base->expires_next = expires_next;
> +
> +	/* Reprogramming necessary ? */
> +	if (expires_next.tv64 != KTIME_MAX) {
> +		if (clockevents_set_next_event(expires_next, 0))
> +			goto retry;
> +	}
> +
> +	/* Invalidate regs */
> +	cpu_base->sched_regs = NULL;
> +
> +	/* Raise softirq ? */
> +	if (raise)
> +		raise_softirq(HRTIMER_SOFTIRQ);
> +}
> +
>  
> ...
>
>  static int __sched do_nanosleep(struct hrtimer_sleeper *t, enum hrtimer_mode mode)
> @@ -701,7 +1226,8 @@ static int __sched do_nanosleep(struct h
>  		set_current_state(TASK_INTERRUPTIBLE);
>  		hrtimer_start(&t->timer, t->timer.expires, mode);
>  
> -		schedule();
> +		if (likely(t->task))
> +			schedule();

Why?  Needs a comment.

> @@ -0,0 +1,22 @@
> +#
> +# Timer subsystem related configuration options
> +#
> +config HIGH_RES_TIMERS
> +	bool "High Resolution Timer Support"
> +	depends on GENERIC_TIME
> +	help
> +	  This option enables high resolution timer support. If your
> +	  hardware is not capable then this option only increases
> +	  the size of the kernel image.
> +
> +config HIGH_RES_RESOLUTION
> +	int "High Resolution Timer resolution (nanoseconds)"
> +	depends on HIGH_RES_TIMERS
> +	default 1000
> +	help
> +	  This sets the resolution in nanoseconds of the high resolution
> +	  timers. Too fine a resolution (small a number) will usually
> +	  not be observable due to normal system latencies.  For an
> +          800 MHz processor about 10,000 (10 microseconds) is recommended as a
> +	  finest resolution.  If you don't need that sort of resolution,
> +	  larger values may generate less overhead.

In that case the default is far too low.

What value are you suggesting that users and vendors set it to?

next prev parent reply	other threads:[~2006-09-30  8:44 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-29 23:58 [patch 00/23] Thomas Gleixner
2006-09-29 23:58 ` [patch 01/23] GTOD: exponential update_wall_time Thomas Gleixner
2006-09-29 23:58 ` [patch 02/23] GTOD: persistent clock support, core Thomas Gleixner
2006-09-30  8:35   ` Andrew Morton
2006-09-30 17:15     ` Jan Engelhardt
2006-10-02 21:49     ` john stultz
2006-09-29 23:58 ` [patch 03/23] GTOD: persistent clock support, i386 Thomas Gleixner
2006-09-30  8:36   ` Andrew Morton
2006-10-02 22:03     ` john stultz
2006-10-02 22:44       ` Andrew Morton
2006-10-02 23:09         ` john stultz
2006-10-03 23:30         ` Thomas Gleixner
2006-09-29 23:58 ` [patch 04/23] time: uninline jiffies.h Thomas Gleixner
2006-09-29 23:58 ` [patch 05/23] time: fix msecs_to_jiffies() bug Thomas Gleixner
2006-09-29 23:58 ` [patch 06/23] time: fix timeout overflow Thomas Gleixner
2006-09-29 23:58 ` [patch 07/23] cleanup: uninline irq_enter() and move it into a function Thomas Gleixner
2006-09-30  8:36   ` Andrew Morton
2006-09-29 23:58 ` [patch 08/23] dynticks: prepare the RCU code Thomas Gleixner
2006-09-30  8:36   ` Andrew Morton
2006-09-30 12:25     ` Dipankar Sarma
2006-09-30 13:09       ` Ingo Molnar
2006-09-30 13:52         ` Dipankar Sarma
2006-09-30 21:35           ` Ingo Molnar
2006-09-29 23:58 ` [patch 09/23] dynticks: extend next_timer_interrupt() to use a reference jiffie Thomas Gleixner
2006-09-30  8:37   ` Andrew Morton
2006-09-29 23:58 ` [patch 10/23] hrtimers: clean up locking Thomas Gleixner
2006-09-30  8:37   ` Andrew Morton
2006-09-29 23:58 ` [patch 11/23] hrtimers: state tracking Thomas Gleixner
2006-09-30  8:37   ` Andrew Morton
2006-09-29 23:58 ` [patch 12/23] hrtimers: clean up callback tracking Thomas Gleixner
2006-09-29 23:58 ` [patch 13/23] clockevents: core Thomas Gleixner
2006-09-30  8:39   ` Andrew Morton
2006-10-03  4:33     ` John Kacur
2006-09-29 23:58 ` [patch 14/23] clockevents: drivers for i386 Thomas Gleixner
2006-09-30  8:40   ` Andrew Morton
2006-09-29 23:58 ` [patch 15/23] high-res timers: core Thomas Gleixner
2006-09-30  8:43   ` Andrew Morton [this message]
2006-09-29 23:58 ` [patch 16/23] dynticks: core Thomas Gleixner
2006-09-30  8:44   ` Andrew Morton
2006-09-30 12:11     ` Dipankar Sarma
2006-09-29 23:58 ` [patch 17/23] dyntick: add nohz stats to /proc/stat Thomas Gleixner
2006-09-29 23:58 ` [patch 18/23] dynticks: i386 arch code Thomas Gleixner
2006-09-30  8:45   ` Andrew Morton
2006-09-29 23:58 ` [patch 19/23] high-res timers, dynticks: enable i386 support Thomas Gleixner
2006-09-29 23:58 ` [patch 20/23] add /proc/sys/kernel/timeout_granularity Thomas Gleixner
2006-09-30  8:45   ` Andrew Morton
2006-09-29 23:58 ` [patch 21/23] debugging feature: timer stats Thomas Gleixner
2006-09-30  8:46   ` Andrew Morton
2006-09-29 23:58 ` [patch 22/23] dynticks: increase SLAB timeouts Thomas Gleixner
2006-09-30  8:49   ` Andrew Morton
2006-09-29 23:58 ` [patch 23/23] dynticks: decrease I8042_POLL_PERIOD Thomas Gleixner
2006-09-30  8:49   ` Andrew Morton
2006-09-30  8:35 ` [patch 00/23] Andrew Morton
2006-09-30 19:17   ` Thomas Gleixner
2006-09-30  8:35 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060930014356.aa22dd48.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=arjan@infradead.org \
    --cc=davej@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=jg@laptop.org \
    --cc=johnstul@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox