linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Updated dynamic tick patches
@ 2005-08-31 16:58 Srivatsa Vaddagiri
  2005-08-31 17:12 ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Srivatsa Vaddagiri
                   ` (3 more replies)
  0 siblings, 4 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-08-31 16:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: arjan, s0348365, kernel, tytso, cfriesen, rlrevell, trenn, george,
	johnstul, akpm

I have cleaned up dynamic tick patch that Con last posted. One issue
that is still to be addressed precisely is recovering lost ticks.

This is supposedly much easier with something like ACPI PM timer, but
I found that the code which calculates lost ticks in timer_pm.c is
not accurate. I have attempted to fix it (in the patch that follows).
With the fix, time has remained stable for ~36 hrs on one machine,
but the same fix does not help in another machine (time speeds up
by couple of seconds after 4-6 hrs). Hence I consider that it needs 
some more rework. Suggestion on accurate lost-tick recovery are wellcome.

Using TSC to recover ticks also is more tricky and hence I have not
enabled TSC support in these patches.

This patch does not address those machines where all CPUs have to
be put to sleep simulaneously (otherwise they dont work well
or something like that), as pointed out by Tony. We could
add support for such machines in another release if they
are common enough to come by.


Following patches related to dynamic tick are posted in separate mails,
for convenience of review. The first patch probably applies w/o dynamic
tick consideration also.

Patch 1/3  -> Fixup lost tick calculation in timer_pm.c
Patch 2/3  -> Dyn-tick cleanups
Patch 3/3  -> Use lost tick information in dyn-tick time recovery 

These patches are against 2.6.13-rc6-mm2.

Con, would be great if you can upload a consolidated new version of
dyn-tick patch on your website!


-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-08-31 16:58 Updated dynamic tick patches Srivatsa Vaddagiri
@ 2005-08-31 17:12 ` Srivatsa Vaddagiri
  2005-08-31 22:36   ` Zachary Amsden
                     ` (2 more replies)
  2005-08-31 17:26 ` [PATCH 2/3] Updated dynamic tick patches - Cleanup Srivatsa Vaddagiri
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-08-31 17:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: arjan, s0348365, kernel, tytso, cfriesen, rlrevell, trenn, george,
	johnstul, akpm

On Wed, Aug 31, 2005 at 10:28:43PM +0530, Srivatsa Vaddagiri wrote:
> Following patches related to dynamic tick are posted in separate mails,
> for convenience of review. The first patch probably applies w/o dynamic
> tick consideration also.
> 
> Patch 1/3  -> Fixup lost tick calculation in timer_pm.c

Currently, lost tick calculation in timer_pm.c is based on number
of microseconds that has elapsed since the last tick. Calculating
the number of microseconds is approximated by cyc2us, which
basically does :

	microsec = (cycles * 286) / 1024

Consider 10 ticks lost. This amounts to 14319*10 = 143190 cycles 
(14319 = PMTMR_EXPECTED_RATE/(CALIBRATE_LATCH/LATCH)).
This amount to 39992 microseconds as per the above equation 
or 39992 / 4000 = 9 lost ticks, which is incorrect.

I feel lost ticks can be based on cycles difference directly
rather than being based on microseconds that has elapsed.

Following patch is in that direction. 

With this patch, time had kept up really well on one particular
machine (Intel 4way Pentium 3 box) overnight, while
on another newer machine (Intel 4way Xeon with HT) it didnt do so
well (time sped up after 3 or 4 hours). Hence I consider this
particular patch will need more review/work.

Patch is against 2.6.13-rc6-mm2.



Fix lost tick calculation in timer_pm.c

---

 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_pm.c |   44 +++++------
 1 files changed, 20 insertions(+), 24 deletions(-)

diff -puN arch/i386/kernel/timers/timer_pm.c~pm_timer_fix arch/i386/kernel/timers/timer_pm.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/timers/timer_pm.c~pm_timer_fix	2005-08-31 16:31:52.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_pm.c	2005-08-31 16:32:51.000000000 +0530
@@ -30,6 +30,8 @@
   ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
 
 
+static int pm_ticks_per_jiffy = PMTMR_EXPECTED_RATE / (CALIBRATE_LATCH/LATCH);
+
 /* The I/O port the PMTMR resides at.
  * The location is detected during setup_arch(),
  * in arch/i386/acpi/boot.c */
@@ -37,8 +39,7 @@ u32 pmtmr_ioport = 0;
 
 
 /* value of the Power timer at last timer interrupt */
-static u32 offset_tick;
-static u32 offset_delay;
+static u32 offset_last;
 
 static unsigned long long monotonic_base;
 static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
@@ -127,6 +128,11 @@ pm_good:
 	if (verify_pmtmr_rate() != 0)
 		return -ENODEV;
 
+	printk ("Using %u PM timer ticks per jiffy \n", pm_ticks_per_jiffy);
+
+	offset_last = read_pmtmr();
+	setup_pit_timer();
+
 	init_cpu_khz();
 	return 0;
 }
@@ -150,47 +156,37 @@ static inline u32 cyc2us(u32 cycles)
  */
 static void mark_offset_pmtmr(void)
 {
-	u32 lost, delta, last_offset;
-	static int first_run = 1;
-	last_offset = offset_tick;
+	u32 lost, delta, deltaus, offset_now;
 
 	write_seqlock(&monotonic_lock);
 
-	offset_tick = read_pmtmr();
+	offset_now = read_pmtmr();
 
 	/* calculate tick interval */
-	delta = (offset_tick - last_offset) & ACPI_PM_MASK;
+	delta = (offset_now - offset_last) & ACPI_PM_MASK;
 
 	/* convert to usecs */
-	delta = cyc2us(delta);
+	deltaus = cyc2us(delta);
 
 	/* update the monotonic base value */
-	monotonic_base += delta * NSEC_PER_USEC;
+	monotonic_base += deltaus * NSEC_PER_USEC;
 	write_sequnlock(&monotonic_lock);
 
 	/* convert to ticks */
-	delta += offset_delay;
-	lost = delta / (USEC_PER_SEC / HZ);
-	offset_delay = delta % (USEC_PER_SEC / HZ);
-
+	lost = delta / pm_ticks_per_jiffy;
+	offset_last += lost * pm_ticks_per_jiffy;
+	offset_last &= ACPI_PM_MASK;
 
 	/* compensate for lost ticks */
 	if (lost >= 2)
 		jiffies_64 += lost - 1;
-
-	/* don't calculate delay for first run,
-	   or if we've got less then a tick */
-	if (first_run || (lost < 1)) {
-		first_run = 0;
-		offset_delay = 0;
-	}
 }
 
 static int pmtmr_resume(void)
 {
 	write_seqlock(&monotonic_lock);
 	/* Assume this is the last mark offset time */
-	offset_tick = read_pmtmr();
+	offset_last = read_pmtmr();
 	write_sequnlock(&monotonic_lock);
 	return 0;
 }
@@ -205,7 +201,7 @@ static unsigned long long monotonic_cloc
 	/* atomically read monotonic base & last_offset */
 	do {
 		seq = read_seqbegin(&monotonic_lock);
-		last_offset = offset_tick;
+		last_offset = offset_last;
 		base = monotonic_base;
 	} while (read_seqretry(&monotonic_lock, seq));
 
@@ -239,11 +235,11 @@ static unsigned long get_offset_pmtmr(vo
 {
 	u32 now, offset, delta = 0;
 
-	offset = offset_tick;
+	offset = offset_last;
 	now = read_pmtmr();
 	delta = (now - offset)&ACPI_PM_MASK;
 
-	return (unsigned long) offset_delay + cyc2us(delta);
+	return (unsigned long) cyc2us(delta);
 }
 
 

_

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 2/3] Updated dynamic tick patches - Cleanup
  2005-08-31 16:58 Updated dynamic tick patches Srivatsa Vaddagiri
  2005-08-31 17:12 ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Srivatsa Vaddagiri
@ 2005-08-31 17:26 ` Srivatsa Vaddagiri
  2005-08-31 17:27 ` [PATCH 3/3] Updated dynamic tick patches - Recover walltime upon wakeup Srivatsa Vaddagiri
  2005-09-01  5:23 ` Updated dynamic tick patches Con Kolivas
  3 siblings, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-08-31 17:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: arjan, s0348365, kernel, tytso, cfriesen, rlrevell, trenn, george,
	johnstul, akpm

On Wed, Aug 31, 2005 at 10:28:43PM +0530, Srivatsa Vaddagiri wrote:
> Following patches related to dynamic tick are posted in separate mails,
> for convenience of review. The first patch probably applies w/o dynamic
> tick consideration also.
> 
> Patch 2/3  -> Dyn-tick cleanups

This patch cleans up dynamic tick further. Notable changes:

	- Remove support for TSC timer. IMO supporting TSC does not
	  make much sense since TSC does not function in some deep
	  ACPI power-save states like C3 (?). Moreover, I think
	  it can drift with PIT, which makes lost tick calculation
	  more complicated.
	- Remove 'use_apic' tunable from sysfs and 'dyntick=noapic' bootup
	  option. Always use APIC timer if available.

Patch applies on top of Con's last posted patch, available here:

http://ck.kolivas.org/patches/dyn-ticks/2.6.13-rc6-dtck-2.patch

(The above patch was applied on top of 2.6.13-rc6-mm2 in my tests. The
time.c reject that was obtained can be ignored)




---

 linux-2.6.13-rc6-mm2-root/arch/i386/Kconfig           |   24 -----
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/apic.c     |   16 +++
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/dyn-tick.c |   52 ++++--------
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/io_apic.c  |    6 +
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/process.c  |    5 -
 linux-2.6.13-rc6-mm2-root/include/asm-i386/apic.h     |    2 
 linux-2.6.13-rc6-mm2-root/include/asm-i386/dyn-tick.h |   41 ---------
 linux-2.6.13-rc6-mm2-root/include/asm-i386/timer.h    |    3 
 linux-2.6.13-rc6-mm2-root/include/linux/dyn-tick.h    |   14 +--
 linux-2.6.13-rc6-mm2-root/kernel/dyn-tick.c           |   76 +++---------------
 10 files changed, 74 insertions(+), 165 deletions(-)

diff -puN arch/i386/kernel/dyn-tick.c~cleanup arch/i386/kernel/dyn-tick.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/dyn-tick.c~cleanup	2005-08-31 16:34:51.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/dyn-tick.c	2005-08-31 16:35:17.000000000 +0530
@@ -24,15 +24,10 @@ static void arch_reprogram_timer(unsigne
 {
 	unsigned int skip = jif_next - jiffies;
 
-	if (cpu_has_local_apic()) {
-		if (dyn_tick->state & DYN_TICK_TIMER_INT)
-			reprogram_apic_timer(skip);
-	} else {
-		if (dyn_tick->state & DYN_TICK_TIMER_INT)
-			reprogram_pit_timer(skip);
-		else
-			disable_pit_timer();
-	}
+	if (cpu_has_local_apic())
+		reprogram_apic_timer(skip);
+	else
+		reprogram_pit_timer(skip);
 
 	/* Fixme: Disable NMI Watchdog */
 }
@@ -40,8 +35,7 @@ static void arch_reprogram_timer(unsigne
 static void arch_all_cpus_idle(int how_long)
 {
 	if (cpu_has_local_apic())
-		if (dyn_tick->state & DYN_TICK_TIMER_INT)
-			disable_pit_timer();
+		disable_pit_timer();
 }
 
 static struct dyn_tick_timer arch_dyn_tick_timer = {
@@ -49,40 +43,37 @@ static struct dyn_tick_timer arch_dyn_ti
 	.arch_all_cpus_idle	= &arch_all_cpus_idle,
 };
 
-static int __init dyn_tick_init(void)
-{
-	arch_dyn_tick_timer.arch_init = dyn_tick_arch_init;
-	dyn_tick_register(&arch_dyn_tick_timer);
-
-	return 0;
-}
-
-arch_initcall(dyn_tick_init);
-
 int __init dyn_tick_arch_init(void)
 {
 
-	if (!(dyn_tick->state & DYN_TICK_USE_APIC) || !cpu_has_local_apic())
-		dyn_tick->max_skip = 0xffff / LATCH;	/* PIT timer length */
+	if (!cpu_has_local_apic())
+		set_dyn_tick_max_skip(0xffff / LATCH);	/* PIT timer length */
+
 	printk(KERN_INFO "dyn-tick: Maximum ticks to skip limited to %i\n",
 	       dyn_tick->max_skip);
 
 	return 0;
 }
 
-/* Functions that need blank prototypes for !CONFIG_NO_IDLE_HZ below here */
-void set_dyn_tick_max_skip(unsigned int max_skip)
+static int __init dyn_tick_init(void)
 {
-	if (!dyn_tick->max_skip || max_skip < dyn_tick->max_skip)
-		dyn_tick->max_skip = max_skip;
+	arch_dyn_tick_timer.arch_init = dyn_tick_arch_init;
+	dyn_tick_register(&arch_dyn_tick_timer);
+
+	return 0;
 }
 
+arch_initcall(dyn_tick_init);
+
+/* Functions that need blank prototypes for !CONFIG_NO_IDLE_HZ below here */
 void setup_dyn_tick_use_apic(unsigned int calibration_result)
 {
 	if (calibration_result)
-		dyn_tick->state |= DYN_TICK_USE_APIC;
-	else
+		dyn_tick->arch_state |= DYN_TICK_APICABLE;
+	else {
+		dyn_tick->arch_state &= ~DYN_TICK_APICABLE;
 		printk(KERN_INFO "dyn-tick: Cannot use local APIC\n");
+	}
 }
 
 void dyn_tick_interrupt(int irq, struct pt_regs *regs)
@@ -103,8 +94,7 @@ void dyn_tick_interrupt(int irq, struct 
 		/* Recover jiffies */
 		cur_timer->mark_offset();
 		if (cpu_has_local_apic())
-			if (dyn_tick->state & DYN_TICK_TIMER_INT)
-				enable_pit_timer();
+			enable_pit_timer();
 	}
 
 	spin_unlock(&dyn_tick_lock);
diff -puN include/linux/dyn-tick.h~cleanup include/linux/dyn-tick.h
--- linux-2.6.13-rc6-mm2/include/linux/dyn-tick.h~cleanup	2005-08-31 16:34:51.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/include/linux/dyn-tick.h	2005-08-31 16:35:17.000000000 +0530
@@ -16,17 +16,14 @@
 #include <linux/interrupt.h>
 #include <asm/timer.h>
 
-#define DYN_TICK_APICABLE	(1 << 5)
-#define DYN_TICK_TIMER_INT	(1 << 4)
-#define DYN_TICK_USE_APIC	(1 << 3)
-#define DYN_TICK_SKIPPING	(1 << 2)
 #define DYN_TICK_ENABLED	(1 << 1)
 #define DYN_TICK_SUITABLE	(1 << 0)
 
 #define DYN_TICK_MIN_SKIP	2
 
 struct dyn_tick_state {
-	unsigned int state;		/* Current state */
+	unsigned short state;		/* Current state */
+	unsigned short arch_state;	/* Arch-specific state */
 	unsigned int max_skip;		/* Max number of ticks to skip */
 };
 
@@ -43,7 +40,8 @@ extern spinlock_t dyn_tick_lock;
 extern void dyn_tick_register(struct dyn_tick_timer *new_timer);
 
 #ifdef CONFIG_NO_IDLE_HZ
-extern unsigned long dyn_tick_reprogram_timer(void);
+extern unsigned int dyn_tick_reprogram_timer(void);
+extern void set_dyn_tick_max_skip(unsigned int max_skip);
 
 static inline int dyn_tick_enabled(void)
 {
@@ -51,12 +49,12 @@ static inline int dyn_tick_enabled(void)
 }
 
 #else	/* CONFIG_NO_IDLE_HZ */
-static inline int arch_has_safe_halt(void)
+static inline unsigned int dyn_tick_reprogram_timer(void)
 {
 	return 0;
 }
 
-static inline unsigned long dyn_tick_reprogram_timer(void)
+static inline void set_dyn_tick_max_skip(unsigned int max_skip)
 {
 }
 
diff -puN kernel/dyn-tick.c~cleanup kernel/dyn-tick.c
--- linux-2.6.13-rc6-mm2/kernel/dyn-tick.c~cleanup	2005-08-31 16:34:51.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/kernel/dyn-tick.c	2005-08-31 16:35:17.000000000 +0530
@@ -23,9 +23,6 @@
 #include <linux/pm.h>
 #include <linux/dyn-tick.h>
 #include <linux/rcupdate.h>
-#include <asm/io.h>
-
-#include "io_ports.h"
 
 #define DYN_TICK_VERSION	"050610-1"
 #define DYN_TICK_IS_SET(x)	((dyn_tick->state & (x)) == (x))
@@ -37,18 +34,16 @@ spinlock_t dyn_tick_lock;
 
 /*
  * Arch independent code needed to reprogram next timer interrupt.
- * Gets called from cpu_idle() before entering idle loop.
+ * Gets called, with IRQs disabled, from cpu_idle() before entering idle loop.
  */
-unsigned long dyn_tick_reprogram_timer(void)
+unsigned int dyn_tick_reprogram_timer(void)
 {
 	int cpu = smp_processor_id();
-	unsigned long delta, flags;
+	unsigned int delta;
 
 	if (!DYN_TICK_IS_SET(DYN_TICK_ENABLED))
 		return 0;
 
-	local_irq_save(flags);
-
 	if (rcu_pending(cpu) || local_softirq_pending())
 		return 0;
 
@@ -78,11 +73,15 @@ unsigned long dyn_tick_reprogram_timer(v
 
 	write_sequnlock(&xtime_lock);
 
-	local_irq_restore(flags);
-
 	return delta;
 }
 
+void set_dyn_tick_max_skip(unsigned int max_skip)
+{
+	if (!dyn_tick->max_skip || max_skip < dyn_tick->max_skip)
+		dyn_tick->max_skip = max_skip;
+}
+
 void __init dyn_tick_register(struct dyn_tick_timer *arch_timer)
 {
 	dyn_tick_cfg = arch_timer;
@@ -96,10 +95,9 @@ void __init dyn_tick_register(struct dyn
  * ---------------------------------------------------------------------------
  */
 static int __initdata dyntick_autoenable = 1;
-static int __initdata dyntick_useapic = 1;
 
 /*
- * dyntick=[disable],[noapic]
+ * dyntick=[disable]
  */ 
 static int __init dyntick_setup(char *options)
 {
@@ -109,9 +107,6 @@ static int __init dyntick_setup(char *op
 	if (!strncmp(options, "disable", 6))
 		dyntick_autoenable = 0;
 
-	if (strstr(options, "noapic"))
-		dyntick_useapic = 0;
-
 	return 0;
 }
 
@@ -129,13 +124,9 @@ static ssize_t show_dyn_tick_state(struc
 {
 	return sprintf(buf,
 		       "suitable:\t%i\n"
-		       "enabled:\t%i\n"
-		       "apic suitable:\t%i\n"
-		       "using APIC:\t%i\n",
+		       "enabled:\t%i\n",
 		       DYN_TICK_IS_SET(DYN_TICK_SUITABLE),
-		       DYN_TICK_IS_SET(DYN_TICK_ENABLED),
-		       DYN_TICK_IS_SET(DYN_TICK_APICABLE),
-		       DYN_TICK_IS_SET(DYN_TICK_USE_APIC));
+		       DYN_TICK_IS_SET(DYN_TICK_ENABLED));
 }
 
 static ssize_t show_dyn_tick_enable(struct sys_device *dev, char *buf)
@@ -165,35 +156,9 @@ static ssize_t set_dyn_tick_enable(struc
 	return count;
 }
 
-static ssize_t show_dyn_tick_useapic(struct sys_device *dev, char *buf)
-{
-	return sprintf(buf, "using APIC:\t%i\n",
-		       DYN_TICK_IS_SET(DYN_TICK_USE_APIC));
-}
-
-static ssize_t set_dyn_tick_useapic(struct sys_device *dev, const char *buf,
-				    size_t count)
-{
-	unsigned long flags;
-	unsigned int enable = simple_strtoul(buf, NULL, 2);
-
-	if (!DYN_TICK_IS_SET(DYN_TICK_APICABLE))
-		goto out;
-	write_seqlock_irqsave(&xtime_lock, flags);
-	if (enable)
-		dyn_tick->state |= DYN_TICK_USE_APIC;
-	else
-		dyn_tick->state &= ~DYN_TICK_USE_APIC;
-	write_sequnlock_irqrestore(&xtime_lock, flags);
-out:
-	return count;
-}
-
 static SYSDEV_ATTR(state, 0444, show_dyn_tick_state, NULL);
 static SYSDEV_ATTR(enable, 0644, show_dyn_tick_enable,
 		   set_dyn_tick_enable);
-static SYSDEV_ATTR(useapic, 0644, show_dyn_tick_useapic,
-		   set_dyn_tick_useapic);
 
 static struct sysdev_class dyn_tick_sysclass = {
 	set_kset_name("dyn_tick"),
@@ -213,9 +178,7 @@ static int init_dyn_tick_sysfs(void)
 		goto out;
 	if ((error = sysdev_create_file(&device_dyn_tick, &attr_state)))
 		goto out;
-	if ((error = sysdev_create_file(&device_dyn_tick, &attr_enable)))
-		goto out;
-	error = sysdev_create_file(&device_dyn_tick, &attr_useapic);
+	error = sysdev_create_file(&device_dyn_tick, &attr_enable);
 
 out:
 	return error;
@@ -229,14 +192,6 @@ device_initcall(init_dyn_tick_sysfs);
  * ---------------------------------------------------------------------------
  */
 
-static int __init dyn_tick_early_init(void)
-{
-	dyn_tick->state |= DYN_TICK_TIMER_INT;
-	return 0;
-}
-
-subsys_initcall(dyn_tick_early_init);
-
 /*
  * We need to initialize dynamic tick after calibrate delay
  */
@@ -250,11 +205,6 @@ static int __init dyn_tick_late_init(voi
 		return -ENODEV;
 	}
 
-	if (DYNTICK_APICABLE)
-		dyn_tick->state |= DYN_TICK_APICABLE;
-	if (!dyntick_useapic || !DYN_TICK_IS_SET(DYN_TICK_APICABLE))
-		dyn_tick->state &= ~DYN_TICK_USE_APIC;
-
 	if ((ret = dyn_tick_cfg->arch_init())) {
 		printk(KERN_ERR "dyn-tick: Init failed\n");
 		return -ENODEV;
diff -puN include/asm-i386/dyn-tick.h~cleanup include/asm-i386/dyn-tick.h
--- linux-2.6.13-rc6-mm2/include/asm-i386/dyn-tick.h~cleanup	2005-08-31 16:34:51.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/include/asm-i386/dyn-tick.h	2005-08-31 16:35:17.000000000 +0530
@@ -16,23 +16,16 @@
 #include <asm/apic.h>
 
 #ifdef CONFIG_NO_IDLE_HZ
-extern int dyn_tick_arch_init(void);
-extern void disable_pit_timer(void);
-extern void enable_pit_timer(void);
-extern void reprogram_pit_timer(int jiffies_to_skip);
-extern void set_dyn_tick_max_skip(unsigned int max_skip);
 extern void setup_dyn_tick_use_apic(unsigned int calibration_result);
 extern void dyn_tick_interrupt(int irq, struct pt_regs *regs);
 extern void dyn_tick_time_init(struct timer_opts *cur_timer);
-extern u32 apic_timer_val;
 
-#if defined(CONFIG_DYN_TICK_USE_APIC)
-#define DYNTICK_APICABLE	1
+#define DYN_TICK_APICABLE	(1 << 0)
 
 #if (defined(CONFIG_SMP) || defined(CONFIG_X86_UP_APIC))
 static inline int cpu_has_local_apic(void)
 {
-	return (dyn_tick->state & DYN_TICK_USE_APIC);
+	return (dyn_tick->arch_state & DYN_TICK_APICABLE);
 }
 
 #else	/* (defined(CONFIG_SMP) || defined(CONFIG_X86_UP_APIC)) */
@@ -42,37 +35,7 @@ static inline int cpu_has_local_apic(voi
 }
 #endif	/* (defined(CONFIG_SMP) || defined(CONFIG_X86_UP_APIC)) */
 
-#else	/*  defined(CONFIG_DYN_TICK_USE_APIC) */
-#define DYNTICK_APICABLE	0
-static inline int cpu_has_local_apic(void)
-{
-	return 0;
-}
-#endif	/*  defined(CONFIG_DYN_TICK_USE_APIC) */
-
-static inline void reprogram_apic_timer(unsigned int count)
-{
-#ifdef CONFIG_X86_LOCAL_APIC
-	unsigned long flags;
-
-	/* Fixme: Make count more accurate. Otherwise can lead
-	 * 	  to latencies of upto 1 jiffy in servicing timers.
-	 */
-	count *= apic_timer_val;
-	local_irq_save(flags);
-	apic_write_around(APIC_TMICT, count);
-	local_irq_restore(flags);
-#endif	/* CONFIG_X86_LOCAL_APIC */
-}
-
 #else /* CONFIG_NO_IDLE_HZ */
-static inline void set_dyn_tick_max_skip(unsigned int max_skip)
-{
-}
-
-static inline void reprogram_apic_timer(unsigned int count)
-{
-}
 
 static inline void setup_dyn_tick_use_apic(unsigned int calibration_result)
 {
diff -puN arch/i386/Kconfig~cleanup arch/i386/Kconfig
--- linux-2.6.13-rc6-mm2/arch/i386/Kconfig~cleanup	2005-08-31 16:34:51.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/Kconfig	2005-08-31 16:35:17.000000000 +0530
@@ -469,34 +469,14 @@ config NO_IDLE_HZ
 	  This option enables support for skipping timer ticks when the
 	  processor is idle. During system load, timer is continuous.
 	  This option saves power, as it allows the system to stay in
-	  idle mode longer. Currently supported timers are ACPI PM
-	  timer, local APIC timer, and TSC timer. HPET timer is currently
-	  not supported.
+	  idle mode longer. Currently supported timer is ACPI PM
+	  timer. TSC and HPET timers are currently not supported.
 
 	  Note that you can disable dynamic tick timer either by
 	  passing dyntick=disable command line option, or via sysfs:
 
 	  # echo 0 > /sys/devices/system/dyn_tick/dyn_tick0/enable
 
-config DYN_TICK_USE_APIC
-	bool "Use APIC timer instead of PIT timer"
-	depends on NO_IDLE_HZ
-	help
-	  This option enables using APIC timer interrupt if your hardware
-	  supports it. APIC timer allows longer sleep periods compared
-	  to PIT timer, however on MOST recent hardware disabling the PIT
-	  timer also disables APIC timer interrupts, and the system won't
-	  run properly. Symptoms include slow system boot, and time running 
-	  slow.
-
-	  If unsure, do NOT enable this option.
-
-	  Note that you can disable apic usage by dynamic tick timer
-	  either by passing dyntick=noapic command line option, or via 
-	  sysfs:
-
-	  # echo 0 > /sys/devices/system/dyn_tick/dyn_tick0/useapic
-
 config SMP
 	bool "Symmetric multi-processing support"
 	---help---
diff -puN include/asm-i386/timer.h~cleanup include/asm-i386/timer.h
--- linux-2.6.13-rc6-mm2/include/asm-i386/timer.h~cleanup	2005-08-31 16:34:51.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/include/asm-i386/timer.h	2005-08-31 16:35:17.000000000 +0530
@@ -38,6 +38,9 @@ struct init_timer_opts {
 extern struct timer_opts* __init select_timer(void);
 extern void clock_fallback(void);
 void setup_pit_timer(void);
+extern void disable_pit_timer(void);
+extern void enable_pit_timer(void);
+extern void reprogram_pit_timer(int jiffies_to_skip);
 
 /* Modifiers for buggy PIT handling */
 
diff -puN arch/i386/kernel/apic.c~cleanup arch/i386/kernel/apic.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/apic.c~cleanup	2005-08-31 16:34:51.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/apic.c	2005-08-31 16:35:17.000000000 +0530
@@ -928,7 +928,7 @@ void (*wait_timer_tick)(void) __devinitd
 
 #define APIC_DIVISOR 16
 
-u32 apic_timer_val;
+static u32 apic_timer_val;
 
 static void __setup_APIC_LVTT(unsigned int clocks)
 {
@@ -969,6 +969,20 @@ static void __devinit setup_APIC_timer(u
 	local_irq_restore(flags);
 }
 
+/* Used by NO_IDLE_HZ to skip ticks on idle CPUs */
+void reprogram_apic_timer(unsigned int count)
+{
+	unsigned long flags;
+
+	/* Fixme: Make count more accurate. Otherwise can lead
+	 * 	  to latencies of upto 1 jiffy in servicing timers.
+	 */
+	count *= apic_timer_val;
+	local_irq_save(flags);
+	apic_write_around(APIC_TMICT, count);
+	local_irq_restore(flags);
+}
+
 /*
  * In this function we calibrate APIC bus clocks to the external
  * timer. Unfortunately we cannot use jiffies and the timer irq
diff -puN include/asm-i386/apic.h~cleanup include/asm-i386/apic.h
--- linux-2.6.13-rc6-mm2/include/asm-i386/apic.h~cleanup	2005-08-31 16:34:52.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/include/asm-i386/apic.h	2005-08-31 16:35:17.000000000 +0530
@@ -121,6 +121,7 @@ extern void nmi_watchdog_tick (struct pt
 extern int APIC_init_uniprocessor (void);
 extern void disable_APIC_timer(void);
 extern void enable_APIC_timer(void);
+extern void reprogram_apic_timer(unsigned int count);
 
 extern void enable_NMI_through_LVT0 (void * dummy);
 
@@ -132,6 +133,7 @@ extern unsigned int nmi_watchdog;
 
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
+static inline void reprogram_apic_timer(unsigned int count) { }
 
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
diff -puN arch/i386/kernel/process.c~cleanup arch/i386/kernel/process.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/process.c~cleanup	2005-08-31 16:34:52.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/process.c	2005-08-31 16:35:17.000000000 +0530
@@ -202,7 +202,10 @@ void cpu_idle(void)
 			if (cpu_is_offline(cpu))
 				play_dead();
 
-			dyn_tick_reprogram_timer();
+			local_irq_disable();
+			if (!need_resched())
+				dyn_tick_reprogram_timer();
+			local_irq_enable();
 
 			__get_cpu_var(irq_stat).idle_timestamp = jiffies;
 			idle();
diff -puN arch/i386/kernel/io_apic.c~cleanup arch/i386/kernel/io_apic.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/io_apic.c~cleanup	2005-08-31 16:34:52.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/io_apic.c	2005-08-31 16:35:17.000000000 +0530
@@ -1148,7 +1148,11 @@ next:
 
 static struct hw_interrupt_type ioapic_level_type;
 static struct hw_interrupt_type ioapic_edge_type;
+#ifdef CONFIG_NO_IDLE_HZ
 static struct hw_interrupt_type ioapic_edge_type_irq0;
+#else
+#define ioapic_edge_type_irq0 ioapic_edge_type
+#endif
 
 #define IOAPIC_AUTO	-1
 #define IOAPIC_EDGE	0
@@ -2020,6 +2024,7 @@ static struct hw_interrupt_type ioapic_l
 #endif
 };
 
+#ifdef CONFIG_NO_IDLE_HZ
 /* Needed to disable PIT interrupts when all CPUs sleep */
 static struct hw_interrupt_type ioapic_edge_type_irq0 = {
 	.typename 	= "IO-APIC-edge-irq0",
@@ -2031,6 +2036,7 @@ static struct hw_interrupt_type ioapic_e
 	.end 		= end_edge_ioapic,
 	.set_affinity 	= set_ioapic_affinity,
 };
+#endif
 
 static inline void init_IO_APIC_traps(void)
 {

_

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 3/3] Updated dynamic tick patches - Recover walltime upon wakeup
  2005-08-31 16:58 Updated dynamic tick patches Srivatsa Vaddagiri
  2005-08-31 17:12 ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Srivatsa Vaddagiri
  2005-08-31 17:26 ` [PATCH 2/3] Updated dynamic tick patches - Cleanup Srivatsa Vaddagiri
@ 2005-08-31 17:27 ` Srivatsa Vaddagiri
  2005-09-01  5:23 ` Updated dynamic tick patches Con Kolivas
  3 siblings, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-08-31 17:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: arjan, s0348365, kernel, tytso, cfriesen, rlrevell, trenn, george,
	johnstul, akpm

On Wed, Aug 31, 2005 at 10:28:43PM +0530, Srivatsa Vaddagiri wrote:
> Following patches related to dynamic tick are posted in separate mails,
> for convenience of review. The first patch probably applies w/o dynamic
> tick consideration also.
> 
> Patch 3/3  -> Use lost tick information in dyn-tick time recovery 

This patch uses the lost tick information returned by mark_offset()
function in dyn-tick, to recover time.


---

 arch/i386/Kconfig                                                 |    0 
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/dyn-tick.c             |   11 ++++++--
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/time.c                 |   13 ++++++----
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_cyclone.c |    4 ++-
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_hpet.c    |    4 ++-
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_none.c    |    3 +-
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_pit.c     |    3 +-
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_pm.c      |    6 +++-
 linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_tsc.c     |   12 ++++++---
 linux-2.6.13-rc6-mm2-root/include/asm-i386/timer.h                |    2 -
 10 files changed, 40 insertions(+), 18 deletions(-)

diff -puN include/asm-i386/timer.h~drift_fix include/asm-i386/timer.h
--- linux-2.6.13-rc6-mm2/include/asm-i386/timer.h~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/include/asm-i386/timer.h	2005-08-31 16:36:30.000000000 +0530
@@ -19,7 +19,7 @@
  */
 struct timer_opts {
 	char* name;
-	void (*mark_offset)(void);
+	int (*mark_offset)(void);
 	unsigned long (*get_offset)(void);
 	unsigned long long (*monotonic_clock)(void);
 	void (*delay)(unsigned long);
diff -puN arch/i386/kernel/time.c~drift_fix arch/i386/kernel/time.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/time.c~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/time.c	2005-08-31 16:36:30.000000000 +0530
@@ -253,7 +253,7 @@ EXPORT_SYMBOL(profile_pc);
  * timer_interrupt() needs to keep up the real-time clock,
  * as well as call the "do_timer()" routine every clocktick
  */
-static inline void do_timer_interrupt(int irq, struct pt_regs *regs)
+static inline void do_timer_interrupt(int irq, struct pt_regs *regs, int lost)
 {
 #ifdef CONFIG_X86_IO_APIC
 	if (timer_ack) {
@@ -271,7 +271,8 @@ static inline void do_timer_interrupt(in
 	}
 #endif
 
-	do_timer_interrupt_hook(regs);
+	if (!dyn_tick_enabled() || lost)
+		do_timer_interrupt_hook(regs);
 
 
 	if (MCA_bus) {
@@ -296,6 +297,8 @@ static inline void do_timer_interrupt(in
  */
 irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
 {
+	int lost;
+
 	/*
 	 * Here we are in the timer irq handler. We just have irqs locally
 	 * disabled but we don't know if the timer_bh is running on the other
@@ -305,9 +308,9 @@ irqreturn_t timer_interrupt(int irq, voi
 	 */
 	write_seqlock(&xtime_lock);
 
-	cur_timer->mark_offset();
- 
-	do_timer_interrupt(irq, regs);
+	lost = cur_timer->mark_offset();
+
+	do_timer_interrupt(irq, regs, lost);
 
 	write_sequnlock(&xtime_lock);
 	return IRQ_HANDLED;
diff -puN arch/i386/kernel/dyn-tick.c~drift_fix arch/i386/kernel/dyn-tick.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/dyn-tick.c~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/dyn-tick.c	2005-08-31 16:36:30.000000000 +0530
@@ -92,7 +92,13 @@ void dyn_tick_interrupt(int irq, struct 
 
 	if (all_were_sleeping) {
 		/* Recover jiffies */
-		cur_timer->mark_offset();
+		if (irq) {
+			int lost;
+
+			lost = cur_timer->mark_offset();
+			if (lost)
+				do_timer(regs);
+		}
 		if (cpu_has_local_apic())
 			enable_pit_timer();
 	}
@@ -116,8 +122,7 @@ void dyn_tick_time_init(struct timer_opt
 {
 	spin_lock_init(&dyn_tick_lock);
 
-	if (strncmp(cur_timer->name, "tsc", 3) == 0 ||
-	    strncmp(cur_timer->name, "pmtmr", 3) == 0) {
+	if (strncmp(cur_timer->name, "pmtmr", 3) == 0) {
 		dyn_tick->state |= DYN_TICK_SUITABLE;
 		printk(KERN_INFO "dyn-tick: Found suitable timer: %s\n",
 		       cur_timer->name);
diff -puN arch/i386/kernel/timers/timer_cyclone.c~drift_fix arch/i386/kernel/timers/timer_cyclone.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/timers/timer_cyclone.c~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_cyclone.c	2005-08-31 16:36:30.000000000 +0530
@@ -45,7 +45,7 @@ static seqlock_t monotonic_lock = SEQLOC
 	} while (high != cyclone_timer[1]);
 
 
-static void mark_offset_cyclone(void)
+static int mark_offset_cyclone(void)
 {
 	unsigned long lost, delay;
 	unsigned long delta = last_cyclone_low;
@@ -101,6 +101,8 @@ static void mark_offset_cyclone(void)
 	 */
 	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
 		jiffies_64++;
+
+	return 1;
 }
 
 static unsigned long get_offset_cyclone(void)
diff -puN arch/i386/kernel/timers/timer_hpet.c~drift_fix arch/i386/kernel/timers/timer_hpet.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/timers/timer_hpet.c~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_hpet.c	2005-08-31 16:36:30.000000000 +0530
@@ -96,7 +96,7 @@ static unsigned long get_offset_hpet(voi
 	return edx;
 }
 
-static void mark_offset_hpet(void)
+static int mark_offset_hpet(void)
 {
 	unsigned long long this_offset, last_offset;
 	unsigned long offset;
@@ -119,6 +119,8 @@ static void mark_offset_hpet(void)
 	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
 	monotonic_base += cycles_2_ns(this_offset - last_offset);
 	write_sequnlock(&monotonic_lock);
+
+	return 1;
 }
 
 static void delay_hpet(unsigned long loops)
diff -puN arch/i386/kernel/timers/timer_none.c~drift_fix arch/i386/kernel/timers/timer_none.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/timers/timer_none.c~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_none.c	2005-08-31 16:36:30.000000000 +0530
@@ -1,9 +1,10 @@
 #include <linux/init.h>
 #include <asm/timer.h>
 
-static void mark_offset_none(void)
+static int mark_offset_none(void)
 {
 	/* nothing needed */
+	return 1;
 }
 
 static unsigned long get_offset_none(void)
diff -puN arch/i386/kernel/timers/timer_pit.c~drift_fix arch/i386/kernel/timers/timer_pit.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/timers/timer_pit.c~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_pit.c	2005-08-31 16:36:30.000000000 +0530
@@ -32,9 +32,10 @@ static int __init init_pit(char* overrid
 	return 0;
 }
 
-static void mark_offset_pit(void)
+static int mark_offset_pit(void)
 {
 	/* nothing needed */
+	return 1;
 }
 
 static unsigned long long monotonic_clock_pit(void)
diff -puN arch/i386/kernel/timers/timer_pm.c~drift_fix arch/i386/kernel/timers/timer_pm.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/timers/timer_pm.c~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_pm.c	2005-08-31 16:36:30.000000000 +0530
@@ -135,7 +135,7 @@ pm_good:
 	setup_pit_timer();
 
 	init_cpu_khz();
-	set_dyn_tick_max_skip( (0xFFFFFF / (286 * 1000000)) * 1024 * HZ );
+	set_dyn_tick_max_skip(((0xFFFFFF / 1000000) * 286 * HZ) >> 10);
 	return 0;
 }
 
@@ -156,7 +156,7 @@ static inline u32 cyc2us(u32 cycles)
  * this gets called during each timer interrupt
  *   - Called while holding the writer xtime_lock
  */
-static void mark_offset_pmtmr(void)
+static int mark_offset_pmtmr(void)
 {
 	u32 lost, delta, deltaus, offset_now;
 
@@ -182,6 +182,8 @@ static void mark_offset_pmtmr(void)
 	/* compensate for lost ticks */
 	if (lost >= 2)
 		jiffies_64 += lost - 1;
+
+	return lost;
 }
 
 static int pmtmr_resume(void)
diff -puN arch/i386/kernel/timers/timer_tsc.c~drift_fix arch/i386/kernel/timers/timer_tsc.c
--- linux-2.6.13-rc6-mm2/arch/i386/kernel/timers/timer_tsc.c~drift_fix	2005-08-31 16:36:17.000000000 +0530
+++ linux-2.6.13-rc6-mm2-root/arch/i386/kernel/timers/timer_tsc.c	2005-08-31 16:36:30.000000000 +0530
@@ -177,7 +177,7 @@ static inline void update_monotonic_base
 }
 
 #ifdef CONFIG_HPET_TIMER
-static void mark_offset_tsc_hpet(void)
+static int mark_offset_tsc_hpet(void)
 {
 	unsigned long long last_offset;
  	unsigned long offset, temp, hpet_current;
@@ -221,6 +221,8 @@ static void mark_offset_tsc_hpet(void)
 	delay_at_last_interrupt = hpet_current - offset;
 	ASM_MUL64_REG(temp, delay_at_last_interrupt,
 			hpet_usec_quotient, delay_at_last_interrupt);
+
+	return 1;
 }
 #endif
 
@@ -347,7 +349,7 @@ int recalibrate_cpu_khz(void)
 }
 EXPORT_SYMBOL(recalibrate_cpu_khz);
 
-static void mark_offset_tsc(void)
+static int mark_offset_tsc(void)
 {
 	unsigned long lost,delay;
 	unsigned long delta = last_tsc_low;
@@ -456,8 +458,12 @@ static void mark_offset_tsc(void)
 	 * between tsc and pit reads (as noted when
 	 * usec delta is > 90% # of usecs/tick)
 	 */
-	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
+	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ)) {
 		jiffies_64++;
+		lost++;
+	}
+
+	return 1;
 }
 
 static int __init init_tsc(char* override)
diff -puN arch/i386/Kconfig~drift_fix arch/i386/Kconfig

_



-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-08-31 17:12 ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Srivatsa Vaddagiri
@ 2005-08-31 22:36   ` Zachary Amsden
  2005-08-31 22:47     ` john stultz
  2005-09-02 15:43   ` [PATCH 1/3] dynticks - implement no idle hz for x86 Con Kolivas
  2005-09-03  4:05   ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Lee Revell
  2 siblings, 1 reply; 96+ messages in thread
From: Zachary Amsden @ 2005-08-31 22:36 UTC (permalink / raw)
  To: vatsa
  Cc: linux-kernel, arjan, s0348365, kernel, tytso, cfriesen, rlrevell,
	trenn, george, johnstul, akpm, Tim Mann

Srivatsa Vaddagiri wrote:

>On Wed, Aug 31, 2005 at 10:28:43PM +0530, Srivatsa Vaddagiri wrote:
>  
>
>>Following patches related to dynamic tick are posted in separate mails,
>>for convenience of review. The first patch probably applies w/o dynamic
>>tick consideration also.
>>
>>Patch 1/3  -> Fixup lost tick calculation in timer_pm.c
>>    
>>
>
>Currently, lost tick calculation in timer_pm.c is based on number
>of microseconds that has elapsed since the last tick. Calculating
>the number of microseconds is approximated by cyc2us, which
>basically does :
>
>	microsec = (cycles * 286) / 1024
>
>Consider 10 ticks lost. This amounts to 14319*10 = 143190 cycles 
>(14319 = PMTMR_EXPECTED_RATE/(CALIBRATE_LATCH/LATCH)).
>This amount to 39992 microseconds as per the above equation 
>or 39992 / 4000 = 9 lost ticks, which is incorrect.
>
>I feel lost ticks can be based on cycles difference directly
>rather than being based on microseconds that has elapsed.
>
>Following patch is in that direction. 
>
>With this patch, time had kept up really well on one particular
>machine (Intel 4way Pentium 3 box) overnight, while
>on another newer machine (Intel 4way Xeon with HT) it didnt do so
>well (time sped up after 3 or 4 hours). Hence I consider this
>particular patch will need more review/work.
>
>  
>

Does this patch help address the issues pointed out here?

http://bugzilla.kernel.org/show_bug.cgi?id=5127

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-08-31 22:36   ` Zachary Amsden
@ 2005-08-31 22:47     ` john stultz
  0 siblings, 0 replies; 96+ messages in thread
From: john stultz @ 2005-08-31 22:47 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: vatsa, linux-kernel, arjan, s0348365, kernel, tytso, cfriesen,
	rlrevell, trenn, george, akpm, Tim Mann

On Wed, 2005-08-31 at 15:36 -0700, Zachary Amsden wrote:
> >I feel lost ticks can be based on cycles difference directly
> >rather than being based on microseconds that has elapsed.
> >
> >Following patch is in that direction. 
> >
> >With this patch, time had kept up really well on one particular
> >machine (Intel 4way Pentium 3 box) overnight, while
> >on another newer machine (Intel 4way Xeon with HT) it didnt do so
> >well (time sped up after 3 or 4 hours). Hence I consider this
> >particular patch will need more review/work.
> >
> >  
> >
> 
> Does this patch help address the issues pointed out here?
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=5127

Unfortunately no. The issue there is that once the lost tick
compensation code has fired, should those "lost" ticks appear later we
end up over-compensating.

This patch however does help to make sure that when the lost tick code
fires, the error from converting to usecs doesn't bite us. And could
probably go into mainline independent of the dynamic ticks patch (with
further testing, of course).

thanks
-john


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Updated dynamic tick patches
  2005-08-31 16:58 Updated dynamic tick patches Srivatsa Vaddagiri
                   ` (2 preceding siblings ...)
  2005-08-31 17:27 ` [PATCH 3/3] Updated dynamic tick patches - Recover walltime upon wakeup Srivatsa Vaddagiri
@ 2005-09-01  5:23 ` Con Kolivas
  2005-09-01 13:07   ` Tony Lindgren
  3 siblings, 1 reply; 96+ messages in thread
From: Con Kolivas @ 2005-09-01  5:23 UTC (permalink / raw)
  To: vatsa
  Cc: linux-kernel, arjan, s0348365, tytso, cfriesen, rlrevell, trenn,
	george, johnstul, akpm

On Thu, 1 Sep 2005 02:58 am, Srivatsa Vaddagiri wrote:
> Following patches related to dynamic tick are posted in separate mails,
> for convenience of review. The first patch probably applies w/o dynamic
> tick consideration also.
>
> Patch 1/3  -> Fixup lost tick calculation in timer_pm.c
> Patch 2/3  -> Dyn-tick cleanups
> Patch 3/3  -> Use lost tick information in dyn-tick time recovery
>
> These patches are against 2.6.13-rc6-mm2.
>
> Con, would be great if you can upload a consolidated new version of
> dyn-tick patch on your website!

Great, thanks. I'll wait till 2.6.13-mm1 is out since that's due shortly and 
I'll resync everything with that and perhaps tweak along the way.

Cheers,
Con

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Updated dynamic tick patches
  2005-09-01  5:23 ` Updated dynamic tick patches Con Kolivas
@ 2005-09-01 13:07   ` Tony Lindgren
  2005-09-01 13:19     ` David Weinehall
                       ` (2 more replies)
  0 siblings, 3 replies; 96+ messages in thread
From: Tony Lindgren @ 2005-09-01 13:07 UTC (permalink / raw)
  To: Con Kolivas
  Cc: vatsa, linux-kernel, arjan, s0348365, tytso, cfriesen, rlrevell,
	trenn, george, johnstul, akpm

[-- Attachment #1: Type: text/plain, Size: 1753 bytes --]

* Con Kolivas <kernel@kolivas.org> [050901 08:22]:
> On Thu, 1 Sep 2005 02:58 am, Srivatsa Vaddagiri wrote:
> > Following patches related to dynamic tick are posted in separate mails,
> > for convenience of review. The first patch probably applies w/o dynamic
> > tick consideration also.
> >
> > Patch 1/3  -> Fixup lost tick calculation in timer_pm.c
> > Patch 2/3  -> Dyn-tick cleanups
> > Patch 3/3  -> Use lost tick information in dyn-tick time recovery
> >
> > These patches are against 2.6.13-rc6-mm2.
> >
> > Con, would be great if you can upload a consolidated new version of
> > dyn-tick patch on your website!
> 
> Great, thanks. I'll wait till 2.6.13-mm1 is out since that's due shortly and 
> I'll resync everything with that and perhaps tweak along the way.

I tried this quickly on a loaner ThinkPad T30, and needed the following
patch to compile. The patch does work with PIT, but with lapic the
system does not wake to timer interrupts :(

I also hacked together a little timer test utility that should go trough
on a completely idle system with no errors. Also posted it to:

http://www.muru.com/linux/dyntick/tools/dyntick-test.c

Srivatsa, could you try the dyntick-test.c on your system after booting
to init=/bin/sh to make the system as idle as possible?

Unfortunately I cannot debug the APIC issue right now, but I seem to
have an issue on ARM OMAP where the timer test occasionally fails on
some longer values, for example 3 second sleep can take 4 seconds.

I don't know yet if this is the problem George Anzinger mentioned with
next_timer_interrupt(), or if this is OMAP specific. But it only seems
to occur with very low idle HZ values. This may be related to the slow
boot time issue I mentioned yesterday.

Regards,

Tony

[-- Attachment #2: patch-fix-up-compile --]
[-- Type: text/plain, Size: 257 bytes --]

--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -2034,7 +2034,9 @@
 	.disable 	= mask_IO_APIC_irq,
 	.ack 		= ack_edge_ioapic,
 	.end 		= end_edge_ioapic,
+#ifdef CONFIG_SMP
 	.set_affinity 	= set_ioapic_affinity,
+#endif
 };
 #endif
 

[-- Attachment #3: dyntick-test.c --]
[-- Type: text/x-csrc, Size: 2927 bytes --]

/*
 * Tests timers to make sure dynamic tick works properly
 */

#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>

#define MAX_SLEEP	(10)		/* seconds */
#define MAX_LATENCY	(100 * 1000)	/* usecs */

int test_sleep(unsigned int msec_len)
{
	sleep(msec_len / 1000);
	return 0;
}

int test_select(unsigned int msec_len)
{
	struct timeval tv_sel;

	tv_sel.tv_sec = msec_len / 1000;
	tv_sel.tv_usec = (msec_len % 1000) * 1000;

	return select(0, NULL, NULL, NULL, &tv_sel);
}

int test_usleep(unsigned int msec_len)
{
	usleep(msec_len * 1000);
}

/* This modified from some GNU exsample _not_ to hose y */
int timeval_subtract(struct timeval *result,
		     const struct timeval *x,
		     const struct timeval *y)
{
	struct timeval tmp;

	tmp.tv_sec = y->tv_sec;
	tmp.tv_usec = y->tv_usec;

	/* Perform the carry for the later subtraction */
	if (x->tv_usec < y->tv_usec) {
		int nsec = (y->tv_usec - x->tv_usec) / 1000000 + 1;
		tmp.tv_usec -= 1000000 * nsec;
		tmp.tv_sec += nsec;
	}
	if (x->tv_usec - y->tv_usec > 1000000) {
		int nsec = (x->tv_usec - y->tv_usec) / 1000000;
		tmp.tv_usec += 1000000 * nsec;
		tmp.tv_sec -= nsec;
	}
     
	/* Compute the time remaining to wait.
	   tv_usec is certainly positive. */
	result->tv_sec = x->tv_sec - tmp.tv_sec;
	result->tv_usec = x->tv_usec - tmp.tv_usec;
     
	/* Return 1 if result is negative. */
	return x->tv_sec < tmp.tv_sec;
}

int do_test(char * name, int (* test)(unsigned int len),
	    unsigned int len, int count)
{
	int i, ret;
	struct timeval tv_in;
	struct timeval tv_beg;
	struct timeval tv_end;
	struct timeval tv_len;
	struct timeval tv_lat;
	struct timezone tz;
	char * status = "OK";
	char * latency_type = "";

	tv_in.tv_sec = len / 1000;
	tv_in.tv_usec = (len % 1000) * 1000;

	gettimeofday(&tv_beg, &tz);
	for (i = 0; i < count; i++) {
		ret = test(len);
	}
	gettimeofday(&tv_end, &tz);

	ret = timeval_subtract(&tv_len, &tv_end, &tv_beg);
	if (ret)
		status = "ERROR";

	ret = timeval_subtract(&tv_lat, &tv_len, &tv_in);
	if (ret) {
		latency_type = "-";
		timeval_subtract(&tv_lat, &tv_in, &tv_len);
	}

	if (tv_lat.tv_sec > 0 || tv_lat.tv_usec > MAX_LATENCY)
		status = "ERROR";

	printf("  Test: %6s %4ums time: %2u.%06us "
	       "latency: %1s%u.%06us status: %s\n",
	       name, 
	       (len * count),
	       (unsigned int)tv_len.tv_sec,
	       (unsigned int)tv_len.tv_usec,
	       latency_type,
	       (unsigned int)tv_lat.tv_sec,
	       (unsigned int)tv_lat.tv_usec,
	       status);

	return ret;
}

int main(void)
{
	unsigned int i;
	int max_secs = MAX_SLEEP;

	printf("Testing sub-second select and usleep\n");
	for (i = 0; i < 1000; i += 100) {
		do_test("select", test_select, i, 1);
		do_test("usleep", test_usleep, i, 1);
	}

	printf("Testing multi-second select and sleep\n");
	for (i = 0; i < max_secs; i++) {
		do_test("select", test_select, i * 1000, 1);
		do_test("sleep", test_sleep, i * 1000, 1);
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Updated dynamic tick patches
  2005-09-01 13:07   ` Tony Lindgren
@ 2005-09-01 13:19     ` David Weinehall
  2005-09-01 13:46       ` Tony Lindgren
  2005-09-01 14:11     ` Srivatsa Vaddagiri
  2005-09-02 17:34     ` Srivatsa Vaddagiri
  2 siblings, 1 reply; 96+ messages in thread
From: David Weinehall @ 2005-09-01 13:19 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Con Kolivas, vatsa, linux-kernel, arjan, s0348365, tytso,
	cfriesen, rlrevell, trenn, george, johnstul, akpm

On Thu, Sep 01, 2005 at 04:07:22PM +0300, Tony Lindgren wrote:
[snip]
> I tried this quickly on a loaner ThinkPad T30, and needed the following
> patch to compile. The patch does work with PIT, but with lapic the
> system does not wake to timer interrupts :(

That may be a thinkpad issue; I have to boot my Thinkpad with nolapic.

[snip]


Regards: David
-- 
 /) David Weinehall <tao@acc.umu.se> /) Northern lights wander      (\
//  Maintainer of the v2.0 kernel   //  Dance across the winter sky //
\)  http://www.acc.umu.se/~tao/    (/   Full colour fire           (/

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Updated dynamic tick patches
  2005-09-01 13:19     ` David Weinehall
@ 2005-09-01 13:46       ` Tony Lindgren
  0 siblings, 0 replies; 96+ messages in thread
From: Tony Lindgren @ 2005-09-01 13:46 UTC (permalink / raw)
  To: Con Kolivas, vatsa, linux-kernel, arjan, s0348365, tytso,
	cfriesen, rlrevell, trenn, george, johnstul, akpm

* David Weinehall <tao@acc.umu.se> [050901 16:19]:
> On Thu, Sep 01, 2005 at 04:07:22PM +0300, Tony Lindgren wrote:
> [snip]
> > I tried this quickly on a loaner ThinkPad T30, and needed the following
> > patch to compile. The patch does work with PIT, but with lapic the
> > system does not wake to timer interrupts :(
> 
> That may be a thinkpad issue; I have to boot my Thinkpad with nolapic.

Yeah, that could be. Or it could be the same old P4 does not wake to
APIC interrupt while P3 does bug :)

Tony

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Updated dynamic tick patches
  2005-09-01 13:07   ` Tony Lindgren
  2005-09-01 13:19     ` David Weinehall
@ 2005-09-01 14:11     ` Srivatsa Vaddagiri
  2005-09-02 17:34     ` Srivatsa Vaddagiri
  2 siblings, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-01 14:11 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Con Kolivas, linux-kernel, arjan, s0348365, tytso, cfriesen,
	rlrevell, trenn, george, johnstul, akpm

On Thu, Sep 01, 2005 at 04:07:22PM +0300, Tony Lindgren wrote:
> I tried this quickly on a loaner ThinkPad T30, and needed the following
> patch to compile. The patch does work with PIT, but with lapic the
> system does not wake to timer interrupts :(

Even I have found that enabling lapic breaks it on my T30! I think
that is a T30 issue, as I dont see any other reason why it should not 
work (note that I have it tested on some other SMP P4 servers where
it works well).

> I also hacked together a little timer test utility that should go trough
> on a completely idle system with no errors. Also posted it to:
> 
> http://www.muru.com/linux/dyntick/tools/dyntick-test.c
> 
> Srivatsa, could you try the dyntick-test.c on your system after booting
> to init=/bin/sh to make the system as idle as possible?

Thanks for this test. Will test and let you know how it goes on x86.
ATM I am trying to corner the lost-tick-calculation problem with ACPI 
PM timer.

> Unfortunately I cannot debug the APIC issue right now, but I seem to
> have an issue on ARM OMAP where the timer test occasionally fails on
> some longer values, for example 3 second sleep can take 4 seconds.
> 
> I don't know yet if this is the problem George Anzinger mentioned with
> next_timer_interrupt(), or if this is OMAP specific. But it only seems

Will let you know if I see it on x86 too.



-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-08-31 17:12 ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Srivatsa Vaddagiri
  2005-08-31 22:36   ` Zachary Amsden
@ 2005-09-02 15:43   ` Con Kolivas
  2005-09-02 15:45     ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Con Kolivas
  2005-09-02 16:56     ` [PATCH 1/3] dynticks - implement no idle hz for x86 Russell King
  2005-09-03  4:05   ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Lee Revell
  2 siblings, 2 replies; 96+ messages in thread
From: Con Kolivas @ 2005-09-02 15:43 UTC (permalink / raw)
  To: vatsa; +Cc: linux-kernel, akpm, ck list

[-- Attachment #1: Type: text/plain, Size: 415 bytes --]

Ok I've resynced all the patches with 2.6.13-mm1, made some cleanups and minor 
modifications. As pm timer is the only supported timer for dynticks I've also 
made it depend on it.

A rollup patch against 2.6.13-mm1 is here:

http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch

also available in the dyn-ticks directory are the older patches and these 
split out patches posted here.

Cheers,
Con
---



[-- Attachment #2: dyntick.patch --]
[-- Type: text/x-diff, Size: 34898 bytes --]

This patch implements the no idle hz feature aka dynamic ticks for
i386 architecture. Work originally by Tony Lindgren <tony@atomide.com>
and Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com>.

Modified for better smp scalability and accurate time by Srivatsa Vaddagiri
<vatsa@in.ibm.com>, with minor cleanups and modifications by Con Kolivas.

Signed-off-By: Con Kolivas <kernel@kolivas.org>

Index: linux-2.6.13-mm1/arch/i386/defconfig
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/defconfig	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/defconfig	2005-09-02 23:57:33.000000000 +1000
@@ -91,6 +91,7 @@ CONFIG_X86_INTEL_USERCOPY=y
 CONFIG_X86_USE_PPRO_CHECKSUM=y
 # CONFIG_HPET_TIMER is not set
 # CONFIG_HPET_EMULATE_RTC is not set
+# CONFIG_NO_IDLE_HZ is not set
 CONFIG_SMP=y
 CONFIG_NR_CPUS=8
 CONFIG_SCHED_SMT=y
Index: linux-2.6.13-mm1/arch/i386/Kconfig
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/Kconfig	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/Kconfig	2005-09-02 23:57:33.000000000 +1000
@@ -462,6 +462,21 @@ config HPET_EMULATE_RTC
 	depends on HPET_TIMER && RTC=y
 	default y
 
+config NO_IDLE_HZ
+	bool "Dynamic Tick Timer - Skip timer ticks during idle"
+	depends on EXPERIMENTAL && X86_PM_TIMER
+	help
+	  This option enables support for skipping timer ticks when the
+	  processor is idle. During system load, timer is continuous.
+	  This option saves power, as it allows the system to stay in
+	  idle mode longer. Currently supported timer is ACPI PM
+	  timer. TSC and HPET timers are currently not supported.
+
+	  Note that you can disable dynamic tick timer either by
+	  passing dyntick=disable command line option, or via sysfs:
+
+	  # echo 0 > /sys/devices/system/dyn_tick/dyn_tick0/enable
+
 config SMP
 	bool "Symmetric multi-processing support"
 	---help---
Index: linux-2.6.13-mm1/arch/i386/kernel/apic.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/apic.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/apic.c	2005-09-02 23:57:33.000000000 +1000
@@ -27,6 +27,7 @@
 #include <linux/kernel_stat.h>
 #include <linux/sysdev.h>
 #include <linux/cpu.h>
+#include <linux/dyn-tick.h>
 
 #include <asm/atomic.h>
 #include <asm/smp.h>
@@ -927,6 +928,8 @@ void (*wait_timer_tick)(void) __devinitd
 
 #define APIC_DIVISOR 16
 
+static u32 apic_timer_val;
+
 static void __setup_APIC_LVTT(unsigned int clocks)
 {
 	unsigned int lvtt_value, tmp_value, ver;
@@ -945,7 +948,9 @@ static void __setup_APIC_LVTT(unsigned i
 				& ~(APIC_TDR_DIV_1 | APIC_TDR_DIV_TMBASE))
 				| APIC_TDR_DIV_16);
 
-	apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR);
+	apic_timer_val = clocks / APIC_DIVISOR;
+
+	apic_write_around(APIC_TMICT, apic_timer_val);
 }
 
 static void __devinit setup_APIC_timer(unsigned int clocks)
@@ -964,6 +969,21 @@ static void __devinit setup_APIC_timer(u
 	local_irq_restore(flags);
 }
 
+/* Used by NO_IDLE_HZ to skip ticks on idle CPUs */
+void reprogram_apic_timer(unsigned int count)
+{
+	unsigned long flags;
+
+	/*
+	 * FIXME: Make count more accurate. Otherwise can lead
+	 * 	  to latencies of upto 1 jiffy in servicing timers.
+	 */
+	count *= apic_timer_val;
+	local_irq_save(flags);
+	apic_write_around(APIC_TMICT, count);
+	local_irq_restore(flags);
+}
+
 /*
  * In this function we calibrate APIC bus clocks to the external
  * timer. Unfortunately we cannot use jiffies and the timer irq
@@ -1058,6 +1078,10 @@ void __init setup_boot_APIC_clock(void)
 	 */
 	setup_APIC_timer(calibration_result);
 
+	setup_dyn_tick_use_apic(calibration_result);
+	set_dyn_tick_max_skip((0xFFFFFFFF / calibration_result) *
+		APIC_DIVISOR);
+
 	local_irq_enable();
 }
 
@@ -1196,6 +1220,9 @@ fastcall void smp_apic_timer_interrupt(s
 	 * interrupt lock, which is the WrongThing (tm) to do.
 	 */
 	irq_enter();
+
+	dyn_tick_interrupt(LOCAL_TIMER_VECTOR, regs);
+
 	smp_local_timer_interrupt(regs);
 	irq_exit();
 }
@@ -1208,6 +1235,9 @@ fastcall void smp_spurious_interrupt(str
 	unsigned long v;
 
 	irq_enter();
+
+	dyn_tick_interrupt(SPURIOUS_APIC_VECTOR, regs);
+
 	/*
 	 * Check if this really is a spurious interrupt and ACK it
 	 * if it is a vectored one.  Just in case...
@@ -1232,6 +1262,9 @@ fastcall void smp_error_interrupt(struct
 	unsigned long v, v1;
 
 	irq_enter();
+
+	dyn_tick_interrupt(ERROR_APIC_VECTOR, regs);
+
 	/* First tickle the hardware, only then report what went on. -- REW */
 	v = apic_read(APIC_ESR);
 	apic_write(APIC_ESR, 0);
Index: linux-2.6.13-mm1/arch/i386/kernel/dyn-tick.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/dyn-tick.c	2005-01-12 16:19:45.000000000 +1100
+++ linux-2.6.13-mm1/arch/i386/kernel/dyn-tick.c	2005-09-03 01:11:54.000000000 +1000
@@ -0,0 +1,132 @@
+/*
+ * linux/arch/i386/kernel/dyn-tick.c
+ *
+ * Copyright (C) 2004 Nokia Corporation
+ * Written by Tony Lindgen <tony@atomide.com> and
+ * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/version.h>
+#include <linux/config.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/dyn-tick.h>
+#include <linux/timer.h>
+#include <linux/irq.h>
+#include <asm/apic.h>
+
+static void arch_reprogram_timer(unsigned long jif_next)
+{
+	unsigned int skip = jif_next - jiffies;
+
+	if (cpu_has_local_apic())
+		reprogram_apic_timer(skip);
+	else
+		reprogram_pit_timer(skip);
+
+	/* Fixme: Disable NMI Watchdog */
+}
+
+static void arch_all_cpus_idle(int how_long)
+{
+	if (cpu_has_local_apic())
+		disable_pit_timer();
+}
+
+static struct dyn_tick_timer arch_dyn_tick_timer = {
+	.arch_reprogram_timer	= &arch_reprogram_timer,
+	.arch_all_cpus_idle	= &arch_all_cpus_idle,
+};
+
+int __init dyn_tick_arch_init(void)
+{
+	if (!cpu_has_local_apic())
+		set_dyn_tick_max_skip(0xffff / LATCH);	/* PIT timer length */
+
+	printk(KERN_INFO "dyn-tick: Maximum ticks to skip limited to %i\n",
+		dyn_tick->max_skip);
+
+	return 0;
+}
+
+static int __init dyn_tick_init(void)
+{
+	arch_dyn_tick_timer.arch_init = dyn_tick_arch_init;
+	dyn_tick_register(&arch_dyn_tick_timer);
+
+	return 0;
+}
+
+arch_initcall(dyn_tick_init);
+
+/* Functions that need blank prototypes for !CONFIG_NO_IDLE_HZ below here */
+void setup_dyn_tick_use_apic(unsigned int calibration_result)
+{
+	if (calibration_result)
+		dyn_tick->arch_state |= DYN_TICK_APICABLE;
+	else {
+		dyn_tick->arch_state &= ~DYN_TICK_APICABLE;
+		printk(KERN_INFO "dyn-tick: Cannot use local APIC\n");
+	}
+}
+
+void dyn_tick_interrupt(int irq, struct pt_regs *regs)
+{
+	int all_were_sleeping = 0;
+	int cpu = smp_processor_id();
+
+	if (!cpu_isset(cpu, nohz_cpu_mask))
+		return;
+
+	spin_lock(&dyn_tick_lock);
+
+	if (cpus_equal(nohz_cpu_mask, cpu_online_map))
+		all_were_sleeping = 1;
+	cpu_clear(cpu, nohz_cpu_mask);
+
+	if (all_were_sleeping) {
+		/* Recover jiffies */
+		cur_timer->mark_offset();
+		if (cpu_has_local_apic())
+			enable_pit_timer();
+	}
+
+	spin_unlock(&dyn_tick_lock);
+
+	if (cpu_has_local_apic())
+		/* Fixme: Needs to be more accurate */
+		reprogram_apic_timer(1);
+	else
+		reprogram_pit_timer(1);
+
+	conditional_run_local_timers();
+
+	/* Fixme: Enable NMI watchdog */
+}
+
+
+void dyn_tick_time_init(struct timer_opts *cur_timer)
+{
+	spin_lock_init(&dyn_tick_lock);
+
+	if (strncmp(cur_timer->name, "pmtmr", 3) == 0) {
+		dyn_tick->state |= DYN_TICK_SUITABLE;
+		printk(KERN_INFO "dyn-tick: Found suitable timer: %s\n",
+		       cur_timer->name);
+	} else
+		printk(KERN_ERR "dyn-tick: Cannot use timer %s\n",
+		       cur_timer->name);
+}
+
+void idle_reprogram_timer(void)
+{
+	local_irq_disable();
+	if (!need_resched())
+		dyn_tick_reprogram_timer();
+	local_irq_enable();
+}
Index: linux-2.6.13-mm1/arch/i386/kernel/io_apic.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/io_apic.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/io_apic.c	2005-09-03 00:54:58.000000000 +1000
@@ -33,6 +33,7 @@
 #include <linux/acpi.h>
 #include <linux/module.h>
 #include <linux/sysdev.h>
+#include <linux/dyn-tick.h>
 
 #include <asm/io.h>
 #include <asm/smp.h>
@@ -1159,15 +1160,19 @@ static inline void ioapic_register_intr(
 		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
 				trigger == IOAPIC_LEVEL)
 			irq_desc[vector].handler = &ioapic_level_type;
-		else
+		else if (vector)
 			irq_desc[vector].handler = &ioapic_edge_type;
+		else
+			irq_desc[vector].handler = IOAPIC_EDGE_TYPE_IRQ0;
 		set_intr_gate(vector, interrupt[vector]);
 	} else	{
 		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
 				trigger == IOAPIC_LEVEL)
 			irq_desc[irq].handler = &ioapic_level_type;
-		else
+		else if (irq)
 			irq_desc[irq].handler = &ioapic_edge_type;
+		else
+			irq_desc[irq].handler = IOAPIC_EDGE_TYPE_IRQ0;
 		set_intr_gate(vector, interrupt[irq]);
 	}
 }
@@ -1280,7 +1285,7 @@ static void __init setup_ExtINT_IRQ0_pin
 	 * The timer IRQ doesn't have to know that behind the
 	 * scene we have a 8259A-master in AEOI mode ...
 	 */
-	irq_desc[0].handler = &ioapic_edge_type;
+	irq_desc[0].handler = IOAPIC_EDGE_TYPE_IRQ0;
 
 	/*
 	 * Add it to the IO-APIC irq-routing table:
@@ -2015,6 +2020,20 @@ static struct hw_interrupt_type ioapic_l
 #endif
 };
 
+/* Needed to disable PIT interrupts when all CPUs sleep */
+struct hw_interrupt_type ioapic_edge_type_irq0 = {
+	.typename 	= "IO-APIC-edge-irq0",
+	.startup 	= startup_edge_ioapic,
+	.shutdown 	= shutdown_edge_ioapic,
+	.enable 	= unmask_IO_APIC_irq,
+	.disable 	= mask_IO_APIC_irq,
+	.ack 		= ack_edge_ioapic,
+	.end 		= end_edge_ioapic,
+#ifdef CONFIG_SMP
+	.set_affinity 	= set_ioapic_affinity,
+#endif
+};
+
 static inline void init_IO_APIC_traps(void)
 {
 	int irq;
Index: linux-2.6.13-mm1/arch/i386/kernel/irq.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/irq.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/irq.c	2005-09-02 23:57:33.000000000 +1000
@@ -18,6 +18,7 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/delay.h>
+#include <linux/dyn-tick.h>
 
 DEFINE_PER_CPU(irq_cpustat_t, irq_stat) ____cacheline_maxaligned_in_smp;
 EXPORT_PER_CPU_SYMBOL(irq_stat);
@@ -79,6 +80,8 @@ fastcall unsigned int do_IRQ(struct pt_r
 	}
 #endif
 
+	dyn_tick_interrupt(irq, regs);
+
 #ifdef CONFIG_4KSTACKS
 
 	curctx = (union irq_ctx *) current_thread_info();
Index: linux-2.6.13-mm1/arch/i386/kernel/Makefile
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/Makefile	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/Makefile	2005-09-02 23:57:33.000000000 +1000
@@ -33,6 +33,7 @@ obj-$(CONFIG_MODULES)		+= module.o
 obj-y				+= sysenter.o vsyscall.o
 obj-$(CONFIG_ACPI_SRAT) 	+= srat.o
 obj-$(CONFIG_HPET_TIMER) 	+= time_hpet.o
+obj-$(CONFIG_NO_IDLE_HZ) 	+= dyn-tick.o
 obj-$(CONFIG_EFI) 		+= efi.o efi_stub.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
 
Index: linux-2.6.13-mm1/arch/i386/kernel/process.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/process.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/process.c	2005-09-02 23:57:33.000000000 +1000
@@ -40,6 +40,7 @@
 #include <linux/ptrace.h>
 #include <linux/random.h>
 #include <linux/kprobes.h>
+#include <linux/dyn-tick.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -201,6 +202,8 @@ void cpu_idle(void)
 			if (cpu_is_offline(cpu))
 				play_dead();
 
+			idle_reprogram_timer();
+
 			__get_cpu_var(irq_stat).idle_timestamp = jiffies;
 			idle();
 		}
Index: linux-2.6.13-mm1/arch/i386/kernel/smp.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/smp.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/smp.c	2005-09-02 23:57:33.000000000 +1000
@@ -21,6 +21,7 @@
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
 #include <linux/module.h>
+#include <linux/dyn-tick.h>
 
 #include <asm/nmi.h>
 #include <asm/mtrr.h>
@@ -315,6 +316,8 @@ fastcall void smp_invalidate_interrupt(s
 {
 	unsigned long cpu;
 
+	dyn_tick_interrupt(INVALIDATE_TLB_VECTOR, regs);
+
 	cpu = get_cpu();
 
 	if (!cpu_isset(cpu, flush_cpumask))
@@ -695,6 +698,8 @@ void smp_send_stop(void)
 fastcall void smp_reschedule_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
+
+	dyn_tick_interrupt(RESCHEDULE_VECTOR, regs);
 }
 
 fastcall void smp_call_function_interrupt(struct pt_regs *regs)
@@ -704,6 +709,9 @@ fastcall void smp_call_function_interrup
 	int wait = call_data->wait;
 
 	ack_APIC_irq();
+
+	dyn_tick_interrupt(CALL_FUNCTION_VECTOR, regs);
+
 	/*
 	 * Notify initiating CPU that I've grabbed the data and am
 	 * about to execute the function
Index: linux-2.6.13-mm1/arch/i386/kernel/time.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/time.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/time.c	2005-09-03 01:11:54.000000000 +1000
@@ -46,6 +46,7 @@
 #include <linux/bcd.h>
 #include <linux/efi.h>
 #include <linux/mca.h>
+#include <linux/dyn-tick.h>
 
 #include <asm/io.h>
 #include <asm/smp.h>
@@ -429,7 +430,7 @@ static struct sysdev_class timer_sysclas
 
 
 /* XXX this driverfs stuff should probably go elsewhere later -john */
-static struct sys_device device_timer = {
+struct sys_device device_timer = {
 	.id	= 0,
 	.cls	= &timer_sysclass,
 };
@@ -485,5 +486,7 @@ void __init time_init(void)
 	cur_timer = select_timer();
 	printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name);
 
+	dyn_tick_time_init(cur_timer);
+
 	time_init_hook();
 }
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pit.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_pit.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pit.c	2005-09-03 01:11:54.000000000 +1000
@@ -148,6 +148,38 @@ static unsigned long get_offset_pit(void
 	return count;
 }
 
+void disable_pit_timer(void)
+{
+	irq_desc[0].handler->disable(0);
+}
+
+void enable_pit_timer(void)
+{
+	irq_desc[0].handler->enable(0);
+}
+
+/*
+ * Reprograms the next timer interrupt
+ * PIT timer reprogramming code taken from APM code.
+ * Note that PIT timer is a 16-bit timer, which allows max
+ * skip of only few seconds.
+ */
+void reprogram_pit_timer(int jiffies_to_skip)
+{
+	int skip;
+	extern spinlock_t i8253_lock;
+	unsigned long flags;
+
+	skip = jiffies_to_skip * LATCH;
+	if (skip > 0xffff)
+		skip = 0xffff;
+
+	spin_lock_irqsave(&i8253_lock, flags);
+	outb_p(0x34, PIT_MODE);		/* binary, mode 2, LSB/MSB, ch 0 */
+	outb_p(skip & 0xff, PIT_CH0);	/* LSB */
+	outb(skip >> 8, PIT_CH0);	/* MSB */
+	spin_unlock_irqrestore(&i8253_lock, flags);
+}
 
 /* tsc timer_opts struct */
 struct timer_opts timer_pit = {
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pm.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_pm.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pm.c	2005-09-03 01:11:55.000000000 +1000
@@ -15,6 +15,7 @@
 #include <linux/module.h>
 #include <linux/device.h>
 #include <linux/init.h>
+#include <linux/dyn-tick.h>
 #include <asm/types.h>
 #include <asm/timer.h>
 #include <asm/smp.h>
@@ -128,6 +129,7 @@ pm_good:
 		return -ENODEV;
 
 	init_cpu_khz();
+	set_dyn_tick_max_skip((0xFFFFFF / (286 * 1000000)) * 1024 * HZ);
 	return 0;
 }
 
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_tsc.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_tsc.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_tsc.c	2005-09-03 01:11:54.000000000 +1000
@@ -14,6 +14,7 @@
 #include <linux/cpufreq.h>
 #include <linux/string.h>
 #include <linux/jiffies.h>
+#include <linux/dyn-tick.h>
 
 #include <asm/timer.h>
 #include <asm/io.h>
@@ -32,8 +33,6 @@ static unsigned long hpet_last;
 static struct timer_opts timer_tsc;
 #endif
 
-static inline void cpufreq_delayed_get(void);
-
 int tsc_disable __devinitdata = 0;
 
 static int use_tsc;
@@ -166,10 +165,19 @@ static void delay_tsc(unsigned long loop
 	} while ((now-bclock) < loops);
 }
 
+/* update the monotonic base value */
+static inline void update_monotonic_base(unsigned long long last_offset)
+{
+	unsigned long long this_offset;
+
+	this_offset = ((unsigned long long)last_tsc_high << 32) | last_tsc_low;
+	monotonic_base += cycles_2_ns(this_offset - last_offset);
+}
+
 #ifdef CONFIG_HPET_TIMER
 static void mark_offset_tsc_hpet(void)
 {
-	unsigned long long this_offset, last_offset;
+	unsigned long long last_offset;
  	unsigned long offset, temp, hpet_current;
 
 	write_seqlock(&monotonic_lock);
@@ -197,9 +205,7 @@ static void mark_offset_tsc_hpet(void)
 	}
 	hpet_last = hpet_current;
 
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	monotonic_base += cycles_2_ns(this_offset - last_offset);
+	update_monotonic_base(last_offset);
 	write_sequnlock(&monotonic_lock);
 
 	/* calculate delay_at_last_interrupt */
@@ -237,7 +243,7 @@ static void handle_cpufreq_delayed_get(v
  * to verify the CPU frequency the timing core thinks the CPU is running
  * at is still correct.
  */
-static inline void cpufreq_delayed_get(void) 
+void cpufreq_delayed_get(void)
 {
 	if (cpufreq_init && !cpufreq_delayed_issched) {
 		cpufreq_delayed_issched = 1;
@@ -316,7 +322,7 @@ static int __init cpufreq_tsc(void)
 core_initcall(cpufreq_tsc);
 
 #else /* CONFIG_CPU_FREQ */
-static inline void cpufreq_delayed_get(void) { return; }
+void cpufreq_delayed_get(void) { return; }
 #endif 
 
 int recalibrate_cpu_khz(void)
@@ -346,8 +352,8 @@ static void mark_offset_tsc(void)
 	int count;
 	int countmp;
 	static int count1 = 0;
-	unsigned long long this_offset, last_offset;
-	static int lost_count = 0;
+	unsigned long long last_offset;
+	int lost_count = 0;
 
 	write_seqlock(&monotonic_lock);
 	last_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
@@ -417,26 +423,11 @@ static void mark_offset_tsc(void)
 	if (lost >= 2) {
 		jiffies_64 += lost-1;
 
-		/* sanity check to ensure we're not always losing ticks */
-		if (lost_count++ > 100) {
-			printk(KERN_WARNING "Losing too many ticks!\n");
-			printk(KERN_WARNING "TSC cannot be used as a timesource.  \n");
-			printk(KERN_WARNING "Possible reasons for this are:\n");
-			printk(KERN_WARNING "  You're running with Speedstep,\n");
-			printk(KERN_WARNING "  You don't have DMA enabled for your hard disk (see hdparm),\n");
-			printk(KERN_WARNING "  Incorrect TSC synchronization on an SMP system (see dmesg).\n");
-			printk(KERN_WARNING "Falling back to a sane timesource now.\n");
-
-			clock_fallback();
-		}
-		/* ... but give the TSC a fair chance */
-		if (lost_count > 25)
-			cpufreq_delayed_get();
+		tsc_sanity_check(&lost_count);
 	} else
 		lost_count = 0;
-	/* update the monotonic base value */
-	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
-	monotonic_base += cycles_2_ns(this_offset - last_offset);
+
+	update_monotonic_base(last_offset);
 	write_sequnlock(&monotonic_lock);
 
 	/* calculate delay_at_last_interrupt */
@@ -537,6 +528,9 @@ static int __init init_tsc(char* overrid
 					cpu_khz / 1000, cpu_khz % 1000);
 			}
 			set_cyc2ns_scale(cpu_khz/1000);
+			/* FIXME: Make use of 64-bit TSC to recover jiffies */
+			set_dyn_tick_max_skip((0xFFFFFFFF /
+				(cpu_khz * 1000)) * HZ);
 			return 0;
 		}
 	}
Index: linux-2.6.13-mm1/drivers/acpi/Kconfig
===================================================================
--- linux-2.6.13-mm1.orig/drivers/acpi/Kconfig	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/drivers/acpi/Kconfig	2005-09-02 23:57:33.000000000 +1000
@@ -302,6 +302,8 @@ config X86_PM_TIMER
 	  like aggressive processor idling, throttling, frequency and/or
 	  voltage scaling, unlike the commonly used Time Stamp Counter
 	  (TSC) timing source.
+	  
+	  This timer is required by dyntick (NO_IDLE_HZ).
 
 	  So, if you see messages like 'Losing too many ticks!' in the
 	  kernel logs, and/or you are using this on a notebook which
Index: linux-2.6.13-mm1/include/asm-i386/apic.h
===================================================================
--- linux-2.6.13-mm1.orig/include/asm-i386/apic.h	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/include/asm-i386/apic.h	2005-09-02 23:57:33.000000000 +1000
@@ -121,6 +121,7 @@ extern void nmi_watchdog_tick (struct pt
 extern int APIC_init_uniprocessor (void);
 extern void disable_APIC_timer(void);
 extern void enable_APIC_timer(void);
+extern void reprogram_apic_timer(unsigned int count);
 
 extern void enable_NMI_through_LVT0 (void * dummy);
 
@@ -132,6 +133,7 @@ extern unsigned int nmi_watchdog;
 
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
+static inline void reprogram_apic_timer(unsigned int count) { }
 
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
Index: linux-2.6.13-mm1/include/asm-i386/dyn-tick.h
===================================================================
--- linux-2.6.13-mm1.orig/include/asm-i386/dyn-tick.h	2005-01-12 16:19:45.000000000 +1100
+++ linux-2.6.13-mm1/include/asm-i386/dyn-tick.h	2005-09-03 01:12:03.000000000 +1000
@@ -0,0 +1,91 @@
+/*
+ * linux/include/asm-i386/dyn-tick.h
+ *
+ * Copyright (C) 2004 Nokia Corporation
+ * Written by Tony Lindgen <tony@atomide.com> and
+ * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _ASM_I386_DYN_TICK_H_
+#define _ASM_I386_DYN_TICK_H_
+
+#include <asm/apic.h>
+#include <asm/timer.h>
+
+extern void cpufreq_delayed_get(void);
+
+#ifdef CONFIG_NO_IDLE_HZ
+extern void setup_dyn_tick_use_apic(unsigned int calibration_result);
+extern void dyn_tick_interrupt(int irq, struct pt_regs *regs);
+extern void dyn_tick_time_init(struct timer_opts *cur_timer);
+
+#define DYN_TICK_APICABLE	(1 << 0)
+
+#if (defined(CONFIG_SMP) || defined(CONFIG_X86_UP_APIC))
+static inline int cpu_has_local_apic(void)
+{
+	return (dyn_tick->arch_state & DYN_TICK_APICABLE);
+}
+
+#else	/* (defined(CONFIG_SMP) || defined(CONFIG_X86_UP_APIC)) */
+static inline int cpu_has_local_apic(void)
+{
+	return 0;
+}
+#endif	/* (defined(CONFIG_SMP) || defined(CONFIG_X86_UP_APIC)) */
+
+#define IOAPIC_EDGE_TYPE_IRQ0 (&(ioapic_edge_type_irq0))
+
+extern struct hw_interrupt_type ioapic_edge_type_irq0;
+
+extern void idle_reprogram_timer(void);
+
+static inline void tsc_sanity_check(int *lost_count)
+{
+}
+
+#else /* CONFIG_NO_IDLE_HZ */
+
+#define IOAPIC_EDGE_TYPE_IRQ0 (&(ioapic_edge_type))
+
+static inline void setup_dyn_tick_use_apic(unsigned int calibration_result)
+{
+}
+
+static inline void dyn_tick_interrupt(int irq, struct pt_regs *regs)
+{
+}
+
+static inline void dyn_tick_time_init(struct timer_opts *cur_timer)
+{
+}
+
+static inline void idle_reprogram_timer(void)
+{
+}
+
+static inline void tsc_sanity_check(int *lost_count)
+{
+	/* sanity check to ensure we're not always losing ticks */
+	if ((*lost_count)++ > 100) {
+		printk(KERN_WARNING "Losing too many ticks!\n");
+		printk(KERN_WARNING "TSC cannot be used as a timesource.  \n");
+		printk(KERN_WARNING "Possible reasons for this are:\n");
+		printk(KERN_WARNING "  You're running with Speedstep,\n");
+		printk(KERN_WARNING "  You don't have DMA enabled for your hard disk (see hdparm),\n");
+		printk(KERN_WARNING "  Incorrect TSC synchronization on an SMP system (see dmesg).\n");
+		printk(KERN_WARNING "Falling back to a sane timesource now.\n");
+
+		clock_fallback();
+	}
+	/* ... but give the TSC a fair chance */
+	if (*lost_count > 25)
+		cpufreq_delayed_get();
+}
+#endif /* CONFIG_NO_IDLE_HZ */
+
+#endif /* _ASM_I386_DYN_TICK_H_ */
Index: linux-2.6.13-mm1/include/asm-i386/timer.h
===================================================================
--- linux-2.6.13-mm1.orig/include/asm-i386/timer.h	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/include/asm-i386/timer.h	2005-09-03 01:11:54.000000000 +1000
@@ -38,6 +38,9 @@ struct init_timer_opts {
 extern struct timer_opts* __init select_timer(void);
 extern void clock_fallback(void);
 void setup_pit_timer(void);
+extern void disable_pit_timer(void);
+extern void enable_pit_timer(void);
+extern void reprogram_pit_timer(int jiffies_to_skip);
 
 /* Modifiers for buggy PIT handling */
 
Index: linux-2.6.13-mm1/include/linux/dyn-tick.h
===================================================================
--- linux-2.6.13-mm1.orig/include/linux/dyn-tick.h	2005-01-12 16:19:45.000000000 +1100
+++ linux-2.6.13-mm1/include/linux/dyn-tick.h	2005-09-03 01:00:51.000000000 +1000
@@ -0,0 +1,70 @@
+/*
+ * linux/include/linux/dyn-tick.h
+ *
+ * Copyright (C) 2004 Nokia Corporation
+ * Written by Tony Lindgen <tony@atomide.com> and
+ * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _DYN_TICK_TIMER_H
+#define _DYN_TICK_TIMER_H
+
+#include <linux/interrupt.h>
+#include <asm/timer.h>
+
+#define DYN_TICK_ENABLED	(1 << 1)
+#define DYN_TICK_SUITABLE	(1 << 0)
+
+#define DYN_TICK_MIN_SKIP	2
+
+struct dyn_tick_state {
+	unsigned short state;		/* Current state */
+	unsigned short arch_state;	/* Arch-specific state */
+	unsigned int max_skip;		/* Max number of ticks to skip */
+};
+
+struct dyn_tick_timer {
+	int (*arch_init) (void);
+	void (*arch_enable) (void);
+	void (*arch_disable) (void);
+	void (*arch_reprogram_timer) (unsigned long);
+	void (*arch_all_cpus_idle) (int);
+};
+
+extern struct dyn_tick_state *dyn_tick;
+extern spinlock_t dyn_tick_lock;
+extern void dyn_tick_register(struct dyn_tick_timer *new_timer);
+
+#ifdef CONFIG_NO_IDLE_HZ
+extern unsigned int dyn_tick_reprogram_timer(void);
+extern void set_dyn_tick_max_skip(unsigned int max_skip);
+
+static inline int dyn_tick_enabled(void)
+{
+	return (dyn_tick->state & DYN_TICK_ENABLED);
+}
+
+#else	/* CONFIG_NO_IDLE_HZ */
+static inline unsigned int dyn_tick_reprogram_timer(void)
+{
+	return 0;
+}
+
+static inline void set_dyn_tick_max_skip(unsigned int max_skip)
+{
+}
+
+static inline int dyn_tick_enabled(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_NO_IDLE_HZ */
+
+/* Pick up arch specific header */
+#include <asm/dyn-tick.h>
+
+#endif	/* _DYN_TICK_TIMER_H */
Index: linux-2.6.13-mm1/include/linux/timer.h
===================================================================
--- linux-2.6.13-mm1.orig/include/linux/timer.h	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/include/linux/timer.h	2005-09-03 00:22:21.000000000 +1000
@@ -91,6 +91,7 @@ static inline void add_timer(struct time
 
 extern void init_timers(void);
 extern void run_local_timers(void);
+extern void conditional_run_local_timers(void);
 extern void it_real_fn(unsigned long);
 
 #endif
Index: linux-2.6.13-mm1/kernel/dyn-tick.c
===================================================================
--- linux-2.6.13-mm1.orig/kernel/dyn-tick.c	2005-01-12 16:19:45.000000000 +1100
+++ linux-2.6.13-mm1/kernel/dyn-tick.c	2005-09-03 00:22:44.000000000 +1000
@@ -0,0 +1,222 @@
+/*
+ * linux/kernel/dyn-tick.c
+ *
+ * Beginnings of generic dynamic tick timer support
+ *
+ * Copyright (C) 2004 Nokia Corporation
+ * Written by Tony Lindgen <tony@atomide.com> and
+ * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/version.h>
+#include <linux/config.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sysdev.h>
+#include <linux/interrupt.h>
+#include <linux/cpumask.h>
+#include <linux/pm.h>
+#include <linux/dyn-tick.h>
+#include <linux/rcupdate.h>
+
+#define DYN_TICK_VERSION	"050610-1"
+#define DYN_TICK_IS_SET(x)	((dyn_tick->state & (x)) == (x))
+
+static struct dyn_tick_state dyn_tick_state;
+struct dyn_tick_state *dyn_tick = &dyn_tick_state;
+static struct dyn_tick_timer *dyn_tick_cfg;
+spinlock_t dyn_tick_lock;
+
+/*
+ * Arch independent code needed to reprogram next timer interrupt.
+ * Gets called, with IRQs disabled, from cpu_idle() before entering idle loop.
+ */
+unsigned int dyn_tick_reprogram_timer(void)
+{
+	int cpu = smp_processor_id();
+	unsigned int delta;
+
+	if (!DYN_TICK_IS_SET(DYN_TICK_ENABLED))
+		return 0;
+
+	if (rcu_pending(cpu) || local_softirq_pending())
+		return 0;
+
+	/* Check if we can start skipping ticks */
+	write_seqlock(&xtime_lock);
+
+	delta = next_timer_interrupt() - jiffies;
+	if (delta > dyn_tick->max_skip)
+		delta = dyn_tick->max_skip;
+
+	if (delta > DYN_TICK_MIN_SKIP) {
+		int idle_time = 0;
+
+		spin_lock(&dyn_tick_lock);
+
+		dyn_tick_cfg->arch_reprogram_timer(jiffies + delta);
+
+		cpu_set(cpu, nohz_cpu_mask);
+		if (cpus_equal(nohz_cpu_mask, cpu_online_map))
+			/* Fixme: idle_time needs to be computed */
+			dyn_tick_cfg->arch_all_cpus_idle(idle_time);
+
+		spin_unlock(&dyn_tick_lock);
+
+	} else
+		delta = 0;
+
+	write_sequnlock(&xtime_lock);
+
+	return delta;
+}
+
+void set_dyn_tick_max_skip(unsigned int max_skip)
+{
+	if (!dyn_tick->max_skip || max_skip < dyn_tick->max_skip)
+		dyn_tick->max_skip = max_skip;
+}
+
+void __init dyn_tick_register(struct dyn_tick_timer *arch_timer)
+{
+	dyn_tick_cfg = arch_timer;
+	printk(KERN_INFO "dyn-tick: Registering dynamic tick timer v%s\n",
+	       DYN_TICK_VERSION);
+}
+
+/*
+ * ---------------------------------------------------------------------------
+ * Command line options
+ * ---------------------------------------------------------------------------
+ */
+static int __initdata dyntick_autoenable = 1;
+
+/*
+ * dyntick=[disable]
+ */ 
+static int __init dyntick_setup(char *options)
+{
+	if (!options)
+		return 0;
+
+	if (!strncmp(options, "disable", 6))
+		dyntick_autoenable = 0;
+
+	return 0;
+}
+
+__setup("dyntick=", dyntick_setup);
+
+/*
+ * ---------------------------------------------------------------------------
+ * Sysfs interface
+ * ---------------------------------------------------------------------------
+ */
+
+extern struct sys_device device_timer;
+
+static ssize_t show_dyn_tick_state(struct sys_device *dev, char *buf)
+{
+	return sprintf(buf,
+		       "suitable:\t%i\n"
+		       "enabled:\t%i\n",
+		       DYN_TICK_IS_SET(DYN_TICK_SUITABLE),
+		       DYN_TICK_IS_SET(DYN_TICK_ENABLED));
+}
+
+static ssize_t show_dyn_tick_enable(struct sys_device *dev, char *buf)
+{
+	return sprintf(buf, "enabled:\t%i\n",
+		DYN_TICK_IS_SET(DYN_TICK_ENABLED));
+}
+
+static ssize_t set_dyn_tick_enable(struct sys_device *dev, const char *buf,
+				   size_t count)
+{
+	unsigned long flags;
+	unsigned int enable = simple_strtoul(buf, NULL, 2);
+
+	write_seqlock_irqsave(&xtime_lock, flags);
+	if (enable) {
+		if (dyn_tick_cfg->arch_enable)
+			dyn_tick_cfg->arch_enable();
+		dyn_tick->state |= DYN_TICK_ENABLED;
+	} else {
+		if (dyn_tick_cfg->arch_disable)
+			dyn_tick_cfg->arch_disable();
+		dyn_tick->state &= ~DYN_TICK_ENABLED;
+	}
+	write_sequnlock_irqrestore(&xtime_lock, flags);
+
+	return count;
+}
+
+static SYSDEV_ATTR(state, 0444, show_dyn_tick_state, NULL);
+static SYSDEV_ATTR(enable, 0644, show_dyn_tick_enable,
+		   set_dyn_tick_enable);
+
+static struct sysdev_class dyn_tick_sysclass = {
+	set_kset_name("dyn_tick"),
+};
+
+static struct sys_device device_dyn_tick = {
+	.id = 0,
+	.cls = &dyn_tick_sysclass,
+};
+
+static int init_dyn_tick_sysfs(void)
+{
+	int error = 0;
+	if ((error = sysdev_class_register(&dyn_tick_sysclass)))
+		goto out;
+	if ((error = sysdev_register(&device_dyn_tick)))
+		goto out;
+	if ((error = sysdev_create_file(&device_dyn_tick, &attr_state)))
+		goto out;
+	error = sysdev_create_file(&device_dyn_tick, &attr_enable);
+
+out:
+	return error;
+}
+
+device_initcall(init_dyn_tick_sysfs);
+
+/*
+ * ---------------------------------------------------------------------------
+ * Init functions
+ * ---------------------------------------------------------------------------
+ */
+
+/*
+ * We need to initialize dynamic tick after calibrate delay
+ */
+static int __init dyn_tick_late_init(void)
+{
+	int ret = 0;
+
+	if (dyn_tick_cfg == NULL || dyn_tick_cfg->arch_init == NULL ||
+	    !DYN_TICK_IS_SET(DYN_TICK_SUITABLE)) {
+		printk(KERN_ERR "dyn-tick: No suitable timer found\n");
+		return -ENODEV;
+	}
+
+	if ((ret = dyn_tick_cfg->arch_init())) {
+		printk(KERN_ERR "dyn-tick: Init failed\n");
+		return -ENODEV;
+	}
+
+	if (!ret && dyntick_autoenable) {
+		dyn_tick->state |= DYN_TICK_ENABLED;
+		printk(KERN_INFO "dyn-tick: Timer using dynamic tick\n");
+	} else
+		printk(KERN_INFO "dyn-tick: Timer not enabled during boot\n");
+
+	return ret;
+}
+
+late_initcall(dyn_tick_late_init);
Index: linux-2.6.13-mm1/kernel/Makefile
===================================================================
--- linux-2.6.13-mm1.orig/kernel/Makefile	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/kernel/Makefile	2005-09-02 23:57:33.000000000 +1000
@@ -32,6 +32,7 @@ obj-$(CONFIG_DETECT_SOFTLOCKUP) += softl
 obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
 obj-$(CONFIG_CRASH_DUMP) += crash_dump.o
 obj-$(CONFIG_SECCOMP) += seccomp.o
+obj-$(CONFIG_NO_IDLE_HZ) += dyn-tick.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux-2.6.13-mm1/kernel/timer.c
===================================================================
--- linux-2.6.13-mm1.orig/kernel/timer.c	2005-09-02 23:57:29.000000000 +1000
+++ linux-2.6.13-mm1/kernel/timer.c	2005-09-03 00:32:19.000000000 +1000
@@ -939,6 +939,13 @@ void run_local_timers(void)
 	}
 }
 
+void conditional_run_local_timers(void)
+{
+	tvec_base_t *base  = &__get_cpu_var(tvec_bases);
+
+	if (base->timer_jiffies != jiffies)
+		run_local_timers();
+}
 /*
  * Called by the timer interrupt. xtime_lock must already be taken
  * by the timer IRQ!

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c
  2005-09-02 15:43   ` [PATCH 1/3] dynticks - implement no idle hz for x86 Con Kolivas
@ 2005-09-02 15:45     ` Con Kolivas
  2005-09-02 15:46       ` [PATCH 3/3] dyntick - Recover walltime upon wakeup Con Kolivas
  2005-09-02 17:25       ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Srivatsa Vaddagiri
  2005-09-02 16:56     ` [PATCH 1/3] dynticks - implement no idle hz for x86 Russell King
  1 sibling, 2 replies; 96+ messages in thread
From: Con Kolivas @ 2005-09-02 15:45 UTC (permalink / raw)
  To: vatsa; +Cc: linux-kernel, akpm, ck list

[-- Attachment #1: Type: text/plain, Size: 13 bytes --]



Con
---




[-- Attachment #2: dyntick-Fix_lost_tick_calculation_in_timer_pm_c.patch --]
[-- Type: text/x-diff, Size: 4267 bytes --]

Currently, lost tick calculation in timer_pm.c is based on number
of microseconds that has elapsed since the last tick. Calculating
the number of microseconds is approximated by cyc2us, which
basically does :

	microsec = (cycles * 286) / 1024

Consider 10 ticks lost. This amounts to 14319*10 = 143190 cycles 
(14319 = PMTMR_EXPECTED_RATE/(CALIBRATE_LATCH/LATCH)).
This amount to 39992 microseconds as per the above equation 
or 39992 / 4000 = 9 lost ticks, which is incorrect.

I feel lost ticks can be based on cycles difference directly
rather than being based on microseconds that has elapsed.

Following patch is in that direction. 

With this patch, time had kept up really well on one particular
machine (Intel 4way Pentium 3 box) overnight, while
on another newer machine (Intel 4way Xeon with HT) it didnt do so
well (time sped up after 3 or 4 hours). Hence I consider this
particular patch will need more review/work.

Fix lost tick calculation in timer_pm.c

Code by: Srivatsa Vaddagiri <vatsa@in.ibm.com>

Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pm.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_pm.c	2005-09-03 01:11:55.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pm.c	2005-09-03 01:12:11.000000000 +1000
@@ -31,6 +31,8 @@
   ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
 
 
+static int pm_ticks_per_jiffy = PMTMR_EXPECTED_RATE / (CALIBRATE_LATCH/LATCH);
+
 /* The I/O port the PMTMR resides at.
  * The location is detected during setup_arch(),
  * in arch/i386/acpi/boot.c */
@@ -38,8 +40,7 @@ u32 pmtmr_ioport = 0;
 
 
 /* value of the Power timer at last timer interrupt */
-static u32 offset_tick;
-static u32 offset_delay;
+static u32 offset_last;
 
 static unsigned long long monotonic_base;
 static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
@@ -128,6 +129,11 @@ pm_good:
 	if (verify_pmtmr_rate() != 0)
 		return -ENODEV;
 
+	printk ("Using %u PM timer ticks per jiffy \n", pm_ticks_per_jiffy);
+
+	offset_last = read_pmtmr();
+	setup_pit_timer();
+
 	init_cpu_khz();
 	set_dyn_tick_max_skip((0xFFFFFF / (286 * 1000000)) * 1024 * HZ);
 	return 0;
@@ -152,47 +158,37 @@ static inline u32 cyc2us(u32 cycles)
  */
 static void mark_offset_pmtmr(void)
 {
-	u32 lost, delta, last_offset;
-	static int first_run = 1;
-	last_offset = offset_tick;
+	u32 lost, delta, deltaus, offset_now;
 
 	write_seqlock(&monotonic_lock);
 
-	offset_tick = read_pmtmr();
+	offset_now = read_pmtmr();
 
 	/* calculate tick interval */
-	delta = (offset_tick - last_offset) & ACPI_PM_MASK;
+	delta = (offset_now - offset_last) & ACPI_PM_MASK;
 
 	/* convert to usecs */
-	delta = cyc2us(delta);
+	deltaus = cyc2us(delta);
 
 	/* update the monotonic base value */
-	monotonic_base += delta * NSEC_PER_USEC;
+	monotonic_base += deltaus * NSEC_PER_USEC;
 	write_sequnlock(&monotonic_lock);
 
 	/* convert to ticks */
-	delta += offset_delay;
-	lost = delta / (USEC_PER_SEC / HZ);
-	offset_delay = delta % (USEC_PER_SEC / HZ);
-
+	lost = delta / pm_ticks_per_jiffy;
+	offset_last += lost * pm_ticks_per_jiffy;
+	offset_last &= ACPI_PM_MASK;
 
 	/* compensate for lost ticks */
 	if (lost >= 2)
 		jiffies_64 += lost - 1;
-
-	/* don't calculate delay for first run,
-	   or if we've got less then a tick */
-	if (first_run || (lost < 1)) {
-		first_run = 0;
-		offset_delay = 0;
-	}
 }
 
 static int pmtmr_resume(void)
 {
 	write_seqlock(&monotonic_lock);
 	/* Assume this is the last mark offset time */
-	offset_tick = read_pmtmr();
+	offset_last = read_pmtmr();
 	write_sequnlock(&monotonic_lock);
 	return 0;
 }
@@ -207,7 +203,7 @@ static unsigned long long monotonic_cloc
 	/* atomically read monotonic base & last_offset */
 	do {
 		seq = read_seqbegin(&monotonic_lock);
-		last_offset = offset_tick;
+		last_offset = offset_last;
 		base = monotonic_base;
 	} while (read_seqretry(&monotonic_lock, seq));
 
@@ -241,11 +237,11 @@ static unsigned long get_offset_pmtmr(vo
 {
 	u32 now, offset, delta = 0;
 
-	offset = offset_tick;
+	offset = offset_last;
 	now = read_pmtmr();
 	delta = (now - offset)&ACPI_PM_MASK;
 
-	return (unsigned long) offset_delay + cyc2us(delta);
+	return (unsigned long) cyc2us(delta);
 }
 
 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [PATCH 3/3] dyntick - Recover walltime upon wakeup
  2005-09-02 15:45     ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Con Kolivas
@ 2005-09-02 15:46       ` Con Kolivas
  2005-09-02 17:25       ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Srivatsa Vaddagiri
  1 sibling, 0 replies; 96+ messages in thread
From: Con Kolivas @ 2005-09-02 15:46 UTC (permalink / raw)
  To: vatsa; +Cc: linux-kernel, akpm, ck list

[-- Attachment #1: Type: text/plain, Size: 10 bytes --]

Con
---



[-- Attachment #2: dyntick-Recover_walltime_upon_wakeup.patch --]
[-- Type: text/x-diff, Size: 7878 bytes --]


This patch uses the lost tick information returned by mark_offset()
function in dyn-tick, to recover time.

Code by Srivatsa Vaddagiri <vatsa@in.ibm.com>

Index: linux-2.6.13-mm1/arch/i386/kernel/dyn-tick.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/dyn-tick.c	2005-09-03 01:11:54.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/dyn-tick.c	2005-09-03 01:12:11.000000000 +1000
@@ -91,7 +91,13 @@ void dyn_tick_interrupt(int irq, struct 
 
 	if (all_were_sleeping) {
 		/* Recover jiffies */
-		cur_timer->mark_offset();
+		if (irq) {
+			int lost;
+
+			lost = cur_timer->mark_offset();
+			if (lost)
+				do_timer(regs);
+		}
 		if (cpu_has_local_apic())
 			enable_pit_timer();
 	}
Index: linux-2.6.13-mm1/arch/i386/kernel/time.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/time.c	2005-09-03 01:11:54.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/time.c	2005-09-03 01:12:11.000000000 +1000
@@ -250,7 +250,7 @@ EXPORT_SYMBOL(profile_pc);
  * timer_interrupt() needs to keep up the real-time clock,
  * as well as call the "do_timer()" routine every clocktick
  */
-static inline void do_timer_interrupt(int irq, struct pt_regs *regs)
+static inline void do_timer_interrupt(int irq, struct pt_regs *regs, int lost)
 {
 #ifdef CONFIG_X86_IO_APIC
 	if (timer_ack) {
@@ -268,7 +268,8 @@ static inline void do_timer_interrupt(in
 	}
 #endif
 
-	do_timer_interrupt_hook(regs);
+	if (!dyn_tick_enabled() || lost)
+		do_timer_interrupt_hook(regs);
 
 
 	if (MCA_bus) {
@@ -293,6 +294,8 @@ static inline void do_timer_interrupt(in
  */
 irqreturn_t timer_interrupt(int irq, void *dev_id, struct pt_regs *regs)
 {
+	int lost;
+
 	/*
 	 * Here we are in the timer irq handler. We just have irqs locally
 	 * disabled but we don't know if the timer_bh is running on the other
@@ -302,9 +305,9 @@ irqreturn_t timer_interrupt(int irq, voi
 	 */
 	write_seqlock(&xtime_lock);
 
-	cur_timer->mark_offset();
- 
-	do_timer_interrupt(irq, regs);
+	lost = cur_timer->mark_offset();
+
+	do_timer_interrupt(irq, regs, lost);
 
 	write_sequnlock(&xtime_lock);
 	return IRQ_HANDLED;
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_cyclone.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_cyclone.c	2005-09-03 01:11:54.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_cyclone.c	2005-09-03 01:12:11.000000000 +1000
@@ -45,7 +45,7 @@ static seqlock_t monotonic_lock = SEQLOC
 	} while (high != cyclone_timer[1]);
 
 
-static void mark_offset_cyclone(void)
+static int mark_offset_cyclone(void)
 {
 	unsigned long lost, delay;
 	unsigned long delta = last_cyclone_low;
@@ -101,6 +101,8 @@ static void mark_offset_cyclone(void)
 	 */
 	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
 		jiffies_64++;
+
+	return 1;
 }
 
 static unsigned long get_offset_cyclone(void)
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_hpet.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_hpet.c	2005-09-03 01:11:54.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_hpet.c	2005-09-03 01:12:11.000000000 +1000
@@ -96,7 +96,7 @@ static unsigned long get_offset_hpet(voi
 	return edx;
 }
 
-static void mark_offset_hpet(void)
+static int mark_offset_hpet(void)
 {
 	unsigned long long this_offset, last_offset;
 	unsigned long offset;
@@ -119,6 +119,8 @@ static void mark_offset_hpet(void)
 	this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low;
 	monotonic_base += cycles_2_ns(this_offset - last_offset);
 	write_sequnlock(&monotonic_lock);
+
+	return 1;
 }
 
 static void delay_hpet(unsigned long loops)
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_none.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_none.c	2005-09-03 01:11:54.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_none.c	2005-09-03 01:12:11.000000000 +1000
@@ -1,9 +1,10 @@
 #include <linux/init.h>
 #include <asm/timer.h>
 
-static void mark_offset_none(void)
+static int mark_offset_none(void)
 {
 	/* nothing needed */
+	return 1;
 }
 
 static unsigned long get_offset_none(void)
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pit.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_pit.c	2005-09-03 01:11:54.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pit.c	2005-09-03 01:12:11.000000000 +1000
@@ -32,9 +32,10 @@ static int __init init_pit(char* overrid
 	return 0;
 }
 
-static void mark_offset_pit(void)
+static int mark_offset_pit(void)
 {
 	/* nothing needed */
+	return 1;
 }
 
 static unsigned long long monotonic_clock_pit(void)
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pm.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_pm.c	2005-09-03 01:12:11.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pm.c	2005-09-03 01:12:11.000000000 +1000
@@ -135,7 +135,7 @@ pm_good:
 	setup_pit_timer();
 
 	init_cpu_khz();
-	set_dyn_tick_max_skip((0xFFFFFF / (286 * 1000000)) * 1024 * HZ);
+	set_dyn_tick_max_skip(((0xFFFFFF / 1000000) * 286 * HZ) >> 10);
 	return 0;
 }
 
@@ -156,7 +156,7 @@ static inline u32 cyc2us(u32 cycles)
  * this gets called during each timer interrupt
  *   - Called while holding the writer xtime_lock
  */
-static void mark_offset_pmtmr(void)
+static int mark_offset_pmtmr(void)
 {
 	u32 lost, delta, deltaus, offset_now;
 
@@ -182,6 +182,8 @@ static void mark_offset_pmtmr(void)
 	/* compensate for lost ticks */
 	if (lost >= 2)
 		jiffies_64 += lost - 1;
+
+	return lost;
 }
 
 static int pmtmr_resume(void)
Index: linux-2.6.13-mm1/arch/i386/kernel/timers/timer_tsc.c
===================================================================
--- linux-2.6.13-mm1.orig/arch/i386/kernel/timers/timer_tsc.c	2005-09-03 01:11:54.000000000 +1000
+++ linux-2.6.13-mm1/arch/i386/kernel/timers/timer_tsc.c	2005-09-03 01:12:11.000000000 +1000
@@ -175,7 +175,7 @@ static inline void update_monotonic_base
 }
 
 #ifdef CONFIG_HPET_TIMER
-static void mark_offset_tsc_hpet(void)
+static int mark_offset_tsc_hpet(void)
 {
 	unsigned long long last_offset;
  	unsigned long offset, temp, hpet_current;
@@ -219,6 +219,8 @@ static void mark_offset_tsc_hpet(void)
 	delay_at_last_interrupt = hpet_current - offset;
 	ASM_MUL64_REG(temp, delay_at_last_interrupt,
 			hpet_usec_quotient, delay_at_last_interrupt);
+
+	return 1;
 }
 #endif
 
@@ -345,7 +347,7 @@ int recalibrate_cpu_khz(void)
 }
 EXPORT_SYMBOL(recalibrate_cpu_khz);
 
-static void mark_offset_tsc(void)
+static int mark_offset_tsc(void)
 {
 	unsigned long lost,delay;
 	unsigned long delta = last_tsc_low;
@@ -438,8 +440,12 @@ static void mark_offset_tsc(void)
 	 * between tsc and pit reads (as noted when
 	 * usec delta is > 90% # of usecs/tick)
 	 */
-	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ))
+	if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ)) {
 		jiffies_64++;
+		lost++;
+	}
+
+	return 1;
 }
 
 static int __init init_tsc(char* override)
Index: linux-2.6.13-mm1/include/asm-i386/timer.h
===================================================================
--- linux-2.6.13-mm1.orig/include/asm-i386/timer.h	2005-09-03 01:11:54.000000000 +1000
+++ linux-2.6.13-mm1/include/asm-i386/timer.h	2005-09-03 01:12:11.000000000 +1000
@@ -19,7 +19,7 @@
  */
 struct timer_opts {
 	char* name;
-	void (*mark_offset)(void);
+	int (*mark_offset)(void);
 	unsigned long (*get_offset)(void);
 	unsigned long long (*monotonic_clock)(void);
 	void (*delay)(unsigned long);

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-02 15:43   ` [PATCH 1/3] dynticks - implement no idle hz for x86 Con Kolivas
  2005-09-02 15:45     ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Con Kolivas
@ 2005-09-02 16:56     ` Russell King
  2005-09-02 17:12       ` Srivatsa Vaddagiri
  2005-09-03  6:13       ` Con Kolivas
  1 sibling, 2 replies; 96+ messages in thread
From: Russell King @ 2005-09-02 16:56 UTC (permalink / raw)
  To: Con Kolivas; +Cc: vatsa, linux-kernel, akpm, ck list

On Sat, Sep 03, 2005 at 01:43:57AM +1000, Con Kolivas wrote:
> Ok I've resynced all the patches with 2.6.13-mm1, made some cleanups and minor 
> modifications. As pm timer is the only supported timer for dynticks I've also 
> made it depend on it.
> 
> A rollup patch against 2.6.13-mm1 is here:
> 
> http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch
> 
> also available in the dyn-ticks directory are the older patches and these 
> split out patches posted here.

Are you guys going to sync your interfaces with what ARM has, or are
we going to have two differing dyntick interfaces in the kernel, one
for ARM and one for x86?

I mentioned this before.  I seem to be ignored.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-02 16:56     ` [PATCH 1/3] dynticks - implement no idle hz for x86 Russell King
@ 2005-09-02 17:12       ` Srivatsa Vaddagiri
  2005-09-03  6:13       ` Con Kolivas
  1 sibling, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-02 17:12 UTC (permalink / raw)
  To: Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On Fri, Sep 02, 2005 at 05:56:23PM +0100, Russell King wrote:
> Are you guys going to sync your interfaces with what ARM has, or are
> we going to have two differing dyntick interfaces in the kernel, one
> for ARM and one for x86?

Three actually, including s390 :) I know that it would be really nice to sync 
up with what is there in ARM/s390. I havent looked closely at both 
implementations. Will have a look and post an update which should keep the 
interfaces alike on all platforms.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c
  2005-09-02 15:45     ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Con Kolivas
  2005-09-02 15:46       ` [PATCH 3/3] dyntick - Recover walltime upon wakeup Con Kolivas
@ 2005-09-02 17:25       ` Srivatsa Vaddagiri
  2005-09-02 20:18         ` Thomas Schlichter
  1 sibling, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-02 17:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, akpm, ck list, thomas.schlichter

Con,
	Pls use this updated "Lost tick" calculation patch, which rectifies the
two problems Thomas pointed out. I have done some basic test with it. 

Would it be possible to incorporate this updated patch in
http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch?

Sorry for the inconvenience!

----


Currently, lost tick calculation in timer_pm.c is based on number
of microseconds that has elapsed since the last tick. Calculating
the number of microseconds is approximated by cyc2us, which
basically does :

	microsec = (cycles * 286) / 1024

Consider 10 ticks lost. This amounts to 14319*10 = 143190 cycles 
(14319 = PMTMR_EXPECTED_RATE/(CALIBRATE_LATCH/LATCH)).
This amount to 39992 microseconds as per the above equation 
or 39992 / 4000 = 9 lost ticks, which is incorrect.

I feel lost ticks can be based on cycles difference directly
rather than being based on microseconds that has elapsed.

Following patch is in that direction. 

With this patch, time had kept up really well on one particular
machine (Intel 4way Pentium 3 box) overnight, while
on another newer machine (Intel 4way Xeon with HT) it didnt do so
well (time sped up after 3 or 4 hours). Hence I consider this
particular patch will need more review/work.

Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
---

 linux-2.6.13-mm1-root/arch/i386/kernel/timers/timer_pm.c |   48 ++++++---------
 1 files changed, 22 insertions(+), 26 deletions(-)

diff -puN arch/i386/kernel/timers/timer_pm.c~pm_timer_fix arch/i386/kernel/timers/timer_pm.c
--- linux-2.6.13-mm1/arch/i386/kernel/timers/timer_pm.c~pm_timer_fix	2005-09-02 22:13:44.000000000 +0530
+++ linux-2.6.13-mm1-root/arch/i386/kernel/timers/timer_pm.c	2005-09-02 22:18:09.000000000 +0530
@@ -30,6 +30,8 @@
   ((CALIBRATE_LATCH * (PMTMR_TICKS_PER_SEC >> 10)) / (CLOCK_TICK_RATE>>10))
 
 
+static int pm_ticks_per_jiffy = PMTMR_EXPECTED_RATE / (CALIBRATE_LATCH/LATCH);
+
 /* The I/O port the PMTMR resides at.
  * The location is detected during setup_arch(),
  * in arch/i386/acpi/boot.c */
@@ -37,8 +39,7 @@ u32 pmtmr_ioport = 0;
 
 
 /* value of the Power timer at last timer interrupt */
-static u32 offset_tick;
-static u32 offset_delay;
+static u32 offset_last;
 
 static unsigned long long monotonic_base;
 static seqlock_t monotonic_lock = SEQLOCK_UNLOCKED;
@@ -127,6 +128,11 @@ pm_good:
 	if (verify_pmtmr_rate() != 0)
 		return -ENODEV;
 
+	printk ("Using %u PM timer ticks per jiffy \n", pm_ticks_per_jiffy);
+
+	offset_last = read_pmtmr();
+	setup_pit_timer();
+
 	init_cpu_khz();
 	return 0;
 }
@@ -150,47 +156,37 @@ static inline u32 cyc2us(u32 cycles)
  */
 static void mark_offset_pmtmr(void)
 {
-	u32 lost, delta, last_offset;
-	static int first_run = 1;
-	last_offset = offset_tick;
+	u32 lost, delta, deltaus, offset_now;
 
 	write_seqlock(&monotonic_lock);
 
-	offset_tick = read_pmtmr();
+	offset_now = read_pmtmr();
 
 	/* calculate tick interval */
-	delta = (offset_tick - last_offset) & ACPI_PM_MASK;
+	delta = (offset_now - offset_last) & ACPI_PM_MASK;
+
+	/* convert to ticks */
+	lost = delta / pm_ticks_per_jiffy;
+	offset_last += lost * pm_ticks_per_jiffy;
+	offset_last &= ACPI_PM_MASK;
 
 	/* convert to usecs */
-	delta = cyc2us(delta);
+	deltaus = cyc2us(lost*pm_ticks_per_jiffy);
 
 	/* update the monotonic base value */
-	monotonic_base += delta * NSEC_PER_USEC;
+	monotonic_base += deltaus * NSEC_PER_USEC;
 	write_sequnlock(&monotonic_lock);
 
-	/* convert to ticks */
-	delta += offset_delay;
-	lost = delta / (USEC_PER_SEC / HZ);
-	offset_delay = delta % (USEC_PER_SEC / HZ);
-
-
 	/* compensate for lost ticks */
 	if (lost >= 2)
 		jiffies_64 += lost - 1;
-
-	/* don't calculate delay for first run,
-	   or if we've got less then a tick */
-	if (first_run || (lost < 1)) {
-		first_run = 0;
-		offset_delay = 0;
-	}
 }
 
 static int pmtmr_resume(void)
 {
 	write_seqlock(&monotonic_lock);
 	/* Assume this is the last mark offset time */
-	offset_tick = read_pmtmr();
+	offset_last = read_pmtmr();
 	write_sequnlock(&monotonic_lock);
 	return 0;
 }
@@ -205,7 +201,7 @@ static unsigned long long monotonic_cloc
 	/* atomically read monotonic base & last_offset */
 	do {
 		seq = read_seqbegin(&monotonic_lock);
-		last_offset = offset_tick;
+		last_offset = offset_last;
 		base = monotonic_base;
 	} while (read_seqretry(&monotonic_lock, seq));
 
@@ -239,11 +235,11 @@ static unsigned long get_offset_pmtmr(vo
 {
 	u32 now, offset, delta = 0;
 
-	offset = offset_tick;
+	offset = offset_last;
 	now = read_pmtmr();
 	delta = (now - offset)&ACPI_PM_MASK;
 
-	return (unsigned long) offset_delay + cyc2us(delta);
+	return (unsigned long) cyc2us(delta);
 }
 
 
_

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Updated dynamic tick patches
  2005-09-01 13:07   ` Tony Lindgren
  2005-09-01 13:19     ` David Weinehall
  2005-09-01 14:11     ` Srivatsa Vaddagiri
@ 2005-09-02 17:34     ` Srivatsa Vaddagiri
  2005-09-03 10:16       ` Tony Lindgren
  2 siblings, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-02 17:34 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Con Kolivas, linux-kernel, arjan, s0348365, tytso, cfriesen,
	rlrevell, trenn, george, johnstul, akpm

On Thu, Sep 01, 2005 at 04:07:22PM +0300, Tony Lindgren wrote:
> Srivatsa, could you try the dyntick-test.c on your system after booting
> to init=/bin/sh to make the system as idle as possible?

Tony,
	I get this o/p when I run your test on my SMP system with
2.6.13-mm1 + Con's latest patches (including the most recent
lost tick calculation patch that I posted after that).

Testing sub-second select and usleep
  Test: select    0ms time:  0.000012s latency:  0.000012s status: OK
  Test: usleep    0ms time:  0.000013s latency:  0.000013s status: OK
  Test: select  100ms time:  0.099386s latency: -0.000614s status: OK
  Test: usleep  100ms time:  0.104019s latency:  0.004019s status: OK
  Test: select  200ms time:  0.200013s latency:  0.000013s status: OK
  Test: usleep  200ms time:  0.204016s latency:  0.004016s status: OK
  Test: select  300ms time:  0.300043s latency:  0.000043s status: OK
  Test: usleep  300ms time:  0.304056s latency:  0.004056s status: OK
  Test: select  400ms time:  0.400010s latency:  0.000010s status: OK
  Test: usleep  400ms time:  0.404098s latency:  0.004098s status: OK
  Test: select  500ms time:  0.499992s latency: -0.000008s status: OK
  Test: usleep  500ms time:  0.504000s latency:  0.004000s status: OK
  Test: select  600ms time:  0.600050s latency:  0.000050s status: OK
  Test: usleep  600ms time:  0.603959s latency:  0.003959s status: OK
  Test: select  700ms time:  0.699969s latency: -0.000031s status: OK
  Test: usleep  700ms time:  0.704037s latency:  0.004037s status: OK
  Test: select  800ms time:  0.800026s latency:  0.000026s status: OK
  Test: usleep  800ms time:  0.803978s latency:  0.003978s status: OK
  Test: select  900ms time:  0.900046s latency:  0.000046s status: OK
  Test: usleep  900ms time:  0.904003s latency:  0.004003s status: OK
Testing multi-second select and sleep
  Test: select    0ms time:  0.000005s latency:  0.000005s status: OK
  Test:  sleep    0ms time:  0.000006s latency:  0.000006s status: OK
  Test: select 1000ms time:  1.000062s latency:  0.000062s status: OK
  Test:  sleep 1000ms time:  1.004069s latency:  0.004069s status: OK
  Test: select 2000ms time:  2.000727s latency:  0.000727s status: OK
  Test:  sleep 2000ms time:  2.004141s latency:  0.004141s status: OK
  Test: select 3000ms time:  3.000127s latency:  0.000127s status: OK
  Test:  sleep 3000ms time:  3.004048s latency:  0.004048s status: OK
  Test: select 4000ms time:  4.000032s latency:  0.000032s status: OK
  Test:  sleep 4000ms time:  4.004827s latency:  0.004827s status: OK
  Test: select 5000ms time:  5.000118s latency:  0.000118s status: OK
  Test:  sleep 5000ms time:  5.004131s latency:  0.004131s status: OK
  Test: select 6000ms time:  5.997241s latency: -0.002759s status: OK
  Test:  sleep 6000ms time:  6.008025s latency:  0.008025s status: OK
  Test: select 7000ms time:  6.997195s latency: -0.002805s status: OK
  Test:  sleep 7000ms time:  7.004180s latency:  0.004180s status: OK
  Test: select 8000ms time:  8.000512s latency:  0.000512s status: OK
  Test:  sleep 8000ms time:  8.008116s latency:  0.008116s status: OK
  Test: select 9000ms time:  8.996997s latency: -0.003003s status: OK
  Test:  sleep 9000ms time:  9.004279s latency:  0.004279s status: OK


Don't see any ERROR status. The negative latencies doesn't seem to sound
good. Do you see them too? I ran your test on my RH9 based T30 and
find several negative latencies there too.



-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c
  2005-09-02 17:25       ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Srivatsa Vaddagiri
@ 2005-09-02 20:18         ` Thomas Schlichter
  2005-09-02 21:21           ` john stultz
  0 siblings, 1 reply; 96+ messages in thread
From: Thomas Schlichter @ 2005-09-02 20:18 UTC (permalink / raw)
  To: vatsa; +Cc: linux-kernel, akpm, john stultz, Con Kolivas, ck list

[-- Attachment #1: signed data --]
[-- Type: text/plain, Size: 1995 bytes --]

Hi Srivatsa,

thank you for improving your patch by fixing the two problems. Now I do have 
just two minor nits which you may consider:

1. You don't need to hold the monotonic_lock that long, it is only necessary 
when updating offset_last and monotonic_base. So I would propose something 
like this:

  offset_now = read_pmtmr();

  /* calculate tick interval */
  delta = (offset_now - offset_last) & ACPI_PM_MASK;

  /* convert to ticks */
  lost = delta / pm_ticks_per_jiffy;

  /* convert ticks to usecs */
  deltaus = cyc2us(lost * pm_ticks_per_jiffy);
  // can we use this instead: ?
  // deltaus = jiffies_to_usecs(lost);

  write_seqlock(&monotonic_lock);
  offset_last += lost * pm_ticks_per_jiffy;
  offset_last &= ACPI_PM_MASK;

  /* update the monotonic base value */
  monotonic_base += deltaus * NSEC_PER_USEC;
  write_sequnlock(&monotonic_lock);

2. Can we really assure that the monotonic clock is still monotonic?
I think with your new code we estimate the monotonic clock value and the 
offset_last at the last tick.
But if we underestimate monotonic_base or overestimate offset_last (even 
simply by rounding errors), the time will make a small step backwards with 
the value-update.
And as far as I understand the monotonic clock its not that bad if it drifts a 
bit, but it is really bad if time makes steps backward...

But maybe you can show me that I am wrong with my second point.
I hope I don't bother you too much with this kind of stuff...

  Thomas

P.S.: I CC'd John because he knows the monotonic clock better than I do... :-)

Am Freitag, 2. September 2005 19:25 schrieb Srivatsa Vaddagiri:
> Con,
> 	Pls use this updated "Lost tick" calculation patch, which rectifies the
> two problems Thomas pointed out. I have done some basic test with it.
>
> Would it be possible to incorporate this updated patch in
> http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch?
>
> Sorry for the inconvenience!

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c
  2005-09-02 20:18         ` Thomas Schlichter
@ 2005-09-02 21:21           ` john stultz
  0 siblings, 0 replies; 96+ messages in thread
From: john stultz @ 2005-09-02 21:21 UTC (permalink / raw)
  To: Thomas Schlichter; +Cc: vatsa, linux-kernel, akpm, Con Kolivas, ck list

On Fri, 2005-09-02 at 22:18 +0200, Thomas Schlichter wrote:
> 2. Can we really assure that the monotonic clock is still monotonic?
> I think with your new code we estimate the monotonic clock value and the 
> offset_last at the last tick.
> But if we underestimate monotonic_base or overestimate offset_last (even 
> simply by rounding errors), the time will make a small step backwards with 
> the value-update.
> And as far as I understand the monotonic clock its not that bad if it drifts a 
> bit, but it is really bad if time makes steps backward...
> 
> But maybe you can show me that I am wrong with my second point.
> I hope I don't bother you too much with this kind of stuff...
> 
>   Thomas
> 
> P.S.: I CC'd John because he knows the monotonic clock better than I do... :-)


Thanks Thomas, that's a good catch. Since monotonic_clock has no real
notion of interrupt edges (it was designed to be constant regardless if
we miss ticks), I would keep accumulating the full inter-tick intervals
converted to usecs into the monotonic_base.

thanks
-john



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-08-31 17:12 ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Srivatsa Vaddagiri
  2005-08-31 22:36   ` Zachary Amsden
  2005-09-02 15:43   ` [PATCH 1/3] dynticks - implement no idle hz for x86 Con Kolivas
@ 2005-09-03  4:05   ` Lee Revell
  2005-09-03  4:18     ` Peter Williams
                       ` (4 more replies)
  2 siblings, 5 replies; 96+ messages in thread
From: Lee Revell @ 2005-09-03  4:05 UTC (permalink / raw)
  To: vatsa
  Cc: linux-kernel, arjan, s0348365, kernel, tytso, cfriesen, trenn,
	george, johnstul, akpm

On Wed, 2005-08-31 at 22:42 +0530, Srivatsa Vaddagiri wrote:
> With this patch, time had kept up really well on one particular
> machine (Intel 4way Pentium 3 box) overnight, while
> on another newer machine (Intel 4way Xeon with HT) it didnt do so
> well (time sped up after 3 or 4 hours). Hence I consider this
> particular patch will need more review/work.
> 

Are lost ticks really that common?  If so, any idea what's disabling
interrupts for so long (or if it's a hardware issue)?  And if not, it
seems like you'd need an artificial way to simulate lost ticks in order
to test this stuff.

Lee


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-03  4:05   ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Lee Revell
@ 2005-09-03  4:18     ` Peter Williams
  2005-09-03  4:34       ` Lee Revell
  2005-09-03  5:15     ` Parag Warudkar
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 96+ messages in thread
From: Peter Williams @ 2005-09-03  4:18 UTC (permalink / raw)
  To: Lee Revell
  Cc: vatsa, linux-kernel, arjan, s0348365, kernel, tytso, cfriesen,
	trenn, george, johnstul, akpm

Lee Revell wrote:
> On Wed, 2005-08-31 at 22:42 +0530, Srivatsa Vaddagiri wrote:
> 
>>With this patch, time had kept up really well on one particular
>>machine (Intel 4way Pentium 3 box) overnight, while
>>on another newer machine (Intel 4way Xeon with HT) it didnt do so
>>well (time sped up after 3 or 4 hours). Hence I consider this
>>particular patch will need more review/work.
>>
> 
> 
> Are lost ticks really that common?  If so, any idea what's disabling
> interrupts for so long (or if it's a hardware issue)?  And if not, it
> seems like you'd need an artificial way to simulate lost ticks in order
> to test this stuff.

In my experience, turning off DMA for IDE disks is a pretty good way to 
generate lost ticks :-)

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-03  4:18     ` Peter Williams
@ 2005-09-03  4:34       ` Lee Revell
  2005-09-03  4:48         ` Peter Williams
  0 siblings, 1 reply; 96+ messages in thread
From: Lee Revell @ 2005-09-03  4:34 UTC (permalink / raw)
  To: Peter Williams
  Cc: vatsa, linux-kernel, arjan, s0348365, kernel, tytso, cfriesen,
	trenn, george, johnstul, akpm

On Sat, 2005-09-03 at 14:18 +1000, Peter Williams wrote:
> In my experience, turning off DMA for IDE disks is a pretty good way to 
> generate lost ticks :-)

For this to "work" you have to unset "unmask IRQ" with hdparm, right?

Lee


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-03  4:34       ` Lee Revell
@ 2005-09-03  4:48         ` Peter Williams
  0 siblings, 0 replies; 96+ messages in thread
From: Peter Williams @ 2005-09-03  4:48 UTC (permalink / raw)
  To: Lee Revell
  Cc: vatsa, linux-kernel, arjan, s0348365, kernel, tytso, cfriesen,
	trenn, george, johnstul, akpm

Lee Revell wrote:
> On Sat, 2005-09-03 at 14:18 +1000, Peter Williams wrote:
> 
>>In my experience, turning off DMA for IDE disks is a pretty good way to 
>>generate lost ticks :-)
> 
> 
> For this to "work" you have to unset "unmask IRQ" with hdparm, right?

I'm not familiar with that method.  When I've experienced this it's been 
due to me accidentally not configuring IDE DMA during configuration.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-03  4:05   ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Lee Revell
  2005-09-03  4:18     ` Peter Williams
@ 2005-09-03  5:15     ` Parag Warudkar
  2005-09-03  5:30       ` Lee Revell
  2005-09-03  5:20     ` Srivatsa Vaddagiri
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 96+ messages in thread
From: Parag Warudkar @ 2005-09-03  5:15 UTC (permalink / raw)
  To: Lee Revell
  Cc: vatsa, linux-kernel, arjan, s0348365, kernel, tytso, cfriesen,
	trenn, george, johnstul, akpm

Lee Revell wrote:

> Are lost ticks really that common? If so, any idea what's disabling
>
>interrupts for so long (or if it's a hardware issue)?  And if not, it
>seems like you'd need an artificial way to simulate lost ticks in order
>to test this stuff.
>
>Lee
>  
>
Yes - I know many people with laptops who have this lost ticks problem. 
So no simulation and/or
special efforts required.  If anyone wants a test bed - my laptop is the 
perfect instrument.

In my case the rip is always as acpi_processor_idle now a days. Earlier 
it used to be at acpi_ec_read.

Parag

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-03  4:05   ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Lee Revell
  2005-09-03  4:18     ` Peter Williams
  2005-09-03  5:15     ` Parag Warudkar
@ 2005-09-03  5:20     ` Srivatsa Vaddagiri
  2005-09-06 10:32     ` Pavel Machek
  2005-09-06 18:04     ` john stultz
  4 siblings, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-03  5:20 UTC (permalink / raw)
  To: Lee Revell
  Cc: linux-kernel, arjan, s0348365, kernel, tytso, cfriesen, trenn,
	george, johnstul, akpm

On Sat, Sep 03, 2005 at 12:05:00AM -0400, Lee Revell wrote:
> Are lost ticks really that common?  If so, any idea what's disabling

It becomes common with a patch like dynamic ticks, where we purposefully
skip ticks when CPU is idle. When the CPU wakes up, we have to regain
the lost/skipped ticks and thats where I ran into incorrect lost-tick
calculation issues.

> interrupts for so long (or if it's a hardware issue)?  And if not, it
> seems like you'd need an artificial way to simulate lost ticks in order
> to test this stuff.

Dyn-tick patch is enought to simulate these lost ticks!

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-03  5:15     ` Parag Warudkar
@ 2005-09-03  5:30       ` Lee Revell
  0 siblings, 0 replies; 96+ messages in thread
From: Lee Revell @ 2005-09-03  5:30 UTC (permalink / raw)
  To: Parag Warudkar
  Cc: vatsa, linux-kernel, arjan, s0348365, kernel, tytso, cfriesen,
	trenn, george, johnstul, akpm

On Sat, 2005-09-03 at 01:15 -0400, Parag Warudkar wrote:
> Lee Revell wrote:
> 
> > Are lost ticks really that common? If so, any idea what's disabling
> >
> >interrupts for so long (or if it's a hardware issue)?  And if not, it
> >seems like you'd need an artificial way to simulate lost ticks in order
> >to test this stuff.
> >
> >Lee
> >  
> >
> Yes - I know many people with laptops who have this lost ticks problem. 
> So no simulation and/or
> special efforts required.  If anyone wants a test bed - my laptop is the 
> perfect instrument.
> 
> In my case the rip is always as acpi_processor_idle now a days. Earlier 
> it used to be at acpi_ec_read.

Ah, OK, I forgot about SMM traps.

Lee


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-02 16:56     ` [PATCH 1/3] dynticks - implement no idle hz for x86 Russell King
  2005-09-02 17:12       ` Srivatsa Vaddagiri
@ 2005-09-03  6:13       ` Con Kolivas
  2005-09-03  7:58         ` Russell King
  1 sibling, 1 reply; 96+ messages in thread
From: Con Kolivas @ 2005-09-03  6:13 UTC (permalink / raw)
  To: Russell King; +Cc: vatsa, linux-kernel, akpm, ck list

On Sat, 3 Sep 2005 02:56, Russell King wrote:
> On Sat, Sep 03, 2005 at 01:43:57AM +1000, Con Kolivas wrote:
> > Ok I've resynced all the patches with 2.6.13-mm1, made some cleanups and
> > minor modifications. As pm timer is the only supported timer for dynticks
> > I've also made it depend on it.
> >
> > A rollup patch against 2.6.13-mm1 is here:
> >
> > http://ck.kolivas.org/patches/dyn-ticks/2.6.13-mm1-dtck1.patch
> >
> > also available in the dyn-ticks directory are the older patches and these
> > split out patches posted here.
>
> Are you guys going to sync your interfaces with what ARM has, or are
> we going to have two differing dyntick interfaces in the kernel, one
> for ARM and one for x86?
>
> I mentioned this before.  I seem to be ignored.

RMK

Noone's ignoring you. 

What we need to do is ensure that dynamic ticks is working properly on x86 and 
worth including before anything else. If and when we confirm this it makes 
sense only then to try and merge code from the other 2 architectures to as 
much common code as possible as no doubt we'll be modifying other 
architectures we're less familiar with. At that stage we will definitely want 
to tread even more cautiously at that stage.

Cheers,
Con

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-03  6:13       ` Con Kolivas
@ 2005-09-03  7:58         ` Russell King
  2005-09-03  8:01           ` Con Kolivas
  0 siblings, 1 reply; 96+ messages in thread
From: Russell King @ 2005-09-03  7:58 UTC (permalink / raw)
  To: Con Kolivas; +Cc: vatsa, linux-kernel, akpm, ck list

On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> Noone's ignoring you. 
> 
> What we need to do is ensure that dynamic ticks is working properly on x86 and 
> worth including before anything else. If and when we confirm this it makes 
> sense only then to try and merge code from the other 2 architectures to as 
> much common code as possible as no doubt we'll be modifying other 
> architectures we're less familiar with. At that stage we will definitely want 
> to tread even more cautiously at that stage.

dyntick has all the hallmarks of ending up another mess just like the
"generic" (hahaha) irq stuff in kernel/irq - it's being developed in
precisely the same way - by ignore non-x86 stuff.

I can well see that someone will say "ok, this is ready, merge it"
at which point we then end up with multiple differing userspace
methods of controlling it depending on the architecture, but
multiple differing kernel interfaces as well.

Indeed, you seem to be at the point where you'd like akpm to merge
it.  That sets alarm bells ringing if you haven't considered these
issues.

I want to avoid that.  Just because a couple of people say "we'll
deal with that later" it's no guarantee that it _will_ happen.  I
want to ensure that ARM doesn't get fscked over again like it did
with the generic IRQ crap.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-03  7:58         ` Russell King
@ 2005-09-03  8:01           ` Con Kolivas
  2005-09-03  8:06             ` Russell King
  0 siblings, 1 reply; 96+ messages in thread
From: Con Kolivas @ 2005-09-03  8:01 UTC (permalink / raw)
  To: Russell King; +Cc: vatsa, linux-kernel, akpm, ck list

On Sat, 3 Sep 2005 17:58, Russell King wrote:
> On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> > Noone's ignoring you.
> >
> > What we need to do is ensure that dynamic ticks is working properly on
> > x86 and worth including before anything else. If and when we confirm this
> > it makes sense only then to try and merge code from the other 2
> > architectures to as much common code as possible as no doubt we'll be
> > modifying other architectures we're less familiar with. At that stage we
> > will definitely want to tread even more cautiously at that stage.
>
> dyntick has all the hallmarks of ending up another mess just like the
> "generic" (hahaha) irq stuff in kernel/irq - it's being developed in
> precisely the same way - by ignore non-x86 stuff.
>
> I can well see that someone will say "ok, this is ready, merge it"
> at which point we then end up with multiple differing userspace
> methods of controlling it depending on the architecture, but
> multiple differing kernel interfaces as well.
>
> Indeed, you seem to be at the point where you'd like akpm to merge
> it.  That sets alarm bells ringing if you haven't considered these
> issues.
>
> I want to avoid that.  Just because a couple of people say "we'll
> deal with that later" it's no guarantee that it _will_ happen.  I
> want to ensure that ARM doesn't get fscked over again like it did
> with the generic IRQ crap.

Ok I'll make it clearer. We don't merge x86 dynticks to mainline till all are 
consolidated in -mm.

Con

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-03  8:01           ` Con Kolivas
@ 2005-09-03  8:06             ` Russell King
  2005-09-03  8:14               ` Con Kolivas
  0 siblings, 1 reply; 96+ messages in thread
From: Russell King @ 2005-09-03  8:06 UTC (permalink / raw)
  To: Con Kolivas; +Cc: vatsa, linux-kernel, akpm, ck list

On Sat, Sep 03, 2005 at 06:01:08PM +1000, Con Kolivas wrote:
> On Sat, 3 Sep 2005 17:58, Russell King wrote:
> > On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> > > Noone's ignoring you.
> > >
> > > What we need to do is ensure that dynamic ticks is working properly on
> > > x86 and worth including before anything else. If and when we confirm this
> > > it makes sense only then to try and merge code from the other 2
> > > architectures to as much common code as possible as no doubt we'll be
> > > modifying other architectures we're less familiar with. At that stage we
> > > will definitely want to tread even more cautiously at that stage.
> >
> > dyntick has all the hallmarks of ending up another mess just like the
> > "generic" (hahaha) irq stuff in kernel/irq - it's being developed in
> > precisely the same way - by ignore non-x86 stuff.
> >
> > I can well see that someone will say "ok, this is ready, merge it"
> > at which point we then end up with multiple differing userspace
> > methods of controlling it depending on the architecture, but
> > multiple differing kernel interfaces as well.
> >
> > Indeed, you seem to be at the point where you'd like akpm to merge
> > it.  That sets alarm bells ringing if you haven't considered these
> > issues.
> >
> > I want to avoid that.  Just because a couple of people say "we'll
> > deal with that later" it's no guarantee that it _will_ happen.  I
> > want to ensure that ARM doesn't get fscked over again like it did
> > with the generic IRQ crap.
> 
> Ok I'll make it clearer. We don't merge x86 dynticks to mainline till all are 
> consolidated in -mm.

Does this mean you're seriously going to rewrite bits of it after
you've spent what seems like months sorting out all the problems
currently being found?

Excuse me for being stupid, but I somehow don't see that happening.
Those months would be effectively wasted effort, both on the side
of the people working on the patches and those testing them.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-03  8:06             ` Russell King
@ 2005-09-03  8:14               ` Con Kolivas
  2005-09-04 20:10                 ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Con Kolivas @ 2005-09-03  8:14 UTC (permalink / raw)
  To: Russell King; +Cc: vatsa, linux-kernel, akpm, ck list

On Sat, 3 Sep 2005 18:06, Russell King wrote:
> On Sat, Sep 03, 2005 at 06:01:08PM +1000, Con Kolivas wrote:
> > On Sat, 3 Sep 2005 17:58, Russell King wrote:
> > > On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> > > > Noone's ignoring you.
> > > >
> > > > What we need to do is ensure that dynamic ticks is working properly
> > > > on x86 and worth including before anything else. If and when we
> > > > confirm this it makes sense only then to try and merge code from the
> > > > other 2 architectures to as much common code as possible as no doubt
> > > > we'll be modifying other architectures we're less familiar with. At
> > > > that stage we will definitely want to tread even more cautiously at
> > > > that stage.
> > >
> > > dyntick has all the hallmarks of ending up another mess just like the
> > > "generic" (hahaha) irq stuff in kernel/irq - it's being developed in
> > > precisely the same way - by ignore non-x86 stuff.
> > >
> > > I can well see that someone will say "ok, this is ready, merge it"
> > > at which point we then end up with multiple differing userspace
> > > methods of controlling it depending on the architecture, but
> > > multiple differing kernel interfaces as well.
> > >
> > > Indeed, you seem to be at the point where you'd like akpm to merge
> > > it.  That sets alarm bells ringing if you haven't considered these
> > > issues.
> > >
> > > I want to avoid that.  Just because a couple of people say "we'll
> > > deal with that later" it's no guarantee that it _will_ happen.  I
> > > want to ensure that ARM doesn't get fscked over again like it did
> > > with the generic IRQ crap.
> >
> > Ok I'll make it clearer. We don't merge x86 dynticks to mainline till all
> > are consolidated in -mm.
>
> Does this mean you're seriously going to rewrite bits of it after
> you've spent what seems like months sorting out all the problems
> currently being found?
>
> Excuse me for being stupid, but I somehow don't see that happening.
> Those months would be effectively wasted effort, both on the side
> of the people working on the patches and those testing them.

I've personally been on this code for 3 separate days in total and have no 
deadline or requirement for this to go in ever so I should stop speaking on 
behalf of the others.

Cheers,
Con

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: Updated dynamic tick patches
  2005-09-02 17:34     ` Srivatsa Vaddagiri
@ 2005-09-03 10:16       ` Tony Lindgren
  0 siblings, 0 replies; 96+ messages in thread
From: Tony Lindgren @ 2005-09-03 10:16 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Tony Lindgren, Con Kolivas, linux-kernel, arjan, s0348365, tytso,
	cfriesen, rlrevell, trenn, george, johnstul, akpm

On Fri, Sep 02, 2005 at 11:04:32PM +0530, Srivatsa Vaddagiri wrote:
> On Thu, Sep 01, 2005 at 04:07:22PM +0300, Tony Lindgren wrote:
> > Srivatsa, could you try the dyntick-test.c on your system after booting
> > to init=/bin/sh to make the system as idle as possible?
> 
> Tony,
> 	I get this o/p when I run your test on my SMP system with
> 2.6.13-mm1 + Con's latest patches (including the most recent
> lost tick calculation patch that I posted after that).
...
> 
> Don't see any ERROR status. The negative latencies doesn't seem to sound
> good. Do you see them too? I ran your test on my RH9 based T30 and
> find several negative latencies there too.

Good, thanks for testing.

>  Test: select 3000ms time:  3.000127s latency:  0.000127s status: OK

This is when I started seeing errors. Looks like if the next event from
next_timer_interrupt() is longer than HZ and idle HZ is very low, such
as 3 - 4 HZ, something gets confused.

I'll be looking into it more a bit later on, but until the problem
is solved, we should limit MAX_SKIP_TICKS to HZ/2.

Tony

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-03  8:14               ` Con Kolivas
@ 2005-09-04 20:10                 ` Nishanth Aravamudan
  2005-09-04 20:26                   ` Russell King
  2005-09-05  7:00                   ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-04 20:10 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Russell King, vatsa, linux-kernel, akpm, ck list

On 03.09.2005 [18:14:48 +1000], Con Kolivas wrote:
> On Sat, 3 Sep 2005 18:06, Russell King wrote:
> > On Sat, Sep 03, 2005 at 06:01:08PM +1000, Con Kolivas wrote:
> > > On Sat, 3 Sep 2005 17:58, Russell King wrote:
> > > > On Sat, Sep 03, 2005 at 04:13:10PM +1000, Con Kolivas wrote:
> > > > > Noone's ignoring you.
> > > > >
> > > > > What we need to do is ensure that dynamic ticks is working properly
> > > > > on x86 and worth including before anything else. If and when we
> > > > > confirm this it makes sense only then to try and merge code from the
> > > > > other 2 architectures to as much common code as possible as no doubt
> > > > > we'll be modifying other architectures we're less familiar with. At
> > > > > that stage we will definitely want to tread even more cautiously at
> > > > > that stage.
> > > >
> > > > dyntick has all the hallmarks of ending up another mess just like the
> > > > "generic" (hahaha) irq stuff in kernel/irq - it's being developed in
> > > > precisely the same way - by ignore non-x86 stuff.
> > > >
> > > > I can well see that someone will say "ok, this is ready, merge it"
> > > > at which point we then end up with multiple differing userspace
> > > > methods of controlling it depending on the architecture, but
> > > > multiple differing kernel interfaces as well.
> > > >
> > > > Indeed, you seem to be at the point where you'd like akpm to merge
> > > > it.  That sets alarm bells ringing if you haven't considered these
> > > > issues.
> > > >
> > > > I want to avoid that.  Just because a couple of people say "we'll
> > > > deal with that later" it's no guarantee that it _will_ happen.  I
> > > > want to ensure that ARM doesn't get fscked over again like it did
> > > > with the generic IRQ crap.
> > >
> > > Ok I'll make it clearer. We don't merge x86 dynticks to mainline till all
> > > are consolidated in -mm.
> >
> > Does this mean you're seriously going to rewrite bits of it after
> > you've spent what seems like months sorting out all the problems
> > currently being found?
> >
> > Excuse me for being stupid, but I somehow don't see that happening.
> > Those months would be effectively wasted effort, both on the side
> > of the people working on the patches and those testing them.
> 
> I've personally been on this code for 3 separate days in total and have no 
> deadline or requirement for this to go in ever so I should stop speaking on 
> behalf of the others.

To join in this conversation late:

I've got a few ideas that I think might help push Con's patch coalescing
efforts in an arch-independent fashion.

First of all, and maybe this is just me, I think it would be good to
make the dyn_tick_timer per-interrupt source, as opposed to each arch?
Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
APIC, ACPI PM-timer and the HPET. These structures could be put in
arch-specific timer.c files (there currently is not one for x86, I
believe). Then, at compilation time, the appropriate structure would be
linked to the arch-generic code. That should make the arch-independent
code simple to implement (I do have some patches in the works, but it's
slow going, right now, sorry). I think ARM and s390 could perhaps use
this infrastructure as well?

Also, I am a bit confused by the use of "dynamic-tick" to describe these
changes. To me, these are all NO_IDLE_HZ implementations, as they are
only invoked from cpu_idle() (or their equivalent) routines. I know this
is true of s390 and the x86 code, and I believe it is true of the ARM
code? If it were dynamic-tick, I would think we would be adjusting the
timer interrupt frequency continuously (e.g., at the end of
__run_timers() and at every call to {add,mod,del}_timer()). I was
working on a patch which did some renaming to no_idle_hz_timer, etc.,
but it's mostly code churn :)

Con, wrt to the x86 implementation, I think the max_skip field should be
a member of the interrupt source (dyn_tick_timer) structure, as opposed
to the state. This would require dyn_tick_reprogram_timer() to change
slightly: either push the max_skip check into arch-specific code (and
then have arch_reprogram() return the actual number of jiffies
programmed to skip) or simply change the check conditional.

Also, what exactly the purpose of conditional_run_local_timers()? It
seems identical to run_local_timers(), except you check for inequality
before potentially raising the softirq. It seems like the conditional
check in run_timer_softirq() [the TIMER_SOFTIRQ callback] will achieve
the same thing? And, in fact, I think that conditional is always true?
At the end of __run_timers, base->timer_jiffies should be greater than
jiffies by 1.

In any case, sorry for all the words and no code... I will try and
rectify that soon. I think it *is* possible to do some architecting now,
so that other architectures can also easily implement no_idle_hz.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-04 20:10                 ` Nishanth Aravamudan
@ 2005-09-04 20:26                   ` Russell King
  2005-09-04 20:37                     ` Nishanth Aravamudan
                                       ` (2 more replies)
  2005-09-05  7:00                   ` Srivatsa Vaddagiri
  1 sibling, 3 replies; 96+ messages in thread
From: Russell King @ 2005-09-04 20:26 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: Con Kolivas, vatsa, linux-kernel, akpm, ck list

On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> I've got a few ideas that I think might help push Con's patch coalescing
> efforts in an arch-independent fashion.

Note that ARM contains cleanups on top of Tony's original work, on
which the x86 version is based.

Basically, Tony submitted his ARM version, we discussed it, fixed up
some locking problems and simplified it (it contained multiple
structures which weren't necessary, even in multiple timer-based systems).

I'd be really surprised if any architecture couldn't use what ARM has
today - in other words, this is the only kernel-side interface:

#ifdef CONFIG_NO_IDLE_HZ

#define DYN_TICK_SKIPPING       (1 << 2)
#define DYN_TICK_ENABLED        (1 << 1)
#define DYN_TICK_SUITABLE       (1 << 0)

struct dyn_tick_timer {
        unsigned int    state;                  /* Current state */
        int             (*enable)(void);        /* Enables dynamic tick */
        int             (*disable)(void);       /* Disables dynamic tick */
        void            (*reprogram)(unsigned long); /* Reprograms the timer */
        int             (*handler)(int, void *, struct pt_regs *);
};

void timer_dyn_reprogram(void);
#else
#define timer_dyn_reprogram()   do { } while (0)
#endif

> First of all, and maybe this is just me, I think it would be good to
> make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> APIC, ACPI PM-timer and the HPET. These structures could be put in
> arch-specific timer.c files (there currently is not one for x86, I
> believe).

Each timer source should have its own struct dyn_tick_timer.  On x86,
maybe it makes sense having a pointer in the init_timer_opts or timer_opts
structures?

> I think ARM and s390 could perhaps use this infrastructure as well?

ARM already has a well thought-out encapsulation which is 100% suited to
its needs - which are essentially the same as x86 - the ability to select
one of several timer sources at boot time.

I would suggest having a good look at the ARM implementation.  See:
 include/asm-arm/mach/time.h (bit quoted above)
 arch/arm/kernel/irq.c (to update system time before calling any irq handler)
 arch/arm/kernel/time.c (initialisation and sysfs interface, etc)
 arch/arm/mach-sa1100/time.c, arch/arm/mach-pxa/time.c, and
 arch/arm/mach-omap1/time.c (dyntick machine class implementations).

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-04 20:26                   ` Russell King
@ 2005-09-04 20:37                     ` Nishanth Aravamudan
  2005-09-04 21:17                       ` Russell King
                                         ` (2 more replies)
  2005-09-04 20:41                     ` Nishanth Aravamudan
  2005-09-05  5:32                     ` Srivatsa Vaddagiri
  2 siblings, 3 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-04 20:37 UTC (permalink / raw)
  To: Con Kolivas, vatsa, linux-kernel, akpm, ck list

On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > I've got a few ideas that I think might help push Con's patch coalescing
> > efforts in an arch-independent fashion.
> 
> Note that ARM contains cleanups on top of Tony's original work, on
> which the x86 version is based.
> 
> Basically, Tony submitted his ARM version, we discussed it, fixed up
> some locking problems and simplified it (it contained multiple
> structures which weren't necessary, even in multiple timer-based systems).

Make sense. Thanks for the quick feedback!

> I'd be really surprised if any architecture couldn't use what ARM has
> today - in other words, this is the only kernel-side interface:
> 
> #ifdef CONFIG_NO_IDLE_HZ
> 
> #define DYN_TICK_SKIPPING       (1 << 2)
> #define DYN_TICK_ENABLED        (1 << 1)
> #define DYN_TICK_SUITABLE       (1 << 0)
> 
> struct dyn_tick_timer {
>         unsigned int    state;                  /* Current state */
>         int             (*enable)(void);        /* Enables dynamic tick */
>         int             (*disable)(void);       /* Disables dynamic tick */
>         void            (*reprogram)(unsigned long); /* Reprograms the timer */
>         int             (*handler)(int, void *, struct pt_regs *);
> };
> 
> void timer_dyn_reprogram(void);
> #else
> #define timer_dyn_reprogram()   do { } while (0)
> #endif

That looks great! So I guess I'm just suggesting moving this from
include/asm-arch/mach/time.h to arch-independent headers? Perhaps
timer.h is the best place for now, as it already contains the
next_timer_interrupt() prototype (which probably should be in the #ifdef
with timer_dyn_reprogram()).

> > First of all, and maybe this is just me, I think it would be good to
> > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> > Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> > APIC, ACPI PM-timer and the HPET. These structures could be put in
> > arch-specific timer.c files (there currently is not one for x86, I
> > believe).
> 
> Each timer source should have its own struct dyn_tick_timer.  On x86,
> maybe it makes sense having a pointer in the init_timer_opts or timer_opts
> structures?

Well, I know John Stultz is not a big fan of timer_opts, and is trying
to get rid of it :) timer_opts is supposed to be for timesources, I
believe, which are distinct from interrupt sources (e.g., TSC, Cyclone,
etc.), whereas I think dyn-tick is dealing with interrupt sources. I
guess if hardware (like the acpi_pm) can do both, there could be some
sort of inter-hooking.

> > I think ARM and s390 could perhaps use this infrastructure as well?
> 
> ARM already has a well thought-out encapsulation which is 100% suited to
> its needs - which are essentially the same as x86 - the ability to select
> one of several timer sources at boot time.
> 
> I would suggest having a good look at the ARM implementation.  See:
>  include/asm-arm/mach/time.h (bit quoted above)
>  arch/arm/kernel/irq.c (to update system time before calling any irq handler)
>  arch/arm/kernel/time.c (initialisation and sysfs interface, etc)
>  arch/arm/mach-sa1100/time.c, arch/arm/mach-pxa/time.c, and
>  arch/arm/mach-omap1/time.c (dyntick machine class implementations).

Yeah, I took a quick look before sending out my mail, but obviously need
to study it more. Thanks for the pointers! I guess that the time.h,
irq.c and time.c bits could all (or mostly) be done in arch-independent
code? I agree that your encapsulation seems to be suited to most arch's
use of NO_IDLE_HZ.

Overall, though, do you agree it would be best to have the common code
in a common file? If so, I'll work harder on getting some patches out.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-04 20:26                   ` Russell King
  2005-09-04 20:37                     ` Nishanth Aravamudan
@ 2005-09-04 20:41                     ` Nishanth Aravamudan
  2005-09-05  5:32                     ` Srivatsa Vaddagiri
  2 siblings, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-04 20:41 UTC (permalink / raw)
  To: Con Kolivas, vatsa, linux-kernel, akpm, ck list

On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > I've got a few ideas that I think might help push Con's patch coalescing
> > efforts in an arch-independent fashion.
> 
> Note that ARM contains cleanups on top of Tony's original work, on
> which the x86 version is based.
> 
> Basically, Tony submitted his ARM version, we discussed it, fixed up
> some locking problems and simplified it (it contained multiple
> structures which weren't necessary, even in multiple timer-based systems).

<snip>

> > First of all, and maybe this is just me, I think it would be good to
> > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> > Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> > APIC, ACPI PM-timer and the HPET. These structures could be put in
> > arch-specific timer.c files (there currently is not one for x86, I
> > believe).
> 
> Each timer source should have its own struct dyn_tick_timer.  On x86,
> maybe it makes sense having a pointer in the init_timer_opts or timer_opts
> structures?

Just to be clear, I think we mean the same thing with timer source and
interrupt source. But I believe time sources are distinct (which is why<
I think, John hates the naming (his own) of timer_opts).

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-04 20:37                     ` Nishanth Aravamudan
@ 2005-09-04 21:17                       ` Russell King
  2005-09-05  3:08                       ` Con Kolivas
  2005-09-05  6:58                       ` Tony Lindgren
  2 siblings, 0 replies; 96+ messages in thread
From: Russell King @ 2005-09-04 21:17 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: Con Kolivas, vatsa, linux-kernel, akpm, ck list

On Sun, Sep 04, 2005 at 01:37:55PM -0700, Nishanth Aravamudan wrote:
> That looks great! So I guess I'm just suggesting moving this from
> include/asm-arch/mach/time.h to arch-independent headers? Perhaps
> timer.h is the best place for now, as it already contains the
> next_timer_interrupt() prototype (which probably should be in the #ifdef
> with timer_dyn_reprogram()).

Sounds great!

> Overall, though, do you agree it would be best to have the common code
> in a common file? If so, I'll work harder on getting some patches out.

Absolutely, with the proviso that ARM doesn't (yet) use the generic IRQ
code.  I say "(yet)" there because there are some folk working in this
area, and I've recently merged a couple of bits which reduce the impact
of their patches.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-04 20:37                     ` Nishanth Aravamudan
  2005-09-04 21:17                       ` Russell King
@ 2005-09-05  3:08                       ` Con Kolivas
  2005-09-05 16:28                         ` Nishanth Aravamudan
  2005-09-05  6:58                       ` Tony Lindgren
  2 siblings, 1 reply; 96+ messages in thread
From: Con Kolivas @ 2005-09-05  3:08 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: vatsa, linux-kernel, akpm, ck list

On Mon, 5 Sep 2005 06:37 am, Nishanth Aravamudan wrote:
> On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > I've got a few ideas that I think might help push Con's patch
> > > coalescing efforts in an arch-independent fashion.

Thanks very much Nish!

I've updated the patches here http://ck.kolivas.org/patches/dyn-ticks/ with 
the latest change to timer_pm.c that Srivatsa sent me and have a new rollup 
there as well as the split out patches. The ball is in Nish's court now so 
we'll avoid touching the code till you get back to us (this project needs 
some form of locking ;) ).

Cheers,
Con

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-04 20:26                   ` Russell King
  2005-09-04 20:37                     ` Nishanth Aravamudan
  2005-09-04 20:41                     ` Nishanth Aravamudan
@ 2005-09-05  5:32                     ` Srivatsa Vaddagiri
  2005-09-05  5:48                       ` Nishanth Aravamudan
  2005-09-05  7:37                       ` Russell King
  2 siblings, 2 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05  5:32 UTC (permalink / raw)
  To: Nishanth Aravamudan, Con Kolivas, linux-kernel, akpm, ck list,
	rmk+lkml

On Sun, Sep 04, 2005 at 09:26:16PM +0100, Russell King wrote:
> I'd be really surprised if any architecture couldn't use what ARM has
> today - in other words, this is the only kernel-side interface:

Russel,
	I went thr' the ARM implementation and have some remarks (mostly
from a SMP perspective):

1. On a SMP platform, we want to let individual CPUs "sleep" independent of 
   each other. What this mean is there has to be some way of tracking which
   CPU's are sleeping currently, so that code like RCU ignores sleeping CPUs.
   This was the reason nohz_cpu_mask bitmap was added. I don't see that
   bitmap being updated at all in ARM implementation.

2. On architectures like x86 there is a separate jiffy interrupt source 
   (PIT) which is used to update time-of-day. This is different from the
   HZ timer interrupts used on each CPU (local apic timer). When all 
   CPUs are idle and sleeping, we want to shut off this PIT timer as well.
   That's why I added 'arch_all_cpus_idle' interface. One could argue that
   this can be done as part of the dyn_tick->reprogram interface as well,
   but I felt that having a separate arch_all_cpus_idle is cleaner and
   makes it clear what its purpose is.

3. The fact that we want to manipulate the bitmap (set a bit when CPU is going
   idle and unset it when it is waking up) _and_ the fact that want to take
   some action when all CPUs are idle or when the first CPU is waking up, 
   requires the use of a spinlock, which is again not present in the ARM 
   implementation.

4. Again the fact that CPUs could be sleeping independent of each other
   requires do_IRQ to check out whether the current CPU was sleeping as 
   its first step. If the CPU was sleeping, it needs to unset itself
   from the bitmap _and_ if we are coming out of "all-cpu-asleep" state,
   the PIT timer needs to be restarted as well as time recovered. Note
   that these two steps need not be undertaken if we were not in 
   "all-cpus-asleep" state.

I don't see provisions for all these in the current ARM implementation.
In fact the x86 patch that Tony/Con posted didnt take into account most of these
as well, which is the reason I jumped in to fix the above issues.

5. Don't see how DYN_TICK_SKIPPING is being used. In SMP scenario,
   it doesnt make sense since it will have to be per-cpu. The bitmap
   that I talked of exactly tells that (whether a CPU is skipping
   ticks or not).

6. S390 makes use of notifier mechanism to notify when CPUs are coming
   in and out of idle state. Don't know how it will be used in other
   arches. But obviously, if we are talking of unifying, we have to
   provide one.

I hope this makes clear why some of the rework happened, which
in a way is extending the interface that ARM already has. Having
said all these, I do agree that having a consistent interface 
is good (for example: x86 has dyn_tick_state structure whereas
ARM uses dyn_tick_timer strucuture itself to store the state etc).
   

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  5:32                     ` Srivatsa Vaddagiri
@ 2005-09-05  5:48                       ` Nishanth Aravamudan
  2005-09-05  6:32                         ` Srivatsa Vaddagiri
  2005-09-05  7:37                       ` Russell King
  1 sibling, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05  5:48 UTC (permalink / raw)
  To: Srivatsa Vaddagiri; +Cc: Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On 05.09.2005 [11:02:25 +0530], Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 09:26:16PM +0100, Russell King wrote:
> > I'd be really surprised if any architecture couldn't use what ARM has
> > today - in other words, this is the only kernel-side interface:
> 
> Russel,
> 	I went thr' the ARM implementation and have some remarks (mostly
> from a SMP perspective):
> 
> 1. On a SMP platform, we want to let individual CPUs "sleep" independent of 
>    each other. What this mean is there has to be some way of tracking which
>    CPU's are sleeping currently, so that code like RCU ignores sleeping CPUs.
>    This was the reason nohz_cpu_mask bitmap was added. I don't see that
>    bitmap being updated at all in ARM implementation.

Admittedly, I don't think SMP ARM has been around all that long? Maybe
the existing code just has not been extended.

> 2. On architectures like x86 there is a separate jiffy interrupt source 
>    (PIT) which is used to update time-of-day. This is different from the
>    HZ timer interrupts used on each CPU (local apic timer). When all 
>    CPUs are idle and sleeping, we want to shut off this PIT timer as well.
>    That's why I added 'arch_all_cpus_idle' interface. One could argue that
>    this can be done as part of the dyn_tick->reprogram interface as well,
>    but I felt that having a separate arch_all_cpus_idle is cleaner and
>    makes it clear what its purpose is.

I'm not sure on this. It's going to be NULL for other architectures, or
end up being called by the reprogram() call for the last CPU to go idle,
right (presuming there isn't a separate TOD source, like in x86). I
think it is better to be in the reprogram() interface.

> 3. The fact that we want to manipulate the bitmap (set a bit when CPU is going
>    idle and unset it when it is waking up) _and_ the fact that want to take
>    some action when all CPUs are idle or when the first CPU is waking up, 
>    requires the use of a spinlock, which is again not present in the ARM 
>    implementation.

This might just be tied to the same UP capabilities, I'm not sure.

> 4. Again the fact that CPUs could be sleeping independent of each other
>    requires do_IRQ to check out whether the current CPU was sleeping as 
>    its first step. If the CPU was sleeping, it needs to unset itself
>    from the bitmap _and_ if we are coming out of "all-cpu-asleep" state,
>    the PIT timer needs to be restarted as well as time recovered. Note
>    that these two steps need not be undertaken if we were not in 
>    "all-cpus-asleep" state.

I agree; the latter can be in the arch-specific reprogram() code, just
like arch_all_cpus_idle() (which might be better named to
arch_set_all_cpus_idle()).

> I don't see provisions for all these in the current ARM
> implementation.  In fact the x86 patch that Tony/Con posted didnt take
> into account most of these as well, which is the reason I jumped in to
> fix the above issues.

I definitely appreciate you doing so; dyn-tick for x86 has clearly come
a long way in a short time.

> 5. Don't see how DYN_TICK_SKIPPING is being used. In SMP scenario,
>    it doesnt make sense since it will have to be per-cpu. The bitmap
>    that I talked of exactly tells that (whether a CPU is skipping
>    ticks or not).

I'll take a look at this.

> 6. S390 makes use of notifier mechanism to notify when CPUs are coming
>    in and out of idle state. Don't know how it will be used in other
>    arches. But obviously, if we are talking of unifying, we have to
>    provide one.

Couldn't that be part of the s390-specific init() code? That member is
non-existent in the ARM implementation either, but it should not be hard
to add? The only issue I see, though, is that the function which the
init() member points to should not be marked __init, as we could have an
empty pointer later on?

> I hope this makes clear why some of the rework happened, which
> in a way is extending the interface that ARM already has. Having
> said all these, I do agree that having a consistent interface 
> is good (for example: x86 has dyn_tick_state structure whereas
> ARM uses dyn_tick_timer strucuture itself to store the state etc).

I'm not sure on this last one, though, what part of the state can't
simply be represented by an integer with appropriate &-ing?

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  5:48                       ` Nishanth Aravamudan
@ 2005-09-05  6:32                         ` Srivatsa Vaddagiri
  2005-09-05  6:44                           ` Nishanth Aravamudan
  2005-09-07 16:14                           ` Bill Davidsen
  0 siblings, 2 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05  6:32 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> Admittedly, I don't think SMP ARM has been around all that long? Maybe
> the existing code just has not been extended.

Yeah, maybe ARM never cared for SMP. But we do care :)

> I'm not sure on this. It's going to be NULL for other architectures, or
> end up being called by the reprogram() call for the last CPU to go idle,
> right (presuming there isn't a separate TOD source, like in x86). I
> think it is better to be in the reprogram() interface.

Non-x86 could have it set to NULL, in which case it doesn't get called.
(I know the current code does not take care of this situation).
But having an explicit 'all_cpus_idle' interface may be good, since 
Tony talked of idling some devices when all CPUs are idle. So it
probably has non-x86/PIT uses too.

> > 6. S390 makes use of notifier mechanism to notify when CPUs are coming
> >    in and out of idle state. Don't know how it will be used in other
> >    arches. But obviously, if we are talking of unifying, we have to
> >    provide one.
> 
> Couldn't that be part of the s390-specific init() code? That member is
> non-existent in the ARM implementation either, but it should not be hard
> to add? The only issue I see, though, is that the function which the
> init() member points to should not be marked __init, as we could have an
> empty pointer later on?

If we consider that only s390 needs it and other arch's dont, then it need 
not be even part of the init code. Basically the notifier list can be maintained
by s390 in its arch-code entirely and have 'stop_hz_timer' call into
dyn_tick_reprogram_timer (or something like that)? But I feel there will be 
other uses for the notifier list (I know the slab reap timer fires every two 
seconds and that may be unnecessary on idle CPUs if it is not reaping 
anything - perhaps it could use such a notifier to fire at longer intervals on 
idle CPUs? That may be true of other short-timers that kernel/drivers may be 
using. This is just a thought and may need more consideration before we 
put a notifier mechanism in arch-independent code).

> I'm not sure on this last one, though, what part of the state can't
> simply be represented by an integer with appropriate &-ing?

Everything can be represented in bits! I was just comparing composition
of structures in ARM and x86. The state bitfield is part of 
'struct dyn_tick_timer' itself in ARM while it is part of a separate structure 
(dyn_tick_state) in x86. Similar minor points need to be sorted out while 
unifying.


-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  6:32                         ` Srivatsa Vaddagiri
@ 2005-09-05  6:44                           ` Nishanth Aravamudan
  2005-09-06 20:51                             ` Nishanth Aravamudan
  2005-09-07 16:14                           ` Bill Davidsen
  1 sibling, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05  6:44 UTC (permalink / raw)
  To: Srivatsa Vaddagiri; +Cc: Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On 05.09.2005 [12:02:29 +0530], Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> > Admittedly, I don't think SMP ARM has been around all that long?
> > Maybe the existing code just has not been extended.
> 
> Yeah, maybe ARM never cared for SMP. But we do care :)

I just took a look at arm/Kconfig and SMP is marked as EXPERIMENTAL &&
BROKEN. So I'm guessing that is the only reason for some of the
differences you mentioned (the differences are of course, valid and the
x86 SMP implementation makes sense to me to extend arch-independently).

> > I'm not sure on this. It's going to be NULL for other architectures,
> > or end up being called by the reprogram() call for the last CPU to
> > go idle, right (presuming there isn't a separate TOD source, like in
> > x86). I think it is better to be in the reprogram() interface.
> 
> Non-x86 could have it set to NULL, in which case it doesn't get
> called.  (I know the current code does not take care of this
> situation).  But having an explicit 'all_cpus_idle' interface may be
> good, since Tony talked of idling some devices when all CPUs are idle.
> So it probably has non-x86/PIT uses too.

OK, not a problem. I'll try and write up a general intsource.h file
(interrupt source header) tonight and tomorrow and send it to this list
to see if everybody agrees on what's in the structure and where the
arch-independent/dependent line lies.

> > > 6. S390 makes use of notifier mechanism to notify when CPUs are
> > >    coming in and out of idle state. Don't know how it will be used
> > >    in other arches. But obviously, if we are talking of unifying,
> > >    we have to provide one.
> > 
> > Couldn't that be part of the s390-specific init() code? That member
> > is non-existent in the ARM implementation either, but it should not
> > be hard to add? The only issue I see, though, is that the function
> > which the init() member points to should not be marked __init, as we
> > could have an empty pointer later on?
> 
> If we consider that only s390 needs it and other arch's dont, then it
> need not be even part of the init code. Basically the notifier list
> can be maintained by s390 in its arch-code entirely and have
> 'stop_hz_timer' call into dyn_tick_reprogram_timer (or something like
> that)? But I feel there will be other uses for the notifier list (I
> know the slab reap timer fires every two seconds and that may be
> unnecessary on idle CPUs if it is not reaping anything - perhaps it
> could use such a notifier to fire at longer intervals on idle CPUs?
> That may be true of other short-timers that kernel/drivers may be
> using. This is just a thought and may need more consideration before
> we put a notifier mechanism in arch-independent code).

Yeah, maybe we would be ok with keeping the notifier setup s390-specific
for now, and then extending the faculty to arch-independent code if we
find good (clean) reasons to do so. I'm not saying the slab reaping code
is insufficient, but I want to keep the structure and code as simple as
possible at first (in the design phase, at least).

> > I'm not sure on this last one, though, what part of the state can't
> > simply be represented by an integer with appropriate &-ing?
> 
> Everything can be represented in bits! I was just comparing
> composition of structures in ARM and x86. The state bitfield is part
> of 'struct dyn_tick_timer' itself in ARM while it is part of a
> separate structure (dyn_tick_state) in x86. Similar minor points need
> to be sorted out while unifying.

Heh, I agree :) I just wanted to make sure that I hadn't missed
something and there was a *specific* reason the x86 code was using a
separate structure. I actuall prefer keeping it tied to the interrupt
source; it's simpler to me.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-04 20:37                     ` Nishanth Aravamudan
  2005-09-04 21:17                       ` Russell King
  2005-09-05  3:08                       ` Con Kolivas
@ 2005-09-05  6:58                       ` Tony Lindgren
  2005-09-05 16:30                         ` Nishanth Aravamudan
  2 siblings, 1 reply; 96+ messages in thread
From: Tony Lindgren @ 2005-09-05  6:58 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: Con Kolivas, vatsa, linux-kernel, akpm, ck list

* Nishanth Aravamudan <nacc@us.ibm.com> [050904 23:38]:
> On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > I've got a few ideas that I think might help push Con's patch coalescing
> > > efforts in an arch-independent fashion.
> > 
> > Note that ARM contains cleanups on top of Tony's original work, on
> > which the x86 version is based.
> > 
> > Basically, Tony submitted his ARM version, we discussed it, fixed up
> > some locking problems and simplified it (it contained multiple
> > structures which weren't necessary, even in multiple timer-based systems).
> 
> Make sense. Thanks for the quick feedback!
> 
> > I'd be really surprised if any architecture couldn't use what ARM has
> > today - in other words, this is the only kernel-side interface:
> > 
> > #ifdef CONFIG_NO_IDLE_HZ
> > 
> > #define DYN_TICK_SKIPPING       (1 << 2)
> > #define DYN_TICK_ENABLED        (1 << 1)
> > #define DYN_TICK_SUITABLE       (1 << 0)
> > 
> > struct dyn_tick_timer {
> >         unsigned int    state;                  /* Current state */
> >         int             (*enable)(void);        /* Enables dynamic tick */
> >         int             (*disable)(void);       /* Disables dynamic tick */
> >         void            (*reprogram)(unsigned long); /* Reprograms the timer */
> >         int             (*handler)(int, void *, struct pt_regs *);
> > };
> > 
> > void timer_dyn_reprogram(void);
> > #else
> > #define timer_dyn_reprogram()   do { } while (0)
> > #endif
> 
> That looks great! So I guess I'm just suggesting moving this from
> include/asm-arch/mach/time.h to arch-independent headers? Perhaps
> timer.h is the best place for now, as it already contains the
> next_timer_interrupt() prototype (which probably should be in the #ifdef
> with timer_dyn_reprogram()).

Yes, the above should be enough on all platforms. I believe x86 still uses
two structs, and should be updated to use the interface above. There are
some extra state flags on x86, but even some of those might be
unnecessary now.

It may not be obvious from the mailing list discussions, but really the
remaining problems are to fix the x86 legacy issues with all the timers,
not with the interface.

Tony

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-04 20:10                 ` Nishanth Aravamudan
  2005-09-04 20:26                   ` Russell King
@ 2005-09-05  7:00                   ` Srivatsa Vaddagiri
  2005-09-05  7:27                     ` Tony Lindgren
                                       ` (3 more replies)
  1 sibling, 4 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05  7:00 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Con Kolivas, Russell King, linux-kernel, akpm, ck list

On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> First of all, and maybe this is just me, I think it would be good to
> make the dyn_tick_timer per-interrupt source, as opposed to each arch?

Nish, may be a good idea as it may make the code more cleaner (it will
remove the 'if (cpu_has_local_apic())' kind of code that is there
currently in x86). However note that ARM currently has 'handler' member also 
part of it, which is used to recover time and that has nothing to do with 
interrupt source. Unless there is something like John's TOD, we still
need to recover time in a arch-dependent fashion ..Where do you
propose to have that 'handler' member?


> Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> APIC, ACPI PM-timer and the HPET. These structures could be put in

Does the ACPI PM-timer support generating interrupts also? Same question
I have for HPET.

> arch-specific timer.c files (there currently is not one for x86, I
> believe). Then, at compilation time, the appropriate structure would be
> linked to the arch-generic code. That should make the arch-independent

I think this binding has to be done at run-time, instead of compile-time?
(since we may build in support for local APIC but not find one at run-time, in 
which case we have to fall back on PIT as interrupt source).

> code simple to implement (I do have some patches in the works, but it's
> slow going, right now, sorry). I think ARM and s390 could perhaps use
> this infrastructure as well?
> 
> Also, I am a bit confused by the use of "dynamic-tick" to describe these
> changes. To me, these are all NO_IDLE_HZ implementations, as they are
> only invoked from cpu_idle() (or their equivalent) routines. I know this
> is true of s390 and the x86 code, and I believe it is true of the ARM
> code? If it were dynamic-tick, I would think we would be adjusting the
> timer interrupt frequency continuously (e.g., at the end of
> __run_timers() and at every call to {add,mod,del}_timer()). I was
> working on a patch which did some renaming to no_idle_hz_timer, etc.,
> but it's mostly code churn :)

Yes, the name 'dynamic-tick' is misleading!

> Con, wrt to the x86 implementation, I think the max_skip field should be
> a member of the interrupt source (dyn_tick_timer) structure, as opposed
> to the state. This would require dyn_tick_reprogram_timer() to change

max_skip is dictated by two things - interrupt and the backing time source.
In case of Local APIC, it may allow for ticks to be skipped upto few tens of
seconds, but if we are using ACPI PM timer to recover time, then we can
really skip not more than what the 24-bit PM timer allows for recovering time.
(few seconds if I remember correctly). Do you agree?


> Also, what exactly the purpose of conditional_run_local_timers()? It
> seems identical to run_local_timers(), except you check for inequality
> before potentially raising the softirq. It seems like the conditional
> check in run_timer_softirq() [the TIMER_SOFTIRQ callback] will achieve
> the same thing? And, in fact, I think that conditional is always true?

Nish, that was just an optimization for not raising the softirq at all
if the CPU was woken up w/o having skipped any ticks (becasue
of some external interrupt).


-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:00                   ` Srivatsa Vaddagiri
@ 2005-09-05  7:27                     ` Tony Lindgren
  2005-09-05 17:02                       ` Nishanth Aravamudan
  2005-09-05  7:44                     ` Russell King
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 96+ messages in thread
From: Tony Lindgren @ 2005-09-05  7:27 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Nishanth Aravamudan, Con Kolivas, Russell King, linux-kernel,
	akpm, ck list

* Srivatsa Vaddagiri <vatsa@in.ibm.com> [050905 10:03]:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > 
> > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > is true of s390 and the x86 code, and I believe it is true of the ARM
> > code? If it were dynamic-tick, I would think we would be adjusting the
> > timer interrupt frequency continuously (e.g., at the end of
> > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > but it's mostly code churn :)
> 
> Yes, the name 'dynamic-tick' is misleading!

Huh? For most people dynamic-tick is much more descriptive name than
NO_IDLE_HZ or VST!

If you wanted, you could reprogram the next timer to happen from
{add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.

And you would want to do that if you wanted sub-jiffie timer interrupts.

So I'd rather not limit the name to the currently implemented functionality
only :)

Tony

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  5:32                     ` Srivatsa Vaddagiri
  2005-09-05  5:48                       ` Nishanth Aravamudan
@ 2005-09-05  7:37                       ` Russell King
  2005-09-05  7:49                         ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 96+ messages in thread
From: Russell King @ 2005-09-05  7:37 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Nishanth Aravamudan, Con Kolivas, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 11:02:25AM +0530, Srivatsa Vaddagiri wrote:
> I don't see provisions for all these in the current ARM implementation.

That's because, like x86, we've been ignoring each other.  ARM
doesn't handle dyntick SMP yet - ARM is fairly young as far as
SMP issues goes, and as yet doesn't include a full SMP
implementation in mainline.

Despite that, the timers as implemented on the hardware are not
suitable for dyntick use - attempting to use them, you lose long
term precision of the timer interrupts.

> 5. Don't see how DYN_TICK_SKIPPING is being used. In SMP scenario,
>    it doesnt make sense since it will have to be per-cpu. The bitmap
>    that I talked of exactly tells that (whether a CPU is skipping
>    ticks or not).

What's DYN_TICK_SKIPPING and what's it used for?  It looks like
a redundant definition left over from Tony's original implementation.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:00                   ` Srivatsa Vaddagiri
  2005-09-05  7:27                     ` Tony Lindgren
@ 2005-09-05  7:44                     ` Russell King
  2005-09-05  8:19                       ` Srivatsa Vaddagiri
  2005-09-05 17:04                       ` Nishanth Aravamudan
  2005-09-05 13:19                     ` Srivatsa Vaddagiri
  2005-09-05 16:57                     ` Nishanth Aravamudan
  3 siblings, 2 replies; 96+ messages in thread
From: Russell King @ 2005-09-05  7:44 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Nishanth Aravamudan, Con Kolivas, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 12:30:53PM +0530, Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > First of all, and maybe this is just me, I think it would be good to
> > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> 
> Nish, may be a good idea as it may make the code more cleaner (it will
> remove the 'if (cpu_has_local_apic())' kind of code that is there
> currently in x86). However note that ARM currently has 'handler' member also 
> part of it, which is used to recover time and that has nothing to do with 
> interrupt source. Unless there is something like John's TOD, we still
> need to recover time in a arch-dependent fashion ..Where do you
> propose to have that 'handler' member?

Exactly where it is.  It's there because of the problem you allude to
above - it's there to catch up system time.  Any generic code can't
answer the question "how much time has passed since we disabled the
timer" without additional information.

However, we could change "handler" to be a function pointer which
returns the number of missed ticks instead, and then updates the
kernels time and tick keeping.  That would probably be more efficient.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:37                       ` Russell King
@ 2005-09-05  7:49                         ` Srivatsa Vaddagiri
  2005-09-05  8:00                           ` Russell King
  0 siblings, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05  7:49 UTC (permalink / raw)
  To: Nishanth Aravamudan, Con Kolivas, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 08:37:28AM +0100, Russell King wrote:
> That's because, like x86, we've been ignoring each other.  ARM
> doesn't handle dyntick SMP yet - ARM is fairly young as far as
> SMP issues goes, and as yet doesn't include a full SMP
> implementation in mainline.



> 
> Despite that, the timers as implemented on the hardware are not
> suitable for dyntick use - attempting to use them, you lose long
> term precision of the timer interrupts.

Thats one of the problems I am seeing on x86 as well. Recovering
wall-time precisely after sleep is tough esepcially if the interrupt
source (PIT) and backing-time source (TSC/PM Timer/HPET) can
drift wrt each other. PPC64 should be much better I hope (which is what I 
intend to take up next).

> > 5. Don't see how DYN_TICK_SKIPPING is being used. In SMP scenario,
> >    it doesnt make sense since it will have to be per-cpu. The bitmap
> >    that I talked of exactly tells that (whether a CPU is skipping
> >    ticks or not).
> 
> What's DYN_TICK_SKIPPING and what's it used for?  It looks like
> a redundant definition left over from Tony's original implementation.

Tony was using it to signal that all CPUs are idle and timer are
being skipped. With the SMP changes I made, I felt it can be
substituted with the nohz_cpu_mask bitmap and hence I removed
it.


-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:49                         ` Srivatsa Vaddagiri
@ 2005-09-05  8:00                           ` Russell King
  2005-09-05 16:33                             ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Russell King @ 2005-09-05  8:00 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Nishanth Aravamudan, Con Kolivas, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 01:19:28PM +0530, Srivatsa Vaddagiri wrote:
> > Despite that, the timers as implemented on the hardware are not
> > suitable for dyntick use - attempting to use them, you lose long
> > term precision of the timer interrupts.
> 
> Thats one of the problems I am seeing on x86 as well. Recovering
> wall-time precisely after sleep is tough esepcially if the interrupt
> source (PIT) and backing-time source (TSC/PM Timer/HPET) can
> drift wrt each other. PPC64 should be much better I hope (which is what I 
> intend to take up next).

This is why the config option to enable it on ARM has a warning in
there about it.  Some hardware timer implementations just aren't
suitable for this, so users should be warned about it (and are on
ARM.)

> Tony was using it to signal that all CPUs are idle and timer are
> being skipped. With the SMP changes I made, I felt it can be
> substituted with the nohz_cpu_mask bitmap and hence I removed
> it.

Well, consider that definition removed from ARM.  Forget it was even
saw it in there. 8)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:44                     ` Russell King
@ 2005-09-05  8:19                       ` Srivatsa Vaddagiri
  2005-09-05  8:32                         ` Russell King
  2005-09-05 17:04                       ` Nishanth Aravamudan
  1 sibling, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05  8:19 UTC (permalink / raw)
  To: Nishanth Aravamudan, Con Kolivas, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 08:44:25AM +0100, Russell King wrote:
> Exactly where it is.  It's there because of the problem you allude to
> above - it's there to catch up system time.  Any generic code can't
> answer the question "how much time has passed since we disabled the
> timer" without additional information.
> 
> However, we could change "handler" to be a function pointer which
> returns the number of missed ticks instead, and then updates the
> kernels time and tick keeping.  That would probably be more efficient.

This is precisely what I have done. I have made cur_timer->mark-offset() to 
return the lost ticks and update wall-time from the callee, which
can be either timer_interrupt handler or in dyn-tick case the dyn-tick
code (I have called it dyn_tick_interrupt) which is called before processing 
_any_ interrupt. If ARM had a timer_opts equivalent we could have followed 
the same approach i.e remove 'handler' member and call dyn_tick_interrupt 
as first step in __do_irq/do_IRQ to process whatever it wants (recover wall 
time, start PIT timer in case of x86 etc). This is the definition of 
dyn_tick_interrupt that I have in my patch:


~~~~~~~~~~~~~


asm-i386/dyn-tick.h:

#ifdef CONFIG_NO_IDLE_HZ

extern void dyn_tick_interrupt(int irq, struct pt_regs *regs);

#else

static inline void dyn_tick_interrupt(int irq, struct pt_regs *regs)
{
}

#endif

And dyn_tick_interrupt is coded as:


arch/i386/kernel/dyn-tick.c:

void dyn_tick_interrupt(int irq, struct pt_regs *regs)
{
       int all_were_sleeping = 0;
       int cpu = smp_processor_id();

       if (!cpu_isset(cpu, nohz_cpu_mask))
               return;

       spin_lock(&dyn_tick_lock);

       if (cpus_equal(nohz_cpu_mask, cpu_online_map))
               all_were_sleeping = 1;
       cpu_clear(cpu, nohz_cpu_mask);

       if (all_were_sleeping) {
               /* Recover jiffies */
                if (irq) {
                        int lost;

                        lost = cur_timer->mark_offset();
                        if (lost)
                                do_timer(regs);
                }
                if (cpu_has_local_apic())
                        enable_pit_timer();
       }

       spin_unlock(&dyn_tick_lock);

       if (cpu_has_local_apic())
               /* Fixme: Needs to be more accurate */
               reprogram_apic_timer(1);
       else
               reprogram_pit_timer(1);

       conditional_run_local_timers();

       /* Fixme: Enable NMI watchdog */
}


~~~~~~~~~~~

Considering that ARM does not have any of that timer_opts structure,
could you call into INT_OS_TIMER handler from dyn_tick_interrupt? AFAICS,
INT_OS_TIMER handler and dyn_tick->handler is same ..



-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  8:19                       ` Srivatsa Vaddagiri
@ 2005-09-05  8:32                         ` Russell King
  2005-09-05  9:24                           ` Srivatsa Vaddagiri
  2005-09-05 17:06                           ` Nishanth Aravamudan
  0 siblings, 2 replies; 96+ messages in thread
From: Russell King @ 2005-09-05  8:32 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Nishanth Aravamudan, Con Kolivas, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 01:49:35PM +0530, Srivatsa Vaddagiri wrote:
> This is precisely what I have done. I have made cur_timer->mark-offset() to 
> return the lost ticks and update wall-time from the callee, which
> can be either timer_interrupt handler or in dyn-tick case the dyn-tick
> code (I have called it dyn_tick_interrupt) which is called before processing 
> _any_ interrupt.

When you have a timer which constantly increments from 0 to MAX and
wraps, and you can set the value to match to cause an interrupt,
it makes more sense to handle it the way we're doing it (which
incidentally leads to no loss of precision.)

Calculating the number of ticks missed, updating the kernel time,
and updating the timer match will cause problems with these - if
the timer has already past the number of ticks you originally
calculated, you may not get another interrupt for a long time.

So I don't actually think that your proposal will work for these
(SA11x0 and PXA).

> If ARM had a timer_opts equivalent we could have followed 

I think your timer_opts is effectively our struct sys_timer.

>                         int lost;
> 
>                         lost = cur_timer->mark_offset();
>                         if (lost)
>                                 do_timer(regs);

This seems to only recover one tick.  What if multiple ticks were lost?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  8:32                         ` Russell King
@ 2005-09-05  9:24                           ` Srivatsa Vaddagiri
  2005-09-05 17:06                           ` Nishanth Aravamudan
  1 sibling, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05  9:24 UTC (permalink / raw)
  To: Nishanth Aravamudan, Con Kolivas, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 09:32:21AM +0100, Russell King wrote:
> When you have a timer which constantly increments from 0 to MAX and
> wraps, and you can set the value to match to cause an interrupt,
> it makes more sense to handle it the way we're doing it (which
> incidentally leads to no loss of precision.)

> Calculating the number of ticks missed, updating the kernel time,
> and updating the timer match will cause problems with these - if
> the timer has already past the number of ticks you originally
> calculated, you may not get another interrupt for a long time.
> 
> So I don't actually think that your proposal will work for these
> (SA11x0 and PXA).

I presume you are referring to code as in omap_32k_timer_interrupt
which calculates lost ticks as well as updates wall-time and 
sets up the next interrupt (BTW doesnt 'now' need to be
refreshed everytime in the loop otherwise will cause the problem
you cite - may not get interrupt for a long time?). Tony,
that may have cause slow bootups for you :)

I am not saying that all the above be done from the callee. In fact
in case of ARM, the same handler can be called from dyn_tick_interrupt.
Having some form of 'dyn_tick_interrupt' makes sense because
it encapsulates functionalities like:

	- If CPU is not sleeping currently, return (which can happen in SMP)
	- Reset the CPU from the bitmap, under the cover of a spinlock
	- Recover wall-time if we are coming out of 'all-cpus-were-asleep'
	  state.  In case of ARM, dyn_tick_timer->handler could be called
	  for this purpose.

 
> This seems to only recover one tick.  What if multiple ticks were lost?

cur_timer->mark_offset() recovers the rest.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:00                   ` Srivatsa Vaddagiri
  2005-09-05  7:27                     ` Tony Lindgren
  2005-09-05  7:44                     ` Russell King
@ 2005-09-05 13:19                     ` Srivatsa Vaddagiri
  2005-09-05 16:57                     ` Nishanth Aravamudan
  3 siblings, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05 13:19 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Con Kolivas, Russell King, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 12:30:53PM +0530, Srivatsa Vaddagiri wrote:
> > Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> > APIC, ACPI PM-timer and the HPET. These structures could be put in
> 
> Does the ACPI PM-timer support generating interrupts also? Same question
> I have for HPET.

I think HPET does support generation of interrupts but not PM timer.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  3:08                       ` Con Kolivas
@ 2005-09-05 16:28                         ` Nishanth Aravamudan
  0 siblings, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 16:28 UTC (permalink / raw)
  To: Con Kolivas; +Cc: vatsa, linux-kernel, akpm, ck list

On 05.09.2005 [13:08:20 +1000], Con Kolivas wrote:
> On Mon, 5 Sep 2005 06:37 am, Nishanth Aravamudan wrote:
> > On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > I've got a few ideas that I think might help push Con's patch
> > > > coalescing efforts in an arch-independent fashion.
> 
> Thanks very much Nish!
> 
> I've updated the patches here http://ck.kolivas.org/patches/dyn-ticks/ with 
> the latest change to timer_pm.c that Srivatsa sent me and have a new rollup 
> there as well as the split out patches. The ball is in Nish's court now so 
> we'll avoid touching the code till you get back to us (this project needs 
> some form of locking ;) ).

Albeit, don't take that to mean that other people shouldn't keep doing
what they are doing (Srivatsa with his pm_timer work, scalability work,
e.g.) :) Hopefully, any changes I make, will not take too long.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  6:58                       ` Tony Lindgren
@ 2005-09-05 16:30                         ` Nishanth Aravamudan
  0 siblings, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 16:30 UTC (permalink / raw)
  To: Tony Lindgren; +Cc: Con Kolivas, vatsa, linux-kernel, akpm, ck list

On 05.09.2005 [09:58:59 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <nacc@us.ibm.com> [050904 23:38]:
> > On 04.09.2005 [21:26:16 +0100], Russell King wrote:
> > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > I've got a few ideas that I think might help push Con's patch coalescing
> > > > efforts in an arch-independent fashion.
> > > 
> > > Note that ARM contains cleanups on top of Tony's original work, on
> > > which the x86 version is based.
> > > 
> > > Basically, Tony submitted his ARM version, we discussed it, fixed up
> > > some locking problems and simplified it (it contained multiple
> > > structures which weren't necessary, even in multiple timer-based systems).
> > 
> > Make sense. Thanks for the quick feedback!
> > 
> > > I'd be really surprised if any architecture couldn't use what ARM has
> > > today - in other words, this is the only kernel-side interface:
> > > 
> > > #ifdef CONFIG_NO_IDLE_HZ
> > > 
> > > #define DYN_TICK_SKIPPING       (1 << 2)
> > > #define DYN_TICK_ENABLED        (1 << 1)
> > > #define DYN_TICK_SUITABLE       (1 << 0)
> > > 
> > > struct dyn_tick_timer {
> > >         unsigned int    state;                  /* Current state */
> > >         int             (*enable)(void);        /* Enables dynamic tick */
> > >         int             (*disable)(void);       /* Disables dynamic tick */
> > >         void            (*reprogram)(unsigned long); /* Reprograms the timer */
> > >         int             (*handler)(int, void *, struct pt_regs *);
> > > };
> > > 
> > > void timer_dyn_reprogram(void);
> > > #else
> > > #define timer_dyn_reprogram()   do { } while (0)
> > > #endif
> > 
> > That looks great! So I guess I'm just suggesting moving this from
> > include/asm-arch/mach/time.h to arch-independent headers? Perhaps
> > timer.h is the best place for now, as it already contains the
> > next_timer_interrupt() prototype (which probably should be in the #ifdef
> > with timer_dyn_reprogram()).
> 
> Yes, the above should be enough on all platforms. I believe x86 still uses
> two structs, and should be updated to use the interface above. There are
> some extra state flags on x86, but even some of those might be
> unnecessary now.

Yes, I agree.

> It may not be obvious from the mailing list discussions, but really the
> remaining problems are to fix the x86 legacy issues with all the timers,
> not with the interface.

The interface in x86 is fine, I agree. But the problem I see, is that we
would now have *3* different implementations of dyn-tick. At the point
where Con or anyone else is ready to propose merging of the code, I
think the dyn-tick work comes across much better if it simultaneously
unifies the existing NO_IDLE_HZ implementations in common files where
appropriate.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  8:00                           ` Russell King
@ 2005-09-05 16:33                             ` Nishanth Aravamudan
  0 siblings, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 16:33 UTC (permalink / raw)
  To: Srivatsa Vaddagiri, Con Kolivas, linux-kernel, akpm, ck list

On 05.09.2005 [09:00:28 +0100], Russell King wrote:
> On Mon, Sep 05, 2005 at 01:19:28PM +0530, Srivatsa Vaddagiri wrote:
> > > Despite that, the timers as implemented on the hardware are not
> > > suitable for dyntick use - attempting to use them, you lose long
> > > term precision of the timer interrupts.
> > 
> > Thats one of the problems I am seeing on x86 as well. Recovering
> > wall-time precisely after sleep is tough esepcially if the interrupt
> > source (PIT) and backing-time source (TSC/PM Timer/HPET) can drift
> > wrt each other. PPC64 should be much better I hope (which is what I
> > intend to take up next).
> 
> This is why the config option to enable it on ARM has a warning in
> there about it.  Some hardware timer implementations just aren't
> suitable for this, so users should be warned about it (and are on
> ARM.)

And this is where almost all of the bugs are going to come from in the
x86 implementation. John Stultz's rework helps remove some of the
interrupt dependency of the timeofday code, but he's reworking it now.

> > Tony was using it to signal that all CPUs are idle and timer are
> > being skipped. With the SMP changes I made, I felt it can be
> > substituted with the nohz_cpu_mask bitmap and hence I removed
> > it.
> 
> Well, consider that definition removed from ARM.  Forget it was even
> saw it in there. 8)

Yes, the cpu_mask covers the same concept, I think it's a good choice.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:00                   ` Srivatsa Vaddagiri
                                       ` (2 preceding siblings ...)
  2005-09-05 13:19                     ` Srivatsa Vaddagiri
@ 2005-09-05 16:57                     ` Nishanth Aravamudan
  2005-09-05 17:25                       ` Srivatsa Vaddagiri
  3 siblings, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 16:57 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Con Kolivas, Russell King, linux-kernel, akpm, ck list, johnstul

On 05.09.2005 [12:30:53 +0530], Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > First of all, and maybe this is just me, I think it would be good to
> > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> 
> Nish, may be a good idea as it may make the code more cleaner (it will
> remove the 'if (cpu_has_local_apic())' kind of code that is there
> currently in x86).

Yes, exactly. I think those kind of interrupt-source specific code can
be handled by the interrupt-source :)

> However note that ARM currently has 'handler' member also part of it,
> which is used to recover time and that has nothing to do with
> interrupt source. Unless there is something like John's TOD, we still
> need to recover time in a arch-dependent fashion ..Where do you
> propose to have that 'handler' member?

I think it's ok where it is. Currently, with x86, at least, you can have
an independent interrupt source and time source (not true for all archs,
of course, ppc64 being a good example, I think?) Perhaps "handler"
should be called arch_recover_time() and may point to a timesource
function (currently PIT/TSC/ACPI_PM/HPET on x86, right?) which does the
appropriate catch-up for the time-related variables. In any case, since
most of the timesource code is lost-tick aware, I think it is possible.

> > Thus, for x86, we would have a dyn_tick_timer structure for the PIT,
> > APIC, ACPI PM-timer and the HPET. These structures could be put in
> 
> Does the ACPI PM-timer support generating interrupts also? Same question
> I have for HPET.

I think, as you figured out, the HPET can, but the ACPI_PM can not. John
might know for sure (I always end up asking him), have added him to the
Cc.

> 
> > arch-specific timer.c files (there currently is not one for x86, I
> > believe). Then, at compilation time, the appropriate structure would be
> > linked to the arch-generic code. That should make the arch-independent
> 
> I think this binding has to be done at run-time, instead of
> compile-time?  (since we may build in support for local APIC but not
> find one at run-time, in which case we have to fall back on PIT as
> interrupt source).

What may be useful is something similar to what John Stultz does in his
rework, attaching priorities to the various interrupt sources. For
example, on x86, if we have an HPET, then we should use it, if not, then
use APIC and PIT, but if the APIC doesn't exist in h/w, or is buggy
(perhaps determined via a calibration loop), then only use the PIT.

> > code simple to implement (I do have some patches in the works, but it's
> > slow going, right now, sorry). I think ARM and s390 could perhaps use
> > this infrastructure as well?
> > 
> > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > is true of s390 and the x86 code, and I believe it is true of the ARM
> > code? If it were dynamic-tick, I would think we would be adjusting the
> > timer interrupt frequency continuously (e.g., at the end of
> > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > but it's mostly code churn :)
> 
> Yes, the name 'dynamic-tick' is misleading!

Especially, to me, as the .config option is NO_IDLE_HZ. I prefer
referring to everything as interrupt_source or something similar, I
think (after looking more at the code), then it doesn't matter whether
it is being used for (what is technically) NO_IDLE_HZ or dynamic-tick.

> > Con, wrt to the x86 implementation, I think the max_skip field should be
> > a member of the interrupt source (dyn_tick_timer) structure, as opposed
> > to the state. This would require dyn_tick_reprogram_timer() to change
> 
> max_skip is dictated by two things - interrupt and the backing time source.
> In case of Local APIC, it may allow for ticks to be skipped upto few tens of
> seconds, but if we are using ACPI PM timer to recover time, then we can
> really skip not more than what the 24-bit PM timer allows for recovering time.
> (few seconds if I remember correctly). Do you agree?

I agree. I guess max_skip, to me, is what the kernel thinks the
interrupt source should maximally skip by, not what the interrupt source
thinks it can do. So, I think it fits in fine with what you are saying
and with the code you have in the current patch.

> > Also, what exactly the purpose of conditional_run_local_timers()? It
> > seems identical to run_local_timers(), except you check for
> > inequality before potentially raising the softirq. It seems like the
> > conditional check in run_timer_softirq() [the TIMER_SOFTIRQ
> > callback] will achieve the same thing? And, in fact, I think that
> > conditional is always true?
> 
> Nish, that was just an optimization for not raising the softirq at all
> if the CPU was woken up w/o having skipped any ticks (becasue of some
> external interrupt).

I was just wondering; I guess it makes sense, but did you check to see
if it ever *doesn't* get called? Like I said, __run_timers() [from how I
understand it], will always increment base->timer_jiffies to one more
than jiffies. So if we disable interrupts and come right back, that
conditional is still true, but time_after_eq(jiffies,
base->timer_jiffies) [the condition in run_timer_softirq()] is not. How
much does it cost to raise the softirq, if it is going to return
immediately from the callback?

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:27                     ` Tony Lindgren
@ 2005-09-05 17:02                       ` Nishanth Aravamudan
  2005-09-07  7:37                         ` Tony Lindgren
  0 siblings, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 17:02 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Srivatsa Vaddagiri, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list

On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> * Srivatsa Vaddagiri <vatsa@in.ibm.com> [050905 10:03]:
> > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > 
> > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > timer interrupt frequency continuously (e.g., at the end of
> > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > but it's mostly code churn :)
> > 
> > Yes, the name 'dynamic-tick' is misleading!
> 
> Huh? For most people dynamic-tick is much more descriptive name than
> NO_IDLE_HZ or VST!

I understand this. My point is that the structures are *not*
dynamic-tick specific. They are interrupt source specific, generally
(also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
users of the interrupt source reprogramming functions, but not the
reprogrammers themselves, in my mind. Also, it still would be confusing
to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)

> If you wanted, you could reprogram the next timer to happen from
> {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.

I messed with this with my soft-timer rework (which has since has fallen
by the wayside). It is a bit of overhead, especially del_timer(), but
it's possible. This is what I would consider "dynamic-tick." And I would
setup a *different* .config option to enable it. Perhaps depending on
CONFIG_NO_IDLE_HZ.

> And you would want to do that if you wanted sub-jiffie timer
> interrupts.

Yes, true, it does enable that. Well, to be honest, it completely
redefines (in some sense) the jiffy, as it is potentially continuously
changing, not just at idle times.

> So I'd rather not limit the name to the currently implemented
> functionality only :)

I'm not trying to limit the name, but make sure we are tying the
strcutures and functions to the right abstraction (interrupt source, in
my opinion).

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  7:44                     ` Russell King
  2005-09-05  8:19                       ` Srivatsa Vaddagiri
@ 2005-09-05 17:04                       ` Nishanth Aravamudan
  2005-09-05 17:27                         ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 17:04 UTC (permalink / raw)
  To: Srivatsa Vaddagiri, Con Kolivas, linux-kernel, akpm, ck list

On 05.09.2005 [08:44:25 +0100], Russell King wrote:
> On Mon, Sep 05, 2005 at 12:30:53PM +0530, Srivatsa Vaddagiri wrote:
> > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > First of all, and maybe this is just me, I think it would be good to
> > > make the dyn_tick_timer per-interrupt source, as opposed to each arch?
> > 
> > Nish, may be a good idea as it may make the code more cleaner (it will
> > remove the 'if (cpu_has_local_apic())' kind of code that is there
> > currently in x86). However note that ARM currently has 'handler' member also 
> > part of it, which is used to recover time and that has nothing to do with 
> > interrupt source. Unless there is something like John's TOD, we still
> > need to recover time in a arch-dependent fashion ..Where do you
> > propose to have that 'handler' member?
> 
> Exactly where it is.  It's there because of the problem you allude to
> above - it's there to catch up system time.  Any generic code can't
> answer the question "how much time has passed since we disabled the
> timer" without additional information.

I agree.

> However, we could change "handler" to be a function pointer which
> returns the number of missed ticks instead, and then updates the
> kernels time and tick keeping.  That would probably be more efficient.

Yes, I think

unsigned long (*recover_time)(int, void *, struct pt_regs *);

or something similar (not sure about the params), might be more
appropriate.

Thanks,
Nish


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  8:32                         ` Russell King
  2005-09-05  9:24                           ` Srivatsa Vaddagiri
@ 2005-09-05 17:06                           ` Nishanth Aravamudan
  1 sibling, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 17:06 UTC (permalink / raw)
  To: Srivatsa Vaddagiri, Con Kolivas, linux-kernel, akpm, ck list,
	johnstul

On 05.09.2005 [09:32:21 +0100], Russell King wrote:
> On Mon, Sep 05, 2005 at 01:49:35PM +0530, Srivatsa Vaddagiri wrote:
> > This is precisely what I have done. I have made cur_timer->mark-offset() to 
> > return the lost ticks and update wall-time from the callee, which
> > can be either timer_interrupt handler or in dyn-tick case the dyn-tick
> > code (I have called it dyn_tick_interrupt) which is called before processing 
> > _any_ interrupt.
> 
> When you have a timer which constantly increments from 0 to MAX and
> wraps, and you can set the value to match to cause an interrupt,
> it makes more sense to handle it the way we're doing it (which
> incidentally leads to no loss of precision.)

This is the way ppc works, I believe (match register).

> Calculating the number of ticks missed, updating the kernel time,
> and updating the timer match will cause problems with these - if
> the timer has already past the number of ticks you originally
> calculated, you may not get another interrupt for a long time.

Yes, this is the source of much bugginess, especially with bad hardware
:)

> > If ARM had a timer_opts equivalent we could have followed 
> 
> I think your timer_opts is effectively our struct sys_timer.

I agree, in looking over the two. Perhaps those structures could be
served to be unified as well? John Stultz would be the one to talk to,
though.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05 16:57                     ` Nishanth Aravamudan
@ 2005-09-05 17:25                       ` Srivatsa Vaddagiri
  2005-09-05 18:11                         ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05 17:25 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Con Kolivas, Russell King, linux-kernel, akpm, ck list, johnstul

On Mon, Sep 05, 2005 at 09:57:30AM -0700, Nishanth Aravamudan wrote:
> I think it's ok where it is. Currently, with x86, at least, you can have
> an independent interrupt source and time source (not true for all archs,
> of course, ppc64 being a good example, I think?) Perhaps "handler"

By "independent" do you mean driven by separate clocks? PPC64 does
use decrementer as its interrupt source and Time-base-register as 
its timesource AFAIK. Both are driven by the same clock I think.

> What may be useful is something similar to what John Stultz does in his
> rework, attaching priorities to the various interrupt sources. For
> example, on x86, if we have an HPET, then we should use it, if not, then
> use APIC and PIT, but if the APIC doesn't exist in h/w, or is buggy
> (perhaps determined via a calibration loop), then only use the PIT.

This logic is what the arch-code should follow in picking its interrupt
source and is independent of dyn-tick. dyn-tick just works with whatever 
arch-code has chosen as its interrupt source.

> I agree. I guess max_skip, to me, is what the kernel thinks the
> interrupt source should maximally skip by, not what the interrupt source
> thinks it can do. So, I think it fits in fine with what you are saying
> and with the code you have in the current patch.

Great!

> I was just wondering; I guess it makes sense, but did you check to see
> if it ever *doesn't* get called? Like I said, __run_timers() [from how I

Haven't tested that, but I feel can happen in practice, since we dont
control device interrupts.

> base->timer_jiffies) [the condition in run_timer_softirq()] is not. How
> much does it cost to raise the softirq, if it is going to return
> immediately from the callback?

Don't know. It just felt nice to avoid any unnecessary invocations.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05 17:04                       ` Nishanth Aravamudan
@ 2005-09-05 17:27                         ` Srivatsa Vaddagiri
  2005-09-05 18:06                           ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-05 17:27 UTC (permalink / raw)
  To: Nishanth Aravamudan; +Cc: Con Kolivas, linux-kernel, akpm, ck list

On Mon, Sep 05, 2005 at 10:04:24AM -0700, Nishanth Aravamudan wrote:
> > However, we could change "handler" to be a function pointer which
> > returns the number of missed ticks instead, and then updates the
> > kernels time and tick keeping.  That would probably be more efficient.
> 
> Yes, I think
> 
> unsigned long (*recover_time)(int, void *, struct pt_regs *);
> 
> or something similar (not sure about the params), might be more
> appropriate.

What would this be for x86? This could be cur_timer->mark_offset()
itself for now i think, until John's TOD comes along.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05 17:27                         ` Srivatsa Vaddagiri
@ 2005-09-05 18:06                           ` Nishanth Aravamudan
  0 siblings, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 18:06 UTC (permalink / raw)
  To: Srivatsa Vaddagiri; +Cc: Con Kolivas, linux-kernel, akpm, ck list

On 05.09.2005 [22:57:14 +0530], Srivatsa Vaddagiri wrote:
> On Mon, Sep 05, 2005 at 10:04:24AM -0700, Nishanth Aravamudan wrote:
> > > However, we could change "handler" to be a function pointer which
> > > returns the number of missed ticks instead, and then updates the
> > > kernels time and tick keeping.  That would probably be more efficient.
> > 
> > Yes, I think
> > 
> > unsigned long (*recover_time)(int, void *, struct pt_regs *);
> > 
> > or something similar (not sure about the params), might be more
> > appropriate.
> 
> What would this be for x86? This could be cur_timer->mark_offset()
> itself for now i think, until John's TOD comes along.

Yes, exactly, I was planning on hooking into the timer_opts for x86,
until John's timesource rework occured, which will keep the code pretty
similar across the change, but helps keep it clear *why* we are calling
mark_offset(), at least to me.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05 17:25                       ` Srivatsa Vaddagiri
@ 2005-09-05 18:11                         ` Nishanth Aravamudan
  0 siblings, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-05 18:11 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Con Kolivas, Russell King, linux-kernel, akpm, ck list, johnstul

On 05.09.2005 [22:55:01 +0530], Srivatsa Vaddagiri wrote:
> On Mon, Sep 05, 2005 at 09:57:30AM -0700, Nishanth Aravamudan wrote:
> > I think it's ok where it is. Currently, with x86, at least, you can have
> > an independent interrupt source and time source (not true for all archs,
> > of course, ppc64 being a good example, I think?) Perhaps "handler"
> 
> By "independent" do you mean driven by separate clocks? PPC64 does
> use decrementer as its interrupt source and Time-base-register as 
> its timesource AFAIK. Both are driven by the same clock I think.

Well, independent as in not the same, I meant. Let me think about it and
look at the code a bit, before making myself or anyone else more
confused. John, do you have any input on what I'm getting at? I know we
have discussed this before...

> > What may be useful is something similar to what John Stultz does in his
> > rework, attaching priorities to the various interrupt sources. For
> > example, on x86, if we have an HPET, then we should use it, if not, then
> > use APIC and PIT, but if the APIC doesn't exist in h/w, or is buggy
> > (perhaps determined via a calibration loop), then only use the PIT.
> 
> This logic is what the arch-code should follow in picking its interrupt
> source and is independent of dyn-tick. dyn-tick just works with whatever 
> arch-code has chosen as its interrupt source.

Yes, true. I didn't mean for the h/w interrupt source selection to be
part of the arch-independent code, but that we might need to include a
priority field in the interrupt_source structure to allow the
arch-dependent code to do so.

> > I agree. I guess max_skip, to me, is what the kernel thinks the
> > interrupt source should maximally skip by, not what the interrupt source
> > thinks it can do. So, I think it fits in fine with what you are saying
> > and with the code you have in the current patch.
> 
> Great!
> 
> > I was just wondering; I guess it makes sense, but did you check to see
> > if it ever *doesn't* get called? Like I said, __run_timers() [from how I
> 
> Haven't tested that, but I feel can happen in practice, since we dont
> control device interrupts.

Well, it would be interesting to see if there's any difference without
that function, or if it's even getting called.

> > base->timer_jiffies) [the condition in run_timer_softirq()] is not. How
> > much does it cost to raise the softirq, if it is going to return
> > immediately from the callback?
> 
> Don't know. It just felt nice to avoid any unnecessary invocations.

Yes, but it also might add a function which doesn't need to be. I'll
take a closer look at this too.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-03  4:05   ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Lee Revell
                       ` (2 preceding siblings ...)
  2005-09-03  5:20     ` Srivatsa Vaddagiri
@ 2005-09-06 10:32     ` Pavel Machek
  2005-09-06 10:46       ` Srivatsa Vaddagiri
  2005-09-06 18:04     ` john stultz
  4 siblings, 1 reply; 96+ messages in thread
From: Pavel Machek @ 2005-09-06 10:32 UTC (permalink / raw)
  To: Lee Revell
  Cc: vatsa, linux-kernel, arjan, s0348365, kernel, tytso, cfriesen,
	trenn, george, johnstul, akpm

Hi!

> > With this patch, time had kept up really well on one particular
> > machine (Intel 4way Pentium 3 box) overnight, while
> > on another newer machine (Intel 4way Xeon with HT) it didnt do so
> > well (time sped up after 3 or 4 hours). Hence I consider this
> > particular patch will need more review/work.
> > 
> 
> Are lost ticks really that common?  If so, any idea what's disabling
> interrupts for so long (or if it's a hardware issue)?  And if not, it
> seems like you'd need an artificial way to simulate lost ticks in order
> to test this stuff.

Try running this from userspace (and watch for time going completely
crazy). Try it in mainline, too; it broke even vanilla some time
ago. Need to run as root. 

								Pavel

void
main(void)
{
        int i;
        iopl(3);
        while (1) {
                asm volatile("cli");
                //              for (i=0; i<20000000; i++)
                for (i=0; i<1000000000; i++)
                        asm volatile("");
                asm volatile("sti");
                sleep(1);
        }
}


-- 
if you have sharp zaurus hardware you don't need... you know my address

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-06 10:32     ` Pavel Machek
@ 2005-09-06 10:46       ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-06 10:46 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Lee Revell, linux-kernel, arjan, s0348365, kernel, tytso,
	cfriesen, trenn, george, johnstul, akpm

On Tue, Sep 06, 2005 at 12:32:32PM +0200, Pavel Machek wrote:
> Try running this from userspace (and watch for time going completely
> crazy). Try it in mainline, too; it broke even vanilla some time
> ago. Need to run as root. 

Note that kernel relies on some backing time source (like TSC/PM)
to recover lost ticks (& time). And these backing time source have 
their own limitation on how many max lost ticks you can recover,
which in turn means how long you can have interrupts blocked.
In case of TSC, since only 32-bit previous snapshot is maintained (in x86
atleast) it allows for ticks to be lost only upto a second (if I remember
correctly), while the 24-bit ACPI PM timer allows for upto 3-4
seconds. 

I found that the while loop below takes 3.66 seconds running
on a 1.8GHz P4 CPU. That may be too much if kernel is using
(32-bit snapshot of) TSC to recover ticks, while maybe just
at the max limit allowed for ACPI PM timer.

I will test this code with the lost-tick recovery fixes
for ACPI PM timer that I sent out and let you know
how it performs!

>                 for (i=0; i<1000000000; i++)
>                         asm volatile("");

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c
  2005-09-03  4:05   ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Lee Revell
                       ` (3 preceding siblings ...)
  2005-09-06 10:32     ` Pavel Machek
@ 2005-09-06 18:04     ` john stultz
  4 siblings, 0 replies; 96+ messages in thread
From: john stultz @ 2005-09-06 18:04 UTC (permalink / raw)
  To: Lee Revell
  Cc: vatsa, linux-kernel, arjan, s0348365, kernel, tytso, cfriesen,
	trenn, george, akpm

On Sat, 2005-09-03 at 00:05 -0400, Lee Revell wrote:
> On Wed, 2005-08-31 at 22:42 +0530, Srivatsa Vaddagiri wrote:
> > With this patch, time had kept up really well on one particular
> > machine (Intel 4way Pentium 3 box) overnight, while
> > on another newer machine (Intel 4way Xeon with HT) it didnt do so
> > well (time sped up after 3 or 4 hours). Hence I consider this
> > particular patch will need more review/work.
> > 
> 
> Are lost ticks really that common?  If so, any idea what's disabling
> interrupts for so long (or if it's a hardware issue)?  And if not, it
> seems like you'd need an artificial way to simulate lost ticks in order
> to test this stuff.

Pavel came up with a pretty good test for this awhile back.

http://marc.theaimsgroup.com/?l=linux-kernel&m=110519095425851&w=2

Adding:
	unsigned long mask = 0x1;
	sched_setaffinity(0, sizeof(mask), &mask);

to the top helps it work on SMP systems.

thanks
-john


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  6:44                           ` Nishanth Aravamudan
@ 2005-09-06 20:51                             ` Nishanth Aravamudan
  2005-09-07  8:13                               ` Tony Lindgren
  0 siblings, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-06 20:51 UTC (permalink / raw)
  To: Srivatsa Vaddagiri; +Cc: Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On 04.09.2005 [23:44:16 -0700], Nishanth Aravamudan wrote:
> On 05.09.2005 [12:02:29 +0530], Srivatsa Vaddagiri wrote:
> > On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> > > Admittedly, I don't think SMP ARM has been around all that long?
> > > Maybe the existing code just has not been extended.
> > 
> > Yeah, maybe ARM never cared for SMP. But we do care :)
> 
> I just took a look at arm/Kconfig and SMP is marked as EXPERIMENTAL &&
> BROKEN. So I'm guessing that is the only reason for some of the
> differences you mentioned (the differences are of course, valid and the
> x86 SMP implementation makes sense to me to extend arch-independently).
> 
> > > I'm not sure on this. It's going to be NULL for other architectures,
> > > or end up being called by the reprogram() call for the last CPU to
> > > go idle, right (presuming there isn't a separate TOD source, like in
> > > x86). I think it is better to be in the reprogram() interface.
> > 
> > Non-x86 could have it set to NULL, in which case it doesn't get
> > called.  (I know the current code does not take care of this
> > situation).  But having an explicit 'all_cpus_idle' interface may be
> > good, since Tony talked of idling some devices when all CPUs are idle.
> > So it probably has non-x86/PIT uses too.
> 
> OK, not a problem. I'll try and write up a general intsource.h file
> (interrupt source header) tonight and tomorrow and send it to this list
> to see if everybody agrees on what's in the structure and where the
> arch-independent/dependent line lies.

Sigh, later than I had hoped, but here is what I have hashed out so far.
Does it seem like a step in the right direction? Rather hand-wavy, but I
think it's mostly correct ;)

Thanks,
Nish


- include/linux/intsource.h
	with definitions in kernel/intsource.c

#define DYN_TICK_ENABLED	(1 << 1)
#define DYN_TICK_SUITABLE	(1 << 0)

#define DYN_TICK_MIN_SKIP	2

/* Abstraction of an interrupt source
 * @state: current state
 * @max_skip: current maximum number of ticks to skip
 * @arch_init: initialization routine
 * @arch_enable_dyn_tick: called via sysfs to enable interrupt skipping
 * @arch_disable_dyn_tick: called via sysfs to disable interrupt
 * 				skipping
 * @arch_set_all_cpus_idle: last cpu to go idle calls this, which should
 * 				disable any timesource (e.g. PIT on x86)
 * @arch_recover_time: handler for returning from skipped ticks and keeping
 * 				time consistent
 */
struct interrupt_source {
	unsigned int state;
	unsigned long max_skip;
	int (*arch_init) (void);
	void (*arch_enable_dyn_tick) (void);
	void (*arch_disable_dyn_tick) (void);
	unsigned long (*arch_reprogram) (unsigned long); /* return number of ticks skipped */
	unsigned long (*arch_recover_time) (int, void *, struct pt_regs *); /* handler in arm */
	/* following empty in UP */
	void (*arch_set_all_cpus_idle) (int);
	spinlock_t lock;
};

extern void interrupt_source_register(struct interrupt_source *new_interrupt_source);
extern struct interrupt_source *current_intsource;

#ifdef CONFIG_NO_IDLE_HZ
extern void set_interrupt_max_skip(unsigned long max_skip);
/* idle_reprogram_interrupt calls reprogram_interrupt calls current_intsource->arch_reprogram()
 * do we really need the first step? */
extern void idle_reprogram_interrupt(void);
/* return number of ticks skipped, potentially for accounting purposes? */
extern unsigned long reprogram_interrupt(void);

extern struct interrupt_source * __init arch_select_interrupt_source(void);
extern void __init dyn_tick_init(void); /* calls select_interrupt_source(), verifies source is usable, then calls interrupt_source_register() */

static inline int dyn_tick_enabled(void)
{
	return (current_intsource->state & DYN_TICK_ENABLED);
}

#else	/* CONFIG_NO_IDLE_HZ */
static inline void set_interrupt_max_skip(unsigned long max_skip)
{
}

static inline void idle_reprogram_interrupt(void)
{
}

static inline unsigned long reprogram_interrupt(void)
{
	return 0;
}

static inline void dyn_tick_init(void)
{
}

static inline int dyn_tick_enabled(void)
{
	return 0;
}
#endif	/* CONFIG_NO_IDLE_HZ */

/* Pick up arch specific header */
#include <asm/intsource.h>

#endif	/* _DYN_TICK_TIMER_H */

- sched.c / sched.h
	/* do we want these elsewhere? */
	cpumask_t no_idle_hz_cpumask;

* each arch-specific file pair needs to provide:
	arch_select_interrupt_source();
	appropriate struct interrupt_source definitions, functions, etc.

- include/asm-i386/intsource.h
	with defines in arch/i386/intsource.c

- include/asm-arm/arch-omap/intsource.h
	with definitions in arch/arm/mach-omap/intsource.c

- include/asm-s390/intsource.h
	with definitions in arch/s390/intsource.c

- include/asm-generic/intsource.h
	do I need something here?

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05 17:02                       ` Nishanth Aravamudan
@ 2005-09-07  7:37                         ` Tony Lindgren
  2005-09-07 15:05                           ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Tony Lindgren @ 2005-09-07  7:37 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Srivatsa Vaddagiri, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list

* Nishanth Aravamudan <nacc@us.ibm.com> [050905 20:02]:
> On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > * Srivatsa Vaddagiri <vatsa@in.ibm.com> [050905 10:03]:
> > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > 
> > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > timer interrupt frequency continuously (e.g., at the end of
> > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > but it's mostly code churn :)
> > > 
> > > Yes, the name 'dynamic-tick' is misleading!
> > 
> > Huh? For most people dynamic-tick is much more descriptive name than
> > NO_IDLE_HZ or VST!
> 
> I understand this. My point is that the structures are *not*
> dynamic-tick specific. They are interrupt source specific, generally
> (also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
> users of the interrupt source reprogramming functions, but not the
> reprogrammers themselves, in my mind. Also, it still would be confusing
> to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)

I see what you mean, it's a confusing naming issue currently :) Would
the following solution work for you:

- Dynamic tick is the structure you register with, and then you use it
  for any kind of non-continuous timer tinkering 

- This structure has at least two possible users, NO_IDLE_HZ and
  sub-jiffie timers

So we could have following config options:

CONFIG_DYNTICK
CONFIG_NO_IDLE_HZ	depends on dyntick
CONFIG_SUBJIFFIE_TIMER	depends on dyntick
 
> > If you wanted, you could reprogram the next timer to happen from
> > {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.
> 
> I messed with this with my soft-timer rework (which has since has fallen
> by the wayside). It is a bit of overhead, especially del_timer(), but
> it's possible. This is what I would consider "dynamic-tick." And I would
> setup a *different* .config option to enable it. Perhaps depending on
> CONFIG_NO_IDLE_HZ.

Yes, I agree it should be a different .config option. Maybe the example
above would work for that?
 
> > And you would want to do that if you wanted sub-jiffie timer
> > interrupts.
> 
> Yes, true, it does enable that. Well, to be honest, it completely
> redefines (in some sense) the jiffy, as it is potentially continuously
> changing, not just at idle times.

Yeah. But should still work as we already accept interrupts at any point
inbetween jiffies to update time, and update the system time from a
second continuously running timer :)
 
> > So I'd rather not limit the name to the currently implemented
> > functionality only :)
> 
> I'm not trying to limit the name, but make sure we are tying the
> strcutures and functions to the right abstraction (interrupt source, in
> my opinion).

But other devices are interrupt sources too... And really the only use
for this stuct is non-continuous timer stuff, right?

Tony

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-06 20:51                             ` Nishanth Aravamudan
@ 2005-09-07  8:13                               ` Tony Lindgren
  2005-09-07 15:00                                 ` Nishanth Aravamudan
  2005-09-07 15:53                                 ` Nishanth Aravamudan
  0 siblings, 2 replies; 96+ messages in thread
From: Tony Lindgren @ 2005-09-07  8:13 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Srivatsa Vaddagiri, Con Kolivas, linux-kernel, akpm, ck list,
	rmk+lkml

* Nishanth Aravamudan <nacc@us.ibm.com> [050906 23:55]:

...

> Sigh, later than I had hoped, but here is what I have hashed out so far.
> Does it seem like a step in the right direction? Rather hand-wavy, but I
> think it's mostly correct ;)

Some comments below.

> - include/linux/intsource.h
> 	with definitions in kernel/intsource.c
> 
> #define DYN_TICK_ENABLED	(1 << 1)
> #define DYN_TICK_SUITABLE	(1 << 0)
> 
> #define DYN_TICK_MIN_SKIP	2
> 
> /* Abstraction of an interrupt source
>  * @state: current state
>  * @max_skip: current maximum number of ticks to skip
>  * @arch_init: initialization routine
>  * @arch_enable_dyn_tick: called via sysfs to enable interrupt skipping
>  * @arch_disable_dyn_tick: called via sysfs to disable interrupt
>  * 				skipping
>  * @arch_set_all_cpus_idle: last cpu to go idle calls this, which should
>  * 				disable any timesource (e.g. PIT on x86)
>  * @arch_recover_time: handler for returning from skipped ticks and keeping
>  * 				time consistent
>  */
> struct interrupt_source {
> 	unsigned int state;
> 	unsigned long max_skip;
> 	int (*arch_init) (void);
> 	void (*arch_enable_dyn_tick) (void);
> 	void (*arch_disable_dyn_tick) (void);
> 	unsigned long (*arch_reprogram) (unsigned long); /* return number of ticks skipped */
> 	unsigned long (*arch_recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> 	/* following empty in UP */
> 	void (*arch_set_all_cpus_idle) (int);
> 	spinlock_t lock;
> };

I would still call the struct dyntick, have CONFIG_DYNTICK, and then have
CONFIG_NO_IDLE_HZ and possibly CONFIG_SUBJIFFIE_TIMER register to use it
like I said in my earlier mail. Would that solve the issues you have
with the naming?

> /* return number of ticks skipped, potentially for accounting purposes? */
> extern unsigned long reprogram_interrupt(void);

The number of ticks skipped can be potentially used in idle loops to
select which ACPI C state to go to depending on the estimated length of
sleep.

Regards,
 
Tony

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07  8:13                               ` Tony Lindgren
@ 2005-09-07 15:00                                 ` Nishanth Aravamudan
  2005-09-07 15:53                                 ` Nishanth Aravamudan
  1 sibling, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-07 15:00 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Srivatsa Vaddagiri, Con Kolivas, linux-kernel, akpm, ck list,
	rmk+lkml

On 07.09.2005 [11:13:04 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <nacc@us.ibm.com> [050906 23:55]:
> 
> ...
> 
> > Sigh, later than I had hoped, but here is what I have hashed out so far.
> > Does it seem like a step in the right direction? Rather hand-wavy, but I
> > think it's mostly correct ;)
> 
> Some comments below.

Thanks, Tony!

> > - include/linux/intsource.h
> > 	with definitions in kernel/intsource.c
> > 
> > #define DYN_TICK_ENABLED	(1 << 1)
> > #define DYN_TICK_SUITABLE	(1 << 0)
> > 
> > #define DYN_TICK_MIN_SKIP	2
> > 
> > /* Abstraction of an interrupt source
> >  * @state: current state
> >  * @max_skip: current maximum number of ticks to skip
> >  * @arch_init: initialization routine
> >  * @arch_enable_dyn_tick: called via sysfs to enable interrupt skipping
> >  * @arch_disable_dyn_tick: called via sysfs to disable interrupt
> >  * 				skipping
> >  * @arch_set_all_cpus_idle: last cpu to go idle calls this, which should
> >  * 				disable any timesource (e.g. PIT on x86)
> >  * @arch_recover_time: handler for returning from skipped ticks and keeping
> >  * 				time consistent
> >  */
> > struct interrupt_source {
> > 	unsigned int state;
> > 	unsigned long max_skip;
> > 	int (*arch_init) (void);
> > 	void (*arch_enable_dyn_tick) (void);
> > 	void (*arch_disable_dyn_tick) (void);
> > 	unsigned long (*arch_reprogram) (unsigned long); /* return number of ticks skipped */
> > 	unsigned long (*arch_recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > 	/* following empty in UP */
> > 	void (*arch_set_all_cpus_idle) (int);
> > 	spinlock_t lock;
> > };
> 
> I would still call the struct dyntick, have CONFIG_DYNTICK, and then have
> CONFIG_NO_IDLE_HZ and possibly CONFIG_SUBJIFFIE_TIMER register to use it
> like I said in my earlier mail. Would that solve the issues you have
> with the naming?

I'll respond more fully there, but I think it might. If that's the case,
though, I think I'll just push all of the code down into timer.c and
timer.h, no need for a separate file, really. I'll mull it over, see
what the others think as well...

> > /* return number of ticks skipped, potentially for accounting purposes? */
> > extern unsigned long reprogram_interrupt(void);
> 
> The number of ticks skipped can be potentially used in idle loops to
> select which ACPI C state to go to depending on the estimated length of
> sleep.

Ah true!

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07  7:37                         ` Tony Lindgren
@ 2005-09-07 15:05                           ` Nishanth Aravamudan
  2005-09-08 10:00                             ` Tony Lindgren
  0 siblings, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-07 15:05 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Srivatsa Vaddagiri, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list

On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <nacc@us.ibm.com> [050905 20:02]:
> > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > * Srivatsa Vaddagiri <vatsa@in.ibm.com> [050905 10:03]:
> > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > 
> > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > but it's mostly code churn :)
> > > > 
> > > > Yes, the name 'dynamic-tick' is misleading!
> > > 
> > > Huh? For most people dynamic-tick is much more descriptive name than
> > > NO_IDLE_HZ or VST!
> > 
> > I understand this. My point is that the structures are *not*
> > dynamic-tick specific. They are interrupt source specific, generally
> > (also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
> > users of the interrupt source reprogramming functions, but not the
> > reprogrammers themselves, in my mind. Also, it still would be confusing
> > to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)
> 
> I see what you mean, it's a confusing naming issue currently :) Would
> the following solution work for you:
> 
> - Dynamic tick is the structure you register with, and then you use it
>   for any kind of non-continuous timer tinkering 
> 
> - This structure has at least two possible users, NO_IDLE_HZ and
>   sub-jiffie timers
> 
> So we could have following config options:
> 
> CONFIG_DYNTICK
> CONFIG_NO_IDLE_HZ	depends on dyntick
> CONFIG_SUBJIFFIE_TIMER	depends on dyntick

Hrm, yes, first you are right with the dependency ordering. I take it
CONFIG_DYNTICK is simply there as NO_IDLE_HZ and SUBJIFFIE_TIMER are
independent users of the same underlying infrastructure.

> > > If you wanted, you could reprogram the next timer to happen from
> > > {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.
> > 
> > I messed with this with my soft-timer rework (which has since has fallen
> > by the wayside). It is a bit of overhead, especially del_timer(), but
> > it's possible. This is what I would consider "dynamic-tick." And I would
> > setup a *different* .config option to enable it. Perhaps depending on
> > CONFIG_NO_IDLE_HZ.
> 
> Yes, I agree it should be a different .config option. Maybe the example
> above would work for that?

Yes, I'm thinking it might.

> > > And you would want to do that if you wanted sub-jiffie timer
> > > interrupts.
> > 
> > Yes, true, it does enable that. Well, to be honest, it completely
> > redefines (in some sense) the jiffy, as it is potentially continuously
> > changing, not just at idle times.
> 
> Yeah. But should still work as we already accept interrupts at any point
> inbetween jiffies to update time, and update the system time from a
> second continuously running timer :)

The problem with subjiffie timers is that the precision of soft-timers
is jiffies currently. It requires some serious effort to modify the
soft-timer subsystem to be aware of the extra bits it needs,
efficiently -- take a look at what HRT has had to do.

> > > So I'd rather not limit the name to the currently implemented
> > > functionality only :)
> > 
> > I'm not trying to limit the name, but make sure we are tying the
> > strcutures and functions to the right abstraction (interrupt source, in
> > my opinion).
> 
> But other devices are interrupt sources too... And really the only use
> for this stuct is non-continuous timer stuff, right?

Would "tick_source" be better? I guess you are right, that there is only
this one consumer... Although if that is the case, then maybe a separate
.h/.c file is the right way to go, to isolate the code, reduce
#ifdeffery in timer.h/.c.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07  8:13                               ` Tony Lindgren
  2005-09-07 15:00                                 ` Nishanth Aravamudan
@ 2005-09-07 15:53                                 ` Nishanth Aravamudan
  2005-09-07 17:07                                   ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-07 15:53 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Srivatsa Vaddagiri, Con Kolivas, linux-kernel, akpm, ck list,
	rmk+lkml

On 07.09.2005 [11:13:04 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <nacc@us.ibm.com> [050906 23:55]:
> 
> ...
> 
> > Sigh, later than I had hoped, but here is what I have hashed out so far.
> > Does it seem like a step in the right direction? Rather hand-wavy, but I
> > think it's mostly correct ;)
> 
> Some comments below.

Updated document below.

Thanks,
Nish


- include/linux/timer.h
	with definitions in kernel/timer.c

OR better in
- include/linux/ticksource.h
	with definitions in kernel/ticksource.c?

#define DYN_TICK_ENABLED	(1 << 1)
#define DYN_TICK_SUITABLE	(1 << 0)

#define DYN_TICK_MIN_SKIP	2

/* Abstraction of a tick source
 * @state: current state
 * @max_skip: current maximum number of ticks to skip
 * @init: initialization routine
 * @enable_dyn_tick: called via sysfs to enable interrupt skipping
 * @disable_dyn_tick: called via sysfs to disable interrupt
 * 				skipping
 * @set_all_cpus_idle: last cpu to go idle calls this, which should
 * 				disable any timesource (e.g. PIT on x86)
 * @recover_time: handler for returning from skipped ticks and keeping
 * 				time consistent
 */
struct tick_source {
	unsigned int state;
	unsigned long max_skip;
	int (*init) (void);
	void (*enable_dyn_tick) (void);
	void (*disable_dyn_tick) (void);
	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
	unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
	/* following empty in UP */
	void (*set_all_cpus_idle) (int);
	spinlock_t lock;
};

extern void tick_source_register(struct tick_source *new_tick_source);
extern struct tick_source *current_ticksource;

#ifdef CONFIG_NO_IDLE_HZ /* which means CONFIG_DYNTICK is also on */
extern void set_tick_max_skip(unsigned long max_skip);
/* idle_reprogram_tick calls reprogram_tick calls current_ticksource->reprogram()
 * do we really need the first step? */
extern void idle_reprogram_tick(void);
/* return number of ticks skipped, potentially for accounting purposes? */
extern unsigned long reprogram_tick(void);

extern struct tick_source * __init arch_select_tick_source(void);
extern void __init dyn_tick_init(void); /* calls select_tick_source(), verifies source is usable, then calls tick_source_register() */

static inline int dyn_tick_enabled(void)
{
	return (current_ticksource->state & DYN_TICK_ENABLED);
}

#else	/* CONFIG_NO_IDLE_HZ */
static inline void set_tick_max_skip(unsigned long max_skip)
{
}

static inline void idle_reprogram_tick(void)
{
}

static inline unsigned long reprogram_tick(void)
{
	return 0;
}

static inline void dyn_tick_init(void)
{
}

static inline int dyn_tick_enabled(void)
{
	return 0;
}
#endif	/* CONFIG_NO_IDLE_HZ */

/* Pick up arch specific header */
#include <asm/timer.h> /* or <asm/ticksource.h>, depending */

- sched.c / sched.h
	/* do we want these elsewhere? */
	cpumask_t no_idle_hz_cpumask;

- each arch-specific file pair needs to provide:
	arch_select_tick_source();
	appropriate struct tick_source definitions, functions, etc.

- include/asm-i386/timer.h /* or ticksource.h */
	with defines in arch/i386/timer.c /* or ticksource.c */

- include/asm-arm/arch-omap/timer.h /* or ticksource.h */
	with definitions in arch/arm/mach-omap/timer.c /* or ticksource.c */

- include/asm-s390/timer.h /* or ticksource.h */
	with definitions in arch/s390/timer.c /* or ticksource.c */

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-05  6:32                         ` Srivatsa Vaddagiri
  2005-09-05  6:44                           ` Nishanth Aravamudan
@ 2005-09-07 16:14                           ` Bill Davidsen
  2005-09-07 16:42                             ` Nish Aravamudan
  1 sibling, 1 reply; 96+ messages in thread
From: Bill Davidsen @ 2005-09-07 16:14 UTC (permalink / raw)
  To: vatsa; +Cc: Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

Srivatsa Vaddagiri wrote:
> On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> 
>>Admittedly, I don't think SMP ARM has been around all that long? Maybe
>>the existing code just has not been extended.
> 
> 
> Yeah, maybe ARM never cared for SMP. But we do care :)
> 
> 
>>I'm not sure on this. It's going to be NULL for other architectures, or
>>end up being called by the reprogram() call for the last CPU to go idle,
>>right (presuming there isn't a separate TOD source, like in x86). I
>>think it is better to be in the reprogram() interface.
> 
> 
> Non-x86 could have it set to NULL, in which case it doesn't get called.
> (I know the current code does not take care of this situation).
> But having an explicit 'all_cpus_idle' interface may be good, since 
> Tony talked of idling some devices when all CPUs are idle. So it
> probably has non-x86/PIT uses too.

If this is intended to reduce power, and it originally came from that 
root, then this is the time to put in a hook for transitions to<=>from 
the all-idle state. Various arch may have things other than the PIT 
which should (or at least can) be stopped, and which need to be restarted.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 16:14                           ` Bill Davidsen
@ 2005-09-07 16:42                             ` Nish Aravamudan
  2005-09-07 17:17                               ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 96+ messages in thread
From: Nish Aravamudan @ 2005-09-07 16:42 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: vatsa, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On 9/7/05, Bill Davidsen <davidsen@tmr.com> wrote:
> Srivatsa Vaddagiri wrote:
> > On Sun, Sep 04, 2005 at 10:48:13PM -0700, Nishanth Aravamudan wrote:
> >
> >>Admittedly, I don't think SMP ARM has been around all that long? Maybe
> >>the existing code just has not been extended.
> >
> >
> > Yeah, maybe ARM never cared for SMP. But we do care :)
> >
> >
> >>I'm not sure on this. It's going to be NULL for other architectures, or
> >>end up being called by the reprogram() call for the last CPU to go idle,
> >>right (presuming there isn't a separate TOD source, like in x86). I
> >>think it is better to be in the reprogram() interface.
> >
> >
> > Non-x86 could have it set to NULL, in which case it doesn't get called.
> > (I know the current code does not take care of this situation).
> > But having an explicit 'all_cpus_idle' interface may be good, since
> > Tony talked of idling some devices when all CPUs are idle. So it
> > probably has non-x86/PIT uses too.
> 
> If this is intended to reduce power, and it originally came from that
> root, then this is the time to put in a hook for transitions to<=>from
> the all-idle state. Various arch may have things other than the PIT
> which should (or at least can) be stopped, and which need to be restarted.

Hrm, got dropped from the Cc... :) Yes, the dynamic-tick generic
infrastructure being proposed, with the idle CPU mask and the
set_all_cpus_idle() tick_source hook, would allow exactly this in
arch-specific code.

Is there a generic location where the all-idle state is entered?
Currently, I think we can do it via the generic reprogram() routine
checking the mask and then calling set_all_cpus_idle(), if
appropriate, after reprogramming the last idle CPU.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 15:53                                 ` Nishanth Aravamudan
@ 2005-09-07 17:07                                   ` Srivatsa Vaddagiri
  2005-09-07 17:23                                     ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-07 17:07 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Tony Lindgren, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On Wed, Sep 07, 2005 at 08:53:52AM -0700, Nishanth Aravamudan wrote:
> 
> - include/linux/timer.h
> 	with definitions in kernel/timer.c
> 
> OR better in
> - include/linux/ticksource.h
> 	with definitions in kernel/ticksource.c?

> 
> #define DYN_TICK_ENABLED	(1 << 1)
> #define DYN_TICK_SUITABLE	(1 << 0)
> 
> #define DYN_TICK_MIN_SKIP	2
> 
> /* Abstraction of a tick source

I think "tick source" probably doesn't bring out the fact that we are
also dealing with more than tick source here (@recover_time and 
@set_all_cpus_idle). From this perspective, I feel that the original 
'dyn_tick_timer' name itself was better (atleast it captures the 
'dynamic' nature of ticks).

>  * @state: current state
>  * @max_skip: current maximum number of ticks to skip
>  * @init: initialization routine
>  * @enable_dyn_tick: called via sysfs to enable interrupt skipping
>  * @disable_dyn_tick: called via sysfs to disable interrupt
>  * 				skipping
>  * @set_all_cpus_idle: last cpu to go idle calls this, which should
>  * 				disable any timesource (e.g. PIT on x86)
>  * @recover_time: handler for returning from skipped ticks and keeping
>  * 				time consistent
>  */
> struct tick_source {
> 	unsigned int state;
> 	unsigned long max_skip;
> 	int (*init) (void);
> 	void (*enable_dyn_tick) (void);
> 	void (*disable_dyn_tick) (void);
> 	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */

How will it be able to return the number of ticks skipped? Or are you referring
to max_skip here?

> 	unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> 	/* following empty in UP */
> 	void (*set_all_cpus_idle) (int);

Does 'set' in 'set_all_cpus_idle' signify anything?

> 	spinlock_t lock;

I think the 'lock' fits in nicely here.

> };
> 
> extern void tick_source_register(struct tick_source *new_tick_source);

I tend to prefer the original interface - dyn_tick_register - itself (since 
as I said it captures the dynamic nature of the timer).

> extern struct tick_source *current_ticksource;

In x86-like architectures, there can be multiple ticksources that can
be simultaneously active - ex: APIC and PIT. So one "current_ticksource" 
doesnt capture that fact?

> 
> #ifdef CONFIG_NO_IDLE_HZ /* which means CONFIG_DYNTICK is also on */
> extern void set_tick_max_skip(unsigned long max_skip);
> /* idle_reprogram_tick calls reprogram_tick calls current_ticksource->reprogram()
>  * do we really need the first step? */

If 'idle_reprogram_tick' is the equivalent of 'idle_reprogram_timer' that is 
in the latest patch published by Con, then I think we can avoid the first step 
yes.

> extern void idle_reprogram_tick(void);
> /* return number of ticks skipped, potentially for accounting purposes? */
> extern unsigned long reprogram_tick(void);
> 
> extern struct tick_source * __init arch_select_tick_source(void);
> extern void __init dyn_tick_init(void); /* calls select_tick_source(), verifies source is usable, then calls tick_source_register() */

I presume dyn_tick_init will be in arch-independent code. In that case, how
does it "verify" that the source is usable? Seems like we need arch-hooks
for this as well?

> static inline int dyn_tick_enabled(void)
> {
> 	return (current_ticksource->state & DYN_TICK_ENABLED);
> }
> 
> #else	/* CONFIG_NO_IDLE_HZ */
> static inline void set_tick_max_skip(unsigned long max_skip)
> {
> }
> 
> static inline void idle_reprogram_tick(void)
> {
> }
> 
> static inline unsigned long reprogram_tick(void)
> {
> 	return 0;
> }
> 
> static inline void dyn_tick_init(void)
> {
> }
> 
> static inline int dyn_tick_enabled(void)
> {
> 	return 0;
> }
> #endif	/* CONFIG_NO_IDLE_HZ */
> 
> /* Pick up arch specific header */
> #include <asm/timer.h> /* or <asm/ticksource.h>, depending */
> 
> - sched.c / sched.h
> 	/* do we want these elsewhere? */
> 	cpumask_t no_idle_hz_cpumask;

Could be moved to timer.h/c?

> - each arch-specific file pair needs to provide:
> 	arch_select_tick_source();
> 	appropriate struct tick_source definitions, functions, etc.
> 
> - include/asm-i386/timer.h /* or ticksource.h */
> 	with defines in arch/i386/timer.c /* or ticksource.c */
> 
> - include/asm-arm/arch-omap/timer.h /* or ticksource.h */
> 	with definitions in arch/arm/mach-omap/timer.c /* or ticksource.c */
> 
> - include/asm-s390/timer.h /* or ticksource.h */
> 	with definitions in arch/s390/timer.c /* or ticksource.c */

I somehow consider that we can retain what currently exists - 
include/asm-i386/dyn-tick.h and arch/i386/kernel/dyn-tick.c ..
IMO current abstraction of 'dyn_tick_timer' is good enough to unify all the 
ports of no-idle-hz. We probably need to just iron out the differences between
how ARM and x86 defines this.

As far as the problem of multiple interrupt sources (like APIC, PIT, HPET)
is concerned, it can be completely handled by the architecture code itself and 
it appropriately sets the 'reprogram_timer' member to point to APIC, PIT or 
HPET reprogramming routines. That would also avoid the 
'if (cpu_has_local_apic())' kind of code that exists now.


-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 16:42                             ` Nish Aravamudan
@ 2005-09-07 17:17                               ` Srivatsa Vaddagiri
  2005-09-07 17:27                                 ` Nish Aravamudan
  2005-09-09 16:27                                 ` Bill Davidsen
  0 siblings, 2 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-07 17:17 UTC (permalink / raw)
  To: Nish Aravamudan
  Cc: Bill Davidsen, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On Wed, Sep 07, 2005 at 09:42:24AM -0700, Nish Aravamudan wrote:
> Hrm, got dropped from the Cc... :) Yes, the dynamic-tick generic
> infrastructure being proposed, with the idle CPU mask and the
> set_all_cpus_idle() tick_source hook, would allow exactly this in
> arch-specific code.

I think Bill is referring to the "resume" interface i.e an
unset_all_cpus_idle() interface, which is missing (set/unset
probably are not good prefixes maybe?). I feel we can
add one.

> Is there a generic location where the all-idle state is entered?

Should be from the place where the last cpu is set in the bitmap
and bitmap is found equal to cpu_online_map.

> Currently, I think we can do it via the generic reprogram() routine
> checking the mask and then calling set_all_cpus_idle(), if
> appropriate, after reprogramming the last idle CPU.

So are you saying that setting of the CPU in the bitmap will be done
inside reprogram_timer routine? If we consider that reprogram_timer can 
directly point to a routine in a interrupt source file (like apic.c/timer_pit.c)
I dont think that it is the right place to set bits in the nohz_cpu_mask.
It can be done by the callee of reprogram_timer itself.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 17:07                                   ` Srivatsa Vaddagiri
@ 2005-09-07 17:23                                     ` Nishanth Aravamudan
  2005-09-07 18:14                                       ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-07 17:23 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Tony Lindgren, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On 07.09.2005 [22:37:03 +0530], Srivatsa Vaddagiri wrote:
> On Wed, Sep 07, 2005 at 08:53:52AM -0700, Nishanth Aravamudan wrote:
> > 
> > - include/linux/timer.h
> > 	with definitions in kernel/timer.c
> > 
> > OR better in
> > - include/linux/ticksource.h
> > 	with definitions in kernel/ticksource.c?
> 
> > 
> > #define DYN_TICK_ENABLED	(1 << 1)
> > #define DYN_TICK_SUITABLE	(1 << 0)
> > 
> > #define DYN_TICK_MIN_SKIP	2

Another point. Why is this 2? I guess if you're going to make it 2, why
bother defining/checking it at all?

> > /* Abstraction of a tick source
> 
> I think "tick source" probably doesn't bring out the fact that we are
> also dealing with more than tick source here (@recover_time and 
> @set_all_cpus_idle). From this perspective, I feel that the original 
> 'dyn_tick_timer' name itself was better (atleast it captures the 
> 'dynamic' nature of ticks).

Yes, maybe you are right. dyn_tick_source, maybe?

> >  * @state: current state
> >  * @max_skip: current maximum number of ticks to skip
> >  * @init: initialization routine
> >  * @enable_dyn_tick: called via sysfs to enable interrupt skipping
> >  * @disable_dyn_tick: called via sysfs to disable interrupt
> >  * 				skipping
> >  * @set_all_cpus_idle: last cpu to go idle calls this, which should
> >  * 				disable any timesource (e.g. PIT on x86)
> >  * @recover_time: handler for returning from skipped ticks and keeping
> >  * 				time consistent
> >  */
> > struct tick_source {
> > 	unsigned int state;
> > 	unsigned long max_skip;
> > 	int (*init) (void);
> > 	void (*enable_dyn_tick) (void);
> > 	void (*disable_dyn_tick) (void);
> > 	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> 
> How will it be able to return the number of ticks skipped? Or are you
> referring to max_skip here?

Yes, maybe this can be a void function... I was thinking more along the
lines of you can send whatever request you want to reprogram(), it does
what it can with the request (cuts it short if too long, ignores it if
too short) and then returns what it actually did.

> > 	unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > 	/* following empty in UP */
> > 	void (*set_all_cpus_idle) (int);
> 
> Does 'set' in 'set_all_cpus_idle' signify anything?

It is the callback for the architecture when the last CPU goes locally
idle, so then system-wide idle code can be placed here (disable PIT on
x86). The name maybe wrong, though, you are right.

> > 	spinlock_t lock;
> 
> I think the 'lock' fits in nicely here.
> 
> > };
> > 
> > extern void tick_source_register(struct tick_source *new_tick_source);
> 
> I tend to prefer the original interface - dyn_tick_register - itself (since 
> as I said it captures the dynamic nature of the timer).
> 
> > extern struct tick_source *current_ticksource;
> 
> In x86-like architectures, there can be multiple ticksources that can
> be simultaneously active - ex: APIC and PIT. So one
> "current_ticksource" doesnt capture that fact?

Not really, though, right? Only one is registered to do the timer
callbacks? So, for x86, if you use the PIT ticksource, you only need to
be PIT aware, but if you use the APIC ticksource, then it needs to be
aware of the APIC and PIT (I believe you mentioned they are tied to each
other), but that's ticksource-specific. CMIIW, though, please.

> > #ifdef CONFIG_NO_IDLE_HZ /* which means CONFIG_DYNTICK is also on */
> > extern void set_tick_max_skip(unsigned long max_skip);
> > /* idle_reprogram_tick calls reprogram_tick calls current_ticksource->reprogram()
> >  * do we really need the first step? */
> 
> If 'idle_reprogram_tick' is the equivalent of 'idle_reprogram_timer' that is 
> in the latest patch published by Con, then I think we can avoid the first step 
> yes.

Yes, exactly.

> > extern void idle_reprogram_tick(void);
> > /* return number of ticks skipped, potentially for accounting purposes? */
> > extern unsigned long reprogram_tick(void);
> > 
> > extern struct tick_source * __init arch_select_tick_source(void);
> > extern void __init dyn_tick_init(void); /* calls select_tick_source(), verifies source is usable, then calls tick_source_register() */
> 
> I presume dyn_tick_init will be in arch-independent code. In that
> case, how does it "verify" that the source is usable? Seems like we
> need arch-hooks for this as well?

hrm, you are right. So then maybe the "usable" checks should be pushed
into arch_select_tick_source()?

<snip>

> > - sched.c / sched.h
> > 	/* do we want these elsewhere? */
> > 	cpumask_t no_idle_hz_cpumask;
> 
> Could be moved to timer.h/c?

Yes, that's what I was thinking. I just wasn't sure if there was a
specific reason for keeping them in sched.h/.c.

> > - each arch-specific file pair needs to provide:
> > 	arch_select_tick_source();
> > 	appropriate struct tick_source definitions, functions, etc.
> > 
> > - include/asm-i386/timer.h /* or ticksource.h */
> > 	with defines in arch/i386/timer.c /* or ticksource.c */
> > 
> > - include/asm-arm/arch-omap/timer.h /* or ticksource.h */
> > 	with definitions in arch/arm/mach-omap/timer.c /* or ticksource.c */
> > 
> > - include/asm-s390/timer.h /* or ticksource.h */
> > 	with definitions in arch/s390/timer.c /* or ticksource.c */
> 
> I somehow consider that we can retain what currently exists -
> include/asm-i386/dyn-tick.h and arch/i386/kernel/dyn-tick.c ..  IMO
> current abstraction of 'dyn_tick_timer' is good enough to unify all
> the ports of no-idle-hz. We probably need to just iron out the
> differences between how ARM and x86 defines this.

Maybe you are right. I don't like having a separate struct for the
state, though, and the dyn_tick_timer struct doesn't have a
recover_time() style member. If you look closely, my structure is
basically exactly what the x86 work has, just some different names
(don't need arch_ prefix, for instance, because it's clearly
dyn_tick_timer specific, etc.) I also would like to hear from the s390
folks about their issues/opinions.

> As far as the problem of multiple interrupt sources (like APIC, PIT,
> HPET) is concerned, it can be completely handled by the architecture
> code itself and it appropriately sets the 'reprogram_timer' member to
> point to APIC, PIT or HPET reprogramming routines. That would also
> avoid the 'if (cpu_has_local_apic())' kind of code that exists now.

Yes, true. I'm wondering, do we need to make the
current_ticksource/current_dyn_tick_timer per-CPU? I am just wondering
how to gracefully handle the SMP case. Or is that not a problem?

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 17:17                               ` Srivatsa Vaddagiri
@ 2005-09-07 17:27                                 ` Nish Aravamudan
  2005-09-07 18:18                                   ` Srivatsa Vaddagiri
  2005-09-09 16:27                                 ` Bill Davidsen
  1 sibling, 1 reply; 96+ messages in thread
From: Nish Aravamudan @ 2005-09-07 17:27 UTC (permalink / raw)
  To: vatsa; +Cc: Bill Davidsen, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On 9/7/05, Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:
> On Wed, Sep 07, 2005 at 09:42:24AM -0700, Nish Aravamudan wrote:
> > Hrm, got dropped from the Cc... :) Yes, the dynamic-tick generic
> > infrastructure being proposed, with the idle CPU mask and the
> > set_all_cpus_idle() tick_source hook, would allow exactly this in
> > arch-specific code.
> 
> I think Bill is referring to the "resume" interface i.e an
> unset_all_cpus_idle() interface, which is missing (set/unset
> probably are not good prefixes maybe?). I feel we can
> add one.

Yes, can be added.

enter_all_cpus_idle() and exit_all_cpus_idle() would be better?

> > Is there a generic location where the all-idle state is entered?
> 
> Should be from the place where the last cpu is set in the bitmap
> and bitmap is found equal to cpu_online_map.

Yes, this is what I said.

> > Currently, I think we can do it via the generic reprogram() routine
> > checking the mask and then calling set_all_cpus_idle(), if
> > appropriate, after reprogramming the last idle CPU.
> 
> So are you saying that setting of the CPU in the bitmap will be done
> inside reprogram_timer routine? If we consider that reprogram_timer can
> directly point to a routine in a interrupt source file (like apic.c/timer_pit.c)
> I dont think that it is the right place to set bits in the nohz_cpu_mask.
> It can be done by the callee of reprogram_timer itself.

No, I was saying what you were, if a little unclearly, so the caller
does something like:

current_dyn_tick_timer->reprogram();
check_cpu_mask(nohz_cpu_mask);
if (we_are_last_idle)
  enter_all_cpus_idle();

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 17:23                                     ` Nishanth Aravamudan
@ 2005-09-07 18:14                                       ` Srivatsa Vaddagiri
  2005-09-07 18:22                                         ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-07 18:14 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Tony Lindgren, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml,
	schwidefsky

On Wed, Sep 07, 2005 at 10:23:15AM -0700, Nishanth Aravamudan wrote:
> > > #define DYN_TICK_MIN_SKIP	2
> 
> Another point. Why is this 2? I guess if you're going to make it 2, why
> bother defining/checking it at all?

I think that should be arch-specific.


> > > 	void (*disable_dyn_tick) (void);
> > > 	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > 
> > How will it be able to return the number of ticks skipped? Or are you
> > referring to max_skip here?
> 
> Yes, maybe this can be a void function... I was thinking more along the
> lines of you can send whatever request you want to reprogram(), it does
> what it can with the request (cuts it short if too long, ignores it if
> too short) and then returns what it actually did.

Looks fine in that case to have a non-void return.

> > In x86-like architectures, there can be multiple ticksources that can
> > be simultaneously active - ex: APIC and PIT. So one
> > "current_ticksource" doesnt capture that fact?
> 
> Not really, though, right? Only one is registered to do the timer
> callbacks? 

True.

> So, for x86, if you use the PIT ticksource, you only need to
> be PIT aware, but if you use the APIC ticksource, then it needs to be
> aware of the APIC and PIT (I believe you mentioned they are tied to each
> other), but that's ticksource-specific. CMIIW, though, please.

I was going more by what meaning 'current_ticksource' may give - from
a pure "ticksource" perspective, both (PIT/APIC) are tick sources!
Thats why current_ticksource may not be a good term.

> Maybe you are right. I don't like having a separate struct for the
> state, though, and the dyn_tick_timer struct doesn't have a
> recover_time() style member. If you look closely, my structure is

I agree we can remove the separate struct for state and have
recover_time member. Although in x86, it may have to be a wrapper
around mark_offset() since mark_offset does not recover time
completely (it expect the callee to recover one remaining tick).

> basically exactly what the x86 work has, just some different names
> (don't need arch_ prefix, for instance, because it's clearly
> dyn_tick_timer specific, etc.) I also would like to hear from the s390
> folks about their issues/opinions.

Martin Schwidefsky (whom I have CC'ed) may be the person who can comment on 
behalf of s390.

> Yes, true. I'm wondering, do we need to make the
> current_ticksource/current_dyn_tick_timer per-CPU? I am just wondering
> how to gracefully handle the SMP case. Or is that not a problem?

I don't see that current_ticksource/current_dyn_tick_timer to be write-heavy.
In fact I see them to be initialied during bootup and after that mostly
read-only. That may not warrant a per-CPU structure.


-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 17:27                                 ` Nish Aravamudan
@ 2005-09-07 18:18                                   ` Srivatsa Vaddagiri
  2005-09-07 18:33                                     ` Nish Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-07 18:18 UTC (permalink / raw)
  To: Nish Aravamudan
  Cc: Bill Davidsen, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On Wed, Sep 07, 2005 at 10:27:43AM -0700, Nish Aravamudan wrote:
> enter_all_cpus_idle() and exit_all_cpus_idle() would be better?

Looks perfect.

> No, I was saying what you were, if a little unclearly, so the caller
> does something like:
> 
> current_dyn_tick_timer->reprogram();
> check_cpu_mask(nohz_cpu_mask);
> if (we_are_last_idle)
>   enter_all_cpus_idle();

Looks fine!

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 18:14                                       ` Srivatsa Vaddagiri
@ 2005-09-07 18:22                                         ` Nishanth Aravamudan
  0 siblings, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-07 18:22 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Tony Lindgren, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml,
	schwidefsky

On 07.09.2005 [23:44:45 +0530], Srivatsa Vaddagiri wrote:
> On Wed, Sep 07, 2005 at 10:23:15AM -0700, Nishanth Aravamudan wrote:
> > > > #define DYN_TICK_MIN_SKIP	2
> > 
> > Another point. Why is this 2? I guess if you're going to make it 2, why
> > bother defining/checking it at all?
> 
> I think that should be arch-specific.

Yes, I agree, will perhaps add a min_skip member to the structure.

> > > > 	void (*disable_dyn_tick) (void);
> > > > 	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > > 
> > > How will it be able to return the number of ticks skipped? Or are you
> > > referring to max_skip here?
> > 
> > Yes, maybe this can be a void function... I was thinking more along the
> > lines of you can send whatever request you want to reprogram(), it does
> > what it can with the request (cuts it short if too long, ignores it if
> > too short) and then returns what it actually did.
> 
> Looks fine in that case to have a non-void return.

Ok, I agree.

> > > In x86-like architectures, there can be multiple ticksources that can
> > > be simultaneously active - ex: APIC and PIT. So one
> > > "current_ticksource" doesnt capture that fact?
> > 
> > Not really, though, right? Only one is registered to do the timer
> > callbacks? 
> 
> True.

Thanks for confirming, I wasn't sure if that was the case or not.

> > So, for x86, if you use the PIT ticksource, you only need to
> > be PIT aware, but if you use the APIC ticksource, then it needs to be
> > aware of the APIC and PIT (I believe you mentioned they are tied to each
> > other), but that's ticksource-specific. CMIIW, though, please.
> 
> I was going more by what meaning 'current_ticksource' may give - from
> a pure "ticksource" perspective, both (PIT/APIC) are tick sources!
> Thats why current_ticksource may not be a good term.

Ah, true. maybe current_dyn_tick_source or something to that effect?
Because there should only be one dyn_tick_timer, yes?

> > Maybe you are right. I don't like having a separate struct for the
> > state, though, and the dyn_tick_timer struct doesn't have a
> > recover_time() style member. If you look closely, my structure is
> 
> I agree we can remove the separate struct for state and have
> recover_time member. Although in x86, it may have to be a wrapper
> around mark_offset() since mark_offset does not recover time
> completely (it expect the callee to recover one remaining tick).

Yes, certainly, in fact, I think some of these functions may provide
impetus to eventually clean up other code.

> > basically exactly what the x86 work has, just some different names
> > (don't need arch_ prefix, for instance, because it's clearly
> > dyn_tick_timer specific, etc.) I also would like to hear from the s390
> > folks about their issues/opinions.
> 
> Martin Schwidefsky (whom I have CC'ed) may be the person who can comment on 
> behalf of s390.

Thanks, I forwarded him the first plan I submitted a few days ago, but
didn't add him to the Cc.

> > Yes, true. I'm wondering, do we need to make the
> > current_ticksource/current_dyn_tick_timer per-CPU? I am just wondering
> > how to gracefully handle the SMP case. Or is that not a problem?
> 
> I don't see that current_ticksource/current_dyn_tick_timer to be write-heavy.
> In fact I see them to be initialied during bootup and after that mostly
> read-only. That may not warrant a per-CPU structure.

I meant more for making sure we can manage that one CPU may have access
to the PIT, but others may not (CPU0)? More along the lines of diverse
h/w setups where perhaps the HPET is tied to one chip, but not the
other. So, actually different hardware per-cpu. If that can't be the
case (at least not currently), then the nohz_mask and spin_lock is
enough to guarantee we don't muck with cpus accidentally.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 18:18                                   ` Srivatsa Vaddagiri
@ 2005-09-07 18:33                                     ` Nish Aravamudan
  0 siblings, 0 replies; 96+ messages in thread
From: Nish Aravamudan @ 2005-09-07 18:33 UTC (permalink / raw)
  To: vatsa; +Cc: Bill Davidsen, Con Kolivas, linux-kernel, akpm, ck list, rmk+lkml

On 9/7/05, Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:
> On Wed, Sep 07, 2005 at 10:27:43AM -0700, Nish Aravamudan wrote:
> > enter_all_cpus_idle() and exit_all_cpus_idle() would be better?
> 
> Looks perfect.
> 
> > No, I was saying what you were, if a little unclearly, so the caller
> > does something like:
> >
> > current_dyn_tick_timer->reprogram();
> > check_cpu_mask(nohz_cpu_mask);
> > if (we_are_last_idle)
> >   enter_all_cpus_idle();
> 
> Looks fine!

Great!

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 15:05                           ` Nishanth Aravamudan
@ 2005-09-08 10:00                             ` Tony Lindgren
  2005-09-08 21:22                               ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Tony Lindgren @ 2005-09-08 10:00 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Srivatsa Vaddagiri, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list

* Nishanth Aravamudan <nacc@us.ibm.com> [050907 18:07]:
> On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> > * Nishanth Aravamudan <nacc@us.ibm.com> [050905 20:02]:
> > > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > > * Srivatsa Vaddagiri <vatsa@in.ibm.com> [050905 10:03]:
> > > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > > 
> > > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > > but it's mostly code churn :)
> > > > > 
> > > > > Yes, the name 'dynamic-tick' is misleading!
> > > > 
> > > > Huh? For most people dynamic-tick is much more descriptive name than
> > > > NO_IDLE_HZ or VST!
> > > 
> > > I understand this. My point is that the structures are *not*
> > > dynamic-tick specific. They are interrupt source specific, generally
> > > (also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
> > > users of the interrupt source reprogramming functions, but not the
> > > reprogrammers themselves, in my mind. Also, it still would be confusing
> > > to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)
> > 
> > I see what you mean, it's a confusing naming issue currently :) Would
> > the following solution work for you:
> > 
> > - Dynamic tick is the structure you register with, and then you use it
> >   for any kind of non-continuous timer tinkering 
> > 
> > - This structure has at least two possible users, NO_IDLE_HZ and
> >   sub-jiffie timers
> > 
> > So we could have following config options:
> > 
> > CONFIG_DYNTICK
> > CONFIG_NO_IDLE_HZ	depends on dyntick
> > CONFIG_SUBJIFFIE_TIMER	depends on dyntick
> 
> Hrm, yes, first you are right with the dependency ordering. I take it
> CONFIG_DYNTICK is simply there as NO_IDLE_HZ and SUBJIFFIE_TIMER are
> independent users of the same underlying infrastructure.

Cool, I'm glad we got the dependencies figured out now rather than later :)
 
> > > > If you wanted, you could reprogram the next timer to happen from
> > > > {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.
> > > 
> > > I messed with this with my soft-timer rework (which has since has fallen
> > > by the wayside). It is a bit of overhead, especially del_timer(), but
> > > it's possible. This is what I would consider "dynamic-tick." And I would
> > > setup a *different* .config option to enable it. Perhaps depending on
> > > CONFIG_NO_IDLE_HZ.
> > 
> > Yes, I agree it should be a different .config option. Maybe the example
> > above would work for that?
> 
> Yes, I'm thinking it might.
> 
> > > > And you would want to do that if you wanted sub-jiffie timer
> > > > interrupts.
> > > 
> > > Yes, true, it does enable that. Well, to be honest, it completely
> > > redefines (in some sense) the jiffy, as it is potentially continuously
> > > changing, not just at idle times.
> > 
> > Yeah. But should still work as we already accept interrupts at any point
> > inbetween jiffies to update time, and update the system time from a
> > second continuously running timer :)
> 
> The problem with subjiffie timers is that the precision of soft-timers
> is jiffies currently. It requires some serious effort to modify the
> soft-timer subsystem to be aware of the extra bits it needs,
> efficiently -- take a look at what HRT has had to do.

Yes, we should coordinate that with HRT. BTW, we can reduce the overhead
of del_timer() by _not_ calling next_timer_interrupt(), and programming
the next timer interrupt to happen where next jiffie would be. Then once
we get to the idle, we call next_timer_interrupt()...
 
> > > > So I'd rather not limit the name to the currently implemented
> > > > functionality only :)
> > > 
> > > I'm not trying to limit the name, but make sure we are tying the
> > > strcutures and functions to the right abstraction (interrupt source, in
> > > my opinion).
> > 
> > But other devices are interrupt sources too... And really the only use
> > for this stuct is non-continuous timer stuff, right?
> 
> Would "tick_source" be better? I guess you are right, that there is only
> this one consumer... Although if that is the case, then maybe a separate
> .h/.c file is the right way to go, to isolate the code, reduce
> #ifdeffery in timer.h/.c.

Hmmm, seems like dyntick.[ch] is still the best name for it...

Tony

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-08 10:00                             ` Tony Lindgren
@ 2005-09-08 21:22                               ` Nishanth Aravamudan
  2005-09-08 22:08                                 ` Nishanth Aravamudan
  0 siblings, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-08 21:22 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Srivatsa Vaddagiri, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list

On 08.09.2005 [13:00:36 +0300], Tony Lindgren wrote:
> * Nishanth Aravamudan <nacc@us.ibm.com> [050907 18:07]:
> > On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> > > * Nishanth Aravamudan <nacc@us.ibm.com> [050905 20:02]:
> > > > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > > > * Srivatsa Vaddagiri <vatsa@in.ibm.com> [050905 10:03]:
> > > > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > > > 
> > > > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > > > but it's mostly code churn :)
> > > > > > 
> > > > > > Yes, the name 'dynamic-tick' is misleading!
> > > > > 
> > > > > Huh? For most people dynamic-tick is much more descriptive name than
> > > > > NO_IDLE_HZ or VST!
> > > > 
> > > > I understand this. My point is that the structures are *not*
> > > > dynamic-tick specific. They are interrupt source specific, generally
> > > > (also known as hardware timers) -- dynamic tick or NO_IDLE_HZ are the
> > > > users of the interrupt source reprogramming functions, but not the
> > > > reprogrammers themselves, in my mind. Also, it still would be confusing
> > > > to use dynamic-tick, when the .config option is NO_IDLE_HZ! :)
> > > 
> > > I see what you mean, it's a confusing naming issue currently :) Would
> > > the following solution work for you:
> > > 
> > > - Dynamic tick is the structure you register with, and then you use it
> > >   for any kind of non-continuous timer tinkering 
> > > 
> > > - This structure has at least two possible users, NO_IDLE_HZ and
> > >   sub-jiffie timers
> > > 
> > > So we could have following config options:
> > > 
> > > CONFIG_DYNTICK
> > > CONFIG_NO_IDLE_HZ	depends on dyntick
> > > CONFIG_SUBJIFFIE_TIMER	depends on dyntick
> > 
> > Hrm, yes, first you are right with the dependency ordering. I take it
> > CONFIG_DYNTICK is simply there as NO_IDLE_HZ and SUBJIFFIE_TIMER are
> > independent users of the same underlying infrastructure.
> 
> Cool, I'm glad we got the dependencies figured out now rather than later :)

Yup, I think that makes the most sense. I appreciate your help with it!

> > > > > If you wanted, you could reprogram the next timer to happen from
> > > > > {add,mod,del}_timer() just by calling the timer_dyn_reprogram() there.
> > > > 
> > > > I messed with this with my soft-timer rework (which has since has fallen
> > > > by the wayside). It is a bit of overhead, especially del_timer(), but
> > > > it's possible. This is what I would consider "dynamic-tick." And I would
> > > > setup a *different* .config option to enable it. Perhaps depending on
> > > > CONFIG_NO_IDLE_HZ.
> > > 
> > > Yes, I agree it should be a different .config option. Maybe the example
> > > above would work for that?
> > 
> > Yes, I'm thinking it might.
> > 
> > > > > And you would want to do that if you wanted sub-jiffie timer
> > > > > interrupts.
> > > > 
> > > > Yes, true, it does enable that. Well, to be honest, it completely
> > > > redefines (in some sense) the jiffy, as it is potentially continuously
> > > > changing, not just at idle times.
> > > 
> > > Yeah. But should still work as we already accept interrupts at any point
> > > inbetween jiffies to update time, and update the system time from a
> > > second continuously running timer :)
> > 
> > The problem with subjiffie timers is that the precision of soft-timers
> > is jiffies currently. It requires some serious effort to modify the
> > soft-timer subsystem to be aware of the extra bits it needs,
> > efficiently -- take a look at what HRT has had to do.
> 
> Yes, we should coordinate that with HRT. BTW, we can reduce the overhead
> of del_timer() by _not_ calling next_timer_interrupt(), and programming
> the next timer interrupt to happen where next jiffie would be. Then once
> we get to the idle, we call next_timer_interrupt()...

Yes, I agree with the del_timer() changes.

> > > > > So I'd rather not limit the name to the currently implemented
> > > > > functionality only :)
> > > > 
> > > > I'm not trying to limit the name, but make sure we are tying the
> > > > strcutures and functions to the right abstraction (interrupt source, in
> > > > my opinion).
> > > 
> > > But other devices are interrupt sources too... And really the only use
> > > for this stuct is non-continuous timer stuff, right?
> > 
> > Would "tick_source" be better? I guess you are right, that there is only
> > this one consumer... Although if that is the case, then maybe a separate
> > .h/.c file is the right way to go, to isolate the code, reduce
> > #ifdeffery in timer.h/.c.
> 
> Hmmm, seems like dyntick.[ch] is still the best name for it...

I guess you are right, I'll change my little summary and send it out so
it's archived.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-08 21:22                               ` Nishanth Aravamudan
@ 2005-09-08 22:08                                 ` Nishanth Aravamudan
  2005-09-09 22:30                                   ` Nishanth Aravamudan
  2005-09-20 11:06                                   ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-08 22:08 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Srivatsa Vaddagiri, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list, schwidefsky

On 08.09.2005 [14:22:13 -0700], Nishanth Aravamudan wrote:
> On 08.09.2005 [13:00:36 +0300], Tony Lindgren wrote:
> > * Nishanth Aravamudan <nacc@us.ibm.com> [050907 18:07]:
> > > On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> > > > * Nishanth Aravamudan <nacc@us.ibm.com> [050905 20:02]:
> > > > > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > > > > * Srivatsa Vaddagiri <vatsa@in.ibm.com> [050905 10:03]:
> > > > > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > > > > 
> > > > > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > > > > but it's mostly code churn :)
> > > > > > > 
> > > > > > > Yes, the name 'dynamic-tick' is misleading!

<snipping much useful feedback and many constructive conversations>

So, after *all* that, I'm going back to dyntick (notice no hyphen though
:-P). Everyone ok with this doc?

Thanks,
Nish


- include/linux/dyntick.h
	with definitions in kernel/dyntick.c

#define DYN_TICK_ENABLED	(1 << 1)
#define DYN_TICK_SUITABLE	(1 << 0)

#define DYN_TICK_MIN_SKIP	2

/* Abstraction of a dynamic tick source
 * @state: current state
 * @max_skip: current maximum number of jiffies to program h/w to skip
 * @min_skip: current minimum number of jiffies to program h/w to skip
 * @init: initialization routine
 * @enable_dyn_tick: called via sysfs to enable interrupt skipping
 * @disable_dyn_tick: called via sysfs to disable interrupt
 * 			skipping
 * @reprogram: actually interact with h/w, return number of ticks the
 *			h/w will skip
 * @recover_time: handler for returning from skipped ticks and keeping
 * 			time consistent
 * @enter_all_cpus_idle: last cpu to go idle calls this, which should
 * 			disable any timer source (e.g. PIT on x86)
 * @exit_all_cpus_idle: first cpu to wake after @enter_all_cpus_idle has
 *			been called should use this to revert the
 *			effects of that function
 */
struct dyntick_timer {
	unsigned int state;
	unsigned long max_skip;
	unsigned long min_skip;
	int (*init) (void);
	void (*enable_dyn_tick) (void);
	void (*disable_dyn_tick) (void);
	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
	unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
	/* following empty in UP */
	void (*enter_all_cpus_idle) (int);
	void (*exit_all_cpus_idle) (int);
	spinlock_t lock;
};

extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
 /* so do we need this?
 	Maybe it can just be static to dyntick.c and all the callable
	functions will call-down to the structure members? */
extern struct dyntick_timer *current_dyntick_timer;

#ifdef CONFIG_NO_IDLE_HZ /* which means CONFIG_DYNTICK is also on */
extern void set_dyntick_max_skip(unsigned long max_skip);
extern void set_dyntick_min_skip(unsigned long min_skip);
/* return number of ticks skipped, as we can request any number
	called from cpu_idle() in dyntick-enabled arch's */
extern unsigned long reprogram_dyntick(void);

extern struct tick_source * __init arch_select_tick_source(void);
/* calls select_tick_source(), then calls tick_source_register() */
extern void __init dyn_tick_init(void);

static inline int dyn_tick_enabled(void)
{
	return (current_ticksource->state & DYN_TICK_ENABLED);
}

#else	/* CONFIG_NO_IDLE_HZ */
static inline void set_tick_max_skip(unsigned long max_skip)
{
}

static inline void idle_reprogram_tick(void)
{
}

static inline unsigned long reprogram_tick(void)
{
	return 0;
}

static inline void dyn_tick_init(void)
{
}

static inline int dyn_tick_enabled(void)
{
	return 0;
}
#endif	/* CONFIG_NO_IDLE_HZ */

/* Pick up arch specific header */
#include <asm/dyntick.h>

- timer.c / timer.h
	/* moved from sched.c/.h */
	cpumask_t no_idle_hz_cpumask;

- each arch-specific file pair needs to provide:
	arch_select_tick_source();
	an appropriate struct tick_source definitions, functions, etc. per
		usable h/w

- include/asm-i386/dyntick.h
	with defines in arch/i386/dyntick.c
	/* basically already done */

- include/asm-arm/arch-omap/dyntick.h
	with definitions in arch/arm/mach-omap/dyntick.c

- include/asm-s390/dyntick.h
	with definitions in arch/s390/dyntick.c



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-07 17:17                               ` Srivatsa Vaddagiri
  2005-09-07 17:27                                 ` Nish Aravamudan
@ 2005-09-09 16:27                                 ` Bill Davidsen
  1 sibling, 0 replies; 96+ messages in thread
From: Bill Davidsen @ 2005-09-09 16:27 UTC (permalink / raw)
  To: vatsa; +Cc: Nish Aravamudan, Con Kolivas, linux-kernel, akpm, ck list,
	rmk+lkml

Srivatsa Vaddagiri wrote:

>On Wed, Sep 07, 2005 at 09:42:24AM -0700, Nish Aravamudan wrote:
>  
>
>>Hrm, got dropped from the Cc... :) Yes, the dynamic-tick generic
>>infrastructure being proposed, with the idle CPU mask and the
>>set_all_cpus_idle() tick_source hook, would allow exactly this in
>>arch-specific code.
>>    
>>
>
>I think Bill is referring to the "resume" interface i.e an
>unset_all_cpus_idle() interface, which is missing (set/unset
>probably are not good prefixes maybe?). I feel we can
>add one.
>
Exactly what I had in mind. If there are hooks for all_idle transitions 
then architectures can hang whatever makes sense there. That hopefully 
would result in readable code for both power reduction (laptop) and for 
the strange things that embedded systems sometimes do.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-08 22:08                                 ` Nishanth Aravamudan
@ 2005-09-09 22:30                                   ` Nishanth Aravamudan
  2005-09-20 11:06                                   ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-09 22:30 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Srivatsa Vaddagiri, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list, schwidefsky

On 08.09.2005 [15:08:54 -0700], Nishanth Aravamudan wrote:
> On 08.09.2005 [14:22:13 -0700], Nishanth Aravamudan wrote:
> > On 08.09.2005 [13:00:36 +0300], Tony Lindgren wrote:
> > > * Nishanth Aravamudan <nacc@us.ibm.com> [050907 18:07]:
> > > > On 07.09.2005 [10:37:43 +0300], Tony Lindgren wrote:
> > > > > * Nishanth Aravamudan <nacc@us.ibm.com> [050905 20:02]:
> > > > > > On 05.09.2005 [10:27:05 +0300], Tony Lindgren wrote:
> > > > > > > * Srivatsa Vaddagiri <vatsa@in.ibm.com> [050905 10:03]:
> > > > > > > > On Sun, Sep 04, 2005 at 01:10:54PM -0700, Nishanth Aravamudan wrote:
> > > > > > > > > 
> > > > > > > > > Also, I am a bit confused by the use of "dynamic-tick" to describe these
> > > > > > > > > changes. To me, these are all NO_IDLE_HZ implementations, as they are
> > > > > > > > > only invoked from cpu_idle() (or their equivalent) routines. I know this
> > > > > > > > > is true of s390 and the x86 code, and I believe it is true of the ARM
> > > > > > > > > code? If it were dynamic-tick, I would think we would be adjusting the
> > > > > > > > > timer interrupt frequency continuously (e.g., at the end of
> > > > > > > > > __run_timers() and at every call to {add,mod,del}_timer()). I was
> > > > > > > > > working on a patch which did some renaming to no_idle_hz_timer, etc.,
> > > > > > > > > but it's mostly code churn :)
> > > > > > > > 
> > > > > > > > Yes, the name 'dynamic-tick' is misleading!
> 
> <snipping much useful feedback and many constructive conversations>
> 
> So, after *all* that, I'm going back to dyntick (notice no hyphen though
> :-P). Everyone ok with this doc?
> 
> Thanks,
> Nish
> 
> 
> - include/linux/dyntick.h
> 	with definitions in kernel/dyntick.c
> 
> #define DYN_TICK_ENABLED	(1 << 1)
> #define DYN_TICK_SUITABLE	(1 << 0)
> 
> #define DYN_TICK_MIN_SKIP	2
> 
> /* Abstraction of a dynamic tick source
>  * @state: current state
>  * @max_skip: current maximum number of jiffies to program h/w to skip
>  * @min_skip: current minimum number of jiffies to program h/w to skip
>  * @init: initialization routine
>  * @enable_dyn_tick: called via sysfs to enable interrupt skipping
>  * @disable_dyn_tick: called via sysfs to disable interrupt
>  * 			skipping
>  * @reprogram: actually interact with h/w, return number of ticks the
>  *			h/w will skip
>  * @recover_time: handler for returning from skipped ticks and keeping
>  * 			time consistent
>  * @enter_all_cpus_idle: last cpu to go idle calls this, which should
>  * 			disable any timer source (e.g. PIT on x86)
>  * @exit_all_cpus_idle: first cpu to wake after @enter_all_cpus_idle has
>  *			been called should use this to revert the
>  *			effects of that function
>  */
> struct dyntick_timer {

Ah ha! I think I figured out what my problem was with the naming (I
*just* can't get my head around it). As we added the cpus_idle() hooks,
recover_time() and other non-per-interrupt-source related functionality,
I think it might be best to name this structure:

struct dyntick_control;

indicating it's dynamic tick basis, but that it's used to control the
subsystem.

I think that makes it clear, and keeps it clear why we have
non-"timer" stuff in there. It's also much clearer to me now why we
*don't* need it per-CPU, as we're relying on one set of callbacks for a
subsystem, and not for each CPU's hardware (works out to be the same,
but makes more sense to me that way).

Does that make more sense?

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-08 22:08                                 ` Nishanth Aravamudan
  2005-09-09 22:30                                   ` Nishanth Aravamudan
@ 2005-09-20 11:06                                   ` Srivatsa Vaddagiri
  2005-09-20 14:58                                     ` Nishanth Aravamudan
  1 sibling, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-20 11:06 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Tony Lindgren, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list, schwidefsky

Nish,
	I did some study of how s390 and ARM are architected and have
some comments as a result of that.

On Thu, Sep 08, 2005 at 03:08:54PM -0700, Nishanth Aravamudan wrote:
> struct dyntick_timer {
> 	unsigned int state;
> 	unsigned long max_skip;
> 	unsigned long min_skip;
> 	int (*init) (void);
> 	void (*enable_dyn_tick) (void);
> 	void (*disable_dyn_tick) (void);
> 	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> 	unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> 	/* following empty in UP */
> 	void (*enter_all_cpus_idle) (int);
> 	void (*exit_all_cpus_idle) (int);
> 	spinlock_t lock;
> };

The usage of 'lock' probably needs to be made clear. I intended it to be used 
for mainly serializing enter/exit_all_cpus_idle routines.

Considering that not all architectures have such routines, then the use
of spinlock can be entirely within the arch code. Maybe the 'enter' routine
is invoked as part of 'reprogram' routine (when the last CPU goes down)
and 'exit' routine is invoked as part of dyn_tick_interrupt() (when
coming out of all_idle_state), both being serialized using the spinlock?

Another interesting point is that I expected recover_time to be
invoked only as part of 'exit_all_cpus_idle', but s390 seems to 
have unconditional call to account_ticks (for recovering time) on
any CPU that wakes up. I guess it will be a no-op if other CPUs
were active.

We probably also need to document how 'reprogram' will be invoked 
- with xtime_lock held or not. Again s390 does not seem to require it
while ARM is using one. I think we should let the arch code take
xtime_lock if they deem it necessary.

> extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
>  /* so do we need this?
>  	Maybe it can just be static to dyntick.c and all the callable
> 	functions will call-down to the structure members? */
> extern struct dyntick_timer *current_dyntick_timer;

I don't think this can be static - since the low-level arch-code
will need access to, for example, 'recover_time'/'handler' 
and 'enter/exit_all_cpus_idle' routines?


> extern struct tick_source * __init arch_select_tick_source(void);
> /* calls select_tick_source(), then calls tick_source_register() */
> extern void __init dyn_tick_init(void);

Hmm ..I think just tick_source_register is sufficient ..we can do 
let the arch-code select what tick source it wants and call
register with the selected source ..

>From a point of getting this reviewed by arch-maintainers, I think it will 
help if a new version of this interface is posted and point out how the 
existing s390/ARM interfaces will be affected. I could help out if you are busy.

-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-20 11:06                                   ` Srivatsa Vaddagiri
@ 2005-09-20 14:58                                     ` Nishanth Aravamudan
  2005-09-22 13:38                                       ` Martin Schwidefsky
  0 siblings, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-20 14:58 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Tony Lindgren, Con Kolivas, Russell King, linux-kernel, akpm,
	ck list, schwidefsky

On 20.09.2005 [16:36:54 +0530], Srivatsa Vaddagiri wrote:
> Nish,
> 	I did some study of how s390 and ARM are architected and have
> some comments as a result of that.
> 
> On Thu, Sep 08, 2005 at 03:08:54PM -0700, Nishanth Aravamudan wrote:
> > struct dyntick_timer {
> > 	unsigned int state;
> > 	unsigned long max_skip;
> > 	unsigned long min_skip;
> > 	int (*init) (void);
> > 	void (*enable_dyn_tick) (void);
> > 	void (*disable_dyn_tick) (void);
> > 	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > 	unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > 	/* following empty in UP */
> > 	void (*enter_all_cpus_idle) (int);
> > 	void (*exit_all_cpus_idle) (int);
> > 	spinlock_t lock;
> > };
> 
> The usage of 'lock' probably needs to be made clear. I intended it to
> be used for mainly serializing enter/exit_all_cpus_idle routines.

Yes, now that we are somewhat settled on the structure, I will add
comments to the structure in dyn-tick.h and send out patches. Sorry,
I've been swamped with other tasks lately...

> Considering that not all architectures have such routines, then the
> use of spinlock can be entirely within the arch code. Maybe the
> 'enter' routine is invoked as part of 'reprogram' routine (when the
> last CPU goes down) and 'exit' routine is invoked as part of
> dyn_tick_interrupt() (when coming out of all_idle_state), both being
> serialized using the spinlock?

I think I suggested this at one point or another :)

The usage I envisioned is

dyntick_timer_reprogram():

current_dyntick_timer->reprogram();
if (cpus_full(noidlehz_mask))
	current_dyntick_timer->enter_all_cpus_idle(); // which will lock
					// with
					// current_dyntick_timer->lock,
					// if necessary

dyn_tick_interrupt():

if (cpus_full(noidlehz_mask)) {
	cpu_unset(cpu, noidlehz_mask);
	current_dyntick_timer->exit_all_cpus_idle(); // which will lock
					// with
					// current_dyntick_timer->lock,
					// if necessary

> Another interesting point is that I expected recover_time to be
> invoked only as part of 'exit_all_cpus_idle', but s390 seems to 
> have unconditional call to account_ticks (for recovering time) on
> any CPU that wakes up. I guess it will be a no-op if other CPUs
> were active.

Maybe that is just paranoia? In theory, if nothing has happened, then
accounting should not need to do anything; but I'm not sure, I'll add
this to my "code to look at" list ;)

> We probably also need to document how 'reprogram' will be invoked 
> - with xtime_lock held or not. Again s390 does not seem to require it
> while ARM is using one. I think we should let the arch code take
> xtime_lock if they deem it necessary.

That seems buggy. I'm guessing they need the xtime_lock there just as
much as ARM and x86 will. In fact, I'm pretty sure all archs will need
it. But I'm fine with leaving it to the arch for now, and then unifying
the locking later, if we find that all archs call seq_lock(xtime_lock).

> > extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
> >  /* so do we need this?
> >  	Maybe it can just be static to dyntick.c and all the callable
> > 	functions will call-down to the structure members? */
> > extern struct dyntick_timer *current_dyntick_timer;
> 
> I don't think this can be static - since the low-level arch-code
> will need access to, for example, 'recover_time'/'handler' 
> and 'enter/exit_all_cpus_idle' routines?

Ah yes, you are probably right...

> > extern struct tick_source * __init arch_select_tick_source(void);
> > /* calls select_tick_source(), then calls tick_source_register() */
> > extern void __init dyn_tick_init(void);
> 
> Hmm ..I think just tick_source_register is sufficient ..we can do let
> the arch-code select what tick source it wants and call register with
> the selected source ..

Ok, that is fine by me.

> From a point of getting this reviewed by arch-maintainers, I think it
> will help if a new version of this interface is posted and point out
> how the existing s390/ARM interfaces will be affected. I could help
> out if you are busy.

That would be great. I will try to get your changes merged into what I
already have pending for dyn-tick.h to make sure everyone is still in
agreement.

I think for x86, it's mostly assigning the members of the structure and
perhaps renaming some functions for the PIT/APIC, etc. Same for s390 as
there is only the one timer source. Ideally, the same will hold for ARM,
but it may require some validation.

Thanks again,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-20 14:58                                     ` Nishanth Aravamudan
@ 2005-09-22 13:38                                       ` Martin Schwidefsky
  2005-09-22 14:52                                         ` Nishanth Aravamudan
  2005-09-23  6:55                                         ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 96+ messages in thread
From: Martin Schwidefsky @ 2005-09-22 13:38 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Srivatsa Vaddagiri, Tony Lindgren, Con Kolivas, Russell King,
	linux-kernel, akpm, ck list

On Tue, 2005-09-20 at 07:58 -0700, Nishanth Aravamudan wrote:

Hi folks,
finally found some time to catch up on the dynticks discussion. Quite
lengthy already..

> > On Thu, Sep 08, 2005 at 03:08:54PM -0700, Nishanth Aravamudan wrote:
> > > struct dyntick_timer {
> > > 	unsigned int state;
> > > 	unsigned long max_skip;
> > > 	unsigned long min_skip;
> > > 	int (*init) (void);
> > > 	void (*enable_dyn_tick) (void);
> > > 	void (*disable_dyn_tick) (void);
> > > 	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > > 	unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > > 	/* following empty in UP */
> > > 	void (*enter_all_cpus_idle) (int);
> > > 	void (*exit_all_cpus_idle) (int);
> > > 	spinlock_t lock;
> > > };
> > 
> > The usage of 'lock' probably needs to be made clear. I intended it to
> > be used for mainly serializing enter/exit_all_cpus_idle routines.
> 
> Yes, now that we are somewhat settled on the structure, I will add
> comments to the structure in dyn-tick.h and send out patches. Sorry,
> I've been swamped with other tasks lately...

As I first saw the dyntick_timer structure I thought: "why does it have
to be that complicated?". Must be because of the requirements for
high-res-timers, since it's overkill for no-idle-hz.

> > Considering that not all architectures have such routines, then the
> > use of spinlock can be entirely within the arch code. Maybe the
> > 'enter' routine is invoked as part of 'reprogram' routine (when the
> > last CPU goes down) and 'exit' routine is invoked as part of
> > dyn_tick_interrupt() (when coming out of all_idle_state), both being
> > serialized using the spinlock?
> 
> I think I suggested this at one point or another :)
> 
> The usage I envisioned is
> 
> dyntick_timer_reprogram():
> 
> current_dyntick_timer->reprogram();
> if (cpus_full(noidlehz_mask))
> 	current_dyntick_timer->enter_all_cpus_idle(); // which will lock
> 					// with
> 					// current_dyntick_timer->lock,
> 					// if necessary
> 
> dyn_tick_interrupt():
> 
> if (cpus_full(noidlehz_mask)) {
> 	cpu_unset(cpu, noidlehz_mask);
> 	current_dyntick_timer->exit_all_cpus_idle(); // which will lock
> 					// with
> 					// current_dyntick_timer->lock,
> 					// if necessary

I would really like to see how all the fields from the dyntick_timer
structure are supposed to be used. Especially the who-calls-what graph,
if I got it right then the low-level arch code calls common code
functions which in turn call functions from the dyntick_timer structure.
The question is what should be the connecting points between the arch
code and the common timer code? With the current code its
* do_timer()
* update_process_times()
* next_timer_event()
and the non-trivial interactions with rcu via
* test/set/clear bit on nohz_cpu_mask
* rcu_pending() and friends.

> > Another interesting point is that I expected recover_time to be
> > invoked only as part of 'exit_all_cpus_idle', but s390 seems to 
> > have unconditional call to account_ticks (for recovering time) on
> > any CPU that wakes up. I guess it will be a no-op if other CPUs
> > were active.
> 
> Maybe that is just paranoia? In theory, if nothing has happened, then
> accounting should not need to do anything; but I'm not sure, I'll add
> this to my "code to look at" list ;)

After a cpu woke up some code need to check if a tick has passed or not.
For s390 this is done in the start_hz_timer function called from the
idle notifier. Even if "nothing" has happened it's not just paranoia,
account_ticks sets up the clock comparator for the next tick, updates
xtime if it is necessary and account_user_vtime()/update_process_times()
for cpu time accounting. It is the exact same function that is used for
the regular tick interrupts, its just called from a different context.

The reason that xtime and process ticks are done in a single function is
that s390 doesn't have the distinction between wall-clock timer and
local APIC timer. Each s390 cpu has something called the clock
comparator that each cpu can set up individually. Whenever the value of
the time-of-day (TOD) clock is bigger then the clock comparator the cpu
gets an external interrupt if the cpu is enabled for the interrupt. This
timer interrupt source is used for both the xtime updates and the
process ticks. So there won't be any exit_all_cpus_idle special handling
for s390.

> > We probably also need to document how 'reprogram' will be invoked 
> > - with xtime_lock held or not. Again s390 does not seem to require it
> > while ARM is using one. I think we should let the arch code take
> > xtime_lock if they deem it necessary.
> 
> That seems buggy. I'm guessing they need the xtime_lock there just as
> much as ARM and x86 will. In fact, I'm pretty sure all archs will need
> it. But I'm fine with leaving it to the arch for now, and then unifying
> the locking later, if we find that all archs call seq_lock(xtime_lock).

Why do you need the xtime_lock to reprogram the clock comparator (=local
APIC timer) for the next timer interrupt? Neither xtime nor jiffies are
needed to reprogram the timer.

> > > extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
> > >  /* so do we need this?
> > >  	Maybe it can just be static to dyntick.c and all the callable
> > > 	functions will call-down to the structure members? */
> > > extern struct dyntick_timer *current_dyntick_timer;
> > 
> > I don't think this can be static - since the low-level arch-code
> > will need access to, for example, 'recover_time'/'handler' 
> > and 'enter/exit_all_cpus_idle' routines?
> 
> Ah yes, you are probably right...

Again the question who-calls-what. 

> > > extern struct tick_source * __init arch_select_tick_source(void);
> > > /* calls select_tick_source(), then calls tick_source_register() */
> > > extern void __init dyn_tick_init(void);
> > 
> > Hmm ..I think just tick_source_register is sufficient ..we can do let
> > the arch-code select what tick source it wants and call register with
> > the selected source ..
> 
> Ok, that is fine by me.
> 
> > From a point of getting this reviewed by arch-maintainers, I think it
> > will help if a new version of this interface is posted and point out
> > how the existing s390/ARM interfaces will be affected. I could help
> > out if you are busy.
> 
> That would be great. I will try to get your changes merged into what I
> already have pending for dyn-tick.h to make sure everyone is still in
> agreement.

Yes, that would be good. And I will happily review and test the changes
for s390.

-- 
blue skies,
   Martin

Martin Schwidefsky
Linux for zSeries Development & Services
IBM Deutschland Entwicklung GmbH



^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-22 13:38                                       ` Martin Schwidefsky
@ 2005-09-22 14:52                                         ` Nishanth Aravamudan
  2005-09-22 18:32                                           ` Srivatsa Vaddagiri
  2005-09-23  6:55                                         ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 96+ messages in thread
From: Nishanth Aravamudan @ 2005-09-22 14:52 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Srivatsa Vaddagiri, Tony Lindgren, Con Kolivas, Russell King,
	linux-kernel, akpm, ck list

On 22.09.2005 [15:38:10 +0200], Martin Schwidefsky wrote:
> On Tue, 2005-09-20 at 07:58 -0700, Nishanth Aravamudan wrote:
> 
> Hi folks,
> finally found some time to catch up on the dynticks discussion. Quite
> lengthy already..
> 
> > > On Thu, Sep 08, 2005 at 03:08:54PM -0700, Nishanth Aravamudan wrote:
> > > > struct dyntick_timer {
> > > > 	unsigned int state;
> > > > 	unsigned long max_skip;
> > > > 	unsigned long min_skip;
> > > > 	int (*init) (void);
> > > > 	void (*enable_dyn_tick) (void);
> > > > 	void (*disable_dyn_tick) (void);
> > > > 	unsigned long (*reprogram) (unsigned long); /* return number of ticks skipped */
> > > > 	unsigned long (*recover_time) (int, void *, struct pt_regs *); /* handler in arm */
> > > > 	/* following empty in UP */
> > > > 	void (*enter_all_cpus_idle) (int);
> > > > 	void (*exit_all_cpus_idle) (int);
> > > > 	spinlock_t lock;
> > > > };
> > > 
> > > The usage of 'lock' probably needs to be made clear. I intended it to
> > > be used for mainly serializing enter/exit_all_cpus_idle routines.
> > 
> > Yes, now that we are somewhat settled on the structure, I will add
> > comments to the structure in dyn-tick.h and send out patches. Sorry,
> > I've been swamped with other tasks lately...
> 
> As I first saw the dyntick_timer structure I thought: "why does it have
> to be that complicated?". Must be because of the requirements for
>  high-res-timers, since it's overkill for no-idle-hz.

It has become complicated :) HRT does not (yet) use any of this
functionality for their work; Thomas is interested in perhaps doing so
with ktimers, but that is an eventuality, not a guarantee.

Let me try and give an overview of what is in here. Keep in mind that,
for now, we are only aiming for NO_IDLE_HZ in an arch-generic, clean
fashion (not VST, therefore). And, it is certainly reasonable for
certain members to be NULL depending on the arch (like the idle entry
and exit points).

- state: contains the current state of the dynamic-tick subsystem
  (enabled, etc.)
- max_skip, min_skip: the h/w may only be able to skip so far into the
  future (32-bit issues, or other random bit-values), so we need to
  constrain the dynamic-tick subsystem. Initialization code should set
  this value appropriately (on x86 it would depend on what interrupt
  source is eventually selected, e.g.). min_skip exists because it might
  make sense not to go through the hassle of reprogramming when the time
  necessary to reprogram the interrupt is about the same as how long we
  would be disabling the interrupt for.
- init(): arch-dependent initialization routine
- {enable,disable}_dyn_tick(): sysfs hooks so that userspace can disable
  the dynamic-tick subsystem (useful for debugging, mostly?)
- reprogram(): actually interact with whatever arch-dependent underlying
  h/w controls the timer interrupt frequency.
- recover_time(): since we skip ticks, we *must* recover time. Now, one
  option is simply to hook into the normal interrupt path, and I am
  talking with John about the best approach for that (e.g., just have an
  interrupt happen, and *in theory* the time/timer subsystem should be
  capable of recovering from "missed" or "lost" ticks), but for now we
  have a manual callback to catch up.
- {enter,exit}_all_cpus_idle(): some architectures (ahem, x86) may need
  to do additional work, if *all* CPUs go idle (disable the PIT, e.g.).
  This may also be a place for eventual PM code to catch the all going
  idle event and put the processor(s) into a sleep/low-power state.
- lock: guarantee atomicity of the enter/exit idle routines. we only
  want one CPU to be the "last" CPU going idle :)

> > > Considering that not all architectures have such routines, then the
> > > use of spinlock can be entirely within the arch code. Maybe the
> > > 'enter' routine is invoked as part of 'reprogram' routine (when the
> > > last CPU goes down) and 'exit' routine is invoked as part of
> > > dyn_tick_interrupt() (when coming out of all_idle_state), both being
> > > serialized using the spinlock?
> > 
> > I think I suggested this at one point or another :)
> > 
> > The usage I envisioned is
> > 
> > dyntick_timer_reprogram():
> > 
> > current_dyntick_timer->reprogram();
> > if (cpus_full(noidlehz_mask))
> > 	current_dyntick_timer->enter_all_cpus_idle(); // which will lock
> > 					// with
> > 					// current_dyntick_timer->lock,
> > 					// if necessary
> > 
> > dyn_tick_interrupt():
> > 
> > if (cpus_full(noidlehz_mask)) {
> > 	cpu_unset(cpu, noidlehz_mask);
> > 	current_dyntick_timer->exit_all_cpus_idle(); // which will lock
> > 					// with
> > 					// current_dyntick_timer->lock,
> > 					// if necessary
> 
> I would really like to see how all the fields from the dyntick_timer
> structure are supposed to be used. Especially the who-calls-what graph,
> if I got it right then the low-level arch code calls common code
> functions which in turn call functions from the dyntick_timer structure.

Yes, I will try and write this up for you and everyone else. In theory,
since we are only dealing with NO_IDLE_HZ right now, what should happen
is the following (very roughly):

during boot, we of course will initialize the subsystem, which will call
current_dyntick_timer->init().

in cpu_idle() [only?], the arch-dependent idle routine, we call
reprogram_dyntick(), which checks if tick skipping is enabled, finds out
when next_timer_interrupt() thinks the next timer is going to expire is,
and then calls into current_dyntick_timer->reprogram() with the delta
value. We also would check here to see if we are the last CPU to go idle
(see above) and hook into current_dyntick_timer->enter_all_cpus_idle()
if so.

Then, on the interrupt, we would see if we are the first CPU coming back
from idle, and hook into current_dyntick_timer->exit_all_cpus_idle() if
so. We then call current_dyntick_timer->recover_time() to make sure our
time subsystem catches up after missing X ticks. Then we execute the
common code-path for the timer interrupt.

> The question is what should be the connecting points between the arch
> code and the common timer code? With the current code its
> * do_timer()
> * update_process_times()
> * next_timer_event()

So I guess our interaction points would include those three and
cpu_idle(). By next_timer_event() did you mean next_timer_interrupt()?

> and the non-trivial interactions with rcu via
> * test/set/clear bit on nohz_cpu_mask
> * rcu_pending() and friends.

Yes, these are definitely the more complicated interactions.

> > > Another interesting point is that I expected recover_time to be
> > > invoked only as part of 'exit_all_cpus_idle', but s390 seems to 
> > > have unconditional call to account_ticks (for recovering time) on
> > > any CPU that wakes up. I guess it will be a no-op if other CPUs
> > > were active.
> > 
> > Maybe that is just paranoia? In theory, if nothing has happened, then
> > accounting should not need to do anything; but I'm not sure, I'll add
> > this to my "code to look at" list ;)
> 
> After a cpu woke up some code need to check if a tick has passed or not.
> For s390 this is done in the start_hz_timer function called from the
> idle notifier. Even if "nothing" has happened it's not just paranoia,
> account_ticks sets up the clock comparator for the next tick, updates
> xtime if it is necessary and account_user_vtime()/update_process_times()
> for cpu time accounting. It is the exact same function that is used for
> the regular tick interrupts, its just called from a different context.

You are right, and in fact, I mis-wrote when I replied to Vatsa. I think
this is how all archs should treat recover_time(). But doesn't the timer
interrupt already do much of this? I mean, if we were to allow the next
interrupt to occur as usual (maybe forced to be called from our common
dyntick_interrupt()), should time not get caught up that way?

> The reason that xtime and process ticks are done in a single function is
> that s390 doesn't have the distinction between wall-clock timer and
> local APIC timer. Each s390 cpu has something called the clock
> comparator that each cpu can set up individually. Whenever the value of
> the time-of-day (TOD) clock is bigger then the clock comparator the cpu
> gets an external interrupt if the cpu is enabled for the interrupt. This
> timer interrupt source is used for both the xtime updates and the
> process ticks. So there won't be any exit_all_cpus_idle special handling
> for s390.

Ah yes, that's why we are trying to figure out what needs to go into
arch code and what doesn't ;)


> > > We probably also need to document how 'reprogram' will be invoked 
> > > - with xtime_lock held or not. Again s390 does not seem to require it
> > > while ARM is using one. I think we should let the arch code take
> > > xtime_lock if they deem it necessary.
> > 
> > That seems buggy. I'm guessing they need the xtime_lock there just as
> > much as ARM and x86 will. In fact, I'm pretty sure all archs will need
> > it. But I'm fine with leaving it to the arch for now, and then unifying
> > the locking later, if we find that all archs call seq_lock(xtime_lock).
> 
> Why do you need the xtime_lock to reprogram the clock comparator (=local
> APIC timer) for the next timer interrupt? Neither xtime nor jiffies are
> needed to reprogram the timer.

Hrm, you might be right. I might have mis-typed again. At the point
where we are ready to hook into the underlying arch-dependent
current_dyntick_timer->reprogram() routine, we should know the delta
according to next_timer_interrupt(), so we should not need to query
jiffies or xtime anymore...

> > > > extern void dyntick_timer_register(struct dyntick_timer *new_dyntick_timer);
> > > >  /* so do we need this?
> > > >  	Maybe it can just be static to dyntick.c and all the callable
> > > > 	functions will call-down to the structure members? */
> > > > extern struct dyntick_timer *current_dyntick_timer;
> > > 
> > > I don't think this can be static - since the low-level arch-code
> > > will need access to, for example, 'recover_time'/'handler' 
> > > and 'enter/exit_all_cpus_idle' routines?
> > 
> > Ah yes, you are probably right...
> 
> Again the question who-calls-what. 

This is what I get for replying on little sleep. Vatsa, what arch-code
should need access to current_dyntick_timer->recover_time() or the
current_dyntick_timer->{enter,exit}_all_cpus_idle() routines? Ah...maybe
the problem is I removed the generic dyntick_interrupt()? If we have
that function again, we cann call current_dyntick_timer->recover_time()
from there as well as current_dyntick_timer->exit_all_cpus_idle().
current_dyntick_timer->enter_all_cpus_idle() should only need to be
called from reprogram_dyntick().

> > > > extern struct tick_source * __init arch_select_tick_source(void);
> > > > /* calls select_tick_source(), then calls tick_source_register() */
> > > > extern void __init dyn_tick_init(void);
> > > 
> > > Hmm ..I think just tick_source_register is sufficient ..we can do let
> > > the arch-code select what tick source it wants and call register with
> > > the selected source ..
> > 
> > Ok, that is fine by me.
> > 
> > > From a point of getting this reviewed by arch-maintainers, I think it
> > > will help if a new version of this interface is posted and point out
> > > how the existing s390/ARM interfaces will be affected. I could help
> > > out if you are busy.
> > 
> > That would be great. I will try to get your changes merged into what I
> > already have pending for dyn-tick.h to make sure everyone is still in
> > agreement.
> 
> Yes, that would be good. And I will happily review and test the changes
> for s390.

Great! That's good news. Thank you for reviewing what we've gotten
cobbled together so far.

Thanks,
Nish

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-22 14:52                                         ` Nishanth Aravamudan
@ 2005-09-22 18:32                                           ` Srivatsa Vaddagiri
  2005-09-26 15:08                                             ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-22 18:32 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Martin Schwidefsky, Tony Lindgren, Con Kolivas, Russell King,
	linux-kernel, akpm, ck list

[-- Attachment #1: Type: text/plain, Size: 10565 bytes --]

On Thu, Sep 22, 2005 at 07:52:22AM -0700, Nishanth Aravamudan wrote:

> - recover_time(): since we skip ticks, we *must* recover time. Now, one
>   option is simply to hook into the normal interrupt path, and I am
>   talking with John about the best approach for that (e.g., just have an
>   interrupt happen, and *in theory* the time/timer subsystem should be
>   capable of recovering from "missed" or "lost" ticks), but for now we
>   have a manual callback to catch up.

It's not just xtime but jiffies also needs to be recovered. My understanding
of John's TOD is that it keeps up time and not jiffies. In which case
we still need to recover jiffies? 

Also, I think "recover_time" on s390 may do more than recovering time (as 
per Martin's note). So we should probably change the name - "handler" as in
ARM :)?

> > > Considering that not all architectures have such routines, then the
> > > use of spinlock can be entirely within the arch code. Maybe the

I eat my words. I tried embedding the spinlock entirely inside the arch-code
and find it non-trivial and very racy, as illustrated below:

[CPU0 going down]

> > current_dyntick_timer->reprogram();
> > if (cpus_full(noidlehz_mask))

	Lets say that this check was true (becasue CPU0 and CPU1 both
	were idle). But before we grab the spinlock, CPU1 wakes up
	(in the code below), also notices that all_were_idle,
	gets spinlock and calls exit_all_cpus_idle. After CPU1 releases
	the spinlock, CPU0 now gets the spinlock and calls
	enter_all_cpus_idle, which is _wrong_.
	
> >     current_dyntick_timer->enter_all_cpus_idle(); // which will lock
> >                                     // with
> >                                     // current_dyntick_timer->lock,
> >                                     // if necessary


[CPU1 coming up]

> >
> > dyn_tick_interrupt():
> >
> > if (cpus_full(noidlehz_mask)) {
> >     cpu_unset(cpu, noidlehz_mask);
> >     current_dyntick_timer->exit_all_cpus_idle(); // which will lock
> >                                     // with
> >                                     // current_dyntick_timer->lock,
> >                                     // if necessary
>


This is lot simpler if we use the lock to set/clear the bitmap also.
Question is : what overhead will that introduce on other arches 
(like s390/arm) where it not really required (since they will not 
have enter/exit_all_idle routines). We could define some macro, 
dyn_tick_lock()?, which evaluates to no-op on those arches, but
becomes a spinlock on x86-kind arch'es, something like:

static inline void dyn_tick_lock(void)
{
	spin_lock(&dyn_tick->lock);
}

for x86, and

static inline void dyn_tick_lock(void)
{
}

for s390/ARM.

> > I would really like to see how all the fields from the dyntick_timer
> > structure are supposed to be used. Especially the who-calls-what graph,
> > if I got it right then the low-level arch code calls common code
> > functions which in turn call functions from the dyntick_timer structure.


> > After a cpu woke up some code need to check if a tick has passed or not.
> > For s390 this is done in the start_hz_timer function called from the
> > idle notifier. Even if "nothing" has happened it's not just paranoia,
> > account_ticks sets up the clock comparator for the next tick, updates
> > xtime if it is necessary and account_user_vtime()/update_process_times()
> > for cpu time accounting. It is the exact same function that is used for
> > the regular tick interrupts, its just called from a different context.
> 
> You are right, and in fact, I mis-wrote when I replied to Vatsa. I think
> this is how all archs should treat recover_time(). But doesn't the timer
> interrupt already do much of this? I mean, if we were to allow the next
> interrupt to occur as usual (maybe forced to be called from our common
> dyntick_interrupt()), should time not get caught up that way?

If you are referring to calling the actual timer interrupt handler "as is" from
dyntick_interrupt(), then I feel uncomfortable about it, primary because
of some hardware interactions timer interrupt handler may have assuming that 
it is invoked due to a timer interrupt.

> > > > We probably also need to document how 'reprogram' will be invoked 
> > > > - with xtime_lock held or not. Again s390 does not seem to require it
> > > > while ARM is using one. I think we should let the arch code take
> > > > xtime_lock if they deem it necessary.
> > > 
> > > That seems buggy. I'm guessing they need the xtime_lock there just as
> > > much as ARM and x86 will. In fact, I'm pretty sure all archs will need
> > > it. But I'm fine with leaving it to the arch for now, and then unifying
> > > the locking later, if we find that all archs call seq_lock(xtime_lock).
> > 
> > Why do you need the xtime_lock to reprogram the clock comparator (=local
> > APIC timer) for the next timer interrupt? Neither xtime nor jiffies are
> > needed to reprogram the timer.
> 
> Hrm, you might be right. I might have mis-typed again. At the point
> where we are ready to hook into the underlying arch-dependent
> current_dyntick_timer->reprogram() routine, we should know the delta
> according to next_timer_interrupt(), so we should not need to query
> jiffies or xtime anymore...

I feel this is a bit tricky on non-comparator based interrupt sources like
a decrementer on PPC64 or the local APIC timer.


	cpu_idle()
	  | 
	  | (IRQs disabled)
	  V
	unsigned int dyn_tick_reprogram_timer(void)
	{
		int cpu = smp_processor_id();
		unsigned int delta;

		cpu_set(cpu, nohz_cpu_mask);

		smp_wmb();

		if (rcu_pending(cpu) || local_softirq_pending()) {
			cpu_clear(cpu, nohz_cpu_mask);
			return 0;
		}

a.		delta = next_timer_interrupt() - jiffies;

		if (delta < dyn_tick->min_skip) {
			cpu_clear(cpu, nohz_cpu_mask);
			return 0;
		}

		if (delta > dyn_tick->max_skip)
			delta = dyn_tick->max_skip;

b.		dyn_tick->reprogram(delta);

		return delta;
	}

				
If we pass just the number of ticks to skip to 'reprogram' then clearly it
is racy with respect to a changing jiffy. For example, lets say that:

At point a) jiffies = 100, next_timer_interrupt = 105. So we pass count 5 to
reprogram.

However by the time we reach point b) jiffies changes to 101. However since
only relative number was passed, reprogram code will cause the CPU to wake up
at 106.

We could consider passing absolute value to 'reprogram' (say 105), like below:

	unsigned int dyn_tick_reprogram_timer(void)
	{
		int cpu = smp_processor_id();
		unsigned long next, delta, seq;

		cpu_set(cpu, nohz_cpu_mask);

		smp_wmb();

		if (rcu_pending(cpu) || local_softirq_pending()) {
			cpu_clear(cpu, nohz_cpu_mask);
			return 0;
		}

		do { 
			read_seqbegin(&xtime_lock);
	
			next = next_timer_interrupt();
			delta = next - jiffies;

			if (delta < dyn_tick->min_skip) {
				cpu_clear(cpu, nohz_cpu_mask);
				return 0;
			}

			if (delta > dyn_tick->max_skip)
				next = jiffies + dyn_tick->max_skip;

		} while (read_seqretry(&xtime_lock, seq));

		dyn_tick->reprogram(next);

		return delta;
	}
	

Since reprogram has to convert it back to some relative number, it will need
to reference jiffy, which makes it racy and require the read_seqbegin/retry
based conversion to relative number.  I feel it is lot cleaner in such
a case to just take a write_lock(&xtime_lock) for the whole of 
dyn_tick_reprogram_timer.

> > Again the question who-calls-what. 
> 
> This is what I get for replying on little sleep. Vatsa, what arch-code
> should need access to current_dyntick_timer->recover_time() or the
> current_dyntick_timer->{enter,exit}_all_cpus_idle() routines? Ah...maybe
> the problem is I removed the generic dyntick_interrupt()? If we have
> that function again, we cann call current_dyntick_timer->recover_time()
> from there as well as current_dyntick_timer->exit_all_cpus_idle().
> current_dyntick_timer->enter_all_cpus_idle() should only need to be
> called from reprogram_dyntick().


Here's what I had in mind for the entire call-flow. This is based on the
interface attached in the mail.

At bootup:

	a. dyn_tick_timer structure is initalized and registered 
	   (dyn_tick_register). dyn_tick_register will set the
	   global dyn_tick pointer to the passed structure.
	   (See kernel/dyn-tick.c attached)

	b. Somewhere down the line, arch code calls dyn_tick_enable() 
	   to enable skipping ticks.  This should be called only after
	   initializing various h/w resources that will be reprogrammed
	   when we call dyn_tick->reprogram(). 

	   Typically both a. and b. are done during device_initcall time.

	c. init_dyn_tick_sysfs() creates "/sys/devices/system/dyn_tick/dyn_tick0/enable" file if dyn_tick_register has been called by know (i.e dyn_tick pointer
 	   is non-NULL).


Entering tickless state:

	a. cpu_idle calls dyn_tick_reprogram_timer() with IRQs disabled.

	b. dyn_tick_reprogram_timer finds out when the next timer is and
	   whether is atleast min_skip away. If so, it calls
	   dyn_tick->reprogram(). In the interface that I have attached,
	   this is called with write_lock held on xtime_lock and is
	   passed the relative value of number of ticks to be skipped.
	   Also on x86-arch, it is called with dyn_tick->lock held.

	c. dyn_tick->reprogram will arrange for that many ticks to be
	   skipped. In addition, on x86 like platforms, it can do
	   step d.

	d. if (cpus_equal(nohz_cpu_mask, cpu_online_map))
		dyn_tick->enter_all_cpus_idle();


Exiting tickless state:

	a. H/w interrupt comes in. dyn_tick_interrupt() is called 
	   as one of the first steps. dyn_tick_interrupt() is completely
	   defined in arch-code and does what it wants. As an example,
	   it can do this:
	   
		b. dyn_tick_lock(); 	/* spin_lock(&dyn_tick->lock) on x86 */

		   if (cpus_equal(nohz_cpu_mask, cpu_online_map))
			all_were_idle = 1;
		   else
			all_were_idle = 0;
	
		   cpu_clear(cpu, nohz_cpu_mask);
	
		   if (all_were_idle)
			dyn_tick->exit_all_cpus_idle();

		   dyn_tick_unlock(); 	/* spin_unlock(&dyn_tick->lock) on x86 */
	
		c. dyn_tick->recover_time(); 
			This can recover jiffies/xtime and also setup the
			next timer. On s390 this can be account_ticks() for
			example. Maybe we should call this dyn_tick->handler?


	d. dyn_tick_interrupt() returns so that rest of interrupt processing
	   can occur.


Note that a-d are completely inside arch-code.
		

I have attached include/linux/dyn-tick.h & kernel/dyn-tick.c as
detailed reference of the interface.


-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

[-- Attachment #2: dyn-tick.patch --]
[-- Type: text/plain, Size: 7339 bytes --]

---

 linux-2.6.14-rc1-vatsa/include/linux/dyn-tick.h |   84 +++++++++++
 linux-2.6.14-rc1-vatsa/kernel/dyn-tick.c        |  182 ++++++++++++++++++++++++
 2 files changed, 266 insertions(+)

diff -puN /dev/null kernel/dyn-tick.c
--- /dev/null	2003-01-30 15:54:37.000000000 +0530
+++ linux-2.6.14-rc1-vatsa/kernel/dyn-tick.c	2005-09-22 23:58:51.000000000 +0530
@@ -0,0 +1,182 @@
+/*
+ * linux/kernel/dyn-tick.c
+ *
+ * Beginnings of generic dynamic tick timer support
+ *
+ * Copyright (C) 2004 Nokia Corporation
+ * Written by Tony Lindgen <tony@atomide.com> and
+ * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/version.h>
+#include <linux/config.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sysdev.h>
+#include <linux/interrupt.h>
+#include <linux/cpumask.h>
+#include <linux/pm.h>
+#include <linux/dyn-tick.h>
+#include <linux/rcupdate.h>
+
+#define DYN_TICK_VERSION	"050610-1"
+
+struct dyn_tick_state *dyn_tick;
+
+/*
+ * Arch independent code needed to reprogram next timer interrupt.
+ * Gets called, with IRQs disabled, from cpu_idle() before entering idle loop.
+ */
+unsigned int dyn_tick_reprogram_timer(void)
+{
+	int cpu = smp_processor_id();
+	unsigned int delta;
+
+	if (!dyn_tick_enabled())
+		return 0;
+
+	/* This is defined in asm/dyn-tick.h */
+	dyn_tick_lock();
+
+	/* Check if we can start skipping ticks */
+	write_seqlock(&xtime_lock);
+
+
+	cpu_set(cpu, nohz_cpu_mask);
+
+	smp_wmb();
+
+	if (rcu_pending(cpu) || local_softirq_pending()) {
+		cpu_clear(cpu, nohz_cpu_mask);
+		return 0;
+	}
+
+	delta = next_timer_interrupt() - jiffies;
+
+	if (delta < dyn_tick->min_skip) {
+		cpu_clear(cpu, nohz_cpu_mask);
+		return 0;
+	}
+
+	if (delta > dyn_tick->max_skip)
+		delta = dyn_tick->max_skip;
+
+	dyn_tick->reprogram(delta);
+
+	write_sequnlock(&xtime_lock);
+
+	dyn_tick_unlock();
+
+	return delta;
+}
+
+int dyn_tick_enable(void)
+{
+        unsigned long flags;
+        int ret = -ENODEV;
+
+        if (dyn_tick) {
+                write_seqlock_irqsave(&xtime_lock, flags);
+                ret = 0;
+                if (!dyn_tick_enabled())
+                        ret = dyn_tick->enable();
+
+                        if (ret == 0)
+                                dyn_tick->state |= DYN_TICK_ENABLED;
+                }
+                write_sequnlock_irqrestore(&xtime_lock, flags);
+        }
+
+        return ret;
+}
+
+int dyn_tick_disable(void)
+{
+        unsigned long flags;
+        int ret = -ENODEV;
+
+        if (dyn_tick) {
+                write_seqlock_irqsave(&xtime_lock, flags);
+                ret = 0;
+                if (dyn_tick_enabled())
+                        ret = dyn_tick->disable();
+
+                        if (ret == 0)
+                                dyn_tick->state &= ~DYN_TICK_ENABLED;
+                }
+                write_sequnlock_irqrestore(&xtime_lock, flags);
+        }
+
+        return ret;
+}
+
+
+void dyn_tick_register(struct dyn_tick_timer *arch_timer)
+{
+	dyn_tick = arch_timer;
+	printk(KERN_INFO "dyn-tick: Registering dynamic tick timer v%s\n",
+	       DYN_TICK_VERSION);
+}
+
+/*
+ * ---------------------------------------------------------------------------
+ * Sysfs interface
+ * ---------------------------------------------------------------------------
+ */
+
+extern struct sys_device device_timer;
+
+static ssize_t show_dyn_tick_enable(struct sys_device *dev, char *buf)
+{
+	return sprintf(buf, "enabled:\t%i\n", dyn_tick_enabled());
+}
+
+static ssize_t set_dyn_tick_enable(struct sys_device *dev, const char *buf,
+				   size_t count)
+{
+	unsigned long flags;
+	unsigned int enable = simple_strtoul(buf, NULL, 2);
+
+	if (enable)
+		dyn_tick_enable();
+	else
+		dyn_tick_disable();
+
+	return count;
+}
+
+static SYSDEV_ATTR(enable, 0644, show_dyn_tick_enable,
+		   set_dyn_tick_enable);
+
+static struct sysdev_class dyn_tick_sysclass = {
+	set_kset_name("dyn_tick"),
+};
+
+static struct sys_device device_dyn_tick = {
+	.id = 0,
+	.cls = &dyn_tick_sysclass,
+};
+
+static int init_dyn_tick_sysfs(void)
+{
+	int error = 0;
+
+	if (!dyn_tick)
+		goto out;
+
+	if ((error = sysdev_class_register(&dyn_tick_sysclass)))
+		goto out;
+	if ((error = sysdev_register(&device_dyn_tick)))
+		goto out;
+	error = sysdev_create_file(&device_dyn_tick, &attr_enable);
+
+out:
+	return error;
+}
+
+late_initcall(init_dyn_tick_sysfs);
diff -puN /dev/null include/linux/dyn-tick.h
--- /dev/null	2003-01-30 15:54:37.000000000 +0530
+++ linux-2.6.14-rc1-vatsa/include/linux/dyn-tick.h	2005-09-22 23:59:46.000000000 +0530
@@ -0,0 +1,84 @@
+/*
+ * linux/include/linux/dyn-tick.h
+ *
+ * Copyright (C) 2004 Nokia Corporation
+ * Written by Tony Lindgen <tony@atomide.com> and
+ * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _DYN_TICK_TIMER_H
+#define _DYN_TICK_TIMER_H
+
+#include <linux/interrupt.h>
+#include <asm/timer.h>
+
+#define DYN_TICK_ENABLED	(1 << 1)
+
+/*
+ * Abstraction of a dynamic tick source:
+ *
+ * @state	     	: current state (bitfield)
+ * @max_skip	     	: maximum number of ticks to skip
+ * @min_skip		: minimum number of ticks to skip
+ * @enable_dyn_tick  	: called via sysfs to enable interrupt skipping
+ * @disable_dyn_tick 	: called via sysfs to disable interrupt skipping
+ * @reprogram	     	: reprogram the interrupt source to skip ticks.
+	               	  Passed with one argument - the number of ticks
+			  to skip. Called with IRQs disabled and with
+			  write_xtime_lock held.
+ * @recover_time 	: handler to recover time. Called when coming out of
+			  tickless state by each CPU.
+ * @enter_all_cpus_idle	: last cpu to go idle can call this. Typically
+			  invoked from reprogram routine.
+ * @exit_all_cpus_idle	: called when coming out of all_cpus_idle state
+ * @lock		: Used to serialize enter/exit routines
+ * 			  with modifications to nohz_cpu_mask.
+ */
+
+struct dyn_tick_timer {
+	unsigned int state;
+	unsigned long max_skip;
+	unsigned long min_skip;
+	void (*enable) (void);
+	void (*disable) (void);
+	unsigned long (*reprogram) (unsigned long);
+	unsigned long (*recover_time) (int, void *, struct pt_regs *);
+	void (*enter_all_cpus_idle) (int);
+	void (*exit_all_cpus_idle) (int);
+	spinlock_t lock;
+};
+
+extern struct dyn_tick_timer *dyn_tick;
+
+extern void dyn_tick_timer_register(struct dyn_tick_timer *new_dyntick_timer);
+extern int dyn_tick_enable(void);
+extern int dyn_tick_disable(void);
+
+#ifdef CONFIG_NO_IDLE_HZ
+extern unsigned int dyn_tick_reprogram_timer(void);
+
+static inline int dyn_tick_enabled(void)
+{
+	return (dyn_tick->state & DYN_TICK_ENABLED);
+}
+
+#else	/* CONFIG_NO_IDLE_HZ */
+static inline unsigned int dyn_tick_reprogram_timer(void)
+{
+	return 0;
+}
+
+static inline int dyn_tick_enabled(void)
+{
+	return 0;
+}
+#endif	/* CONFIG_NO_IDLE_HZ */
+
+/* Pick up arch specific header */
+#include <asm/dyn-tick.h>
+
+#endif	/* _DYN_TICK_TIMER_H */

_

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-22 13:38                                       ` Martin Schwidefsky
  2005-09-22 14:52                                         ` Nishanth Aravamudan
@ 2005-09-23  6:55                                         ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-23  6:55 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Nishanth Aravamudan, Tony Lindgren, Con Kolivas, Russell King,
	linux-kernel, akpm, ck list

On Thu, Sep 22, 2005 at 03:38:10PM +0200, Martin Schwidefsky wrote:
> As I first saw the dyntick_timer structure I thought: "why does it have
> to be that complicated?". Must be because of the requirements for
> high-res-timers, since it's overkill for no-idle-hz.

I think most of the complication is because of the different needs of
various architectures. But if you think the structure can be cut down
further, that would be good to consider.

> I would really like to see how all the fields from the dyntick_timer
> structure are supposed to be used. Especially the who-calls-what graph,
> if I got it right then the low-level arch code calls common code
> functions which in turn call functions from the dyntick_timer structure.
> The question is what should be the connecting points between the arch
> code and the common timer code? With the current code its

> * do_timer()
> * update_process_times()
> * next_timer_event()
> and the non-trivial interactions with rcu via
> * test/set/clear bit on nohz_cpu_mask
> * rcu_pending() and friends.

I think with dyn-tick, next_timer_event is replaced by 
dyn_tick_reprogram_timer().  We should also add add_timer_on() to the
list.



-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [PATCH 1/3] dynticks - implement no idle hz for x86
  2005-09-22 18:32                                           ` Srivatsa Vaddagiri
@ 2005-09-26 15:08                                             ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 96+ messages in thread
From: Srivatsa Vaddagiri @ 2005-09-26 15:08 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: Martin Schwidefsky, Tony Lindgren, Con Kolivas, Russell King,
	linux-kernel, akpm, ck list

On Fri, Sep 23, 2005 at 12:02:15AM +0530, Srivatsa Vaddagiri wrote:
> I feel this is a bit tricky on non-comparator based interrupt sources like
> a decrementer on PPC64 or the local APIC timer.

[snip]

> We could consider passing absolute value to 'reprogram' (say 105), like below:
> 
> 	unsigned int dyn_tick_reprogram_timer(void)
> 	{
> 		int cpu = smp_processor_id();
> 		unsigned long next, delta, seq;
> 
> 		cpu_set(cpu, nohz_cpu_mask);
> 
> 		smp_wmb();
> 
> 		if (rcu_pending(cpu) || local_softirq_pending()) {
> 			cpu_clear(cpu, nohz_cpu_mask);
> 			return 0;
> 		}
> 
> 		do { 
> 			read_seqbegin(&xtime_lock);
> 	
> 			next = next_timer_interrupt();
> 			delta = next - jiffies;
> 
> 			if (delta < dyn_tick->min_skip) {
> 				cpu_clear(cpu, nohz_cpu_mask);
> 				return 0;
> 			}
> 
> 			if (delta > dyn_tick->max_skip)
> 				next = jiffies + dyn_tick->max_skip;
> 
> 		} while (read_seqretry(&xtime_lock, seq));
> 
> 		dyn_tick->reprogram(next);
> 
> 		return delta;
> 	}
> 	
> 
> Since reprogram has to convert it back to some relative number, it will need
> to reference jiffy, which makes it racy and require the read_seqbegin/retry
> based conversion to relative number.  I feel it is lot cleaner in such
> a case to just take a write_lock(&xtime_lock) for the whole of 
> dyn_tick_reprogram_timer.

OTOH, write_seqlock is probably more heavier compared to read_seqlock. So 
I am OK if we want to call 'reprogram' w/o any xtime_lock held and that
routine internally uses a read_seqlock if it wants.

Let me know what you guys think about this and the rest of the interface.
If it seems Ok, I can post modified i386 patch based on this interface and
would request Martin/Tony to do the S390/ARM ports.


-- 


Thanks and Regards,
Srivatsa Vaddagiri,
Linux Technology Center,
IBM Software Labs,
Bangalore, INDIA - 560017

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2005-09-26 15:09 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-31 16:58 Updated dynamic tick patches Srivatsa Vaddagiri
2005-08-31 17:12 ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Srivatsa Vaddagiri
2005-08-31 22:36   ` Zachary Amsden
2005-08-31 22:47     ` john stultz
2005-09-02 15:43   ` [PATCH 1/3] dynticks - implement no idle hz for x86 Con Kolivas
2005-09-02 15:45     ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Con Kolivas
2005-09-02 15:46       ` [PATCH 3/3] dyntick - Recover walltime upon wakeup Con Kolivas
2005-09-02 17:25       ` [PATCH 2/3] dyntick - Fix lost tick calculation in timer pm.c Srivatsa Vaddagiri
2005-09-02 20:18         ` Thomas Schlichter
2005-09-02 21:21           ` john stultz
2005-09-02 16:56     ` [PATCH 1/3] dynticks - implement no idle hz for x86 Russell King
2005-09-02 17:12       ` Srivatsa Vaddagiri
2005-09-03  6:13       ` Con Kolivas
2005-09-03  7:58         ` Russell King
2005-09-03  8:01           ` Con Kolivas
2005-09-03  8:06             ` Russell King
2005-09-03  8:14               ` Con Kolivas
2005-09-04 20:10                 ` Nishanth Aravamudan
2005-09-04 20:26                   ` Russell King
2005-09-04 20:37                     ` Nishanth Aravamudan
2005-09-04 21:17                       ` Russell King
2005-09-05  3:08                       ` Con Kolivas
2005-09-05 16:28                         ` Nishanth Aravamudan
2005-09-05  6:58                       ` Tony Lindgren
2005-09-05 16:30                         ` Nishanth Aravamudan
2005-09-04 20:41                     ` Nishanth Aravamudan
2005-09-05  5:32                     ` Srivatsa Vaddagiri
2005-09-05  5:48                       ` Nishanth Aravamudan
2005-09-05  6:32                         ` Srivatsa Vaddagiri
2005-09-05  6:44                           ` Nishanth Aravamudan
2005-09-06 20:51                             ` Nishanth Aravamudan
2005-09-07  8:13                               ` Tony Lindgren
2005-09-07 15:00                                 ` Nishanth Aravamudan
2005-09-07 15:53                                 ` Nishanth Aravamudan
2005-09-07 17:07                                   ` Srivatsa Vaddagiri
2005-09-07 17:23                                     ` Nishanth Aravamudan
2005-09-07 18:14                                       ` Srivatsa Vaddagiri
2005-09-07 18:22                                         ` Nishanth Aravamudan
2005-09-07 16:14                           ` Bill Davidsen
2005-09-07 16:42                             ` Nish Aravamudan
2005-09-07 17:17                               ` Srivatsa Vaddagiri
2005-09-07 17:27                                 ` Nish Aravamudan
2005-09-07 18:18                                   ` Srivatsa Vaddagiri
2005-09-07 18:33                                     ` Nish Aravamudan
2005-09-09 16:27                                 ` Bill Davidsen
2005-09-05  7:37                       ` Russell King
2005-09-05  7:49                         ` Srivatsa Vaddagiri
2005-09-05  8:00                           ` Russell King
2005-09-05 16:33                             ` Nishanth Aravamudan
2005-09-05  7:00                   ` Srivatsa Vaddagiri
2005-09-05  7:27                     ` Tony Lindgren
2005-09-05 17:02                       ` Nishanth Aravamudan
2005-09-07  7:37                         ` Tony Lindgren
2005-09-07 15:05                           ` Nishanth Aravamudan
2005-09-08 10:00                             ` Tony Lindgren
2005-09-08 21:22                               ` Nishanth Aravamudan
2005-09-08 22:08                                 ` Nishanth Aravamudan
2005-09-09 22:30                                   ` Nishanth Aravamudan
2005-09-20 11:06                                   ` Srivatsa Vaddagiri
2005-09-20 14:58                                     ` Nishanth Aravamudan
2005-09-22 13:38                                       ` Martin Schwidefsky
2005-09-22 14:52                                         ` Nishanth Aravamudan
2005-09-22 18:32                                           ` Srivatsa Vaddagiri
2005-09-26 15:08                                             ` Srivatsa Vaddagiri
2005-09-23  6:55                                         ` Srivatsa Vaddagiri
2005-09-05  7:44                     ` Russell King
2005-09-05  8:19                       ` Srivatsa Vaddagiri
2005-09-05  8:32                         ` Russell King
2005-09-05  9:24                           ` Srivatsa Vaddagiri
2005-09-05 17:06                           ` Nishanth Aravamudan
2005-09-05 17:04                       ` Nishanth Aravamudan
2005-09-05 17:27                         ` Srivatsa Vaddagiri
2005-09-05 18:06                           ` Nishanth Aravamudan
2005-09-05 13:19                     ` Srivatsa Vaddagiri
2005-09-05 16:57                     ` Nishanth Aravamudan
2005-09-05 17:25                       ` Srivatsa Vaddagiri
2005-09-05 18:11                         ` Nishanth Aravamudan
2005-09-03  4:05   ` [PATCH 1/3] Updated dynamic tick patches - Fix lost tick calculation in timer_pm.c Lee Revell
2005-09-03  4:18     ` Peter Williams
2005-09-03  4:34       ` Lee Revell
2005-09-03  4:48         ` Peter Williams
2005-09-03  5:15     ` Parag Warudkar
2005-09-03  5:30       ` Lee Revell
2005-09-03  5:20     ` Srivatsa Vaddagiri
2005-09-06 10:32     ` Pavel Machek
2005-09-06 10:46       ` Srivatsa Vaddagiri
2005-09-06 18:04     ` john stultz
2005-08-31 17:26 ` [PATCH 2/3] Updated dynamic tick patches - Cleanup Srivatsa Vaddagiri
2005-08-31 17:27 ` [PATCH 3/3] Updated dynamic tick patches - Recover walltime upon wakeup Srivatsa Vaddagiri
2005-09-01  5:23 ` Updated dynamic tick patches Con Kolivas
2005-09-01 13:07   ` Tony Lindgren
2005-09-01 13:19     ` David Weinehall
2005-09-01 13:46       ` Tony Lindgren
2005-09-01 14:11     ` Srivatsa Vaddagiri
2005-09-02 17:34     ` Srivatsa Vaddagiri
2005-09-03 10:16       ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).