* RE: [PATCH] Dynamic tick for x86 version 050602-1 @ 2005-06-08 22:14 Pallipadi, Venkatesh 2005-06-09 1:40 ` Tony Lindgren 0 siblings, 1 reply; 20+ messages in thread From: Pallipadi, Venkatesh @ 2005-06-08 22:14 UTC (permalink / raw) To: Jonathan Corbet, Tony Lindgren; +Cc: linux-kernel >-----Original Message----- >From: linux-kernel-owner@vger.kernel.org >[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of >Jonathan Corbet >Sent: Tuesday, June 07, 2005 1:36 PM >To: Tony Lindgren >Cc: linux-kernel@vger.kernel.org >Subject: Re: [PATCH] Dynamic tick for x86 version 050602-1 > >Tony Lindgren <tony@atomide.com> wrote: > >> --- linux-dev.orig/arch/i386/kernel/irq.c 2005-06-01 >17:51:36.000000000 -0700 >> +++ linux-dev/arch/i386/kernel/irq.c 2005-06-01 >17:54:32.000000000 -0700 >> [...] >> @@ -102,6 +103,12 @@ fastcall unsigned int do_IRQ(struct pt_r >> ); >> } else >> #endif >> + >> +#ifdef CONFIG_NO_IDLE_HZ >> + if (dyn_tick->state & (DYN_TICK_ENABLED | >DYN_TICK_SKIPPING) && irq != 0) >> + dyn_tick->interrupt(irq, NULL, regs); >> +#endif >> + >> __do_IRQ(irq, regs); > >Forgive me if I'm being obtuse (again...), but this hunk doesn't look >like it would work well in the 4K stacks case. When 4K stacks >are being >used, dyn_tick->interrupt() will only get called in the nested >interrupt >case, when the interrupt stack is already in use. This change also >pushes the non-assembly __do_IRQ() call out of the else branch, meaning >that, when the switch is made to the interrupt stack (most of >the time), >__do_IRQ() will be called twice for the same interrupt. > >It looks to me like you want to put your #ifdef chunk *after* the call >to __do_IRQ(), unless you have some reason for needing it to happen >before the regular interrupt handler is invoked. > Good catch. This indeed looks like a bug. With 050602-1 version I am seeing double the number of calls to timer_interrupt routine than expected. Say, when all CPUs are fully busy, I see 2*HZ timer interrupt count in /proc/interrupts And things look normal once I change this hunk as below >> } else >> #endif >> + + { >> +#ifdef CONFIG_NO_IDLE_HZ >> + if (dyn_tick->state & (DYN_TICK_ENABLED | >DYN_TICK_SKIPPING) && irq != 0) >> + dyn_tick->interrupt(irq, NULL, regs); >> +#endif >> + >> __do_IRQ(irq, regs); + } Thanks, Venki ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050602-1 2005-06-08 22:14 [PATCH] Dynamic tick for x86 version 050602-1 Pallipadi, Venkatesh @ 2005-06-09 1:40 ` Tony Lindgren 2005-06-10 4:30 ` [PATCH] Dynamic tick for x86 version 050609-2 Tony Lindgren 0 siblings, 1 reply; 20+ messages in thread From: Tony Lindgren @ 2005-06-09 1:40 UTC (permalink / raw) To: Pallipadi, Venkatesh; +Cc: Jonathan Corbet, linux-kernel * Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> [050608 15:14]: > > >-----Original Message----- > >From: linux-kernel-owner@vger.kernel.org > >[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of > >Jonathan Corbet > >Sent: Tuesday, June 07, 2005 1:36 PM > >To: Tony Lindgren > >Cc: linux-kernel@vger.kernel.org > >Subject: Re: [PATCH] Dynamic tick for x86 version 050602-1 > > > >Tony Lindgren <tony@atomide.com> wrote: > > > >> --- linux-dev.orig/arch/i386/kernel/irq.c 2005-06-01 > >17:51:36.000000000 -0700 > >> +++ linux-dev/arch/i386/kernel/irq.c 2005-06-01 > >17:54:32.000000000 -0700 > >> [...] > >> @@ -102,6 +103,12 @@ fastcall unsigned int do_IRQ(struct pt_r > >> ); > >> } else > >> #endif > >> + > >> +#ifdef CONFIG_NO_IDLE_HZ > >> + if (dyn_tick->state & (DYN_TICK_ENABLED | > >DYN_TICK_SKIPPING) && irq != 0) > >> + dyn_tick->interrupt(irq, NULL, regs); > >> +#endif > >> + > >> __do_IRQ(irq, regs); > > > >Forgive me if I'm being obtuse (again...), but this hunk doesn't look > >like it would work well in the 4K stacks case. When 4K stacks > >are being > >used, dyn_tick->interrupt() will only get called in the nested > >interrupt > >case, when the interrupt stack is already in use. This change also > >pushes the non-assembly __do_IRQ() call out of the else branch, meaning > >that, when the switch is made to the interrupt stack (most of > >the time), > >__do_IRQ() will be called twice for the same interrupt. > > > >It looks to me like you want to put your #ifdef chunk *after* the call > >to __do_IRQ(), unless you have some reason for needing it to happen > >before the regular interrupt handler is invoked. > > > > Good catch. This indeed looks like a bug. > With 050602-1 version I am seeing double the number of calls to > timer_interrupt routine than expected. Say, when all CPUs are fully > busy, > I see 2*HZ timer interrupt count in /proc/interrupts > > And things look normal once I change this hunk as below > > >> } else > >> #endif > >> + > + { > >> +#ifdef CONFIG_NO_IDLE_HZ > >> + if (dyn_tick->state & (DYN_TICK_ENABLED | > >DYN_TICK_SKIPPING) && irq != 0) > >> + dyn_tick->interrupt(irq, NULL, regs); > >> +#endif > >> + > >> __do_IRQ(irq, regs); > + } Cool. Sorry for not responding earlier, my hard drive crashed yesterday morning... I also managed to fry my spare computer's motherboard while trying to recover some data from the broken disk :) I'll try to post an updated patch tomorrow. Tony ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] Dynamic tick for x86 version 050609-2 2005-06-09 1:40 ` Tony Lindgren @ 2005-06-10 4:30 ` Tony Lindgren 2005-06-10 9:10 ` Pavel Machek ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Tony Lindgren @ 2005-06-10 4:30 UTC (permalink / raw) To: linux-kernel Cc: Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo [-- Attachment #1: Type: text/plain, Size: 738 bytes --] Hi all, Thanks for all the comments. Here's an updated dyntick patch. Changes from last patch: - Patch from Bernard Blackham for missing for cpu_has_local_apic() - Fixed interrupt code as noted by Pavel Machek, Jonathan Corbet and Venkatesh Pallipadi. - Added kernel command line options as in the ARM patch uggested by Russell King. You now need to pass: dyntick=enable and possibly dyntick=enable,forceapic if you have a P3 system that works with dyntick with APIC. (P4 won't). You can also enable dyntick via sysfs during runtime: # echo 1 > /sys/devices/system/timer/timer0/dyn_tick_state - Separated debug code into another optional patch available at: http://muru.com/linux/dyntick/ Regards, Tony [-- Attachment #2: patch-dynamic-tick-2.6.12-rc6-050609-2 --] [-- Type: text/plain, Size: 24499 bytes --] Index: linux-dev/arch/i386/Kconfig =================================================================== --- linux-dev.orig/arch/i386/Kconfig 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/Kconfig 2005-06-09 20:55:03.000000000 -0700 @@ -458,6 +458,37 @@ bool "Provide RTC interrupt" depends on HPET_TIMER && RTC=y +config NO_IDLE_HZ + bool "Dynamic Tick Timer - Skip timer ticks during idle" + help + This option enables support for skipping timer ticks when the + processor is idle. During system load, timer is continuous. + This option saves power, as it allows the system to stay in + idle mode longer. Currently supported timers are ACPI PM + timer, local APIC timer, and TSC timer. HPET timer is currently + not supported. + + Note that you need to enable dynamic tick timer either by + passing dyntick=enable command line option, or via sysfs: + + # echo 1 > /sys/devices/system/timer/timer0/dyn_tick_state + +config DYN_TICK_USE_APIC + bool "Use APIC timer instead of PIT timer" + help + This option enables using APIC timer interrupt if your hardware + supports it. APIC timer allows longer sleep periods compared + to PIT timer. + + Note that on most recent hardware disabling PIT timer also + disables APIC timer interrupts, and system won't run properly. + Symptoms include slow system boot, and time running slow. + + If unsure, don't enable this option. + + Note that to you still need to pass dyntick=enable,forceapic + command line option to use APIC timer. + config SMP bool "Symmetric multi-processing support" ---help--- Index: linux-dev/arch/i386/kernel/Makefile =================================================================== --- linux-dev.orig/arch/i386/kernel/Makefile 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/kernel/Makefile 2005-06-09 20:55:03.000000000 -0700 @@ -31,6 +31,7 @@ obj-y += sysenter.o vsyscall.o obj-$(CONFIG_ACPI_SRAT) += srat.o obj-$(CONFIG_HPET_TIMER) += time_hpet.o +obj-$(CONFIG_NO_IDLE_HZ) += dyn-tick.o obj-$(CONFIG_EFI) += efi.o efi_stub.o obj-$(CONFIG_EARLY_PRINTK) += early_printk.o Index: linux-dev/arch/i386/kernel/apic.c =================================================================== --- linux-dev.orig/arch/i386/kernel/apic.c 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/kernel/apic.c 2005-06-09 20:55:03.000000000 -0700 @@ -26,6 +26,7 @@ #include <linux/mc146818rtc.h> #include <linux/kernel_stat.h> #include <linux/sysdev.h> +#include <linux/dyn-tick.h> #include <asm/atomic.h> #include <asm/smp.h> @@ -909,6 +910,8 @@ #define APIC_DIVISOR 16 +static u32 apic_timer_val; + static void __setup_APIC_LVTT(unsigned int clocks) { unsigned int lvtt_value, tmp_value, ver; @@ -927,7 +930,15 @@ & ~(APIC_TDR_DIV_1 | APIC_TDR_DIV_TMBASE)) | APIC_TDR_DIV_16); - apic_write_around(APIC_TMICT, clocks/APIC_DIVISOR); + apic_timer_val = clocks/APIC_DIVISOR; + +#ifdef CONFIG_NO_IDLE_HZ + /* Local APIC timer is 24-bit */ + if (apic_timer_val) + dyn_tick->max_skip = 0xffffff / apic_timer_val; +#endif + + apic_write_around(APIC_TMICT, apic_timer_val); } static void __init setup_APIC_timer(unsigned int clocks) @@ -1040,6 +1051,13 @@ */ setup_APIC_timer(calibration_result); +#ifdef CONFIG_NO_IDLE_HZ + if (calibration_result) + dyn_tick->state |= DYN_TICK_USE_APIC; + else + printk(KERN_INFO "dyn-tick: Cannot use local APIC\n"); +#endif + local_irq_enable(); } @@ -1068,6 +1086,18 @@ } } +#if defined(CONFIG_NO_IDLE_HZ) +void reprogram_apic_timer(unsigned int count) +{ + unsigned long flags; + + count *= apic_timer_val; + local_irq_save(flags); + apic_write_around(APIC_TMICT, count); + local_irq_restore(flags); +} +#endif + /* * the frequency of the profiling timer can be changed * by writing a multiplier value into /proc/profile. @@ -1160,6 +1190,7 @@ fastcall void smp_apic_timer_interrupt(struct pt_regs *regs) { + unsigned long seq; int cpu = smp_processor_id(); /* @@ -1178,6 +1209,23 @@ * interrupt lock, which is the WrongThing (tm) to do. */ irq_enter(); + +#ifdef CONFIG_NO_IDLE_HZ + /* + * Check if we need to wake up PIT interrupt handler. + * Otherwise just wake up local APIC timer. + */ + do { + seq = read_seqbegin(&xtime_lock); + if (dyn_tick->state & (DYN_TICK_ENABLED | DYN_TICK_SKIPPING)) { + if (dyn_tick->skip_cpu == cpu && dyn_tick->skip > DYN_TICK_MIN_SKIP) + dyn_tick->interrupt(99, NULL, regs); + else + reprogram_apic_timer(1); + } + } while (read_seqretry(&xtime_lock, seq)); +#endif + smp_local_timer_interrupt(regs); irq_exit(); } Index: linux-dev/arch/i386/kernel/dyn-tick.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-dev/arch/i386/kernel/dyn-tick.c 2005-06-09 20:55:03.000000000 -0700 @@ -0,0 +1,45 @@ +/* + * linux/arch/i386/kernel/dyn-tick.c + * + * Copyright (C) 2004 Nokia Corporation + * Written by Tony Lindgen <tony@atomide.com> and + * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/version.h> +#include <linux/config.h> +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/dyn-tick.h> + +void arch_reprogram_timer(void) +{ + if (cpu_has_local_apic()) { + disable_pit_timer(); + if (dyn_tick->state & DYN_TICK_TIMER_INT) + reprogram_apic_timer(dyn_tick->skip); + } else { + if (dyn_tick->state & DYN_TICK_TIMER_INT) + reprogram_pit_timer(dyn_tick->skip); + else + disable_pit_timer(); + } +} + +static struct dyn_tick_timer arch_dyn_tick_timer = { + .arch_reprogram_timer = &arch_reprogram_timer, +}; + +int __init dyn_tick_init(void) +{ + arch_dyn_tick_timer.arch_init = dyn_tick_arch_init; + dyn_tick_register(&arch_dyn_tick_timer); + + return 0; +} +arch_initcall(dyn_tick_init); Index: linux-dev/include/asm-i386/dyn-tick.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-dev/include/asm-i386/dyn-tick.h 2005-06-09 20:55:03.000000000 -0700 @@ -0,0 +1,42 @@ +/* + * linux/include/asm-i386/dyn-tick.h + * + * Copyright (C) 2004 Nokia Corporation + * Written by Tony Lindgen <tony@atomide.com> and + * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef _ASM_I386_DYN_TICK_H_ +#define _ASM_I386_DYN_TICK_H_ + +extern int dyn_tick_arch_init(void); +extern void disable_pit_timer(void); +extern void reprogram_pit_timer(int jiffies_to_skip); +extern void reprogram_apic_timer(unsigned int count); +extern void replace_timer_interrupt(void * new_handler); + +#if defined(CONFIG_NO_IDLE_HZ) && defined(CONFIG_X86_LOCAL_APIC) +extern void reprogram_apic_timer(unsigned int count); +#else +#define reprogram_apic_timer(x) {} +#endif + +#undef DEBUG +#ifdef DEBUG +#define dbg_dyn_tick_irq() {if (skipped && skipped < dyn_tick->skip) \ + printk("%u/%li ", skipped, dyn_tick->skip);} +#else +#define dbg_dyn_tick_irq() {} +#endif + +#if defined(CONFIG_DYN_TICK_USE_APIC) && (defined(CONFIG_SMP) || defined(CONFIG_X86_UP_APIC)) +#define cpu_has_local_apic() (dyn_tick->state & DYN_TICK_USE_APIC) +#else +#define cpu_has_local_apic() 0 +#endif + +#endif /* _ASM_I386_DYN_TICK_H_ */ Index: linux-dev/arch/i386/kernel/irq.c =================================================================== --- linux-dev.orig/arch/i386/kernel/irq.c 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/kernel/irq.c 2005-06-09 20:55:03.000000000 -0700 @@ -15,6 +15,7 @@ #include <linux/seq_file.h> #include <linux/interrupt.h> #include <linux/kernel_stat.h> +#include <linux/dyn-tick.h> DEFINE_PER_CPU(irq_cpustat_t, irq_stat) ____cacheline_maxaligned_in_smp; EXPORT_PER_CPU_SYMBOL(irq_stat); @@ -73,6 +74,11 @@ } #endif +#ifdef CONFIG_NO_IDLE_HZ + if (dyn_tick->state & (DYN_TICK_ENABLED | DYN_TICK_SKIPPING) && irq != 0) + dyn_tick->interrupt(irq, NULL, regs); +#endif + #ifdef CONFIG_4KSTACKS curctx = (union irq_ctx *) current_thread_info(); Index: linux-dev/arch/i386/kernel/process.c =================================================================== --- linux-dev.orig/arch/i386/kernel/process.c 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/kernel/process.c 2005-06-09 21:08:49.000000000 -0700 @@ -37,6 +37,7 @@ #include <linux/kallsyms.h> #include <linux/ptrace.h> #include <linux/random.h> +#include <linux/dyn-tick.h> #include <asm/uaccess.h> #include <asm/pgtable.h> @@ -160,6 +161,8 @@ if (!idle) idle = default_idle; + dyn_tick_reprogram_timer(); + __get_cpu_var(irq_stat).idle_timestamp = jiffies; idle(); } Index: linux-dev/arch/i386/kernel/time.c =================================================================== --- linux-dev.orig/arch/i386/kernel/time.c 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/kernel/time.c 2005-06-09 20:55:03.000000000 -0700 @@ -46,6 +46,7 @@ #include <linux/bcd.h> #include <linux/efi.h> #include <linux/mca.h> +#include <linux/dyn-tick.h> #include <asm/io.h> #include <asm/smp.h> @@ -308,6 +309,44 @@ return IRQ_HANDLED; } +#ifdef CONFIG_NO_IDLE_HZ +static unsigned long long last_tick; + +/* + * This interrupt handler updates the time based on number of jiffies skipped + * It would be somewhat more optimized to have a customa handler in each timer + * using hardware ticks instead of nanoseconds. Note that CONFIG_NO_IDLE_HZ + * currently disables timer fallback on skipped jiffies. + */ +irqreturn_t dyn_tick_timer_interrupt(int irq, void *dev_id, struct pt_regs *regs) +{ + unsigned long flags; + volatile unsigned long long now; + unsigned int skipped = 0; + + write_seqlock_irqsave(&xtime_lock, flags); + now = cur_timer->monotonic_clock(); + while (now - last_tick >= NS_TICK_LEN) { + last_tick += NS_TICK_LEN; + cur_timer->mark_offset(); + do_timer_interrupt(irq, NULL, regs); + skipped++; + } + if (dyn_tick->state & (DYN_TICK_ENABLED | DYN_TICK_SKIPPING)) { + dbg_dyn_tick_irq(); + dyn_tick->skip = 1; + if (cpu_has_local_apic()) + reprogram_apic_timer(dyn_tick->skip); + reprogram_pit_timer(dyn_tick->skip); + dyn_tick->state |= DYN_TICK_ENABLED; + dyn_tick->state &= ~DYN_TICK_SKIPPING; + } + write_sequnlock_irqrestore(&xtime_lock, flags); + + return IRQ_HANDLED; +} +#endif /* CONFIG_NO_IDLE_HZ */ + /* not static: needed by APM */ unsigned long get_cmos_time(void) { @@ -416,7 +455,7 @@ /* XXX this driverfs stuff should probably go elsewhere later -john */ -static struct sys_device device_timer = { +struct sys_device device_timer = { .id = 0, .cls = &timer_sysclass, }; @@ -452,6 +491,28 @@ } #endif +#ifdef CONFIG_NO_IDLE_HZ + +int __init dyn_tick_arch_init(void) +{ + unsigned long flags; + + write_seqlock_irqsave(&xtime_lock, flags); + last_tick = cur_timer->monotonic_clock(); + dyn_tick->skip = 1; + if (!(dyn_tick->state & DYN_TICK_USE_APIC) || !cpu_has_local_apic()) + dyn_tick->max_skip = 0xffff/LATCH; /* PIT timer length */ + printk(KERN_INFO "dyn-tick: Maximum ticks to skip limited to %i\n", + dyn_tick->max_skip); + write_sequnlock_irqrestore(&xtime_lock, flags); + + dyn_tick->interrupt = dyn_tick_timer_interrupt; + replace_timer_interrupt(dyn_tick->interrupt); + + return 0; +} +#endif /* CONFIG_NO_IDLE_HZ */ + void __init time_init(void) { #ifdef CONFIG_HPET_TIMER @@ -472,5 +533,16 @@ cur_timer = select_timer(); printk(KERN_INFO "Using %s for high-res timesource\n",cur_timer->name); +#ifdef CONFIG_NO_IDLE_HZ + if (strncmp(cur_timer->name, "tsc", 3) == 0 || + strncmp(cur_timer->name, "pmtmr", 3) == 0) { + dyn_tick->state |= DYN_TICK_SUITABLE; + printk(KERN_INFO "dyn-tick: Found suitable timer: %s\n", + cur_timer->name); + } else + printk(KERN_ERR "dyn-tick: Cannot use timer %s\n", + cur_timer->name); +#endif + time_init_hook(); } Index: linux-dev/arch/i386/kernel/timers/timer_pm.c =================================================================== --- linux-dev.orig/arch/i386/kernel/timers/timer_pm.c 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/kernel/timers/timer_pm.c 2005-06-09 20:55:03.000000000 -0700 @@ -168,6 +168,7 @@ monotonic_base += delta * NSEC_PER_USEC; write_sequnlock(&monotonic_lock); +#ifndef CONFIG_NO_IDLE_HZ /* convert to ticks */ delta += offset_delay; lost = delta / (USEC_PER_SEC / HZ); @@ -184,6 +185,7 @@ first_run = 0; offset_delay = 0; } +#endif } Index: linux-dev/arch/i386/kernel/timers/timer_tsc.c =================================================================== --- linux-dev.orig/arch/i386/kernel/timers/timer_tsc.c 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/kernel/timers/timer_tsc.c 2005-06-09 20:55:03.000000000 -0700 @@ -368,6 +368,7 @@ rdtsc(last_tsc_low, last_tsc_high); +#ifndef CONFIG_NO_IDLE_HZ spin_lock(&i8253_lock); outb_p(0x00, PIT_MODE); /* latch the count ASAP */ @@ -435,11 +436,14 @@ cpufreq_delayed_get(); } else lost_count = 0; +#endif + /* update the monotonic base value */ this_offset = ((unsigned long long)last_tsc_high<<32)|last_tsc_low; monotonic_base += cycles_2_ns(this_offset - last_offset); write_sequnlock(&monotonic_lock); +#ifndef CONFIG_NO_IDLE_HZ /* calculate delay_at_last_interrupt */ count = ((LATCH-1) - count) * TICK_SIZE; delay_at_last_interrupt = (count + LATCH/2) / LATCH; @@ -450,6 +454,7 @@ */ if (lost && abs(delay - delay_at_last_interrupt) > (900000/HZ)) jiffies_64++; +#endif } static int __init init_tsc(char* override) Index: linux-dev/arch/i386/mach-default/setup.c =================================================================== --- linux-dev.orig/arch/i386/mach-default/setup.c 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/mach-default/setup.c 2005-06-09 20:55:03.000000000 -0700 @@ -85,6 +85,22 @@ setup_irq(0, &irq0); } +/** + * replace_timer_interrupt - allow replacing timer interrupt handler + * + * Description: + * Can be used to replace timer interrupt handler with a more optimized + * handler. Used for enabling and disabling of CONFIG_NO_IDLE_HZ. + */ +void replace_timer_interrupt(void * new_handler) +{ + unsigned long flags; + + write_seqlock_irqsave(&xtime_lock, flags); + irq0.handler = new_handler; + write_sequnlock_irqrestore(&xtime_lock, flags); +} + #ifdef CONFIG_MCA /** * mca_nmi_hook - hook into MCA specific NMI chain Index: linux-dev/include/linux/dyn-tick.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-dev/include/linux/dyn-tick.h 2005-06-09 20:55:03.000000000 -0700 @@ -0,0 +1,61 @@ +/* + * linux/include/linux/dyn-tick.h + * + * Copyright (C) 2004 Nokia Corporation + * Written by Tony Lindgen <tony@atomide.com> and + * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef _DYN_TICK_TIMER_H +#define _DYN_TICK_TIMER_H + +#include <linux/interrupt.h> + +#define DYN_TICK_TIMER_INT (1 << 4) +#define DYN_TICK_USE_APIC (1 << 3) +#define DYN_TICK_SKIPPING (1 << 2) +#define DYN_TICK_ENABLED (1 << 1) +#define DYN_TICK_SUITABLE (1 << 0) + +struct dyn_tick_state { + unsigned int state; /* Current state */ + int skip_cpu; /* Skip handling processor */ + unsigned long skip; /* Ticks to skip */ + unsigned int max_skip; /* Max number of ticks to skip */ + unsigned long irq_skip_mask; /* Do not update time from these irqs */ + irqreturn_t (*interrupt)(int, void *, struct pt_regs *); +}; + +struct dyn_tick_timer { + int (*arch_init) (void); + void (*arch_enable) (void); + void (*arch_disable) (void); + void (*arch_reprogram_timer) (void); +}; + +extern struct dyn_tick_state * dyn_tick; +extern void dyn_tick_register(struct dyn_tick_timer * new_timer); + +#define NS_TICK_LEN ((1 * 1000000000)/HZ) +#define DYN_TICK_MIN_SKIP 2 + +#ifdef CONFIG_NO_IDLE_HZ + +extern unsigned long dyn_tick_reprogram_timer(void); + +#else + +#define arch_has_safe_halt() 0 +#define dyn_tick_reprogram_timer() {} + + +#endif /* CONFIG_NO_IDLE_HZ */ + +/* Pick up arch specific header */ +#include <asm/dyn-tick.h> + +#endif /* _DYN_TICK_TIMER_H */ Index: linux-dev/kernel/Makefile =================================================================== --- linux-dev.orig/kernel/Makefile 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/kernel/Makefile 2005-06-09 20:55:03.000000000 -0700 @@ -28,6 +28,7 @@ obj-$(CONFIG_SYSFS) += ksysfs.o obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ obj-$(CONFIG_SECCOMP) += seccomp.o +obj-$(CONFIG_NO_IDLE_HZ) += dyn-tick.o ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is Index: linux-dev/kernel/dyn-tick.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux-dev/kernel/dyn-tick.c 2005-06-09 20:57:27.000000000 -0700 @@ -0,0 +1,209 @@ +/* + * linux/arch/i386/kernel/dyn-tick.c + * + * Beginnings of generic dynamic tick timer support + * + * Copyright (C) 2004 Nokia Corporation + * Written by Tony Lindgen <tony@atomide.com> and + * Tuukka Tikkanen <tuukka.tikkanen@elektrobit.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/version.h> +#include <linux/config.h> +#include <linux/kernel.h> +#include <linux/init.h> +#include <linux/module.h> +#include <linux/sysdev.h> +#include <linux/interrupt.h> +#include <linux/cpumask.h> +#include <linux/pm.h> +#include <linux/dyn-tick.h> +#include <asm/io.h> + +#include "io_ports.h" + +#define DYN_TICK_VERSION "050609-2" + +struct dyn_tick_state dyn_tick_state; +struct dyn_tick_state * dyn_tick = &dyn_tick_state; +struct dyn_tick_timer * dyn_tick_cfg; +static cpumask_t dyn_cpu_map; + +/* + * Arch independed code needed to reprogram next timer interrupt. + * Gets called from cpu_idle() before entering idle loop. Note that + * we want to have all processors idle before reprogramming the + * next timer interrupt. + */ +unsigned long dyn_tick_reprogram_timer(void) +{ + int cpu; + unsigned long flags; + cpumask_t idle_cpus; + unsigned long next; + + if (!(dyn_tick->state & DYN_TICK_ENABLED)) + return 0; + + /* Check if we are already skipping ticks and can idle other cpus */ + if (dyn_tick->state & DYN_TICK_SKIPPING) { + if (cpu_has_local_apic()) + reprogram_apic_timer(dyn_tick->skip); + return 0; + } + + /* Check if we can start skipping ticks */ + write_seqlock_irqsave(&xtime_lock, flags); + cpu = smp_processor_id(); + cpu_set(cpu, dyn_cpu_map); + cpus_and(idle_cpus, dyn_cpu_map, cpu_online_map); + if (cpus_equal(idle_cpus, cpu_online_map)) { + next = next_timer_interrupt(); + if (jiffies > next) + dyn_tick->skip = 1; + else + dyn_tick->skip = next_timer_interrupt() - jiffies; + if (dyn_tick->skip > DYN_TICK_MIN_SKIP) { + if (dyn_tick->skip > dyn_tick->max_skip) + dyn_tick->skip = dyn_tick->max_skip; + + dyn_tick_cfg->arch_reprogram_timer(); + + dyn_tick->skip_cpu = cpu; + dyn_tick->state |= DYN_TICK_SKIPPING; + } + cpus_clear(dyn_cpu_map); + } + write_sequnlock_irqrestore(&xtime_lock, flags); + + return dyn_tick->skip; +} + +void __init dyn_tick_register(struct dyn_tick_timer * arch_timer) +{ + dyn_tick_cfg = arch_timer; + printk(KERN_INFO "dyn-tick: Registering dynamic tick timer v%s\n", + DYN_TICK_VERSION); +} + +/* + * --------------------------------------------------------------------------- + * Command line options + * --------------------------------------------------------------------------- + */ +static int __initdata dyntick_autoenable = 0; +static int __initdata dyntick_useapic = 0; + +/* + * dyntick=[enable|disable],[forceapic] + */ +static int __init dyntick_setup(char *options) +{ + if (!options) + return 0; + + if (strstr(options, "enable")) + dyntick_autoenable = 1; + + if (strstr(options, "forceapic")) + dyntick_useapic = 1; + + return 0; +} + +__setup("dyntick=", dyntick_setup); + +/* + * --------------------------------------------------------------------------- + * Sysfs interface + * --------------------------------------------------------------------------- + */ + +extern struct sys_device device_timer; + +static ssize_t show_dyn_tick_state(struct sys_device *dev, char *buf) +{ + return sprintf(buf, "suitable:\t%i\n" + "enabled:\t%i\n" + "using APIC:\t%i\n", + dyn_tick->state & DYN_TICK_SUITABLE, + (dyn_tick->state & DYN_TICK_ENABLED) >> 1, + (dyn_tick->state & DYN_TICK_USE_APIC) >> 3); +} + +static ssize_t set_dyn_tick_state(struct sys_device *dev, const char * buf, + size_t count) +{ + unsigned long flags; + unsigned int enable = simple_strtoul(buf, NULL, 2); + + write_seqlock_irqsave(&xtime_lock, flags); + if (enable) { + if (dyn_tick_cfg->arch_enable) + dyn_tick_cfg->arch_enable(); + dyn_tick->state |= DYN_TICK_ENABLED; + } else { + if (dyn_tick_cfg->arch_disable) + dyn_tick_cfg->arch_disable(); + dyn_tick->state &= ~DYN_TICK_ENABLED; + } + write_sequnlock_irqrestore(&xtime_lock, flags); + + return count; +} + +static SYSDEV_ATTR(dyn_tick_state, 0644, show_dyn_tick_state, + set_dyn_tick_state); + +/* + * --------------------------------------------------------------------------- + * Init functions + * --------------------------------------------------------------------------- + */ + +static int __init dyn_tick_early_init(void) +{ + dyn_tick->state |= DYN_TICK_TIMER_INT; + return 0; +} + +subsys_initcall(dyn_tick_early_init); + +/* + * We need to initialize dynamic tick after calibrate delay + */ +static int __init dyn_tick_late_init(void) +{ + int ret = 0; + + if (dyn_tick_cfg == NULL || dyn_tick_cfg->arch_init == NULL || + !(dyn_tick->state & DYN_TICK_SUITABLE)) { + printk(KERN_ERR "dyn-tick: No suitable timer found\n"); + return -ENODEV; + } + + if (!dyntick_useapic) + dyn_tick->state &= ~DYN_TICK_USE_APIC; + + ret = dyn_tick_cfg->arch_init(); + if (ret != 0) { + printk(KERN_ERR "dyn-tick: Init failed\n"); + return -ENODEV; + } + + ret = sysdev_create_file(&device_timer, &attr_dyn_tick_state); + + if (ret == 0 && dyntick_autoenable) { + dyn_tick->state |= DYN_TICK_ENABLED; + printk(KERN_INFO "dyn-tick: Timer using dynamic tick\n"); + } else + printk(KERN_INFO "dyn-tick: Timer not enabled during boot\n"); + + return ret; +} + +late_initcall(dyn_tick_late_init); Index: linux-dev/arch/i386/kernel/timers/timer_pit.c =================================================================== --- linux-dev.orig/arch/i386/kernel/timers/timer_pit.c 2005-06-09 20:49:06.000000000 -0700 +++ linux-dev/arch/i386/kernel/timers/timer_pit.c 2005-06-09 20:55:03.000000000 -0700 @@ -149,6 +149,43 @@ return count; } +/* + * REVISIT: Looks like on P3 APIC timer keeps running if PIT mode + * is changed. On P4, changing PIT mode seems to kill + * APIC timer interrupts. Same thing with disabling PIT + * interrupt. + */ +void disable_pit_timer(void) +{ + extern spinlock_t i8253_lock; + unsigned long flags; + spin_lock_irqsave(&i8253_lock, flags); + outb_p(0x32, PIT_MODE); /* binary, mode 1, LSB/MSB, ch 0 */ + spin_unlock_irqrestore(&i8253_lock, flags); +} + +/* + * Reprograms the next timer interrupt + * PIT timer reprogramming code taken from APM code. + * Note that PIT timer is a 16-bit timer, which allows max + * skip of only few seconds. + */ +void reprogram_pit_timer(int jiffies_to_skip) +{ + int skip; + extern spinlock_t i8253_lock; + unsigned long flags; + + skip = jiffies_to_skip * LATCH; + if (skip > 0xffff) + skip = 0xffff; + + spin_lock_irqsave(&i8253_lock, flags); + outb_p(0x34, PIT_MODE); /* binary, mode 2, LSB/MSB, ch 0 */ + outb_p(skip & 0xff, PIT_CH0); /* LSB */ + outb(skip >> 8, PIT_CH0); /* MSB */ + spin_unlock_irqrestore(&i8253_lock, flags); +} /* tsc timer_opts struct */ struct timer_opts timer_pit = { ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-10 4:30 ` [PATCH] Dynamic tick for x86 version 050609-2 Tony Lindgren @ 2005-06-10 9:10 ` Pavel Machek 2005-06-10 15:10 ` Tony Lindgren 2005-06-13 4:54 ` Valdis.Kletnieks 2005-06-13 17:09 ` Srivatsa Vaddagiri 2 siblings, 1 reply; 20+ messages in thread From: Pavel Machek @ 2005-06-10 9:10 UTC (permalink / raw) To: Tony Lindgren Cc: linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Bernard Blackham, Christian Hesse, Zwane Mwaikambo Hi! Some more nitpicking... > +/* > + * --------------------------------------------------------------------------- > + * Command line options > + * --------------------------------------------------------------------------- > + */ > +static int __initdata dyntick_autoenable = 0; > +static int __initdata dyntick_useapic = 0; > + > +/* > + * dyntick=[enable|disable],[forceapic] > + */ > +static int __init dyntick_setup(char *options) > +{ > + if (!options) > + return 0; > + > + if (strstr(options, "enable")) > + dyntick_autoenable = 1; > + > + if (strstr(options, "forceapic")) > + dyntick_useapic = 1; > + > + return 0; > +} > + > +__setup("dyntick=", dyntick_setup); Well, your parsing is little too simplistic. If I pass dyntick=do_not_dare_to_enable_it, it still enables :-). > +/* > + * --------------------------------------------------------------------------- > + * Sysfs interface > + * --------------------------------------------------------------------------- > + */ > + > +extern struct sys_device device_timer; > + > +static ssize_t show_dyn_tick_state(struct sys_device *dev, char *buf) > +{ > + return sprintf(buf, "suitable:\t%i\n" > + "enabled:\t%i\n" > + "using APIC:\t%i\n", > + dyn_tick->state & DYN_TICK_SUITABLE, > + (dyn_tick->state & DYN_TICK_ENABLED) >> 1, > + (dyn_tick->state & DYN_TICK_USE_APIC) >> 3); You basically hardcode values of DYN_TICK_* here. Why not use !!() and loose dependency? Pavel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-10 9:10 ` Pavel Machek @ 2005-06-10 15:10 ` Tony Lindgren 0 siblings, 0 replies; 20+ messages in thread From: Tony Lindgren @ 2005-06-10 15:10 UTC (permalink / raw) To: Pavel Machek Cc: linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Bernard Blackham, Christian Hesse, Zwane Mwaikambo * Pavel Machek <pavel@ucw.cz> [050610 02:10]: > Hi! > > Some more nitpicking... Great! > > +/* > > + * --------------------------------------------------------------------------- > > + * Command line options > > + * --------------------------------------------------------------------------- > > + */ > > +static int __initdata dyntick_autoenable = 0; > > +static int __initdata dyntick_useapic = 0; > > + > > +/* > > + * dyntick=[enable|disable],[forceapic] > > + */ > > +static int __init dyntick_setup(char *options) > > +{ > > + if (!options) > > + return 0; > > + > > + if (strstr(options, "enable")) > > + dyntick_autoenable = 1; > > + > > + if (strstr(options, "forceapic")) > > + dyntick_useapic = 1; > > + > > + return 0; > > +} > > + > > +__setup("dyntick=", dyntick_setup); > > > Well, your parsing is little too simplistic. If I pass > dyntick=do_not_dare_to_enable_it, it still enables :-). OK, I'll change that to test that enable is the first option. > > +/* > > + * --------------------------------------------------------------------------- > > + * Sysfs interface > > + * --------------------------------------------------------------------------- > > + */ > > + > > +extern struct sys_device device_timer; > > + > > +static ssize_t show_dyn_tick_state(struct sys_device *dev, char *buf) > > +{ > > + return sprintf(buf, "suitable:\t%i\n" > > + "enabled:\t%i\n" > > + "using APIC:\t%i\n", > > + dyn_tick->state & DYN_TICK_SUITABLE, > > + (dyn_tick->state & DYN_TICK_ENABLED) >> 1, > > + (dyn_tick->state & DYN_TICK_USE_APIC) >> 3); > > You basically hardcode values of DYN_TICK_* here. Why not use !!() and > loose dependency? > > Pavel OK, thanks. Tony ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-10 4:30 ` [PATCH] Dynamic tick for x86 version 050609-2 Tony Lindgren 2005-06-10 9:10 ` Pavel Machek @ 2005-06-13 4:54 ` Valdis.Kletnieks 2005-06-13 15:25 ` Tony Lindgren 2005-06-13 17:09 ` Srivatsa Vaddagiri 2 siblings, 1 reply; 20+ messages in thread From: Valdis.Kletnieks @ 2005-06-13 4:54 UTC (permalink / raw) To: Tony Lindgren Cc: linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo [-- Attachment #1: Type: text/plain, Size: 920 bytes --] On Thu, 09 Jun 2005 21:30:18 PDT, Tony Lindgren said: > Thanks for all the comments. Here's an updated dyntick patch. Patches with 3 minor rejects against -rc6-mm1, boots, and seems to work well on my Dell Latitude C840 laptop - although running at full load with seti@home causes the expected 250 timer ticks/sec, running a mostly-idle X session only gets about 117, and having xmms and a few other things running it hits about 170 tics/sec. I've had the CPU speed bounce between 1.2G and 1.6G a few times and it didn't seem to blink either. Even NTP is happy with what it sees.. ;) Need to rebuild with CONFIG_HZ=1000 and see what it does, and see what it does to actual power consumption. Minor nit: The implementation of /sys/devices/system/timer/timer0/dyn_tick_state violates the one-value-per-file rule for sysfs. I suspect this needs to become a directory with 3-4 files in it, each containing one value. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 4:54 ` Valdis.Kletnieks @ 2005-06-13 15:25 ` Tony Lindgren 2005-06-13 16:47 ` Valdis.Kletnieks 0 siblings, 1 reply; 20+ messages in thread From: Tony Lindgren @ 2005-06-13 15:25 UTC (permalink / raw) To: Valdis.Kletnieks Cc: linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo, Thomas Renninger * Valdis.Kletnieks@vt.edu <Valdis.Kletnieks@vt.edu> [050612 21:55]: > On Thu, 09 Jun 2005 21:30:18 PDT, Tony Lindgren said: > > > Thanks for all the comments. Here's an updated dyntick patch. > > Patches with 3 minor rejects against -rc6-mm1, boots, and seems to work well on > my Dell Latitude C840 laptop - although running at full load with seti@home > causes the expected 250 timer ticks/sec, running a mostly-idle X session only > gets about 117, and having xmms and a few other things running it hits about > 170 tics/sec. I've had the CPU speed bounce between 1.2G and 1.6G a few times > and it didn't seem to blink either. Even NTP is happy with what it sees.. ;) Cool. > Need to rebuild with CONFIG_HZ=1000 and see what it does, and see what it does > to actual power consumption. You may also want to check out the patch by Thomas Renninger for ACPI C-states. I've added a link to it at: http://muru.com/dyntick/ > Minor nit: The implementation of /sys/devices/system/timer/timer0/dyn_tick_state > violates the one-value-per-file rule for sysfs. I suspect this needs to > become a directory with 3-4 files in it, each containing one value. Yeah, I'll clean up that for the next version. Tony ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 15:25 ` Tony Lindgren @ 2005-06-13 16:47 ` Valdis.Kletnieks 2005-06-13 18:01 ` Thomas Renninger 0 siblings, 1 reply; 20+ messages in thread From: Valdis.Kletnieks @ 2005-06-13 16:47 UTC (permalink / raw) To: Tony Lindgren Cc: linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo, Thomas Renninger [-- Attachment #1: Type: text/plain, Size: 858 bytes --] On Mon, 13 Jun 2005 08:25:07 PDT, Tony Lindgren said: > You may also want to check out the patch by Thomas Renninger for ACPI > C-states. I've added a link to it at: > > http://muru.com/dyntick/ I think that's muru.com/linux/dyntick ? I'm not sure what Thomas's patch will do for me - here's what I currently have: % cat /proc/acpi/processor/CPU0/power active state: C2 max_cstate: C8 bus master activity: 00000000 states: C1: type[C1] promotion[C2] demotion[--] latency[000] usage[00000010] *C2: type[C2] promotion[--] demotion[C1] latency[050] usage[01314979] Near as I can tell, we start off in C1, drop into C2, and stay there no matter what happens - we never move back up to C1, and there's no C3 to drop into.... Should there be a C3/C4? Is my laptop just plain borked? :) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 16:47 ` Valdis.Kletnieks @ 2005-06-13 18:01 ` Thomas Renninger 2005-06-13 18:22 ` Tony Lindgren 2005-06-13 19:07 ` Valdis.Kletnieks 0 siblings, 2 replies; 20+ messages in thread From: Thomas Renninger @ 2005-06-13 18:01 UTC (permalink / raw) To: Valdis.Kletnieks Cc: Tony Lindgren, linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo Valdis.Kletnieks@vt.edu wrote: > On Mon, 13 Jun 2005 08:25:07 PDT, Tony Lindgren said: > >>You may also want to check out the patch by Thomas Renninger for ACPI >>C-states. I've added a link to it at: >> >>http://muru.com/dyntick/ > > I think that's muru.com/linux/dyntick ? > > I'm not sure what Thomas's patch will do for me Not much. The one measures how long your machine really stays in each C-state (Tony's pmstats should be sufficient for you). The other one tried to calc the next C-state to go, based on statistics of bus master activity and idleness of the machine. But it is *wrong*. Tony could you please remove the link to: ftp://ftp.suse.com/pub/people/trenn/dyn_tick_c_states/dynamic_tick_cstate_patch.diff Therefore we also will never get such good results as stated in: ftp://ftp.suse.com/pub/people/trenn/dyn_tick_c_states/measures_C4_machine The problem is that if there is bus master activity, a certain amount of time has to be waited (nobody could tell me how long this must be, currently it's 40 ms. Then it's assumed bm transfers have been finished) before C3/C4 can be called -> -> bm activity is not interrupt driven -> this needs ticks to be enabled. Therefore a final patch could look like: Let ticks be enabled (maybe reduced?) as long as machine is still in C1/C2 and only disable them for deeper sleeping states (C3/C4). - here's what I currently have: > > % cat /proc/acpi/processor/CPU0/power > active state: C2 > max_cstate: C8 > bus master activity: 00000000 > states: > C1: type[C1] promotion[C2] demotion[--] latency[000] usage[00000010] > *C2: type[C2] promotion[--] demotion[C1] latency[050] usage[01314979] > > Near as I can tell, we start off in C1, drop into C2, and stay there no > matter what happens - we never move back up to C1, and there's no C3 to drop > into.... > > Should there be a C3/C4? Is my laptop just plain borked? :) Depends on your machine and BIOS, whether it's supported -> seems as if it's not. You could verify by having a deeper look in your FADT/DSDT. You need the acpi tools from Len Brown (acpidmp/acpixtract) and the iasl Intel ACPI compiler. AFAIK checking for C-support is rather robust in recent kernels as long as you don't have a broken DSDT table. Maybe you find a newer BIOS supporting C3? To be honest, I doubt you save much power even with dyn tick enabled if you only have support for C1 and C2. The pmstats tool from Tony (see link above) could tell you nicely whether you gain anything. Thomas ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 18:01 ` Thomas Renninger @ 2005-06-13 18:22 ` Tony Lindgren 2005-06-13 19:07 ` Valdis.Kletnieks 1 sibling, 0 replies; 20+ messages in thread From: Tony Lindgren @ 2005-06-13 18:22 UTC (permalink / raw) To: Thomas Renninger Cc: Valdis.Kletnieks, linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo * Thomas Renninger <trenn@suse.de> [050613 11:01]: > Valdis.Kletnieks@vt.edu wrote: > > On Mon, 13 Jun 2005 08:25:07 PDT, Tony Lindgren said: > > > >>You may also want to check out the patch by Thomas Renninger for ACPI > >>C-states. I've added a link to it at: > >> > >>http://muru.com/dyntick/ > > > > I think that's muru.com/linux/dyntick ? Oops, that's correct. > > I'm not sure what Thomas's patch will do for me > Not much. > The one measures how long your machine really stays in > each C-state (Tony's pmstats should be sufficient for you). > > The other one tried to calc the next C-state to go, based on > statistics of bus master activity and idleness of the machine. > But it is *wrong*. > Tony could you please remove the link to: > ftp://ftp.suse.com/pub/people/trenn/dyn_tick_c_states/dynamic_tick_cstate_patch.diff > Therefore we also will never get such good results as stated in: > ftp://ftp.suse.com/pub/people/trenn/dyn_tick_c_states/measures_C4_machine OK, removed. > The problem is that if there is bus master activity, a certain amount of time > has to be waited (nobody could tell me how long this must be, currently > it's 40 ms. Then it's assumed bm transfers have been finished) before C3/C4 can be called -> > -> bm activity is not interrupt driven -> this needs ticks to be enabled. > > Therefore a final patch could look like: > Let ticks be enabled (maybe reduced?) as long as machine is still in C1/C2 and > only disable them for deeper sleeping states (C3/C4). > > - here's what I currently have: > > > > % cat /proc/acpi/processor/CPU0/power > > active state: C2 > > max_cstate: C8 > > bus master activity: 00000000 > > states: > > C1: type[C1] promotion[C2] demotion[--] latency[000] usage[00000010] > > *C2: type[C2] promotion[--] demotion[C1] latency[050] usage[01314979] > > > > Near as I can tell, we start off in C1, drop into C2, and stay there no > > matter what happens - we never move back up to C1, and there's no C3 to drop > > into.... > > > > Should there be a C3/C4? Is my laptop just plain borked? :) > Depends on your machine and BIOS, whether it's supported -> seems as if it's not. > > You could verify by having a deeper look in your FADT/DSDT. > You need the acpi tools from Len Brown (acpidmp/acpixtract) and the iasl Intel ACPI > compiler. > AFAIK checking for C-support is rather robust in recent kernels as long as you don't have a broken > DSDT table. > Maybe you find a newer BIOS supporting C3? > > To be honest, I doubt you save much power even with dyn tick enabled if you only have support > for C1 and C2. The pmstats tool from Tony (see link above) > could tell you nicely whether you gain anything. Yes, the savings are hard to get currently. In the long run using C4 and idling some devices when the ticks to skip is longer should give better savings. Tony ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 18:01 ` Thomas Renninger 2005-06-13 18:22 ` Tony Lindgren @ 2005-06-13 19:07 ` Valdis.Kletnieks 2005-06-14 9:39 ` Thomas Renninger 1 sibling, 1 reply; 20+ messages in thread From: Valdis.Kletnieks @ 2005-06-13 19:07 UTC (permalink / raw) To: Thomas Renninger Cc: Tony Lindgren, linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo [-- Attachment #1: Type: text/plain, Size: 1109 bytes --] On Mon, 13 Jun 2005 20:01:11 +0200, Thomas Renninger said: > > Should there be a C3/C4? Is my laptop just plain borked? :) > Depends on your machine and BIOS, whether it's supported -> seems as if it's not. > > You could verify by having a deeper look in your FADT/DSDT. > You need the acpi tools from Len Brown (acpidmp/acpixtract) and the iasl Intel ACPI > compiler. > AFAIK checking for C-support is rather robust in recent kernels as long as you don't have a broken > DSDT table. OK, found acpidmp and iasl, now have a decompiled DSDT - now to figure out if it's busticated or not.... :) > Maybe you find a newer BIOS supporting C3? Nope, I just checked, and the A13 BIOS from 02/06/2004 is the latest that Dell has released for the C840. Not much hope there unless there's some special secret site that even newer BIOS updates hide until they escape.. ;) > To be honest, I doubt you save much power even with dyn tick enabled if you only have support > for C1 and C2. The pmstats tool from Tony (see link above) > could tell you nicely whether you gain anything. Well, it's a start, anyhow. :) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 19:07 ` Valdis.Kletnieks @ 2005-06-14 9:39 ` Thomas Renninger 2005-06-14 15:40 ` Valdis.Kletnieks 0 siblings, 1 reply; 20+ messages in thread From: Thomas Renninger @ 2005-06-14 9:39 UTC (permalink / raw) To: Valdis.Kletnieks Cc: Tony Lindgren, linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo Valdis.Kletnieks@vt.edu wrote: > On Mon, 13 Jun 2005 20:01:11 +0200, Thomas Renninger said: > >>>Should there be a C3/C4? Is my laptop just plain borked? :) >>Depends on your machine and BIOS, whether it's supported -> seems as if it's not. >> >>You could verify by having a deeper look in your FADT/DSDT. >>You need the acpi tools from Len Brown (acpidmp/acpixtract) and the iasl Intel ACPI >>compiler. >>AFAIK checking for C-support is rather robust in recent kernels as long as you don't have a broken >>DSDT table. > > OK, found acpidmp and iasl, now have a decompiled DSDT - now to figure out > if it's busticated or not.... :) There are two ways C-state addresses are exported to OS. - Some flags in the FADT (-> ACPI spec) -> this gives you two C-states maximum, AFAIK. - Through the _CST function in your DSDT (-> ACPI spec, sorry). If you have have a look in dsdt.dsl at the _CST function there are that much packages returned as your BIOS claims to support. Hmm, _CST code is often in the SSDT an extention of the DSDT code. If you have one: acpidmp > acpidmp; acpixtract ssdt acpidmp >my_ssdt; iasl -d my_ssdt. If your dsdt/ssdt compiles again without errors (better tell me privately if you get some), you should not have much hope, higher states are probably not supported. You could also increase ACPI debug output when loading the processor module to get more information: cat /proc/acpi/debug_level /proc/acpi/debug_layer echo 0x00000FFF >/proc/acpi/debug_level (or whatever enables INFO, you need ACPI_DEBUG defined) modprobe processor Thomas > >>Maybe you find a newer BIOS supporting C3? > > Nope, I just checked, and the A13 BIOS from 02/06/2004 is the latest that Dell > has released for the C840. Not much hope there unless there's some special > secret site that even newer BIOS updates hide until they escape.. ;) > >>To be honest, I doubt you save much power even with dyn tick enabled if you only have support >>for C1 and C2. The pmstats tool from Tony (see link above) >>could tell you nicely whether you gain anything. > > Well, it's a start, anyhow. :) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-14 9:39 ` Thomas Renninger @ 2005-06-14 15:40 ` Valdis.Kletnieks 0 siblings, 0 replies; 20+ messages in thread From: Valdis.Kletnieks @ 2005-06-14 15:40 UTC (permalink / raw) To: Thomas Renninger Cc: Tony Lindgren, linux-kernel, Pallipadi, Venkatesh, Jonathan Corbet, Pavel Machek, Bernard Blackham, Christian Hesse, Zwane Mwaikambo [-- Attachment #1: Type: text/plain, Size: 1108 bytes --] On Tue, 14 Jun 2005 11:39:27 +0200, Thomas Renninger said: > There are two ways C-state addresses are exported to OS. > > - Some flags in the FADT (-> ACPI spec) -> this gives you two C-states maximum, AFAIK. This must be what I have, because... > - Through the _CST function in your DSDT (-> ACPI spec, sorry). If you have > have a look in dsdt.dsl at the _CST function there are that much packages returned as > your BIOS claims to support. Hmm, _CST code is often in the SSDT an extention > of the DSDT code. If you have one: acpidmp > acpidmp; acpixtract ssdt acpidmp >my_ssdt; > iasl -d my_ssdt. I tried (using pmtools-20031210 and acpica-unix-20050513): acpidmp > c840.dmp acpixtract dsdt c840.dmp > c840.dsdt acpixtract ssdt c840.dmp > c840.ssdt iasl -d c840.dsdt iasl -d c840.ssdt No signs of a _CST in either the DSDT or SSDT (in fact, xtract ssdt got me a zero-length file, so I suspect there's no SSDT at all in there). Oh well.. looks like short of BIOS/DSDT hacking, I'm stuck. At least the dynamic tick code got some testing out of all this... ;) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-10 4:30 ` [PATCH] Dynamic tick for x86 version 050609-2 Tony Lindgren 2005-06-10 9:10 ` Pavel Machek 2005-06-13 4:54 ` Valdis.Kletnieks @ 2005-06-13 17:09 ` Srivatsa Vaddagiri 2005-06-13 17:55 ` Andi Kleen 2005-06-13 18:27 ` Tony Lindgren 2 siblings, 2 replies; 20+ messages in thread From: Srivatsa Vaddagiri @ 2005-06-13 17:09 UTC (permalink / raw) To: Tony Lindgren; +Cc: linux-kernel Hi Tony, I went through the dynamic-tick patch on your website (patch-dynamic-tick-2.6.12-rc6-050610-1) and was having some questions about it: 1. dyn_tick->skip is set to the number of ticks that have to be skipped. This is set on the CPU which is the last (in online_map) to go idle and is based on when that CPU's next timer is set to expire. Other CPUs also seem to use the same interval to skip ticks. Shouldnt other CPU check their nearest timer rather than blindly skipping dyn_tick->skip number of ticks? 2. reprogram_apic_timer seems to reprogram the count-down APIC timer (APIC_TMICT) with an integral number of apic_timer_val. How accurate will this be? Shouldnt this take into account that we may not be reprogramming the timer on exactly "jiffy" boundary? 3. Is there any strong reason why you reprogram timers only when _all_ CPUs are idle? 4. In what aspects you think does your patch differ from VST (other than not relying on HRT!)? -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 17:09 ` Srivatsa Vaddagiri @ 2005-06-13 17:55 ` Andi Kleen 2005-06-13 18:37 ` Tony Lindgren 2005-06-13 18:27 ` Tony Lindgren 1 sibling, 1 reply; 20+ messages in thread From: Andi Kleen @ 2005-06-13 17:55 UTC (permalink / raw) To: vatsa; +Cc: linux-kernel Srivatsa Vaddagiri <vatsa@in.ibm.com> writes: > > 2. reprogram_apic_timer seems to reprogram the count-down > APIC timer (APIC_TMICT) with an integral number of apic_timer_val. > How accurate will this be? Shouldnt this take into account > that we may not be reprogramming the timer on exactly "jiffy" > boundary? All PIT based reprogramming schemes will lose time. Only with HPET you can do better (but even there it is difficult to do properly) > 3. Is there any strong reason why you reprogram timers only when > _all_ CPUs are idle? There is none imho - my x86-64 no idle tick patch doesn't do it. Actually there is a small reason - RCU currently does not get updated by a fully idle CPU and can stall other CPUs. But that is in practice not too big an issue yet because so many subsystems cause ticks now and then, so the CPUs tend to wake up often enough to not stall the rest of the system too badly. -Andi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 17:55 ` Andi Kleen @ 2005-06-13 18:37 ` Tony Lindgren 2005-06-13 18:51 ` Andi Kleen 0 siblings, 1 reply; 20+ messages in thread From: Tony Lindgren @ 2005-06-13 18:37 UTC (permalink / raw) To: Andi Kleen; +Cc: vatsa, linux-kernel * Andi Kleen <ak@muc.de> [050613 10:57]: > Srivatsa Vaddagiri <vatsa@in.ibm.com> writes: > > > > 2. reprogram_apic_timer seems to reprogram the count-down > > APIC timer (APIC_TMICT) with an integral number of apic_timer_val. > > How accurate will this be? Shouldnt this take into account > > that we may not be reprogramming the timer on exactly "jiffy" > > boundary? > > All PIT based reprogramming schemes will lose time. Not true if the timesource is different from interrupt source. Consider PM timer for timesource, and PIT for interrupt source. Reprogamming PIT should not affect PM timer. Time is always updated from PM timer. > Only with HPET you can do better (but even there it is difficult to > do properly) > > > 3. Is there any strong reason why you reprogram timers only when > > _all_ CPUs are idle? > > There is none imho - my x86-64 no idle tick patch doesn't do it. > > Actually there is a small reason - RCU currently does not get > updated by a fully idle CPU and can stall other CPUs. But that is in > practice not too big an issue yet because so many subsystems > cause ticks now and then, so the CPUs tend to wake up often > enough to not stall the rest of the system too badly. I guess it should be safe to reprogram timer even if other CPUs are not idle, assuming the busy CPUs reprogramming timer will also wake up the idle CPUs. There's one thing that should be considered though; Reprogamming timers should be avoided if the system is busy as it causes performance issues. Especially reprogramming PIT. Andi, where's your latest x86-64 patch BTW? I'd like to try it out on my laptop :) Tony ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 18:37 ` Tony Lindgren @ 2005-06-13 18:51 ` Andi Kleen 2005-06-13 19:35 ` Tony Lindgren 2005-06-13 19:48 ` Valdis.Kletnieks 0 siblings, 2 replies; 20+ messages in thread From: Andi Kleen @ 2005-06-13 18:51 UTC (permalink / raw) To: Tony Lindgren; +Cc: vatsa, linux-kernel On Mon, Jun 13, 2005 at 11:37:16AM -0700, Tony Lindgren wrote: > * Andi Kleen <ak@muc.de> [050613 10:57]: > > Srivatsa Vaddagiri <vatsa@in.ibm.com> writes: > > > > > > 2. reprogram_apic_timer seems to reprogram the count-down > > > APIC timer (APIC_TMICT) with an integral number of apic_timer_val. > > > How accurate will this be? Shouldnt this take into account > > > that we may not be reprogramming the timer on exactly "jiffy" > > > boundary? > > > > All PIT based reprogramming schemes will lose time. > > Not true if the timesource is different from interrupt source. > > Consider PM timer for timesource, and PIT for interrupt source. Reprogamming > PIT should not affect PM timer. Time is always updated from PM timer. PM timer is not really suitable for this because it overflows too quickly (several times a second). Also you still lose time in timers (e.g. your internal timers slowly drift) unless you regularly sync with the time source, but that has other drawbacks. > > > > Actually there is a small reason - RCU currently does not get > > updated by a fully idle CPU and can stall other CPUs. But that is in > > practice not too big an issue yet because so many subsystems > > cause ticks now and then, so the CPUs tend to wake up often > > enough to not stall the rest of the system too badly. > > I guess it should be safe to reprogram timer even if other CPUs are not > idle, assuming the busy CPUs reprogramming timer will also wake up the idle > CPUs. > > There's one thing that should be considered though; Reprogamming > timers should be avoided if the system is busy as it causes > performance issues. Especially reprogramming PIT. Just forget about reprogramming with PIT. IMHO that should be never used in production. The right way for this is HPET. The main issue with HPET is that many BIOS even though the chipsets have it don't set up the HPET table because Windows doesn't use it right now. However that can be avoided with some chipset specific code. -Andi ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 18:51 ` Andi Kleen @ 2005-06-13 19:35 ` Tony Lindgren 2005-06-13 19:48 ` Valdis.Kletnieks 1 sibling, 0 replies; 20+ messages in thread From: Tony Lindgren @ 2005-06-13 19:35 UTC (permalink / raw) To: Andi Kleen; +Cc: vatsa, linux-kernel * Andi Kleen <ak@muc.de> [050613 11:51]: > On Mon, Jun 13, 2005 at 11:37:16AM -0700, Tony Lindgren wrote: > > * Andi Kleen <ak@muc.de> [050613 10:57]: > > > Srivatsa Vaddagiri <vatsa@in.ibm.com> writes: > > > > > > > > 2. reprogram_apic_timer seems to reprogram the count-down > > > > APIC timer (APIC_TMICT) with an integral number of apic_timer_val. > > > > How accurate will this be? Shouldnt this take into account > > > > that we may not be reprogramming the timer on exactly "jiffy" > > > > boundary? > > > > > > All PIT based reprogramming schemes will lose time. > > > > Not true if the timesource is different from interrupt source. > > > > Consider PM timer for timesource, and PIT for interrupt source. Reprogamming > > PIT should not affect PM timer. Time is always updated from PM timer. > > PM timer is not really suitable for this because it overflows > too quickly (several times a second). It's longer than PIT overflow, which means it can be used. > Also you still lose time in timers (e.g. your internal timers slowly drift) > unless you regularly sync with the time source, but that has other > drawbacks. > > > > > > > Actually there is a small reason - RCU currently does not get > > > updated by a fully idle CPU and can stall other CPUs. But that is in > > > practice not too big an issue yet because so many subsystems > > > cause ticks now and then, so the CPUs tend to wake up often > > > enough to not stall the rest of the system too badly. > > > > I guess it should be safe to reprogram timer even if other CPUs are not > > idle, assuming the busy CPUs reprogramming timer will also wake up the idle > > CPUs. > > > > There's one thing that should be considered though; Reprogamming > > timers should be avoided if the system is busy as it causes > > performance issues. Especially reprogramming PIT. > > Just forget about reprogramming with PIT. IMHO that should > be never used in production. The right way for this > is HPET. PIT + PM timer / TSC is already working quite nicely. Of course it does not allow long sleeps, but it still helps in bringing down the HZ to about 35HZ. > The main issue with HPET is that many BIOS even though the chipsets > have it don't set up the HPET table because Windows doesn't use > it right now. However that can be avoided with some chipset > specific code. I don't have any x86 HPET hardware right now. But it sounds like it should allow multisecond skipping of ticks. Regards, Tony ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 18:51 ` Andi Kleen 2005-06-13 19:35 ` Tony Lindgren @ 2005-06-13 19:48 ` Valdis.Kletnieks 1 sibling, 0 replies; 20+ messages in thread From: Valdis.Kletnieks @ 2005-06-13 19:48 UTC (permalink / raw) To: Andi Kleen; +Cc: Tony Lindgren, vatsa, linux-kernel [-- Attachment #1: Type: text/plain, Size: 350 bytes --] On Mon, 13 Jun 2005 20:51:09 +0200, Andi Kleen said: > The main issue with HPET is that many BIOS even though the chipsets > have it don't set up the HPET table because Windows doesn't use > it right now. However that can be avoided with some chipset > specific code. If the Intel 845 chipset in my laptop has an HPET, I'm willing to test code. ;) [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] Dynamic tick for x86 version 050609-2 2005-06-13 17:09 ` Srivatsa Vaddagiri 2005-06-13 17:55 ` Andi Kleen @ 2005-06-13 18:27 ` Tony Lindgren 1 sibling, 0 replies; 20+ messages in thread From: Tony Lindgren @ 2005-06-13 18:27 UTC (permalink / raw) To: Srivatsa Vaddagiri; +Cc: linux-kernel * Srivatsa Vaddagiri <vatsa@in.ibm.com> [050613 10:09]: > Hi Tony, > I went through the dynamic-tick patch on your website > (patch-dynamic-tick-2.6.12-rc6-050610-1) and was having some > questions about it: > > 1. dyn_tick->skip is set to the number of ticks that have > to be skipped. This is set on the CPU which is the last > (in online_map) to go idle and is based on when that > CPU's next timer is set to expire. > > Other CPUs also seem to use the same interval > to skip ticks. Shouldnt other CPU check their nearest timer > rather than blindly skipping dyn_tick->skip number of ticks? Probably, unless the wake-up of the first CPU will also wake up the rest. > > 2. reprogram_apic_timer seems to reprogram the count-down > APIC timer (APIC_TMICT) with an integral number of apic_timer_val. > How accurate will this be? Shouldnt this take into account > that we may not be reprogramming the timer on exactly "jiffy" > boundary? The timer reprogramming functions should be converted to use usecs. We just currently get the time in jifies from next_timer_interrupt(). > 3. Is there any strong reason why you reprogram timers only when > _all_ CPUs are idle? I don't know this for sure. It seemed like the safest way to go for now. > 4. In what aspects you think does your patch differ from VST (other > than not relying on HRT!)? Dyntick uses next_timer_interrupt(), which is already part of the mainline kernel. It also works with PIT + PM timer or TSC. Tony ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2005-06-14 15:42 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-06-08 22:14 [PATCH] Dynamic tick for x86 version 050602-1 Pallipadi, Venkatesh 2005-06-09 1:40 ` Tony Lindgren 2005-06-10 4:30 ` [PATCH] Dynamic tick for x86 version 050609-2 Tony Lindgren 2005-06-10 9:10 ` Pavel Machek 2005-06-10 15:10 ` Tony Lindgren 2005-06-13 4:54 ` Valdis.Kletnieks 2005-06-13 15:25 ` Tony Lindgren 2005-06-13 16:47 ` Valdis.Kletnieks 2005-06-13 18:01 ` Thomas Renninger 2005-06-13 18:22 ` Tony Lindgren 2005-06-13 19:07 ` Valdis.Kletnieks 2005-06-14 9:39 ` Thomas Renninger 2005-06-14 15:40 ` Valdis.Kletnieks 2005-06-13 17:09 ` Srivatsa Vaddagiri 2005-06-13 17:55 ` Andi Kleen 2005-06-13 18:37 ` Tony Lindgren 2005-06-13 18:51 ` Andi Kleen 2005-06-13 19:35 ` Tony Lindgren 2005-06-13 19:48 ` Valdis.Kletnieks 2005-06-13 18:27 ` Tony Lindgren
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox