* [patch 0/2] NOHZ vs. profile/oprofile @ 2009-05-28 15:04 Martin Schwidefsky 2009-05-28 15:04 ` [patch 1/2] idle profile hits with NOHZ Martin Schwidefsky 2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky 0 siblings, 2 replies; 11+ messages in thread From: Martin Schwidefsky @ 2009-05-28 15:04 UTC (permalink / raw) To: linux-kernel Cc: Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner, john stultz Greetings, Rob pointed me to a deficiency with the current profile/oprofile code together with NOHZ. For us this problem crept in with the conversion of s390 to generic clock events, git commit 5a62b192 If the system is running with the HZ-tick disabled and the cpu spents time in idle we see skewed percentages e.g. with the oprofile output. On an I/O bound system the number of idle ticks is way to small. The reason is that the generic clock events code reports either zero or one tick to profile/oprofile on wakeup from idle even if the cpu has slept much longer. I've tried to fix that with the two patches in this series and another pure s390 specific fix (profile_tick is called from clock_comparator_work which is nonsense). These three do correct the oprofile output on s390. The best idea I had to get oprofile in good shape again is to let the system tick at the HZ rate while oprofile is working. Better ideas welcome. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* [patch 1/2] idle profile hits with NOHZ 2009-05-28 15:04 [patch 0/2] NOHZ vs. profile/oprofile Martin Schwidefsky @ 2009-05-28 15:04 ` Martin Schwidefsky 2009-05-28 20:19 ` Thomas Gleixner 2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky 1 sibling, 1 reply; 11+ messages in thread From: Martin Schwidefsky @ 2009-05-28 15:04 UTC (permalink / raw) To: linux-kernel Cc: Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner, john stultz, Martin Schwidefsky [-- Attachment #1: profile-idle-tick.diff --] [-- Type: text/plain, Size: 2963 bytes --] From: Martin Schwidefsky <schwidefsky@de.ibm.com> On a NOHZ system the generic clock events code reports a maximum of 1 tick to the kernel profiler. If the system slept for more than a single tick or has not been woken by a timer interrupt the number of ticks reported will be incorrect. This skewes the percentages in the profile. A good place to report the profile ticks is in tick_nohz_restart_sched_tick. We calculate the number of ticks for the cpu accounting anyway. The only obstacle is that we need an instruction address for the profile ticks. In order to get one extend the tick_sched structure by an idle_pc field and set it in tick_nohz_stop_idle which is called from tick_check_idle. Note: this does not solve the same problem in oprofile. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> --- include/linux/tick.h | 1 + kernel/time/tick-sched.c | 13 ++++++++----- 2 files changed, 9 insertions(+), 5 deletions(-) Index: quilt-2.6/include/linux/tick.h =================================================================== --- quilt-2.6.orig/include/linux/tick.h +++ quilt-2.6/include/linux/tick.h @@ -64,6 +64,7 @@ struct tick_sched { unsigned long last_jiffies; unsigned long next_jiffies; ktime_t idle_expires; + unsigned long idle_pc; }; extern void __init tick_init(void); Index: quilt-2.6/kernel/time/tick-sched.c =================================================================== --- quilt-2.6.orig/kernel/time/tick-sched.c +++ quilt-2.6/kernel/time/tick-sched.c @@ -166,6 +166,7 @@ static void tick_nohz_stop_idle(int cpu) ts->idle_lastupdate = now; ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta); ts->idle_active = 0; + ts->idle_pc = profile_pc(get_irq_regs()); sched_clock_idle_wakeup_event(0); } @@ -419,9 +420,7 @@ void tick_nohz_restart_sched_tick(void) { int cpu = smp_processor_id(); struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); -#ifndef CONFIG_VIRT_CPU_ACCOUNTING unsigned long ticks; -#endif ktime_t now; local_irq_disable(); @@ -443,19 +442,23 @@ void tick_nohz_restart_sched_tick(void) tick_do_update_jiffies64(now); cpumask_clear_cpu(cpu, nohz_cpu_mask); + ticks = jiffies - ts->idle_jiffies; #ifndef CONFIG_VIRT_CPU_ACCOUNTING /* * We stopped the tick in idle. Update process times would miss the * time we slept as update_process_times does only a 1 tick * accounting. Enforce that this is accounted to idle ! - */ - ticks = jiffies - ts->idle_jiffies; - /* + * * We might be one off. Do not randomly account a huge number of ticks! */ if (ticks && ticks < LONG_MAX) account_idle_ticks(ticks); #endif +#ifdef CONFIG_PROFILING + /* Create profile hits for all ticks we slept in idle. */ + if (ticks && ticks < LONG_MAX) + profile_hits(CPU_PROFILING, (void *) ts->idle_pc, ticks); +#endif touch_softlockup_watchdog(); /* -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 1/2] idle profile hits with NOHZ 2009-05-28 15:04 ` [patch 1/2] idle profile hits with NOHZ Martin Schwidefsky @ 2009-05-28 20:19 ` Thomas Gleixner 2009-05-29 12:56 ` Martin Schwidefsky 0 siblings, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2009-05-28 20:19 UTC (permalink / raw) To: Martin Schwidefsky Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz On Thu, 28 May 2009, Martin Schwidefsky wrote: > --- quilt-2.6.orig/kernel/time/tick-sched.c > +++ quilt-2.6/kernel/time/tick-sched.c > @@ -166,6 +166,7 @@ static void tick_nohz_stop_idle(int cpu) > ts->idle_lastupdate = now; > ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta); > ts->idle_active = 0; > + ts->idle_pc = profile_pc(get_irq_regs()); Hmm, tick_nohz_stop_idle() is called from tick_nohz_restart_sched_tick() as well in the context of the idle task. I think there is no guarantee that get_irq_regs() will return anything useful in thread context. So get_irq_regs() might return a NULL pointer which will explode some of the profile_pc() implementations. If not it can still feed total nonsense to the profile_hits() call. Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 1/2] idle profile hits with NOHZ 2009-05-28 20:19 ` Thomas Gleixner @ 2009-05-29 12:56 ` Martin Schwidefsky 2009-05-29 13:15 ` Thomas Gleixner 0 siblings, 1 reply; 11+ messages in thread From: Martin Schwidefsky @ 2009-05-29 12:56 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz On Thu, 28 May 2009 22:19:44 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 28 May 2009, Martin Schwidefsky wrote: > > --- quilt-2.6.orig/kernel/time/tick-sched.c > > +++ quilt-2.6/kernel/time/tick-sched.c > > @@ -166,6 +166,7 @@ static void tick_nohz_stop_idle(int cpu) > > ts->idle_lastupdate = now; > > ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta); > > ts->idle_active = 0; > > + ts->idle_pc = profile_pc(get_irq_regs()); > > Hmm, tick_nohz_stop_idle() is called from > tick_nohz_restart_sched_tick() as well in the context of the idle > task. > > I think there is no guarantee that get_irq_regs() will return > anything useful in thread context. > > So get_irq_regs() might return a NULL pointer which will explode > some of the profile_pc() implementations. If not it can still feed > total nonsense to the profile_hits() call. Drat, but if it is called from idle we should be able to just skip the assignment to ts->idle_pc, no? Then a simple in_interrupt() test is missing. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 1/2] idle profile hits with NOHZ 2009-05-29 12:56 ` Martin Schwidefsky @ 2009-05-29 13:15 ` Thomas Gleixner 0 siblings, 0 replies; 11+ messages in thread From: Thomas Gleixner @ 2009-05-29 13:15 UTC (permalink / raw) To: Martin Schwidefsky Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz On Fri, 29 May 2009, Martin Schwidefsky wrote: > On Thu, 28 May 2009 22:19:44 +0200 (CEST) > Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Thu, 28 May 2009, Martin Schwidefsky wrote: > > > --- quilt-2.6.orig/kernel/time/tick-sched.c > > > +++ quilt-2.6/kernel/time/tick-sched.c > > > @@ -166,6 +166,7 @@ static void tick_nohz_stop_idle(int cpu) > > > ts->idle_lastupdate = now; > > > ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta); > > > ts->idle_active = 0; > > > + ts->idle_pc = profile_pc(get_irq_regs()); > > > > Hmm, tick_nohz_stop_idle() is called from > > tick_nohz_restart_sched_tick() as well in the context of the idle > > task. > > > > I think there is no guarantee that get_irq_regs() will return > > anything useful in thread context. > > > > So get_irq_regs() might return a NULL pointer which will explode > > some of the profile_pc() implementations. If not it can still feed > > total nonsense to the profile_hits() call. > > Drat, but if it is called from idle we should be able to just skip the > assignment to ts->idle_pc, no? Then a simple in_interrupt() test is > missing. That should do the trick. Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* [patch 2/2] keep on ticking if oprofile is active 2009-05-28 15:04 [patch 0/2] NOHZ vs. profile/oprofile Martin Schwidefsky 2009-05-28 15:04 ` [patch 1/2] idle profile hits with NOHZ Martin Schwidefsky @ 2009-05-28 15:04 ` Martin Schwidefsky 2009-05-28 20:29 ` Thomas Gleixner 2009-06-01 8:09 ` Andi Kleen 1 sibling, 2 replies; 11+ messages in thread From: Martin Schwidefsky @ 2009-05-28 15:04 UTC (permalink / raw) To: linux-kernel Cc: Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner, john stultz, Martin Schwidefsky [-- Attachment #1: profile-stop-nohz.diff --] [-- Type: text/plain, Size: 3958 bytes --] From: Martin Schwidefsky <schwidefsky@de.ibm.com> On a NOHZ system with oprofile enabled the timer tick should not be stopped when a cpu goes idle. Oprofile needs the pt_regs structure of the interrupt and allocates memory in the ring buffer for each sample. Current a maximum of 1 tick is accounted with oprofile if a cpu sleeps for a longer period of time. This does bad things to the percentages in the oprofile output. To postpone the oprofile tick to tick_nohz_restart_sched_tick analog to the in kernel profiler is not possible as there is no pt_regs structure in the context the tick_nohz_restart_sched_tick function is called and it is not a good idea to create hundreds of samples at once. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> --- drivers/oprofile/oprof.c | 3 +++ include/linux/tick.h | 4 ++++ kernel/time/tick-sched.c | 26 +++++++++++++++++++++++++- 3 files changed, 32 insertions(+), 1 deletion(-) Index: quilt-2.6/drivers/oprofile/oprof.c =================================================================== --- quilt-2.6.orig/drivers/oprofile/oprof.c +++ quilt-2.6/drivers/oprofile/oprof.c @@ -12,6 +12,7 @@ #include <linux/init.h> #include <linux/oprofile.h> #include <linux/moduleparam.h> +#include <linux/tick.h> #include <asm/mutex.h> #include "oprof.h" @@ -103,6 +104,7 @@ int oprofile_start(void) if (oprofile_started) goto out; + tick_nohz_disable(); oprofile_reset_stats(); if ((err = oprofile_ops.start())) @@ -123,6 +125,7 @@ void oprofile_stop(void) goto out; oprofile_ops.stop(); oprofile_started = 0; + tick_nohz_enable(); /* wake up the daemon to read what remains */ wake_up_buffer_waiter(); out: Index: quilt-2.6/include/linux/tick.h =================================================================== --- quilt-2.6.orig/include/linux/tick.h +++ quilt-2.6/include/linux/tick.h @@ -117,6 +117,8 @@ extern void tick_nohz_stop_sched_tick(in extern void tick_nohz_restart_sched_tick(void); extern ktime_t tick_nohz_get_sleep_length(void); extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time); +extern void tick_nohz_enable(void); +extern void tick_nohz_disable(void); # else static inline void tick_nohz_stop_sched_tick(int inidle) { } static inline void tick_nohz_restart_sched_tick(void) { } @@ -127,6 +129,8 @@ static inline ktime_t tick_nohz_get_slee return len; } static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; } +static inline void tick_nohz_enable(void) { } +static inline void tick_nohz_disable(void) { } # endif /* !NO_HZ */ #endif Index: quilt-2.6/kernel/time/tick-sched.c =================================================================== --- quilt-2.6.orig/kernel/time/tick-sched.c +++ quilt-2.6/kernel/time/tick-sched.c @@ -124,6 +124,29 @@ static int __init setup_tick_nohz(char * __setup("nohz=", setup_tick_nohz); +/* + * NO HZ currently disabled ? + */ +static atomic_t tick_nohz_disable_counter = ATOMIC_INIT(0); + +void tick_nohz_enable(void) +{ + atomic_dec(&tick_nohz_disable_counter); +} +EXPORT_SYMBOL_GPL(tick_nohz_enable); + +static void __tick_nohz_disable(void *dummy) +{ +} + +void tick_nohz_disable(void) +{ + if (atomic_inc_return(&tick_nohz_disable_counter) == 1) + /* Wake up all cpus to make them start ticking. */ + smp_call_function(__tick_nohz_disable, NULL, 0); +} +EXPORT_SYMBOL_GPL(tick_nohz_disable); + /** * tick_nohz_update_jiffies - update jiffies when idle was interrupted * @@ -271,7 +294,8 @@ void tick_nohz_stop_sched_tick(int inidl next_jiffies = get_next_timer_interrupt(last_jiffies); delta_jiffies = next_jiffies - last_jiffies; - if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu)) + if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) || + atomic_read(&tick_nohz_disable_counter) > 0) delta_jiffies = 1; /* * Do not stop the tick, if we are only one off -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 2/2] keep on ticking if oprofile is active 2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky @ 2009-05-28 20:29 ` Thomas Gleixner 2009-05-29 12:57 ` Martin Schwidefsky 2009-06-01 8:09 ` Andi Kleen 1 sibling, 1 reply; 11+ messages in thread From: Thomas Gleixner @ 2009-05-28 20:29 UTC (permalink / raw) To: Martin Schwidefsky Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz On Thu, 28 May 2009, Martin Schwidefsky wrote: > From: Martin Schwidefsky <schwidefsky@de.ibm.com> > > On a NOHZ system with oprofile enabled the timer tick should not be > stopped when a cpu goes idle. Oprofile needs the pt_regs structure > of the interrupt and allocates memory in the ring buffer for each > sample. Current a maximum of 1 tick is accounted with oprofile if a > cpu sleeps for a longer period of time. This does bad things to the > percentages in the oprofile output. To postpone the oprofile tick to > tick_nohz_restart_sched_tick analog to the in kernel profiler is not > possible as there is no pt_regs structure in the context the > tick_nohz_restart_sched_tick function is called and it is not a good > idea to create hundreds of samples at once. Sigh. That's stupid. OTOH, thinking more about the patch itself it might be even useful for things aside oprofile. Runtime switching from and to nohz mode for debugging or evaluation purposes comes to my mind. That would need some sysfs interface, but that's not too hard to do. So yeah, I think we should satisfy oprofile needs and utilize it further. Thanks, tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 2/2] keep on ticking if oprofile is active 2009-05-28 20:29 ` Thomas Gleixner @ 2009-05-29 12:57 ` Martin Schwidefsky 2009-05-29 13:14 ` Thomas Gleixner 0 siblings, 1 reply; 11+ messages in thread From: Martin Schwidefsky @ 2009-05-29 12:57 UTC (permalink / raw) To: Thomas Gleixner Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz On Thu, 28 May 2009 22:29:38 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 28 May 2009, Martin Schwidefsky wrote: > > From: Martin Schwidefsky <schwidefsky@de.ibm.com> > > > > On a NOHZ system with oprofile enabled the timer tick should not be > > stopped when a cpu goes idle. Oprofile needs the pt_regs structure > > of the interrupt and allocates memory in the ring buffer for each > > sample. Current a maximum of 1 tick is accounted with oprofile if a > > cpu sleeps for a longer period of time. This does bad things to the > > percentages in the oprofile output. To postpone the oprofile tick to > > tick_nohz_restart_sched_tick analog to the in kernel profiler is not > > possible as there is no pt_regs structure in the context the > > tick_nohz_restart_sched_tick function is called and it is not a good > > idea to create hundreds of samples at once. > > Sigh. That's stupid. What is stupid, the bug or the fix? > OTOH, thinking more about the patch itself it might be even useful > for things aside oprofile. Runtime switching from and to nohz mode > for debugging or evaluation purposes comes to my mind. That would > need some sysfs interface, but that's not too hard to do. That should be no problem. We used to have the hz_timer system control with the old no-tick solution on s390. > So yeah, I think we should satisfy oprofile needs and utilize it further. Ok, so you are in principle fine with the patch? -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 2/2] keep on ticking if oprofile is active 2009-05-29 12:57 ` Martin Schwidefsky @ 2009-05-29 13:14 ` Thomas Gleixner 0 siblings, 0 replies; 11+ messages in thread From: Thomas Gleixner @ 2009-05-29 13:14 UTC (permalink / raw) To: Martin Schwidefsky Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, john stultz On Fri, 29 May 2009, Martin Schwidefsky wrote: > On Thu, 28 May 2009 22:29:38 +0200 (CEST) > Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Thu, 28 May 2009, Martin Schwidefsky wrote: > > > From: Martin Schwidefsky <schwidefsky@de.ibm.com> > > > > > > On a NOHZ system with oprofile enabled the timer tick should not be > > > stopped when a cpu goes idle. Oprofile needs the pt_regs structure > > > of the interrupt and allocates memory in the ring buffer for each > > > sample. Current a maximum of 1 tick is accounted with oprofile if a > > > cpu sleeps for a longer period of time. This does bad things to the > > > percentages in the oprofile output. To postpone the oprofile tick to > > > tick_nohz_restart_sched_tick analog to the in kernel profiler is not > > > possible as there is no pt_regs structure in the context the > > > tick_nohz_restart_sched_tick function is called and it is not a good > > > idea to create hundreds of samples at once. > > > > Sigh. That's stupid. > > What is stupid, the bug or the fix? The bug :) > > OTOH, thinking more about the patch itself it might be even useful > > for things aside oprofile. Runtime switching from and to nohz mode > > for debugging or evaluation purposes comes to my mind. That would > > need some sysfs interface, but that's not too hard to do. > > That should be no problem. We used to have the hz_timer system control > with the old no-tick solution on s390. > > > So yeah, I think we should satisfy oprofile needs and utilize it further. > > Ok, so you are in principle fine with the patch? Yup. tglx ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 2/2] keep on ticking if oprofile is active 2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky 2009-05-28 20:29 ` Thomas Gleixner @ 2009-06-01 8:09 ` Andi Kleen 2009-06-01 10:22 ` Martin Schwidefsky 1 sibling, 1 reply; 11+ messages in thread From: Andi Kleen @ 2009-06-01 8:09 UTC (permalink / raw) To: Martin Schwidefsky Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner, john stultz Martin Schwidefsky <schwidefsky@de.ibm.com> writes: > From: Martin Schwidefsky <schwidefsky@de.ibm.com> > > On a NOHZ system with oprofile enabled the timer tick should not be The old style profile=... profiler should do that too. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch 2/2] keep on ticking if oprofile is active 2009-06-01 8:09 ` Andi Kleen @ 2009-06-01 10:22 ` Martin Schwidefsky 0 siblings, 0 replies; 11+ messages in thread From: Martin Schwidefsky @ 2009-06-01 10:22 UTC (permalink / raw) To: Andi Kleen Cc: linux-kernel, Rob van der Heij, Heiko Carstens, Ingo Molnar, Thomas Gleixner, john stultz On Mon, 01 Jun 2009 10:09:27 +0200 Andi Kleen <andi@firstfloor.org> wrote: > Martin Schwidefsky <schwidefsky@de.ibm.com> writes: > > > From: Martin Schwidefsky <schwidefsky@de.ibm.com> > > > > On a NOHZ system with oprofile enabled the timer tick should not be > > The old style profile=... profiler should do that too. Technically the old style profiler and NOHZ can be combined (see patch #1 plus a fix as discussed with Thomas). On the other hand it would make the code simpler if we just fallback to the standard tick if profile= is used. My personal preference would be to use my patch. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-06-01 10:22 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-05-28 15:04 [patch 0/2] NOHZ vs. profile/oprofile Martin Schwidefsky 2009-05-28 15:04 ` [patch 1/2] idle profile hits with NOHZ Martin Schwidefsky 2009-05-28 20:19 ` Thomas Gleixner 2009-05-29 12:56 ` Martin Schwidefsky 2009-05-29 13:15 ` Thomas Gleixner 2009-05-28 15:04 ` [patch 2/2] keep on ticking if oprofile is active Martin Schwidefsky 2009-05-28 20:29 ` Thomas Gleixner 2009-05-29 12:57 ` Martin Schwidefsky 2009-05-29 13:14 ` Thomas Gleixner 2009-06-01 8:09 ` Andi Kleen 2009-06-01 10:22 ` Martin Schwidefsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox