* [patch 0/2] softlockup watchdog improvements @ 2007-03-27 5:38 Jeremy Fitzhardinge 2007-03-27 5:38 ` [patch 1/2] Ignore stolen time in the softlockup watchdog Jeremy Fitzhardinge 2007-03-27 5:38 ` [patch 2/2] percpu enable flag for " Jeremy Fitzhardinge 0 siblings, 2 replies; 12+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-27 5:38 UTC (permalink / raw) To: Andrew Morton Cc: virtualization, Ingo Molnar, Thomas Gleixner, Linux Kernel, john stultz Here's couple of patches to improve the softlockup watchdog. The first changes the softlockup timer from using jiffies to sched_clock() as a timebase. Xen and VMI implement sched_clock() as counting unstolen time, so time stolen by the hypervisor won't cause the watchdog to bite. The second adds per-cpu enable flags for the watchdog timer. This allows the timer to be disabled when the CPU goes into a (potentially unbounded) tickless sleep. I know this conflicts with fix-bogus-softlockup-warning-with-sysrq-t.patch in -mm2. I think that patch incorrectly changes the behaviour of the softlockup watchdog, and a better solution is to temporarily disable the watchdog while doing something known to be cpu-consuming, like a long sysreq output. J -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 5:38 [patch 0/2] softlockup watchdog improvements Jeremy Fitzhardinge @ 2007-03-27 5:38 ` Jeremy Fitzhardinge 2007-03-27 7:00 ` Eric Dumazet 2007-03-27 14:39 ` Prarit Bhargava 2007-03-27 5:38 ` [patch 2/2] percpu enable flag for " Jeremy Fitzhardinge 1 sibling, 2 replies; 12+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-27 5:38 UTC (permalink / raw) To: Andrew Morton Cc: Linux Kernel, virtualization, Ingo Molnar, Thomas Gleixner, john stultz, Zachary Amsden, James Morris, Dan Hecht, Paul Mackerras, Martin Schwidefsky, Prarit Bhargava, Chris Lalancette, Rick Lindsley [-- Attachment #1: softlockup-ignore-stolen-time.patch --] [-- Type: text/plain, Size: 4020 bytes --] The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> --- kernel/softlockup.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) =================================================================== --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -17,8 +17,8 @@ static DEFINE_SPINLOCK(print_lock); -static DEFINE_PER_CPU(unsigned long, touch_timestamp); -static DEFINE_PER_CPU(unsigned long, print_timestamp); +static DEFINE_PER_CPU(unsigned long long, touch_timestamp); +static DEFINE_PER_CPU(unsigned long long, print_timestamp); static DEFINE_PER_CPU(struct task_struct *, watchdog_task); static int did_panic = 0; @@ -37,7 +37,7 @@ static struct notifier_block panic_block void touch_softlockup_watchdog(void) { - __raw_get_cpu_var(touch_timestamp) = jiffies; + __raw_get_cpu_var(touch_timestamp) = sched_clock(); } EXPORT_SYMBOL(touch_softlockup_watchdog); @@ -48,10 +48,15 @@ void softlockup_tick(void) void softlockup_tick(void) { int this_cpu = smp_processor_id(); - unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu); + unsigned long long touch_timestamp = per_cpu(touch_timestamp, this_cpu); + unsigned long long now; - /* prevent double reports: */ - if (per_cpu(print_timestamp, this_cpu) == touch_timestamp || + /* watchdog task hasn't updated timestamp yet */ + if (touch_timestamp == 0) + return; + + /* report at most once a second */ + if (per_cpu(print_timestamp, this_cpu) < (touch_timestamp + NSEC_PER_SEC) || did_panic || !per_cpu(watchdog_task, this_cpu)) return; @@ -62,12 +67,14 @@ void softlockup_tick(void) return; } + now = sched_clock(); + /* Wake up the high-prio watchdog task every second: */ - if (time_after(jiffies, touch_timestamp + HZ)) + if (now > (touch_timestamp + NSEC_PER_SEC)) wake_up_process(per_cpu(watchdog_task, this_cpu)); /* Warn about unreasonable 10+ seconds delays: */ - if (time_after(jiffies, touch_timestamp + 10*HZ)) { + if (now > (touch_timestamp + 10ull*NSEC_PER_SEC)) { per_cpu(print_timestamp, this_cpu) = touch_timestamp; spin_lock(&print_lock); @@ -87,6 +94,9 @@ static int watchdog(void * __bind_cpu) sched_setscheduler(current, SCHED_FIFO, ¶m); current->flags |= PF_NOFREEZE; + + /* initialize timestamp */ + touch_softlockup_watchdog(); /* * Run briefly once per second to reset the softlockup timestamp. @@ -120,7 +130,7 @@ cpu_callback(struct notifier_block *nfb, printk("watchdog for %i failed\n", hotcpu); return NOTIFY_BAD; } - per_cpu(touch_timestamp, hotcpu) = jiffies; + per_cpu(touch_timestamp, hotcpu) = 0; per_cpu(watchdog_task, hotcpu) = p; kthread_bind(p, hotcpu); break; -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 5:38 ` [patch 1/2] Ignore stolen time in the softlockup watchdog Jeremy Fitzhardinge @ 2007-03-27 7:00 ` Eric Dumazet 2007-03-27 7:12 ` Jeremy Fitzhardinge 2007-03-27 14:39 ` Prarit Bhargava 1 sibling, 1 reply; 12+ messages in thread From: Eric Dumazet @ 2007-03-27 7:00 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar, Thomas Gleixner, john stultz, Zachary Amsden, James Morris, Dan Hecht, Paul Mackerras, Martin Schwidefsky, Prarit Bhargava, Chris Lalancette, Rick Lindsley Jeremy Fitzhardinge a écrit : > +static DEFINE_PER_CPU(unsigned long long, touch_timestamp); ... > void touch_softlockup_watchdog(void) > { > - __raw_get_cpu_var(touch_timestamp) = jiffies; > + __raw_get_cpu_var(touch_timestamp) = sched_clock(); > } Not very clear if this is safe on 32bit, since this is not anymore atomic. Maybe a comment would help ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 7:00 ` Eric Dumazet @ 2007-03-27 7:12 ` Jeremy Fitzhardinge 2007-03-27 7:50 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-27 7:12 UTC (permalink / raw) To: Eric Dumazet Cc: Prarit Bhargava, Rick Lindsley, john stultz, Ingo Molnar, Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky, Thomas Gleixner, Andrew Morton Eric Dumazet wrote: > Jeremy Fitzhardinge a écrit : > >> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp); > > ... > >> void touch_softlockup_watchdog(void) >> { >> - __raw_get_cpu_var(touch_timestamp) = jiffies; >> + __raw_get_cpu_var(touch_timestamp) = sched_clock(); >> } > > Not very clear if this is safe on 32bit, since this is not anymore > atomic. Hm, good point. Don't think it matters very much. These values are per-cpu, and if an interrupt happens between the word updates and the intermediate values causes a timeout, then it was pretty marginal anyway. I guess the worst case is if the low-word gets written first, and it goes from a high value to low, then it could be sampled as if time had gone back by up to ~4 seconds. I'll give it another look. J ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 7:12 ` Jeremy Fitzhardinge @ 2007-03-27 7:50 ` Eric Dumazet 0 siblings, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2007-03-27 7:50 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Prarit Bhargava, Rick Lindsley, john stultz, Ingo Molnar, Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky, Thomas Gleixner, Andrew Morton On Tue, 27 Mar 2007 00:12:53 -0700 Jeremy Fitzhardinge <jeremy@goop.org> wrote: > Eric Dumazet wrote: > > Jeremy Fitzhardinge a écrit : > > > >> +static DEFINE_PER_CPU(unsigned long long, touch_timestamp); > > > > ... > > > >> void touch_softlockup_watchdog(void) > >> { > >> - __raw_get_cpu_var(touch_timestamp) = jiffies; > >> + __raw_get_cpu_var(touch_timestamp) = sched_clock(); > >> } > > > > Not very clear if this is safe on 32bit, since this is not anymore > > atomic. > > Hm, good point. Don't think it matters very much. These values are > per-cpu, and if an interrupt happens between the word updates and the > intermediate values causes a timeout, then it was pretty marginal > anyway. I guess the worst case is if the low-word gets written first, > and it goes from a high value to low, then it could be sampled as if > time had gone back by up to ~4 seconds. > > I'll give it another look. OK thanks. I noticed another 'not clear' bit in your second patch : void softlockup_enable(void) { touch_softlockup_watchdog(); wmb(); /* update timestamp before enable */ __get_cpu_var(enabled) = 1; } Are you sure wmb() is needed here ? I think a barrier() (compiler barrier) should be enough. If not, a nice comment would help too :) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 5:38 ` [patch 1/2] Ignore stolen time in the softlockup watchdog Jeremy Fitzhardinge 2007-03-27 7:00 ` Eric Dumazet @ 2007-03-27 14:39 ` Prarit Bhargava 2007-03-27 16:37 ` Jeremy Fitzhardinge 1 sibling, 1 reply; 12+ messages in thread From: Prarit Bhargava @ 2007-03-27 14:39 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Rick Lindsley, john stultz, Ingo Molnar, Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky, Thomas Gleixner, Andrew Morton Jeremy Fitzhardinge wrote: > > --- > kernel/softlockup.c | 28 +++++++++++++++++++--------- > 1 file changed, 19 insertions(+), 9 deletions(-) > > =================================================================== > --- a/kernel/softlockup.c > +++ b/kernel/softlockup.c > @@ -17,8 +17,8 @@ > > static DEFINE_SPINLOCK(print_lock); > > -static DEFINE_PER_CPU(unsigned long, touch_timestamp); > -static DEFINE_PER_CPU(unsigned long, print_timestamp); > +static DEFINE_PER_CPU(unsigned long long, touch_timestamp); > +static DEFINE_PER_CPU(unsigned long long, print_timestamp); > static DEFINE_PER_CPU(struct task_struct *, watchdog_task); > > static int did_panic = 0; > @@ -37,7 +37,7 @@ static struct notifier_block panic_block > > void touch_softlockup_watchdog(void) > { > - __raw_get_cpu_var(touch_timestamp) = jiffies; > + __raw_get_cpu_var(touch_timestamp) = sched_clock(); > } > I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour. See this now obsolete patch: http://lkml.org/lkml/2007/3/15/131 P. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 14:39 ` Prarit Bhargava @ 2007-03-27 16:37 ` Jeremy Fitzhardinge 2007-03-27 16:53 ` Prarit Bhargava 0 siblings, 1 reply; 12+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-27 16:37 UTC (permalink / raw) To: Prarit Bhargava Cc: Andrew Morton, Linux Kernel, virtualization, Ingo Molnar, Thomas Gleixner, john stultz, Zachary Amsden, James Morris, Dan Hecht, Paul Mackerras, Martin Schwidefsky, Chris Lalancette, Rick Lindsley Prarit Bhargava wrote: > I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog > and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour. Why? Is that more correct? It seems to me that you're interested in whether a specific CPU has gone and locked up. If touching the watchdog makes it update all CPU timestamps, then you'll hide the fact that other CPUs have locked up, won't it? J ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 16:37 ` Jeremy Fitzhardinge @ 2007-03-27 16:53 ` Prarit Bhargava 2007-03-27 17:10 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 12+ messages in thread From: Prarit Bhargava @ 2007-03-27 16:53 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Rick Lindsley, john stultz, Ingo Molnar, Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky, Thomas Gleixner, Andrew Morton Jeremy Fitzhardinge wrote: > Prarit Bhargava wrote: > >> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog >> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour. >> > > Why? Is that more correct? It seems to me that you're interested in > whether a specific CPU has gone and locked up. If touching the watchdog > > makes it update all CPU timestamps, then you'll hide the fact that other > CPUs have locked up, won't it? > > In case of misuse, yes. But there are cases where we know that all CPUs will have softlockup issues, such as when doing a "big" sysrq-t dump. When doing the sysrq-t we take the tasklist_lock which prevents all other CPUs from scheduling -- this leads to bogus softlockup messages, so we need to reset everyone's watchdog just before releasing the tasklist_lock. Another question -- are you going to expose disable/enable_watchdog to other subsystems? Or are you going to expose touch_softlockup_watchdog? > J > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 16:53 ` Prarit Bhargava @ 2007-03-27 17:10 ` Jeremy Fitzhardinge 2007-03-27 17:20 ` Prarit Bhargava 0 siblings, 1 reply; 12+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-27 17:10 UTC (permalink / raw) To: Prarit Bhargava Cc: virtualization, Rick Lindsley, Andrew Morton, Thomas Gleixner, Martin Schwidefsky, john stultz, Ingo Molnar, Linux Kernel, Paul Mackerras Prarit Bhargava wrote: > Jeremy Fitzhardinge wrote: > >> Prarit Bhargava wrote: >> >> >>> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog >>> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour. >>> >>> >> Why? Is that more correct? It seems to me that you're interested in >> whether a specific CPU has gone and locked up. If touching the watchdog >> >> makes it update all CPU timestamps, then you'll hide the fact that other >> CPUs have locked up, won't it? >> >> >> > In case of misuse, yes. But there are cases where we know that all CPUs > will have softlockup issues, such as when doing a "big" sysrq-t dump. > When doing the sysrq-t we take the tasklist_lock which prevents all > other CPUs from scheduling -- this leads to bogus softlockup messages, > so we need to reset everyone's watchdog just before releasing the > tasklist_lock. > > Another question -- are you going to expose disable/enable_watchdog to > other subsystems? Or are you going to expose touch_softlockup_watchdog? Well, it depends on who turns up. My first thought is to export both the global enable/disable interfaces and touch_softlockup_watchdog. But on second thoughts maybe touch_softlockup_watchdog is completely redundant, since you'd only do it if you're holding off timer interrupts, but the lockup only gets reported if timer interrupts are enabled (in other words, the best it can tell you is "you locked up for a while there", which isn't terribly useful). So perhaps this can just be dropped. I haven't looked at the users to see what they're really trying to achieve. The enable/disable interfaces are more generally useful in that you can say "I *know* I'm going to go away for a while, so don't bother reporting it". J ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 1/2] Ignore stolen time in the softlockup watchdog 2007-03-27 17:10 ` Jeremy Fitzhardinge @ 2007-03-27 17:20 ` Prarit Bhargava 0 siblings, 0 replies; 12+ messages in thread From: Prarit Bhargava @ 2007-03-27 17:20 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: virtualization, Rick Lindsley, Andrew Morton, Thomas Gleixner, Martin Schwidefsky, john stultz, Ingo Molnar, Linux Kernel, Paul Mackerras Jeremy Fitzhardinge wrote: > Prarit Bhargava wrote: > >> Jeremy Fitzhardinge wrote: >> >> >>> Prarit Bhargava wrote: >>> >>> >>> >>>> I'd like to see this patch implement/fix touch_cpu_softlockup_watchdog >>>> and touch_softlockup_watchdog to mimic touch_nmi_watchdog's behaviour. >>>> >>>> >>>> >>> Why? Is that more correct? It seems to me that you're interested in >>> whether a specific CPU has gone and locked up. If touching the watchdog >>> >>> makes it update all CPU timestamps, then you'll hide the fact that other >>> CPUs have locked up, won't it? >>> >>> >>> >>> >> In case of misuse, yes. But there are cases where we know that all CPUs >> will have softlockup issues, such as when doing a "big" sysrq-t dump. >> When doing the sysrq-t we take the tasklist_lock which prevents all >> other CPUs from scheduling -- this leads to bogus softlockup messages, >> so we need to reset everyone's watchdog just before releasing the >> tasklist_lock. >> >> Another question -- are you going to expose disable/enable_watchdog to >> other subsystems? Or are you going to expose touch_softlockup_watchdog? >> > > Well, it depends on who turns up. > > My first thought is to export both the global enable/disable interfaces > and touch_softlockup_watchdog. But on second thoughts maybe > touch_softlockup_watchdog is completely redundant, since you'd only do > IMO, if you export enable/disable you should drop touch_softlockup_watchdog. > it if you're holding off timer interrupts, but the lockup only gets > reported if timer interrupts are enabled (in other words, the best it > can tell you is "you locked up for a while there", which isn't terribly > useful). I like to think of the softlockup watchdog letting me know that a cpu hasn't scheduled in a long time. > So perhaps this can just be dropped. I haven't looked at the > users to see what they're really trying to achieve. > I've looked through much of that code for my previous patch ;) AFAICT the uses appear to be cases where we _know_ that we've gone away for a while and need to reset the timer. But there were some exceptions: touch_nmi_watchdog erroneously calls touch_softlockup_watchdog. In fact, touch_nmi_watchdog is trying to touch all cpus softlockup watchdogs, not just one. IIRC, There was an extra call to touch_softlockup_watchdog which wasn't necessary IIRC... Look at my previous patch where I replaced touch_softlockup_watchdog with touch_cpu_softlockup_watchdog ... > The enable/disable interfaces are more generally useful in that you can > say "I *know* I'm going to go away for a while, so don't bother > reporting it". > > J > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch 2/2] percpu enable flag for softlockup watchdog 2007-03-27 5:38 [patch 0/2] softlockup watchdog improvements Jeremy Fitzhardinge 2007-03-27 5:38 ` [patch 1/2] Ignore stolen time in the softlockup watchdog Jeremy Fitzhardinge @ 2007-03-27 5:38 ` Jeremy Fitzhardinge 2007-03-27 14:42 ` Prarit Bhargava 1 sibling, 1 reply; 12+ messages in thread From: Jeremy Fitzhardinge @ 2007-03-27 5:38 UTC (permalink / raw) To: Andrew Morton Cc: Prarit Bhargava, john stultz, Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky, Ingo Molnar, Thomas Gleixner [-- Attachment #1: softlockup-percpu-enable-flag.patch --] [-- Type: text/plain, Size: 6043 bytes --] On a NO_HZ system, there may be an arbitrarily long delay between ticks on a CPU. When we're disabling ticks for a CPU, also disable the softlockup watchdog timer. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> --- include/linux/sched.h | 8 ++++++++ kernel/softlockup.c | 23 +++++++++++++++++++---- kernel/time/tick-sched.c | 34 +++++++++++++++------------------- 3 files changed, 42 insertions(+), 23 deletions(-) =================================================================== --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -232,10 +232,18 @@ extern void scheduler_tick(void); #ifdef CONFIG_DETECT_SOFTLOCKUP extern void softlockup_tick(void); +extern void softlockup_enable(void); +extern void softlockup_disable(void); extern void spawn_softlockup_task(void); extern void touch_softlockup_watchdog(void); #else static inline void softlockup_tick(void) +{ +} +static inline void softlockup_enable(void) +{ +} +static inline void softlockup_disable(void) { } static inline void spawn_softlockup_task(void) =================================================================== --- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -20,6 +20,7 @@ static DEFINE_PER_CPU(unsigned long long static DEFINE_PER_CPU(unsigned long long, touch_timestamp); static DEFINE_PER_CPU(unsigned long long, print_timestamp); static DEFINE_PER_CPU(struct task_struct *, watchdog_task); +static DEFINE_PER_CPU(int, enabled); static int did_panic = 0; @@ -41,6 +42,18 @@ void touch_softlockup_watchdog(void) } EXPORT_SYMBOL(touch_softlockup_watchdog); +void softlockup_enable(void) +{ + touch_softlockup_watchdog(); + wmb(); /* update timestamp before enable */ + __get_cpu_var(enabled) = 1; +} + +void softlockup_disable(void) +{ + __get_cpu_var(enabled) = 0; +} + /* * This callback runs from the timer interrupt, and checks * whether the watchdog thread has hung or not: @@ -51,8 +64,8 @@ void softlockup_tick(void) unsigned long long touch_timestamp = per_cpu(touch_timestamp, this_cpu); unsigned long long now; - /* watchdog task hasn't updated timestamp yet */ - if (touch_timestamp == 0) + /* return if not enabled */ + if (!__get_cpu_var(enabled)) return; /* report at most once a second */ @@ -95,8 +108,8 @@ static int watchdog(void * __bind_cpu) sched_setscheduler(current, SCHED_FIFO, ¶m); current->flags |= PF_NOFREEZE; - /* initialize timestamp */ - touch_softlockup_watchdog(); + /* enable on this cpu */ + softlockup_enable(); /* * Run briefly once per second to reset the softlockup timestamp. @@ -109,6 +122,8 @@ static int watchdog(void * __bind_cpu) touch_softlockup_watchdog(); schedule(); } + + softlockup_disable(); return 0; } =================================================================== --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -228,6 +228,8 @@ void tick_nohz_stop_sched_tick(void) ts->idle_tick = ts->sched_timer.expires; ts->tick_stopped = 1; ts->idle_jiffies = last_jiffies; + + softlockup_disable(); } /* * calculate the expiry time for the next timer wheel @@ -255,6 +257,7 @@ void tick_nohz_stop_sched_tick(void) cpu_clear(cpu, nohz_cpu_mask); } raise_softirq_irqoff(TIMER_SOFTIRQ); + out: ts->next_jiffies = next_jiffies; ts->last_jiffies = last_jiffies; @@ -311,6 +314,8 @@ void tick_nohz_restart_sched_tick(void) ts->tick_stopped = 0; hrtimer_cancel(&ts->sched_timer); ts->sched_timer.expires = ts->idle_tick; + + softlockup_enable(); while (1) { /* Forward the time to expire in the future */ @@ -355,17 +360,12 @@ static void tick_nohz_handler(struct clo tick_do_update_jiffies64(now); /* - * When we are idle and the tick is stopped, we have to touch - * the watchdog as we might not schedule for a really long - * time. This happens on complete idle SMP systems while - * waiting on the login prompt. We also increment the "start - * of idle" jiffy stamp so the idle accounting adjustment we - * do when we go busy again does not account too much ticks. - */ - if (ts->tick_stopped) { - touch_softlockup_watchdog(); + * Increment the "start of idle" jiffy stamp so the idle + * accounting adjustment we do when we go busy again does not + * account too much ticks. + */ + if (ts->tick_stopped) ts->idle_jiffies++; - } update_process_times(user_mode(regs)); profile_tick(CPU_PROFILING); @@ -450,17 +450,12 @@ static enum hrtimer_restart tick_sched_t */ if (regs) { /* - * When we are idle and the tick is stopped, we have to touch - * the watchdog as we might not schedule for a really long - * time. This happens on complete idle SMP systems while - * waiting on the login prompt. We also increment the "start of - * idle" jiffy stamp so the idle accounting adjustment we do - * when we go busy again does not account too much ticks. + * Increment the "start of idle" jiffy stamp so the + * idle accounting adjustment we do when we go busy + * again does not account too much ticks. */ - if (ts->tick_stopped) { - touch_softlockup_watchdog(); + if (ts->tick_stopped) ts->idle_jiffies++; - } /* * update_process_times() might take tasklist_lock, hence * drop the base lock. sched-tick hrtimers are per-CPU and @@ -522,6 +517,7 @@ void tick_cancel_sched_timer(int cpu) if (ts->sched_timer.base) hrtimer_cancel(&ts->sched_timer); ts->tick_stopped = 0; + softlockup_enable(); ts->nohz_mode = NOHZ_MODE_INACTIVE; } #endif /* HIGH_RES_TIMERS */ -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch 2/2] percpu enable flag for softlockup watchdog 2007-03-27 5:38 ` [patch 2/2] percpu enable flag for " Jeremy Fitzhardinge @ 2007-03-27 14:42 ` Prarit Bhargava 0 siblings, 0 replies; 12+ messages in thread From: Prarit Bhargava @ 2007-03-27 14:42 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: john stultz, Ingo Molnar, Linux Kernel, virtualization, Paul Mackerras, Martin Schwidefsky, Thomas Gleixner, Andrew Morton Jeremy Fitzhardinge wrote: > +static DEFINE_PER_CPU(int, enabled); > Minor nit: let's call this softlockup_enabled. Makes it easier for the reader ;) P. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-03-27 17:20 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-03-27 5:38 [patch 0/2] softlockup watchdog improvements Jeremy Fitzhardinge 2007-03-27 5:38 ` [patch 1/2] Ignore stolen time in the softlockup watchdog Jeremy Fitzhardinge 2007-03-27 7:00 ` Eric Dumazet 2007-03-27 7:12 ` Jeremy Fitzhardinge 2007-03-27 7:50 ` Eric Dumazet 2007-03-27 14:39 ` Prarit Bhargava 2007-03-27 16:37 ` Jeremy Fitzhardinge 2007-03-27 16:53 ` Prarit Bhargava 2007-03-27 17:10 ` Jeremy Fitzhardinge 2007-03-27 17:20 ` Prarit Bhargava 2007-03-27 5:38 ` [patch 2/2] percpu enable flag for " Jeremy Fitzhardinge 2007-03-27 14:42 ` Prarit Bhargava
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).