From: Ingo Molnar <mingo@elte.hu>
To: David Miller <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de,
Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: [patch] softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
Date: Wed, 23 Apr 2008 10:50:30 +0200 [thread overview]
Message-ID: <20080423085030.GA22309@elte.hu> (raw)
In-Reply-To: <20080422.030519.259068348.davem@davemloft.net>
* David Miller <davem@davemloft.net> wrote:
> > +++ linux/kernel/time/tick-sched.c
> > @@ -393,6 +393,7 @@ void tick_nohz_restart_sched_tick(void)
> > sub_preempt_count(HARDIRQ_OFFSET);
> > }
> >
> > + touch_softlockup_watchdog();
> > /*
> > * Cancel the scheduled timer and restore the tick
> > */
>
> The NOHZ lockup warnings are gone. But this seems like a band-aid.
> We made sure that cpus don't get into this state via commit:
but that commit:
> @@ -133,6 +133,8 @@ void tick_nohz_update_jiffies(void)
> if (!ts->tick_stopped)
> return;
>
> + touch_softlockup_watchdog();
> +
> cpu_clear(cpu, nohz_cpu_mask);
> now = ktime_get();
updates the softlockup watchdog before we update jiffies:
> ts->idle_waketime = now;
>
> local_irq_save(flags);
> tick_do_update_jiffies64(now);
> local_irq_restore(flags);
so the patch i gave you first should do the trick - the cleaner patch is
attached below.
> While what the guilty patch we're discussing here does is change how
> cpu_clock() is computed, that's it. softlockup uses cpu_clock() to
> calculate it's timestamp. The guilty change modified nothing about
> when touch_softlockup_watchdog() is called, nor any other aspect about
> how the softlockup mechanism itself works.
>
> So we need to figure out why in the world changing how cpu_clock()
> gets calculated makes a difference.
the thing is that that change _fixed_ cpu_clock() - and exposed the
latent softlockup/nohz interaction that was always there and which
caused false positives if a CPU stayed idle for more than 60 seconds.
Ingo
------------------>
Subject: softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
From: Ingo Molnar <mingo@elte.hu>
Date: Wed Apr 23 10:01:08 CEST 2008
David Miller reported softlockup false positives on his 128-way Niagara2
system. The special thing about that system is extremely long clockevent
timeouts combined with extremly long idle times.
Fix rq->clock update bug that can trigger on such a system: in
tick_nohz_update_jiffies() [which is called on all irq entry on all cpus
where the irq entry hits an idle cpu] we call
touch_softlockup_watchdog() before we update jiffies. That works fine
most of the time when idle timeouts are within 60 seconds. But when an
idle timeout is beyond 60 seconds, jiffies is updated with a jump of
more than 60 seconds, which causes a jump in cpu-clock of more than 60
seconds, triggering a false positive.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/time/tick-sched.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux/kernel/time/tick-sched.c
===================================================================
--- linux.orig/kernel/time/tick-sched.c
+++ linux/kernel/time/tick-sched.c
@@ -133,8 +133,6 @@ void tick_nohz_update_jiffies(void)
if (!ts->tick_stopped)
return;
- touch_softlockup_watchdog();
-
cpu_clear(cpu, nohz_cpu_mask);
now = ktime_get();
ts->idle_waketime = now;
@@ -142,6 +140,8 @@ void tick_nohz_update_jiffies(void)
local_irq_save(flags);
tick_do_update_jiffies64(now);
local_irq_restore(flags);
+
+ touch_softlockup_watchdog();
}
void tick_nohz_stop_idle(int cpu)
next prev parent reply other threads:[~2008-04-23 8:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-22 8:59 Soft lockup regression from today's sched.git merge David Miller
2008-04-22 9:14 ` Ingo Molnar
2008-04-22 10:05 ` David Miller
2008-04-22 12:45 ` Peter Zijlstra
2008-05-06 22:41 ` Rafael J. Wysocki
2008-05-06 23:05 ` David Miller
2008-05-07 6:43 ` Ingo Molnar
2008-05-07 18:56 ` Rafael J. Wysocki
2008-04-23 8:50 ` Ingo Molnar [this message]
2008-04-23 10:55 ` [patch] softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds David Miller
2008-04-23 12:29 ` David Miller
2008-04-23 13:36 ` Ingo Molnar
2008-04-23 23:23 ` David Miller
2008-04-23 5:42 ` Soft lockup regression from today's sched.git merge David Miller
2008-04-23 7:32 ` Dhaval Giani
2008-04-23 7:51 ` Ingo Molnar
2008-04-23 9:40 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080423085030.GA22309@elte.hu \
--to=mingo@elte.hu \
--cc=a.p.zijlstra@chello.nl \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox