public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: David Miller <davem@davemloft.net>
Cc: mingo@elte.hu, torvalds@linux-foundation.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	viro@zeniv.linux.org.uk, alan@lxorguk.ukuu.org.uk,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [git pull] scheduler/misc fixes
Date: Fri, 25 Apr 2008 12:19:38 +0200	[thread overview]
Message-ID: <1209118778.7115.417.camel@twins> (raw)
In-Reply-To: <1209109710.7115.400.camel@twins>

On Fri, 2008-04-25 at 09:48 +0200, Peter Zijlstra wrote:

>  a) 15934a37324f32e0fda633dc7984a671ea81cd75 does indeed fix a bug in
>     rq->clock; without that patch it compresses nohz time to a single
>     jiffie, so cpu_clock() which (without the above hack) is based on
>     rq->clock will be short on nohz time. This can 'hide' the clock jump
>     and thus hide false positives.
> 
> 
>  b) there is commit:
> 
> ---
> commit d3938204468dccae16be0099a2abf53db4ed0505
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Wed Nov 28 15:52:56 2007 +0100
>     softlockup: fix false positives on CONFIG_NOHZ
> 
>     David Miller reported soft lockup false-positives that trigger
>     on NOHZ due to CPUs idling for more than 10 seconds.
> 
>     The solution is touch the softlockup watchdog when we return from
>     idle. (by definition we are not 'locked up' when we were idle)
> 
>      http://bugzilla.kernel.org/show_bug.cgi?id=9409
> 
>     Reported-by: David Miller <davem@davemloft.net>
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 27a2338..cb89fa8 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -133,6 +133,8 @@ void tick_nohz_update_jiffies(void)
>  	if (!ts->tick_stopped)
>  		return;
>  
> +	touch_softlockup_watchdog();
> +
>  	cpu_clear(cpu, nohz_cpu_mask);
>  	now = ktime_get();
> 
> ---
> 
>     which should 'fix' this problem.
> 
> 
>  c) there are 'IPI' handlers on SPARC64 that look like they can wake
>     the CPU from idle sleep but do not appear to call irq_enter() which
>     has the above patch's touch_softlock_watchdog() in its callchain.
> 
> tl0_irq1:       TRAP_IRQ(smp_call_function_client, 1)
> tl0_irq2:       TRAP_IRQ(smp_receive_signal_client, 2)
> tl0_irq3:       TRAP_IRQ(smp_penguin_jailcell, 3)
> tl0_irq4:       TRAP_IRQ(smp_new_mmu_context_version_client, 4)


OK, so David came up with the idea that it might be the reschedule IPI
(smp_receive_signal_client) that did the wakeup resulting in the
following scenario:

 <idle > 60s >
 <resched-IPI> -> nohz_restart() -> restart timer
                                 -> schedule()

 <run stuff>
 <timer> -> softlockup_tick() -> BUG!
 
doing irq_enter/exit() for smp_receive_signal_client() did indeed fix
the whole issue. (x86 also has this bug - its just darn hard to generate
60s+ nohz periods due to the shitty clocks/timers)

So per b) any nohz wake needs to be done with an interrupt _and_ all
such interrupts must pass through irq_enter().

As far as I can tell this is not nessecarily true (or desired from a
performance POV) for all platforms, imagine the core2 monitor/mwait idle
that wakes up because of a memory write. This doesn't require an
interrupt at all to wake up.

So, are we going to require all waking interrupts (IPIs and regular) to
do the irq_enter/exit() dance and add the perhaps unneeded overhead to
these paths and require the non-interrupt driven wake-ups like
monitor/mwait to do the touch_softlockup_watchdog() themselves?

Or,

Is Ingo's initial patch to make nohz_restart() also touch the softlockup
watchdog the best fix (now that we understand what happens)?




  parent reply	other threads:[~2008-04-25 10:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-24 22:55 [git pull] scheduler/misc fixes Ingo Molnar
2008-04-25  3:46 ` David Miller
2008-04-25  7:48   ` Peter Zijlstra
2008-04-25  7:57     ` David Miller
2008-04-25  8:13       ` Peter Zijlstra
2008-04-25  8:24       ` Peter Zijlstra
2008-04-25  8:30         ` David Miller
2008-04-25 10:19     ` Peter Zijlstra [this message]
2008-04-25 10:51       ` Ingo Molnar
2008-04-25 20:07         ` David Miller
2008-04-27 18:05       ` Thomas Gleixner
2008-04-25  8:04   ` Ingo Molnar
2008-04-25  8:07     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1209118778.7115.417.camel@twins \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox