All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: David Miller <davem@davemloft.net>
Cc: mingo@elte.hu, torvalds@linux-foundation.org,
	linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	viro@zeniv.linux.org.uk, alan@lxorguk.ukuu.org.uk,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [git pull] scheduler/misc fixes
Date: Fri, 25 Apr 2008 12:19:38 +0200	[thread overview]
Message-ID: <1209118778.7115.417.camel@twins> (raw)
In-Reply-To: <1209109710.7115.400.camel@twins>

On Fri, 2008-04-25 at 09:48 +0200, Peter Zijlstra wrote:

>  a) 15934a37324f32e0fda633dc7984a671ea81cd75 does indeed fix a bug in
>     rq->clock; without that patch it compresses nohz time to a single
>     jiffie, so cpu_clock() which (without the above hack) is based on
>     rq->clock will be short on nohz time. This can 'hide' the clock jump
>     and thus hide false positives.
> 
> 
>  b) there is commit:
> 
> ---
> commit d3938204468dccae16be0099a2abf53db4ed0505
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Wed Nov 28 15:52:56 2007 +0100
>     softlockup: fix false positives on CONFIG_NOHZ
> 
>     David Miller reported soft lockup false-positives that trigger
>     on NOHZ due to CPUs idling for more than 10 seconds.
> 
>     The solution is touch the softlockup watchdog when we return from
>     idle. (by definition we are not 'locked up' when we were idle)
> 
>      http://bugzilla.kernel.org/show_bug.cgi?id=9409
> 
>     Reported-by: David Miller <davem@davemloft.net>
>     Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 27a2338..cb89fa8 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -133,6 +133,8 @@ void tick_nohz_update_jiffies(void)
>  	if (!ts->tick_stopped)
>  		return;
>  
> +	touch_softlockup_watchdog();
> +
>  	cpu_clear(cpu, nohz_cpu_mask);
>  	now = ktime_get();
> 
> ---
> 
>     which should 'fix' this problem.
> 
> 
>  c) there are 'IPI' handlers on SPARC64 that look like they can wake
>     the CPU from idle sleep but do not appear to call irq_enter() which
>     has the above patch's touch_softlock_watchdog() in its callchain.
> 
> tl0_irq1:       TRAP_IRQ(smp_call_function_client, 1)
> tl0_irq2:       TRAP_IRQ(smp_receive_signal_client, 2)
> tl0_irq3:       TRAP_IRQ(smp_penguin_jailcell, 3)
> tl0_irq4:       TRAP_IRQ(smp_new_mmu_context_version_client, 4)


OK, so David came up with the idea that it might be the reschedule IPI
(smp_receive_signal_client) that did the wakeup resulting in the
following scenario:

 <idle > 60s >
 <resched-IPI> -> nohz_restart() -> restart timer
                                 -> schedule()

 <run stuff>
 <timer> -> softlockup_tick() -> BUG!
 
doing irq_enter/exit() for smp_receive_signal_client() did indeed fix
the whole issue. (x86 also has this bug - its just darn hard to generate
60s+ nohz periods due to the shitty clocks/timers)

So per b) any nohz wake needs to be done with an interrupt _and_ all
such interrupts must pass through irq_enter().

As far as I can tell this is not nessecarily true (or desired from a
performance POV) for all platforms, imagine the core2 monitor/mwait idle
that wakes up because of a memory write. This doesn't require an
interrupt at all to wake up.

So, are we going to require all waking interrupts (IPIs and regular) to
do the irq_enter/exit() dance and add the perhaps unneeded overhead to
these paths and require the non-interrupt driven wake-ups like
monitor/mwait to do the touch_softlockup_watchdog() themselves?

Or,

Is Ingo's initial patch to make nohz_restart() also touch the softlockup
watchdog the best fix (now that we understand what happens)?




  parent reply	other threads:[~2008-04-25 10:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-24 22:55 [git pull] scheduler/misc fixes Ingo Molnar
2008-04-25  3:46 ` David Miller
2008-04-25  7:48   ` Peter Zijlstra
2008-04-25  7:57     ` David Miller
2008-04-25  8:13       ` Peter Zijlstra
2008-04-25  8:24       ` Peter Zijlstra
2008-04-25  8:30         ` David Miller
2008-04-25 10:19     ` Peter Zijlstra [this message]
2008-04-25 10:51       ` Ingo Molnar
2008-04-25 20:07         ` David Miller
2008-04-27 18:05       ` Thomas Gleixner
2008-04-25  8:04   ` Ingo Molnar
2008-04-25  8:07     ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1209118778.7115.417.camel@twins \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.