public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org
Cc: stable-review@kernel.org, torvalds@linux-foundation.org,
	akpm@linux-foundation.org, alan@lxorguk.ukuu.org.uk,
	"David S. Miller" <davem@davemloft.net>
Subject: [patch 07/22] sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.
Date: Thu, 10 Sep 2009 17:22:53 -0700	[thread overview]
Message-ID: <20090911002409.986396768@mini.kroah.org> (raw)
In-Reply-To: <20090911002616.GA12087@kroah.com>

[-- Attachment #1: sparc64-kill-spurious-nmi-watchdog-triggers-by-increasing-limit-to-30-seconds.patch --]
[-- Type: text/plain, Size: 2427 bytes --]

2.6.30-stable review patch.  If anyone has any objections, please let us know.

------------------
From: David S. Miller <davem@davemloft.net>

[ Upstream commit e6617c6ec28a17cf2f90262b835ec05b9b861400 ]

This is a compromise and a temporary workaround for bootup NMI
watchdog triggers some people see with qla2xxx devices present.

This happens when, for example:

CPU 0 is in the driver init and looping submitting mailbox commands to
load the firmware, then waiting for completion.

CPU 1 is receiving the device interrupts.  CPU 1 is where the NMI
watchdog triggers.

CPU 0 is submitting mailbox commands fast enough that by the time CPU
1 returns from the device interrupt handler, a new one is pending.
This sequence runs for more than 5 seconds.

The problematic case is CPU 1's timer interrupt running when the
barrage of device interrupts begin.  Then we have:

	timer interrupt
	return for softirq checking
	pending, thus enable interrupts

		 qla2xxx interrupt
		 return
		 qla2xxx interrupt
		 return
		 ... 5+ seconds pass
		 final qla2xxx interrupt for fw load
		 return

	run timer softirq
	return

At some point in the multi-second qla2xxx interrupt storm we trigger
the NMI watchdog on CPU 1 from the NMI interrupt handler.

The timer softirq, once we get back to running it, is smart enough to
run the timer work enough times to make up for the missed timer
interrupts.

However, the NMI watchdogs (both x86 and sparc) use the timer
interrupt count to notice the cpu is wedged.  But in the above
scenerio we'll receive only one such timer interrupt even if we last
all the way back to running the timer softirq.

The default watchdog trigger point is only 5 seconds, which is pretty
low (the softwatchdog triggers at 60 seconds).  So increase it to 30
seconds for now.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 arch/sparc/kernel/nmi.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -103,7 +103,7 @@ notrace __kprobes void perfctr_irq(int i
 	}
 	if (!touched && __get_cpu_var(last_irq_sum) == sum) {
 		local_inc(&__get_cpu_var(alert_counter));
-		if (local_read(&__get_cpu_var(alert_counter)) == 5 * nmi_hz)
+		if (local_read(&__get_cpu_var(alert_counter)) == 30 * nmi_hz)
 			die_nmi("BUG: NMI Watchdog detected LOCKUP",
 				regs, panic_on_timeout);
 	} else {



  parent reply	other threads:[~2009-09-11  0:32 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20090911002246.666327880@mini.kroah.org>
2009-09-11  0:26 ` [patch 00/22] 2.6.30.7-stable review Greg KH
2009-09-11  0:22   ` [patch 01/22] dccp: missing destroy of percpu counter variable while unload module Greg KH
2009-09-11  0:22   ` [patch 02/22] E100: fix interaction with swiotlb on X86 Greg KH
2009-09-11  0:22   ` [patch 03/22] gre: Fix MTU calculation for bound GRE tunnels Greg KH
2009-09-11  0:22   ` [patch 04/22] ppp: fix lost fragments in ppp_mp_explode() (resubmit) Greg KH
2009-09-11  0:22   ` [patch 05/22] pppol2tp: calls unregister_pernet_gen_device() at unload time Greg KH
2009-09-11  0:22   ` [patch 06/22] net: net_assign_generic() fix Greg KH
2009-09-11  0:22   ` Greg KH [this message]
2009-09-11  0:22   ` [patch 08/22] sparc64: Validate linear D-TLB misses Greg KH
2009-09-11  0:22   ` [patch 09/22] sparc64: Fix bootup with mcount in some configs Greg KH
2009-09-11  0:22   ` [patch 10/22] sparc: sys32.S incorrect compat-layer splice() system call Greg KH
2009-09-11  0:22   ` [patch 11/22] JFFS2: add missing verify buffer allocation/deallocation Greg KH
2009-09-11  0:22   ` [patch 12/22] slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU Greg KH
2009-09-11  0:22   ` [patch 13/22] nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key Greg KH
2009-09-11  0:23   ` [patch 14/22] PCI SR-IOV: correct broken resource alignment calculations Greg KH
2009-09-11  0:23   ` [patch 15/22] SCSI: sd: fix bug in SCSI async probing Greg KH
2009-09-11  0:23   ` [patch 16/22] sound: oxygen: handle cards with missing EEPROM Greg KH
2009-09-11  0:23   ` [patch 17/22] sound: oxygen: fix MCLK rate for 192 kHz playback Greg KH
2009-09-11  0:23   ` [patch 18/22] dm raid1: do not allow log_failure variable to unset after being set Greg KH
2009-09-11  0:23   ` [patch 19/22] dm snapshot: refactor zero_disk_area to use chunk_io Greg KH
2009-09-11  0:23   ` [patch 20/22] dm snapshot: fix header corruption race on invalidation Greg KH
2009-09-11  0:23   ` [patch 21/22] dm exception store: split set_chunk_size Greg KH
2009-09-11  0:23   ` [patch 22/22] dm snapshot: fix on disk chunk size validation Greg KH
2009-09-11  5:19   ` [patch 00/22] 2.6.30.7-stable review Dmitry Torokhov
2009-09-11 15:31     ` [stable] " Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090911002409.986396768@mini.kroah.org \
    --to=gregkh@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable-review@kernel.org \
    --cc=stable@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox