From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Ben Greear <greearb@candelatech.com>,
Tejun Heo <tj@kernel.org>, Pekka Riikonen <priikone@iki.fi>,
Eric Dumazet <eric.dumazet@gmail.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [ 76/79] Fix lockup related to stop_machine being stuck in __do_softirq.
Date: Tue, 11 Jun 2013 13:03:42 -0700 [thread overview]
Message-ID: <20130611195326.009654933@linuxfoundation.org> (raw)
In-Reply-To: <20130611195312.352656079@linuxfoundation.org>
3.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Greear <greearb@candelatech.com>
commit 34376a50fb1fa095b9d0636fa41ed2e73125f214 upstream.
The stop machine logic can lock up if all but one of the migration
threads make it through the disable-irq step and the one remaining
thread gets stuck in __do_softirq. The reason __do_softirq can hang is
that it has a bail-out based on jiffies timeout, but in the lockup case,
jiffies itself is not incremented.
To work around this, re-add the max_restart counter in __do_irq and stop
processing irqs after 10 restarts.
Thanks to Tejun Heo and Rusty Russell and others for helping me track
this down.
This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce
latencies").
It may be worth looking into ath9k to see if it has issues with its irq
handler at a later date.
The hang stack traces look something like this:
------------[ cut here ]------------
WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
Watchdog detected hard LOCKUP on cpu 2
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
Pid: 23, comm: migration/2 Tainted: G C 3.9.4+ #11
Call Trace:
<NMI> warn_slowpath_common+0x85/0x9f
warn_slowpath_fmt+0x46/0x48
watchdog_overflow_callback+0x9c/0xa7
__perf_event_overflow+0x137/0x1cb
perf_event_overflow+0x14/0x16
intel_pmu_handle_irq+0x2dc/0x359
perf_event_nmi_handler+0x19/0x1b
nmi_handle+0x7f/0xc2
do_nmi+0xbc/0x304
end_repeat_nmi+0x1e/0x2e
<<EOE>>
cpu_stopper_thread+0xae/0x162
smpboot_thread_fn+0x258/0x260
kthread+0xc7/0xcf
ret_from_fork+0x7c/0xb0
---[ end trace 4947dfa9b0a4cec3 ]---
BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
irq event stamp: 835637905
hardirqs last enabled at (835637904): __do_softirq+0x9f/0x257
hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
softirqs last enabled at (5654720): __do_softirq+0x1ff/0x257
softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
CPU 1
Pid: 17, comm: migration/1 Tainted: G WC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
RIP: tasklet_hi_action+0xf0/0xf0
Process migration/1
Call Trace:
<IRQ>
__do_softirq+0x117/0x257
irq_exit+0x5f/0xbb
smp_apic_timer_interrupt+0x8a/0x98
apic_timer_interrupt+0x72/0x80
<EOI>
printk+0x4d/0x4f
stop_machine_cpu_stop+0x22c/0x274
cpu_stopper_thread+0xae/0x162
smpboot_thread_fn+0x258/0x260
kthread+0xc7/0xcf
ret_from_fork+0x7c/0xb0
Signed-off-by: Ben Greear <greearb@candelatech.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Pekka Riikonen <priikone@iki.fi>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/softirq.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -195,8 +195,12 @@ void local_bh_enable_ip(unsigned long ip
EXPORT_SYMBOL(local_bh_enable_ip);
/*
- * We restart softirq processing for at most 2 ms,
- * and if need_resched() is not set.
+ * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
+ * but break the loop if need_resched() is set or after 2 ms.
+ * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
+ * certain cases, such as stop_machine(), jiffies may cease to
+ * increment and so we need the MAX_SOFTIRQ_RESTART limit as
+ * well to make sure we eventually return from this method.
*
* These limits have been established via experimentation.
* The two things to balance is latency against fairness -
@@ -204,6 +208,7 @@ EXPORT_SYMBOL(local_bh_enable_ip);
* should not be able to lock up the box.
*/
#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)
+#define MAX_SOFTIRQ_RESTART 10
asmlinkage void __do_softirq(void)
{
@@ -212,6 +217,7 @@ asmlinkage void __do_softirq(void)
unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
int cpu;
unsigned long old_flags = current->flags;
+ int max_restart = MAX_SOFTIRQ_RESTART;
/*
* Mask out PF_MEMALLOC s current task context is borrowed for the
@@ -265,7 +271,8 @@ restart:
pending = local_softirq_pending();
if (pending) {
- if (time_before(jiffies, end) && !need_resched())
+ if (time_before(jiffies, end) && !need_resched() &&
+ --max_restart)
goto restart;
wakeup_softirqd();
next prev parent reply other threads:[~2013-06-11 20:08 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-11 20:02 [ 00/79] 3.9.6-stable review Greg Kroah-Hartman
2013-06-11 20:02 ` [ 01/79] USB: serial: Add Option GTM681W to qcserial device table Greg Kroah-Hartman
2013-06-11 20:02 ` [ 02/79] USB: option: blacklist network interface on Huawei E1820 Greg Kroah-Hartman
2013-06-11 20:02 ` [ 03/79] USB: option,zte_ev: move most ZTE CDMA devices to zte_ev Greg Kroah-Hartman
2013-06-11 20:02 ` [ 04/79] usb: dwc3: pci: PHY should be deleted later than dwc3 core Greg Kroah-Hartman
2013-06-11 20:02 ` [ 05/79] xhci - correct comp_mode_recovery_timer on return from hibernate Greg Kroah-Hartman
2013-06-11 20:02 ` [ 06/79] xhci-mem: init list heads at the beginning of init Greg Kroah-Hartman
2013-06-11 20:02 ` [ 07/79] xhci: fix list access before init Greg Kroah-Hartman
2013-06-11 20:02 ` [ 08/79] xhci: Disable D3cold for buggy TI redrivers Greg Kroah-Hartman
2013-06-11 20:02 ` [ 09/79] x86/PCI: Map PCI setup data with ioremap() so it can be in highmem Greg Kroah-Hartman
2013-06-11 20:02 ` [ 10/79] usb: musb: make use_sg flag URB specific Greg Kroah-Hartman
2013-06-11 20:02 ` [ 11/79] ALSA: usb-audio: fix Roland/Cakewalk UM-3G support Greg Kroah-Hartman
2013-06-11 20:02 ` [ 12/79] ALSA: usb-audio - Apply Logitech QuickCam Pro 9000 quirk only to audio iface Greg Kroah-Hartman
2013-06-11 20:02 ` [ 13/79] ALSA: usb-audio - Fix invalid volume resolution on Logitech HD webcam c270 Greg Kroah-Hartman
2013-06-11 20:02 ` [ 14/79] USB: iuu_phoenix: fix bulk-message timeout Greg Kroah-Hartman
2013-06-11 20:02 ` [ 15/79] USB: keyspan: fix bogus array index Greg Kroah-Hartman
2013-06-11 20:02 ` [ 16/79] USB: ark3116: fix control-message timeout Greg Kroah-Hartman
2013-06-11 20:02 ` [ 17/79] USB: visor: fix initialisation of Treo/Kyocera devices Greg Kroah-Hartman
2013-06-11 20:02 ` [ 18/79] USB: zte_ev: fix control-message timeouts Greg Kroah-Hartman
2013-06-11 20:02 ` [ 19/79] USB: zte_ev: fix broken open Greg Kroah-Hartman
2013-06-11 20:02 ` [ 20/79] USB: Serial: cypress_M8: Enable FRWD Dongle hidcom device Greg Kroah-Hartman
2013-06-11 20:02 ` [ 21/79] USB: whiteheat: fix broken port configuration Greg Kroah-Hartman
2013-06-11 20:02 ` [ 22/79] USB: serial: fix Treo/Kyocera interrrupt-in urb context Greg Kroah-Hartman
2013-06-11 20:02 ` [ 23/79] USB: revert periodic scheduling bugfix Greg Kroah-Hartman
2013-06-11 20:02 ` [ 24/79] USB: mos7840: fix DMA to stack Greg Kroah-Hartman
2013-06-11 20:02 ` [ 25/79] USB: mos7720: " Greg Kroah-Hartman
2013-06-11 20:02 ` [ 26/79] USB: mos7720: fix message timeouts Greg Kroah-Hartman
2013-06-11 20:02 ` [ 27/79] USB: mos7720: fix hardware flow control Greg Kroah-Hartman
2013-06-11 20:02 ` [ 28/79] acpi-cpufreq: set current frequency based on target P-State Greg Kroah-Hartman
2013-06-11 20:02 ` [ 29/79] ACPI / video: ignore BIOS initial backlight value for HP m4 Greg Kroah-Hartman
2013-06-11 20:02 ` [ 30/79] ACPI / video: ignore BIOS initial backlight value for HP Pavilion g6 Greg Kroah-Hartman
2013-06-11 20:02 ` [ 31/79] ACPI / scan: do not match drivers against objects having scan handlers Greg Kroah-Hartman
2013-06-11 20:02 ` [ 32/79] ACPI / PM: Do not execute _PS0 for devices without _PSC during initialization Greg Kroah-Hartman
2013-06-11 20:02 ` [ 33/79] x86 / platform / hp_wmi: Fix bluetooth_rfkill misuse in hp_wmi_rfkill_setup() Greg Kroah-Hartman
2013-06-11 20:03 ` [ 34/79] ALSA: hda/via - Disable broken dynamic power control Greg Kroah-Hartman
2013-06-11 20:03 ` [ 35/79] ALSA: hda/via - Fix wrongly cleared pins after suspend on VT1802 Greg Kroah-Hartman
2013-06-11 20:03 ` [ 36/79] ALSA: hda - Allow setting automute/automic hooks after parsing Greg Kroah-Hartman
2013-06-11 20:03 ` [ 37/79] ALSA: hda - Add keep_eapd_on flag to generic parser Greg Kroah-Hartman
2013-06-11 20:03 ` [ 38/79] ARM: Kirkwood: TS219: Fix crash by double PCIe instantiation Greg Kroah-Hartman
2013-06-11 20:03 ` [ 39/79] ARM: 7742/1: topology: export cpu_topology Greg Kroah-Hartman
2013-06-11 20:03 ` [ 40/79] ARM: 7743/1: compressed/head.S: work around new binutils warning Greg Kroah-Hartman
2013-06-11 20:03 ` [ 41/79] ARM: 7747/1: pcpu: ensure __my_cpu_offset cannot be re-ordered across barrier() Greg Kroah-Hartman
2013-06-11 20:03 ` [ 42/79] powerpc/eeh: Dont check RTAS token to get PE addr Greg Kroah-Hartman
2013-06-11 20:03 ` [ 43/79] powerpc/hw_breakpoints: Add DABRX cpu feature to fix 32-bit regression Greg Kroah-Hartman
2013-06-11 20:03 ` [ 44/79] powerpc/perf: Fix deadlock caused by calling printk() in PMU exception Greg Kroah-Hartman
2013-06-11 20:03 ` [ 45/79] dmaengine: ste_dma40: fix pm runtime ref counting Greg Kroah-Hartman
2013-06-11 20:03 ` [ 46/79] radeon: Fix system hang issue when using KMS with older cards Greg Kroah-Hartman
2013-06-11 20:03 ` [ 47/79] drm/radeon: dont allow audio on DCE6 Greg Kroah-Hartman
2013-06-11 20:03 ` [ 48/79] hpfs: fix warnings when the filesystem fills up Greg Kroah-Hartman
2013-06-11 20:03 ` [ 49/79] cifs: fix off-by-one bug in build_unc_path_to_root Greg Kroah-Hartman
2013-06-11 20:03 ` [ 50/79] ecryptfs: fixed msync to flush data Greg Kroah-Hartman
2013-06-11 20:03 ` [ 51/79] eCryptfs: Check return of filemap_write_and_wait during fsync Greg Kroah-Hartman
2013-06-11 20:03 ` [ 52/79] hwmon: (adm1021) Strengthen chip detection for ADM1021, LM84 and MAX1617 Greg Kroah-Hartman
2013-06-11 20:03 ` [ 53/79] drm/mgag200: Add missing write to index before accessing data register Greg Kroah-Hartman
2013-06-11 20:03 ` [ 54/79] drm: fix a use-after-free when GPU acceleration disabled Greg Kroah-Hartman
2013-06-11 20:03 ` [ 55/79] drm/i915/sdvo: Use &intel_sdvo->ddc instead of intel_sdvo->i2c for DDC Greg Kroah-Hartman
2013-06-11 20:03 ` [ 56/79] drm/i915: no lvds quirk for hp t5740 Greg Kroah-Hartman
2013-06-11 20:03 ` [ 57/79] drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus Greg Kroah-Hartman
2013-06-11 20:03 ` [ 58/79] Revert "ACPI / scan: do not match drivers against objects having scan handlers" Greg Kroah-Hartman
2013-06-11 20:03 ` [ 59/79] powerpc: Fix build error in stable/3.9 Greg Kroah-Hartman
2013-06-11 20:03 ` [ 60/79] usb: dwc3: gadget: free trb pool only from epnum 2 Greg Kroah-Hartman
2013-06-11 20:03 ` [ 61/79] drm/gma500: Increase max resolution for mode setting Greg Kroah-Hartman
2013-06-11 20:03 ` [ 62/79] pci: Set dev->dev.type in alloc_pci_dev Greg Kroah-Hartman
2013-06-11 20:03 ` [ 63/79] powerpc: Set default VGA device Greg Kroah-Hartman
2013-06-11 20:03 ` [ 64/79] powerpc/pseries: Force 32 bit MSIs for devices that require it Greg Kroah-Hartman
2013-06-11 20:03 ` [ 65/79] powerpc/pseries: Make 32-bit MSI quirk work on systems lacking firmware support Greg Kroah-Hartman
2013-06-12 17:19 ` Kleber Sacilotto de Souza
2013-06-11 20:03 ` [ 66/79] powerpc/pseries: Perform proper max_bus_speed detection Greg Kroah-Hartman
2013-06-11 20:03 ` [ 67/79] radeon: use max_bus_speed to activate gen2 speeds Greg Kroah-Hartman
2013-06-11 20:03 ` [ 68/79] iio:inkern: Fix typo/bug in convert raw to processed Greg Kroah-Hartman
2013-06-11 20:03 ` [ 69/79] iio: frequency: ad4350: Fix bug / typo in mask Greg Kroah-Hartman
2013-06-11 20:03 ` [ 70/79] drm/i915: force full modeset if the connector is in DPMS OFF mode Greg Kroah-Hartman
2013-06-11 20:03 ` [ 71/79] USB: serial: add wait_until_sent operation Greg Kroah-Hartman
2013-06-11 20:03 ` [ 72/79] USB: serial: add generic wait_until_sent implementation Greg Kroah-Hartman
2013-06-11 20:03 ` [ 73/79] USB: ftdi_sio: clean up get_modem_status Greg Kroah-Hartman
2013-06-11 20:03 ` [ 74/79] USB: ftdi_sio: fix chars_in_buffer overhead Greg Kroah-Hartman
2013-06-11 20:03 ` [ 75/79] USB: io_ti: " Greg Kroah-Hartman
2013-06-11 20:03 ` Greg Kroah-Hartman [this message]
2013-06-11 20:03 ` [ 77/79] xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU Greg Kroah-Hartman
2013-06-11 20:03 ` [ 78/79] timekeeping: Correct run-time detection of persistent_clock Greg Kroah-Hartman
2013-06-11 20:03 ` [ 79/79] s390: Add pgste to ptep_modify_prot_start() Greg Kroah-Hartman
2013-06-12 14:06 ` [ 00/79] 3.9.6-stable review Guenter Roeck
2013-06-12 14:15 ` Josh Boyer
2013-06-12 14:21 ` Guenter Roeck
2013-06-12 15:49 ` Shuah Khan
2013-06-12 16:58 ` Greg Kroah-Hartman
2013-06-12 20:29 ` Guenter Roeck
2013-06-12 20:34 ` Guenter Roeck
2013-06-12 20:53 ` Kleber Sacilotto de Souza
2013-06-13 17:41 ` Greg Kroah-Hartman
2013-06-13 17:45 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130611195326.009654933@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=eric.dumazet@gmail.com \
--cc=greearb@candelatech.com \
--cc=linux-kernel@vger.kernel.org \
--cc=priikone@iki.fi \
--cc=stable@vger.kernel.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox