From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
dzickus@redhat.com, prarit@redhat.com, ak@linux.intel.com,
babu.moger@oracle.com, peterz@infradead.org, eranian@google.com,
acme@redhat.com, atomlin@redhat.com, akpm@linux-foundation.org,
torvalds@linux-foundation.org
Subject: [PATCH 4.12 36/41] kernel/watchdog: Prevent false positives with turbo modes
Date: Tue, 22 Aug 2017 12:14:04 -0700 [thread overview]
Message-ID: <20170822190943.385312179@linuxfoundation.org> (raw)
In-Reply-To: <20170822190941.918296529@linuxfoundation.org>
4.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner <tglx@linutronix.de>
commit 7edaeb6841dfb27e362288ab8466ebdc4972e867 upstream.
The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.
The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.
A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.
Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.
That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.
Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/Kconfig | 1
include/linux/nmi.h | 8 ++++++
kernel/watchdog.c | 1
kernel/watchdog_hld.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 7 +++++
5 files changed, 76 insertions(+)
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -94,6 +94,7 @@ config X86
select GENERIC_STRNCPY_FROM_USER
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
+ select HARDLOCKUP_CHECK_TIMESTAMP if X86_64
select HAVE_ACPI_APEI if ACPI
select HAVE_ACPI_APEI_NMI if ACPI
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -155,6 +155,14 @@ extern int sysctl_hardlockup_all_cpu_bac
#define sysctl_softlockup_all_cpu_backtrace 0
#define sysctl_hardlockup_all_cpu_backtrace 0
#endif
+
+#if defined(CONFIG_HARDLOCKUP_CHECK_TIMESTAMP) && \
+ defined(CONFIG_HARDLOCKUP_DETECTOR)
+void watchdog_update_hrtimer_threshold(u64 period);
+#else
+static inline void watchdog_update_hrtimer_threshold(u64 period) { }
+#endif
+
extern bool is_hardlockup(void);
struct ctl_table;
extern int proc_watchdog(struct ctl_table *, int ,
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -161,6 +161,7 @@ static void set_sample_period(void)
* hardlockup detector generates a warning
*/
sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5);
+ watchdog_update_hrtimer_threshold(sample_period);
}
/* Commands for resetting the watchdog */
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -70,6 +70,62 @@ void touch_nmi_watchdog(void)
}
EXPORT_SYMBOL(touch_nmi_watchdog);
+#ifdef CONFIG_HARDLOCKUP_CHECK_TIMESTAMP
+static DEFINE_PER_CPU(ktime_t, last_timestamp);
+static DEFINE_PER_CPU(unsigned int, nmi_rearmed);
+static ktime_t watchdog_hrtimer_sample_threshold __read_mostly;
+
+void watchdog_update_hrtimer_threshold(u64 period)
+{
+ /*
+ * The hrtimer runs with a period of (watchdog_threshold * 2) / 5
+ *
+ * So it runs effectively with 2.5 times the rate of the NMI
+ * watchdog. That means the hrtimer should fire 2-3 times before
+ * the NMI watchdog expires. The NMI watchdog on x86 is based on
+ * unhalted CPU cycles, so if Turbo-Mode is enabled the CPU cycles
+ * might run way faster than expected and the NMI fires in a
+ * smaller period than the one deduced from the nominal CPU
+ * frequency. Depending on the Turbo-Mode factor this might be fast
+ * enough to get the NMI period smaller than the hrtimer watchdog
+ * period and trigger false positives.
+ *
+ * The sample threshold is used to check in the NMI handler whether
+ * the minimum time between two NMI samples has elapsed. That
+ * prevents false positives.
+ *
+ * Set this to 4/5 of the actual watchdog threshold period so the
+ * hrtimer is guaranteed to fire at least once within the real
+ * watchdog threshold.
+ */
+ watchdog_hrtimer_sample_threshold = period * 2;
+}
+
+static bool watchdog_check_timestamp(void)
+{
+ ktime_t delta, now = ktime_get_mono_fast_ns();
+
+ delta = now - __this_cpu_read(last_timestamp);
+ if (delta < watchdog_hrtimer_sample_threshold) {
+ /*
+ * If ktime is jiffies based, a stalled timer would prevent
+ * jiffies from being incremented and the filter would look
+ * at a stale timestamp and never trigger.
+ */
+ if (__this_cpu_inc_return(nmi_rearmed) < 10)
+ return false;
+ }
+ __this_cpu_write(nmi_rearmed, 0);
+ __this_cpu_write(last_timestamp, now);
+ return true;
+}
+#else
+static inline bool watchdog_check_timestamp(void)
+{
+ return true;
+}
+#endif
+
static struct perf_event_attr wd_hw_attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
@@ -94,6 +150,9 @@ static void watchdog_overflow_callback(s
return;
}
+ if (!watchdog_check_timestamp())
+ return;
+
/* check for a hardlockup
* This is done by making sure our timer interrupt
* is incrementing. The timer interrupt should have
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -345,6 +345,13 @@ config SECTION_MISMATCH_WARN_ONLY
If unsure, say Y.
#
+# Enables a timestamp based low pass filter to compensate for perf based
+# hard lockup detection which runs too fast due to turbo modes.
+#
+config HARDLOCKUP_CHECK_TIMESTAMP
+ bool
+
+#
# Select this config option from the architecture Kconfig, if it
# is preferred to always offer frame pointers as a config
# option on the architecture (regardless of KERNEL_DEBUG):
next prev parent reply other threads:[~2017-08-22 19:15 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-22 19:13 [PATCH 4.12 00/41] 4.12.9-stable review Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 01/41] audit: Fix use after free in audit_remove_watch_rule() Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 02/41] parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 03/41] crypto: ixp4xx - Fix error handling path in aead_perform() Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 04/41] crypto: x86/sha1 - Fix reads beyond the number of blocks passed Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 05/41] drm/i915: Perform an invalidate prior to executing golden renderstate Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 07/41] Input: elan_i2c - add ELAN0608 to the ACPI table Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 08/41] Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 09/41] md: fix test in md_write_start() Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 10/41] md: always clear ->safemode when md_check_recovery gets the mddev lock Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 11/41] MD: not clear ->safemode for external metadata array Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 12/41] ALSA: seq: 2nd attempt at fixing race creating a queue Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 13/41] ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 14/41] ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 15/41] ALSA: usb-audio: add DSD support for new Amanero PID Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 16/41] mm: discard memblock data later Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 17/41] slub: fix per memcg cache leak on css offline Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 18/41] mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 19/41] mm/cma_debug.c: fix stack corruption due to sprintf usage Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 20/41] mm/mempolicy: fix use after free when calling get_mempolicy Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 21/41] mm/vmalloc.c: dont unconditonally use __GFP_HIGHMEM Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 22/41] mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 24/41] ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 25/41] blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 26/41] powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 27/41] xen-blkfront: use a right index when checking requests Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 28/41] perf/x86: Fix RDPMC vs. mm_struct tracking Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 29/41] x86/asm/64: Clear AC on NMI entries Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 30/41] x86: Fix norandmaps/ADDR_NO_RANDOMIZE Greg Kroah-Hartman
2017-08-22 19:13 ` [PATCH 4.12 31/41] x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks Greg Kroah-Hartman
2017-08-22 19:14 ` [PATCH 4.12 32/41] irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup() Greg Kroah-Hartman
2017-08-22 19:14 ` [PATCH 4.12 33/41] irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup() Greg Kroah-Hartman
2017-08-22 19:14 ` [PATCH 4.12 34/41] genirq: Restore trigger settings in irq_modify_status() Greg Kroah-Hartman
2017-08-22 19:14 ` [PATCH 4.12 35/41] genirq/ipi: Fixup checks against nr_cpu_ids Greg Kroah-Hartman
2017-08-22 19:14 ` Greg Kroah-Hartman [this message]
2017-08-22 19:14 ` [PATCH 4.12 37/41] Sanitize move_pages() permission checks Greg Kroah-Hartman
2017-08-22 19:14 ` [PATCH 4.12 38/41] pids: make task_tgid_nr_ns() safe Greg Kroah-Hartman
2017-08-22 19:14 ` [PATCH 4.12 39/41] debug: Fix WARN_ON_ONCE() for modules Greg Kroah-Hartman
2017-08-22 19:14 ` [PATCH 4.12 40/41] usb: optimize acpi companion search for usb port devices Greg Kroah-Hartman
2017-08-22 19:14 ` [PATCH 4.12 41/41] usb: qmi_wwan: add D-Link DWM-222 device ID Greg Kroah-Hartman
2017-08-23 0:33 ` [PATCH 4.12 00/41] 4.12.9-stable review Shuah Khan
2017-08-23 0:48 ` Greg Kroah-Hartman
2017-08-27 18:18 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170822190943.385312179@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=acme@redhat.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=atomlin@redhat.com \
--cc=babu.moger@oracle.com \
--cc=dzickus@redhat.com \
--cc=eranian@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=prarit@redhat.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox