Linux IOMMU Development
 help / color / mirror / Atom feed
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>, x86@kernel.org
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	linuxppc-dev@lists.ozlabs.org,
	Ricardo Neri <ricardo.neri@intel.com>,
	Stephane Eranian <eranian@google.com>,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	Tony Luck <tony.luck@intel.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Ricardo Neri <ricardo.neri-calderon@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Woodhouse <dwmw2@infradead.org>
Subject: [PATCH v6 23/29] x86/watchdog/hardlockup/hpet: Determine if HPET timer caused NMI
Date: Thu,  5 May 2022 17:00:02 -0700	[thread overview]
Message-ID: <20220506000008.30892-24-ricardo.neri-calderon@linux.intel.com> (raw)
In-Reply-To: <20220506000008.30892-1-ricardo.neri-calderon@linux.intel.com>

It is not possible to determine the source of a non-maskable interrupt
(NMI) in x86. When dealing with an HPET channel, the only direct method to
determine whether it caused an NMI would be to read the Interrupt Status
register.

However, reading HPET registers is slow and, therefore, not to be done
while in NMI context. Furthermore, status is not available if the HPET
channel is programmed to deliver an MSI interrupt.

An indirect manner to infer if an incoming NMI was caused by the HPET
channel of the detector is to use the time-stamp counter (TSC). Compute the
value that the TSC is expected to have at the next interrupt of the HPET
channel and compare it with the value it has when the interrupt does
happen. If the actual value falls within a small error window, assume
that the HPET channel of the detector is the source of the NMI.

Let tsc_delta be the difference between the value the TSC has now and the
value it will have when the next HPET channel interrupt happens. Define the
error window as a percentage of tsc_delta.

Below is a table that characterizes the error in the error in the expected
TSC value when the HPET channel fires on a variety of systems. It presents
the error as a percentage of tsc_delta and in microseconds.

The table summarizes the error of 4096 interrupts of the HPET channel
collected after the system has been up for 5 minutes as well as since boot.

The maximum observed error on any system is 0.045%. When the error since
boot is considered, the maximum observed error is 0.198%.

To find the most common error value, the collected data is grouped into
buckets of 0.000001 percentage points of the error and 10ns, respectively.
The most common error on any system is of 0.01317%

Allow a maximum error that is twice as big the maximum error observed in
these experiments: 0.4%

watchdog_thresh     1s                10s                 60s
Error wrt
expected
TSC value     %        us         %        us        %        us

AMD EPYC 7742 64-Core Processor
Abs max
since boot 0.04517   451.74    0.00171   171.04   0.00034   201.89
Abs max    0.04517   451.74    0.00171   171.04   0.00034   201.89
Mode       0.00002     0.18    0.00002     2.07  -0.00003   -19.20

Intel(R) Xeon(R) CPU E7-8890 - INTEL_FAM6_HASWELL_X
abs max
since boot 0.00811    81.15    0.00462   462.40   0.00014    81.65
Abs max    0.00811    81.15    0.00084    84.31   0.00014    81.65
Mode      -0.00422   -42.16   -0.00043   -42.50  -0.00007   -40.40

Intel(R) Xeon(R) Platinum 8170M - INTEL_FAM6_SKYLAKE_X
Abs max
since boot 0.10530  1053.04    0.01324  1324.27   0.00407  2443.25
Abs max    0.01166   116.59    0.00114   114.11   0.00024   143.47
Mode      -0.01023  -102.32   -0.00103  -102.44  -0.00022  -132.38

Intel(R) Xeon(R) CPU E5-2699A v4 - INTEL_FAM6_BROADSWELL_X
Abs max
since boot 0.00010    99.34    0.00099    98.83   0.00016    97.50
Abs max    0.00010    99.34    0.00099    98.83   0.00016    97.50
Mode      -0.00007   -74.29   -0.00074   -73.99  -0.00012   -73.12

Intel(R) Xeon(R) Gold 5318H - INTEL_FAM6_COOPERLAKE_X
Abs max
since boot 0.11262  1126.17    0.01109  1109.17   0.00409  2455.73
Abs max    0.01073   107.31    0.00109   109.02   0.00019   115.34
Mode      -0.00953   -95.26   -0.00094   -93.63  -0.00015   -90.42

Intel(R) Xeon(R) Platinum 8360Y - INTEL_FAM6_ICELAKE_X
Abs max
since boot 0.19853  1985.30    0.00784   783.53  -0.00017  -104.77
Abs max    0.01550   155.02    0.00158   157.56   0.00020   117.74
Mode      -0.01317  -131.65   -0.00136  -136.42  -0.00018  -105.06

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: iommu@lists.linux-foundation.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86@kernel.org
Suggested-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
NOTE: The error characterization data is repetead here from the cover
letter.
---
Changes since v5:
 * Reworked is_hpet_hld_interrupt() to reduce indentation.
 * Use time_in_range64() to compare the actual TSC value vs the expected
   value. This makes it more readable. (Tony)
 * Reduced the error window of the expected TSC value at the time of the
   HPET channel expiration.
 * Described better the heuristics used to determine if the HPET channel
   caused the NMI. (Tony)
 * Added a table to characterize the error in the expected TSC value when
   the HPET channel fires.
 * Removed references to groups of monitored CPUs. Instead, use tsc_khz
   directly.

Changes since v4:
 * Compute the TSC expected value at the next HPET interrupt based on the
   number of monitored packages and not the number of monitored CPUs.

Changes since v3:
 * None

Changes since v2:
 * Reworked condition to check if the expected TSC value is within the
   error margin to avoid an unnecessary conditional. (Peter Zijlstra)
 * Removed TSC error margin from struct hld_data; use a global variable
   instead. (Peter Zijlstra)

Changes since v1:
 * Introduced this patch.
---
 arch/x86/include/asm/hpet.h         |  3 ++
 arch/x86/kernel/watchdog_hld_hpet.c | 54 +++++++++++++++++++++++++++--
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index c88901744848..af0a504b5cff 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -113,6 +113,8 @@ static inline int is_hpet_enabled(void) { return 0; }
  * @channel:			HPET channel assigned to the detector
  * @channe_priv:		Private data of the assigned channel
  * @ticks_per_second:		Frequency of the HPET timer
+ * @tsc_next:			Estimated value of the TSC at the next
+ *				HPET timer interrupt
  * @irq:			IRQ number assigned to the HPET channel
  * @handling_cpu:		CPU handling the HPET interrupt
  * @monitored_cpumask:		CPUs monitored by the hardlockup detector
@@ -124,6 +126,7 @@ struct hpet_hld_data {
 	u32			channel;
 	struct hpet_channel	*channel_priv;
 	u64			ticks_per_second;
+	u64			tsc_next;
 	int			irq;
 	u32			handling_cpu;
 	cpumask_var_t		monitored_cpumask;
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c b/arch/x86/kernel/watchdog_hld_hpet.c
index 9fc7ac2c5059..3effdbf29095 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -34,6 +34,7 @@
 
 static struct hpet_hld_data *hld_data;
 static bool hardlockup_use_hpet;
+static u64 tsc_next_error;
 
 extern struct static_key_false apic_use_ipi_shorthand;
 
@@ -59,10 +60,39 @@ static void __init setup_hpet_channel(struct hpet_hld_data *hdata)
  * Reprogram the timer to expire within watchdog_thresh seconds in the future.
  * If the timer supports periodic mode, it is not kicked unless @force is
  * true.
+ *
+ * Also, compute the expected value of the time-stamp counter at the time of
+ * expiration as well as a deviation from the expected value.
  */
 static void kick_timer(struct hpet_hld_data *hdata, bool force)
 {
-	u64 new_compare, count, period = 0;
+	u64 tsc_curr, tsc_delta, new_compare, count, period = 0;
+
+	tsc_curr = rdtsc();
+
+	/*
+	 * Compute the delta between the value of the TSC now and the value
+	 * it will have the next time the HPET channel fires.
+	 */
+	tsc_delta = watchdog_thresh * tsc_khz * 1000L;
+	hdata->tsc_next = tsc_curr + tsc_delta;
+
+	/*
+	 * Define an error window between the expected TSC value and the actual
+	 * value it will have the next time the HPET channel fires. Define this
+	 * error as percentage of tsc_delta.
+	 *
+	 * The systems that have been tested so far exhibit an error of 0.05%
+	 * of the expected TSC value once the system is up and running. Systems
+	 * that refine tsc_khz exhibit a larger initial error up to 0.2%.
+	 *
+	 * To be safe, allow a maximum error of ~0.4%. This error value can be
+	 * computed by left-shifting tsc_delta by 8 positions. Shift 9
+	 * positions to calculate half the error. When the HPET channel fires,
+	 * check if the actual TSC value is in the range
+	 *  [tsc_next - (tsc_next_error / 2), tsc_next + (tsc_next_error / 2)]
+	 */
+	tsc_next_error = tsc_delta >> 9;
 
 	/* Kick the timer only when needed. */
 	if (!force && hdata->has_periodic)
@@ -126,12 +156,32 @@ static void enable_timer(struct hpet_hld_data *hdata)
  * is_hpet_hld_interrupt() - Check if an HPET timer caused the interrupt
  * @hdata:	A data structure with the timer instance to enable
  *
+ * Checking whether the HPET was the source of this NMI is not possible.
+ * Determining the sources of NMIs is not possible. Furthermore, we have
+ * programmed the HPET channel for MSI delivery, which does not have a
+ * status bit. Also, reading HPET registers is slow.
+ *
+ * Instead, we just assume that any NMI delivered within a time window
+ * of when the HPET was expected to fire probably came from the HPET.
+ *
+ * The window is estimated using the TSC counter. Check the comments in
+ * kick_timer() for details on the size of the time window.
+ *
  * Returns:
  * True if the HPET watchdog timer caused the interrupt. False otherwise.
  */
 static bool is_hpet_hld_interrupt(struct hpet_hld_data *hdata)
 {
-	return false;
+	u64 tsc_curr, tsc_curr_min, tsc_curr_max;
+
+	if (smp_processor_id() != hdata->handling_cpu)
+		return false;
+
+	tsc_curr = rdtsc();
+	tsc_curr_min = tsc_curr - tsc_next_error;
+	tsc_curr_max = tsc_curr + tsc_next_error;
+
+	return time_in_range64(hdata->tsc_next, tsc_curr_min, tsc_curr_max);
 }
 
 /**
-- 
2.17.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  parent reply	other threads:[~2022-05-05 23:58 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-05 23:59 [PATCH v6 00/29] x86: Implement an HPET-based hardlockup detector Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 01/29] irq/matrix: Expose functions to allocate the best CPU for new vectors Ricardo Neri
2022-05-06 19:48   ` Thomas Gleixner
2022-05-12  0:09     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 02/29] x86/apic: Add irq_cfg::delivery_mode Ricardo Neri
2022-05-06 19:53   ` Thomas Gleixner
2022-05-12  0:26     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 03/29] x86/apic/msi: Set the delivery mode individually for each IRQ Ricardo Neri
2022-05-06 20:05   ` Thomas Gleixner
2022-05-12  0:38     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 04/29] x86/apic: Add the X86_IRQ_ALLOC_AS_NMI irq allocation flag Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 05/29] x86/apic/vector: Do not allocate vectors for NMIs Ricardo Neri
2022-05-06 21:12   ` Thomas Gleixner
2022-05-13 18:03     ` Ricardo Neri
2022-05-13 20:50       ` Thomas Gleixner
2022-05-13 23:45         ` Ricardo Neri
2022-05-14  8:15           ` Thomas Gleixner
2022-05-05 23:59 ` [PATCH v6 06/29] x86/apic/vector: Implement support for NMI delivery mode Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 07/29] iommu/vt-d: Clear the redirection hint when the destination mode is physical Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 08/29] iommu/vt-d: Rework prepare_irte() to support per-IRQ delivery mode Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 09/29] iommu/vt-d: Set the IRTE delivery mode individually for each IRQ Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 10/29] iommu/vt-d: Implement minor tweaks for NMI irqs Ricardo Neri
2022-05-06 21:23   ` Thomas Gleixner
2022-05-13 18:07     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 11/29] iommu/amd: Expose [set|get]_dev_entry_bit() Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 12/29] iommu/amd: Enable NMIPass when allocating an NMI irq Ricardo Neri
2022-05-06 21:26   ` Thomas Gleixner
2022-05-13 19:01     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 13/29] iommu/amd: Compose MSI messages for NMI irqs in non-IR format Ricardo Neri
2022-05-06 21:31   ` Thomas Gleixner
2022-05-13 19:03     ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 14/29] x86/hpet: Expose hpet_writel() in header Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 15/29] x86/hpet: Add helper function hpet_set_comparator_periodic() Ricardo Neri
2022-05-06 21:41   ` Thomas Gleixner
2022-05-06 21:51     ` Thomas Gleixner
2022-05-13 21:29       ` Ricardo Neri
2022-05-13 21:19     ` Ricardo Neri
2022-05-14  8:17       ` Thomas Gleixner
2022-05-17 22:54         ` Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 16/29] x86/hpet: Prepare IRQ assignments to use the X86_ALLOC_AS_NMI flag Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 17/29] x86/hpet: Reserve an HPET channel for the hardlockup detector Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 18/29] watchdog/hardlockup: Define a generic function to detect hardlockups Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 19/29] watchdog/hardlockup: Decouple the hardlockup detector from perf Ricardo Neri
2022-05-05 23:59 ` [PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init() Ricardo Neri
2022-05-10 10:38   ` Nicholas Piggin
2022-05-13 23:16     ` Ricardo Neri
2022-05-20  0:25       ` Nicholas Piggin
2022-05-06  0:00 ` [PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category Ricardo Neri
2022-05-09 13:59   ` Thomas Gleixner
2022-05-17 18:41     ` Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 22/29] x86/watchdog/hardlockup: Add an HPET-based hardlockup detector Ricardo Neri
2022-05-09 14:03   ` Thomas Gleixner
2022-05-13 22:16     ` Ricardo Neri
2022-05-14 14:04       ` Thomas Gleixner
2022-05-06  0:00 ` Ricardo Neri [this message]
2022-05-06  0:00 ` [PATCH v6 24/29] watchdog/hardlockup: Use parse_option_str() to handle "nmi_watchdog" Ricardo Neri
2022-05-10 10:46   ` Nicholas Piggin
2022-05-13 23:17     ` Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 25/29] watchdog/hardlockup/hpet: Only enable the HPET watchdog via a boot parameter Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 26/29] x86/watchdog: Add a shim hardlockup detector Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 27/29] watchdog: Expose lockup_detector_reconfigure() Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 28/29] x86/tsc: Restart NMI watchdog after refining tsc_khz Ricardo Neri
2022-05-10 11:16   ` Nicholas Piggin
2022-05-10 11:44     ` Thomas Gleixner
2022-05-17 22:53       ` Ricardo Neri
2022-05-17 22:08     ` Ricardo Neri
2022-05-06  0:00 ` [PATCH v6 29/29] x86/tsc: Switch to perf-based hardlockup detector if TSC become unstable Ricardo Neri
2022-05-10 12:14   ` Nicholas Piggin
2022-05-17  3:09     ` Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220506000008.30892-24-ricardo.neri-calderon@linux.intel.com \
    --to=ricardo.neri-calderon@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dwmw2@infradead.org \
    --cc=eranian@google.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox