linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* (no subject)
@ 2024-06-11 16:54 Jacob Pan
  2024-06-11 16:54 ` [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Jacob Pan @ 2024-06-11 16:54 UTC (permalink / raw)
  To: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li

From e52010df700cde894633c45c0b364847e63a9819 Mon Sep 17 00:00:00 2001
From: Jacob Pan <jacob.jun.pan@linux.intel.com>
Date: Tue, 11 Jun 2024 09:49:17 -0700
Subject: Subject: [PATCH v2 0/6] Add support for NMI source reporting

Hi Thomas and all,

Non-Maskable Interrupts (NMIs) are routed to the local Advanced Programmable
Interrupt Controller (APIC) using vector #2. Before the advent of the
Flexible Return and Event Delivery (FRED)[1], the vector information set by
the NMI initiator was disregarded or lost within the hardware, compelling
system software to poll every registered NMI handler to pinpoint the source
of the NMI[2]. This approach led to several issues:

1.	Inefficiency due to the CPU's time spent polling all handlers.
2.	Increased latency from the additional time taken to poll all handlers.
3.	The occurrence of unnecessary NMIs if they are triggered shortly
	after being processed by a different source.

To tackle these challenges, Intel introduced NMI source reporting as a part
of the FRED specification (detailed in Chapter 9). This CPU feature ensures
that while all NMI sources are still aggregated into NMI vector (#2) for
delivery, the source of the NMI is now conveyed through FRED event data
(a 16-bit bitmap on the stack). This allows for the selective dispatch
of the NMI source handler based on the bitmap, eliminating the need to
invoke all NMI source handlers indiscriminately.

In line with the hardware architecture, various interrupt sources can
generate NMIs by encoding an NMI delivery mode. However, this patchset
activates only the local NMI sources that are currently utilized by the
Linux kernel, which includes:

1.	Performance monitoring.
2.	Inter-Processor Interrupts (IPIs) for functions like CPU backtrace,
	machine check, Kernel GNU Debugger (KGDB), reboot, panic stop, and
	self-test.

Other NMI sources will continue to be handled as previously when the NMI
source is not utilized or remains unidentified.

Next steps:
1. KVM support
2. Optimization to reuse IDT NMI vector 2 as NMI source for "known" source.
Link:https://lore.kernel.org/lkml/746fecd5-4c79-42f9-919e-912ec415e73f@zytor.com/


[1] https://www.intel.com/content/www/us/en/content-details/779982/flexible-return-and-event-delivery-fred-specification.html
[2] https://lore.kernel.org/lkml/171011362209.2468526.15187874627966416701.tglx@xen13/


Thanks,

Jacob

---
Change logs are in individual patches.

Jacob Pan (6):
  x86/irq: Add enumeration of NMI source reporting CPU feature
  x86/irq: Extend NMI handler registration interface to include source
  x86/irq: Factor out common NMI handling code
  x86/irq: Process nmi sources in NMI handler
  perf/x86: Enable NMI source reporting for perfmon
  x86/irq: Enable NMI source on IPIs delivered as NMI

 arch/x86/Kconfig                         |  9 +++
 arch/x86/events/amd/ibs.c                |  2 +-
 arch/x86/events/core.c                   | 11 ++-
 arch/x86/events/intel/core.c             |  6 +-
 arch/x86/include/asm/apic.h              |  1 +
 arch/x86/include/asm/cpufeatures.h       |  1 +
 arch/x86/include/asm/disabled-features.h |  8 +-
 arch/x86/include/asm/irq_vectors.h       | 38 +++++++++
 arch/x86/include/asm/nmi.h               |  4 +-
 arch/x86/kernel/apic/hw_nmi.c            |  5 +-
 arch/x86/kernel/apic/ipi.c               |  4 +-
 arch/x86/kernel/apic/local.h             | 18 +++--
 arch/x86/kernel/cpu/mce/inject.c         |  4 +-
 arch/x86/kernel/cpu/mshyperv.c           |  2 +-
 arch/x86/kernel/kgdb.c                   |  6 +-
 arch/x86/kernel/nmi.c                    | 99 +++++++++++++++++++++---
 arch/x86/kernel/nmi_selftest.c           |  7 +-
 arch/x86/kernel/reboot.c                 |  4 +-
 arch/x86/kernel/smp.c                    |  4 +-
 arch/x86/kernel/traps.c                  |  4 +-
 arch/x86/platform/uv/uv_nmi.c            |  4 +-
 drivers/acpi/apei/ghes.c                 |  2 +-
 drivers/char/ipmi/ipmi_watchdog.c        |  2 +-
 drivers/edac/igen6_edac.c                |  2 +-
 drivers/watchdog/hpwdt.c                 |  6 +-
 25 files changed, 200 insertions(+), 53 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-11 16:54 Jacob Pan
@ 2024-06-11 16:54 ` Jacob Pan
  2024-06-12  2:32   ` Xin Li
  2024-06-21 22:23   ` Sohil Mehta
  2024-06-11 16:54 ` [PATCH v2 2/6] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Jacob Pan @ 2024-06-11 16:54 UTC (permalink / raw)
  To: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li, Jacob Pan

The lack of a mechanism to pinpoint the origins of Non-Maskable Interrupts
(NMIs) necessitates that the NMI vector 2 handler consults each NMI source
handler individually. This approach leads to inefficiencies, delays, and
the occurrence of unnecessary NMIs, thereby also constraining the potential
applications of NMIs.

A new CPU feature, known as NMI source reporting, has been introduced as
part of the Flexible Return and Event Delivery (FRED) spec. This feature
enables the NMI vector 2 handler to directly obtain information about the
NMI source from the FRED event data.

The functionality of NMI source reporting is tied to the FRED. Although it
is enumerated by a unique CPUID feature bit, it cannot be turned off
independently once FRED is activated.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v2: Removed NMI source from static CPU ID dependency table (HPA)
---
 arch/x86/Kconfig                         | 9 +++++++++
 arch/x86/include/asm/cpufeatures.h       | 1 +
 arch/x86/include/asm/disabled-features.h | 8 +++++++-
 arch/x86/kernel/traps.c                  | 4 +++-
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d7122a1883e..b8b15f20b94e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -511,12 +511,21 @@ config X86_CPU_RESCTRL
 config X86_FRED
 	bool "Flexible Return and Event Delivery"
 	depends on X86_64
+	select X86_NMI_SOURCE
 	help
 	  When enabled, try to use Flexible Return and Event Delivery
 	  instead of the legacy SYSCALL/SYSENTER/IDT architecture for
 	  ring transitions and exception/interrupt handling if the
 	  system supports it.
 
+config X86_NMI_SOURCE
+	def_bool n
+	help
+	  Once enabled, information on NMI originator/source can be provided
+	  via FRED event data. This makes NMI processing more efficient in that
+	  NMI handler does not need to check for every possible source at
+	  runtime when NMI is delivered.
+
 config X86_BIGSMP
 	bool "Support for big SMP systems with more than 8 CPUs"
 	depends on SMP && X86_32
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..ec78d361e685 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -327,6 +327,7 @@
 #define X86_FEATURE_FRED		(12*32+17) /* Flexible Return and Event Delivery */
 #define X86_FEATURE_LKGS		(12*32+18) /* "" Load "kernel" (userspace) GS */
 #define X86_FEATURE_WRMSRNS		(12*32+19) /* "" Non-serializing WRMSR */
+#define X86_FEATURE_NMI_SOURCE		(12*32+20) /* NMI source reporting */
 #define X86_FEATURE_AMX_FP16		(12*32+21) /* "" AMX fp16 Support */
 #define X86_FEATURE_AVX_IFMA            (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
 #define X86_FEATURE_LAM			(12*32+26) /* Linear Address Masking */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index c492bdc97b05..3856c4737d65 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -123,6 +123,12 @@
 # define DISABLE_FRED	(1 << (X86_FEATURE_FRED & 31))
 #endif
 
+#ifdef CONFIG_X86_NMI_SOURCE
+# define DISABLE_NMI_SOURCE	0
+#else
+# define DISABLE_NMI_SOURCE	(1 << (X86_FEATURE_NMI_SOURCE & 31))
+#endif
+
 #ifdef CONFIG_KVM_AMD_SEV
 #define DISABLE_SEV_SNP		0
 #else
@@ -145,7 +151,7 @@
 #define DISABLED_MASK10	0
 #define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
 			 DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
-#define DISABLED_MASK12	(DISABLE_FRED|DISABLE_LAM)
+#define DISABLED_MASK12	(DISABLE_FRED|DISABLE_LAM|DISABLE_NMI_SOURCE)
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4fa0b17e5043..465f04e4a79f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1427,8 +1427,10 @@ early_param("fred", fred_setup);
 
 void __init trap_init(void)
 {
-	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred)
+	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred) {
 		setup_clear_cpu_cap(X86_FEATURE_FRED);
+		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
+	}
 
 	/* Init cpu_entry_area before IST entries are set up */
 	setup_cpu_entry_areas();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/6] x86/irq: Extend NMI handler registration interface to include source
  2024-06-11 16:54 Jacob Pan
  2024-06-11 16:54 ` [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
@ 2024-06-11 16:54 ` Jacob Pan
  2024-06-24 23:16   ` Sohil Mehta
  2024-06-11 16:54 ` [PATCH v2 3/6] x86/irq: Factor out common NMI handling code Jacob Pan
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2024-06-11 16:54 UTC (permalink / raw)
  To: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li, Jacob Pan

Add a source vector argument to register_nmi_handler() such that designated
NMI originators can leverage NMI source reporting feature. For those who
do not use NMI source reporting, 0 (unknown) is used as the source vector. NMI
source vectors (up to 16) are pre-defined.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

---
v2:(address review comments from HPA, not including optimizations"
   - Reserve IDT NMI vector 2 in case of devices use hardcoded vector 2
   - Sort NMI source vector by priority in descending order
---
 arch/x86/events/amd/ibs.c          |  2 +-
 arch/x86/events/core.c             |  3 ++-
 arch/x86/include/asm/irq_vectors.h | 28 ++++++++++++++++++++++++++++
 arch/x86/include/asm/nmi.h         |  4 +++-
 arch/x86/kernel/apic/hw_nmi.c      |  2 +-
 arch/x86/kernel/cpu/mce/inject.c   |  2 +-
 arch/x86/kernel/cpu/mshyperv.c     |  2 +-
 arch/x86/kernel/kgdb.c             |  4 ++--
 arch/x86/kernel/nmi.c              | 22 ++++++++++++++++++++++
 arch/x86/kernel/nmi_selftest.c     |  5 +++--
 arch/x86/kernel/reboot.c           |  2 +-
 arch/x86/kernel/smp.c              |  2 +-
 arch/x86/platform/uv/uv_nmi.c      |  4 ++--
 drivers/acpi/apei/ghes.c           |  2 +-
 drivers/char/ipmi/ipmi_watchdog.c  |  2 +-
 drivers/edac/igen6_edac.c          |  2 +-
 drivers/watchdog/hpwdt.c           |  6 +++---
 17 files changed, 74 insertions(+), 20 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d62..20989071f59a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1246,7 +1246,7 @@ static __init int perf_event_ibs_init(void)
 	if (ret)
 		goto err_op;
 
-	ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
+	ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs", 0);
 	if (ret)
 		goto err_nmi;
 
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07b1ef1..1ef2201e48ac 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2100,7 +2100,8 @@ static int __init init_hw_perf_events(void)
 		x86_pmu.intel_ctrl = (1 << x86_pmu.num_counters) - 1;
 
 	perf_events_lapic_init();
-	register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI");
+
+	register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI", NMI_SOURCE_VEC_PMI);
 
 	unconstrained = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 13aea8fc3d45..7629319428e5 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -105,6 +105,34 @@
 
 #define NR_VECTORS			 256
 
+/*
+ * The NMI senders specify the NMI source vector as an 8bit integer in their
+ * vector field with NMI delivery mode. A local APIC receiving an NMI will
+ * set the corresponding bit in a 16bit bitmask, which is accumulated until
+ * the NMI is delivered.
+ * When a sender didn't specify an NMI source vector the source vector will
+ * be 0, which will result in bit 0 of the bitmask being set. For out of
+ * bounds vectors >= 16 bit 0 will also be set.
+ * When bit 0 is set, system software must invoke all registered NMI handlers
+ * as if NMI source feature is not enabled.
+ *
+ * Vector 2 is reserved for matching IDT NMI vector where it may be hardcoded
+ * by some external devices.
+ * 
+ * The NMI source vectors are sorted by descending priority with the exceptions
+ * of 0 and 2.
+ */
+#define NMI_SOURCE_VEC_UNKNOWN		0
+#define NMI_SOURCE_VEC_IPI_REBOOT	1	/* Crash reboot */
+#define NMI_SOURCE_VEC_IDT_NMI		2	/* Match IDT NMI vector 2 */
+#define NMI_SOURCE_VEC_IPI_SMP_STOP	3	/* Panic stop CPU */
+#define NMI_SOURCE_VEC_IPI_BT		4	/* CPU backtrace */
+#define NMI_SOURCE_VEC_PMI		5	/* PerfMon counters */
+#define NMI_SOURCE_VEC_IPI_KGDB		6	/* KGDB */
+#define NMI_SOURCE_VEC_IPI_MCE		7	/* MCE injection */
+#define NMI_SOURCE_VEC_IPI_TEST		8	/* For remote and local IPIs */
+#define NR_NMI_SOURCE_VECTORS		9
+
 #ifdef CONFIG_X86_LOCAL_APIC
 #define FIRST_SYSTEM_VECTOR		POSTED_MSI_NOTIFICATION_VECTOR
 #else
diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 41a0ebb699ec..6fe26fea30eb 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -39,15 +39,17 @@ struct nmiaction {
 	u64			max_duration;
 	unsigned long		flags;
 	const char		*name;
+	unsigned int		source_vec;
 };
 
-#define register_nmi_handler(t, fn, fg, n, init...)	\
+#define register_nmi_handler(t, fn, fg, n, src, init...)	\
 ({							\
 	static struct nmiaction init fn##_na = {	\
 		.list = LIST_HEAD_INIT(fn##_na.list),	\
 		.handler = (fn),			\
 		.name = (n),				\
 		.flags = (fg),				\
+		.source_vec = (src),			\
 	};						\
 	__register_nmi_handler((t), &fn##_na);		\
 })
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 45af535c44a0..9f0125d3b8b0 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -54,7 +54,7 @@ NOKPROBE_SYMBOL(nmi_cpu_backtrace_handler);
 static int __init register_nmi_cpu_backtrace_handler(void)
 {
 	register_nmi_handler(NMI_LOCAL, nmi_cpu_backtrace_handler,
-				0, "arch_bt");
+				0, "arch_bt", NMI_SOURCE_VEC_IPI_BT);
 	return 0;
 }
 early_initcall(register_nmi_cpu_backtrace_handler);
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 94953d749475..365a03f11d06 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -769,7 +769,7 @@ static int __init inject_init(void)
 
 	debugfs_init();
 
-	register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0, "mce_notify");
+	register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0, "mce_notify", NMI_SOURCE_VEC_IPI_MCE);
 	mce_register_injector_chain(&inject_nb);
 
 	setup_inj_struct(&i_mce);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e0fd57a8ba84..2fb9408a8ba9 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -486,7 +486,7 @@ static void __init ms_hyperv_init_platform(void)
 	}
 
 	register_nmi_handler(NMI_UNKNOWN, hv_nmi_unknown, NMI_FLAG_FIRST,
-			     "hv_nmi_unknown");
+			     "hv_nmi_unknown", 0);
 #endif
 
 #ifdef CONFIG_X86_IO_APIC
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 9c9faa1634fb..d167eb23cf13 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -603,12 +603,12 @@ int kgdb_arch_init(void)
 		goto out;
 
 	retval = register_nmi_handler(NMI_LOCAL, kgdb_nmi_handler,
-					0, "kgdb");
+					0, "kgdb", NMI_SOURCE_VEC_IPI_KGDB);
 	if (retval)
 		goto out1;
 
 	retval = register_nmi_handler(NMI_UNKNOWN, kgdb_nmi_handler,
-					0, "kgdb");
+					0, "kgdb", 0);
 
 	if (retval)
 		goto out2;
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index ed163c8c8604..1ebe93edba7a 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -86,6 +86,12 @@ static DEFINE_PER_CPU(struct nmi_stats, nmi_stats);
 
 static int ignore_nmis __read_mostly;
 
+/*
+ * Contains all actions registered by originators with source vector,
+ * excluding UNKNOWN NMI source vector 0.
+ */
+static struct nmiaction *nmiaction_src_table[NR_NMI_SOURCE_VECTORS - 1];
+
 int unknown_nmi_panic;
 /*
  * Prevent NMI reason port (0x61) being accessed simultaneously, can
@@ -163,6 +169,12 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(nmi_handle);
 
+static inline bool use_nmi_source(unsigned int type, struct nmiaction *a)
+{
+	return (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) &&
+		type == NMI_LOCAL && a->source_vec);
+}
+
 int __register_nmi_handler(unsigned int type, struct nmiaction *action)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
@@ -173,6 +185,11 @@ int __register_nmi_handler(unsigned int type, struct nmiaction *action)
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
 
+	if (use_nmi_source(type, action)) {
+		rcu_assign_pointer(nmiaction_src_table[action->source_vec], action);
+		pr_info("NMI source %d registered for %s\n", action->source_vec, action->name);
+	}
+
 	/*
 	 * Indicate if there are multiple registrations on the
 	 * internal NMI handler call chains (SERR and IO_CHECK).
@@ -210,6 +227,11 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 		if (!strcmp(n->name, name)) {
 			WARN(in_nmi(),
 				"Trying to free NMI (%s) from NMI context!\n", n->name);
+			if (use_nmi_source(type, n)) {
+				rcu_assign_pointer(nmiaction_src_table[n->source_vec], NULL);
+				pr_info("NMI source %d unregistered for %s\n", n->source_vec, n->name);
+			}
+
 			list_del_rcu(&n->list);
 			found = n;
 			break;
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index e93a8545c74d..f014c8a66b0c 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -44,7 +44,7 @@ static void __init init_nmi_testsuite(void)
 {
 	/* trap all the unknown NMIs we may generate */
 	register_nmi_handler(NMI_UNKNOWN, nmi_unk_cb, 0, "nmi_selftest_unk",
-			__initdata);
+			0, __initdata);
 }
 
 static void __init cleanup_nmi_testsuite(void)
@@ -67,7 +67,8 @@ static void __init test_nmi_ipi(struct cpumask *mask)
 	unsigned long timeout;
 
 	if (register_nmi_handler(NMI_LOCAL, test_nmi_ipi_callback,
-				 NMI_FLAG_FIRST, "nmi_selftest", __initdata)) {
+				 NMI_FLAG_FIRST, "nmi_selftest", NMI_SOURCE_VEC_IPI_TEST,
+				 __initdata)) {
 		nmi_fail = FAILURE;
 		return;
 	}
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f3130f762784..acc19c1d3b4f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -910,7 +910,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
 	/* Would it be better to replace the trap vector here? */
 	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
-				 NMI_FLAG_FIRST, "crash"))
+				 NMI_FLAG_FIRST, "crash", NMI_SOURCE_VEC_IPI_REBOOT))
 		return;		/* Return what? */
 	/*
 	 * Ensure the new callback function is set before sending
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 18266cc3d98c..f27469e40141 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -143,7 +143,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
 static int register_stop_handler(void)
 {
 	return register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
-				    NMI_FLAG_FIRST, "smp_stop");
+				    NMI_FLAG_FIRST, "smp_stop", NMI_SOURCE_VEC_IPI_SMP_STOP);
 }
 
 static void native_stop_other_cpus(int wait)
diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 5c50e550ab63..473c34eb264c 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -1029,10 +1029,10 @@ static int uv_handle_nmi_ping(unsigned int reason, struct pt_regs *regs)
 
 static void uv_register_nmi_notifier(void)
 {
-	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
+	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv", 0))
 		pr_warn("UV: NMI handler failed to register\n");
 
-	if (register_nmi_handler(NMI_LOCAL, uv_handle_nmi_ping, 0, "uvping"))
+	if (register_nmi_handler(NMI_LOCAL, uv_handle_nmi_ping, 0, "uvping", 0))
 		pr_warn("UV: PING NMI handler failed to register\n");
 }
 
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 623cc0cb4a65..393dca95d2b3 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1318,7 +1318,7 @@ static void ghes_nmi_add(struct ghes *ghes)
 {
 	mutex_lock(&ghes_list_mutex);
 	if (list_empty(&ghes_nmi))
-		register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
+		register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes", 0);
 	list_add_rcu(&ghes->list, &ghes_nmi);
 	mutex_unlock(&ghes_list_mutex);
 }
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index 9a459257489f..61bb5dcade5a 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -1272,7 +1272,7 @@ static void check_parms(void)
 	}
 	if (do_nmi && !nmi_handler_registered) {
 		rv = register_nmi_handler(NMI_UNKNOWN, ipmi_nmi, 0,
-						"ipmi");
+						"ipmi", 0);
 		if (rv) {
 			pr_warn("Can't register nmi handler\n");
 			return;
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index dbe9fe5f2ca6..891278245d8b 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1321,7 +1321,7 @@ static int register_err_handler(void)
 	}
 
 	rc = register_nmi_handler(NMI_SERR, ecclog_nmi_handler,
-				  0, IGEN6_NMI_NAME);
+				  0, IGEN6_NMI_NAME, 0);
 	if (rc) {
 		igen6_printk(KERN_ERR, "Failed to register NMI handler\n");
 		return rc;
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index ae30e394d176..5246706afcf6 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -242,13 +242,13 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
 	/*
 	 * Only one function can register for NMI_UNKNOWN
 	 */
-	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout, 0, "hpwdt");
+	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout, 0, "hpwdt", 0);
 	if (retval)
 		goto error;
-	retval = register_nmi_handler(NMI_SERR, hpwdt_pretimeout, 0, "hpwdt");
+	retval = register_nmi_handler(NMI_SERR, hpwdt_pretimeout, 0, "hpwdt", 0);
 	if (retval)
 		goto error1;
-	retval = register_nmi_handler(NMI_IO_CHECK, hpwdt_pretimeout, 0, "hpwdt");
+	retval = register_nmi_handler(NMI_IO_CHECK, hpwdt_pretimeout, 0, "hpwdt", 0);
 	if (retval)
 		goto error2;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/6] x86/irq: Factor out common NMI handling code
  2024-06-11 16:54 Jacob Pan
  2024-06-11 16:54 ` [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
  2024-06-11 16:54 ` [PATCH v2 2/6] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
@ 2024-06-11 16:54 ` Jacob Pan
  2024-06-11 16:54 ` [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler Jacob Pan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2024-06-11 16:54 UTC (permalink / raw)
  To: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li, Jacob Pan

In preparation for handling NMIs with explicit source reporting, factor
out common code for reuse.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/kernel/nmi.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 1ebe93edba7a..639a34e78bc9 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -135,6 +135,20 @@ static void nmi_check_duration(struct nmiaction *action, u64 duration)
 		action->handler, duration, decimal_msecs);
 }
 
+static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsigned int type)
+{
+	int thishandled;
+	u64 delta;
+
+	delta = sched_clock();
+	thishandled = a->handler(type, regs);
+	delta = sched_clock() - delta;
+	trace_nmi_handler(a->handler, (int)delta, thishandled);
+	nmi_check_duration(a, delta);
+
+	return thishandled;
+}
+
 static int nmi_handle(unsigned int type, struct pt_regs *regs)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
@@ -149,18 +163,8 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
 	 * can be latched at any given time.  Walk the whole list
 	 * to handle those situations.
 	 */
-	list_for_each_entry_rcu(a, &desc->head, list) {
-		int thishandled;
-		u64 delta;
-
-		delta = sched_clock();
-		thishandled = a->handler(type, regs);
-		handled += thishandled;
-		delta = sched_clock() - delta;
-		trace_nmi_handler(a->handler, (int)delta, thishandled);
-
-		nmi_check_duration(a, delta);
-	}
+	list_for_each_entry_rcu(a, &desc->head, list)
+		handled += do_handle_nmi(a, regs, type);
 
 	rcu_read_unlock();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler
  2024-06-11 16:54 Jacob Pan
                   ` (2 preceding siblings ...)
  2024-06-11 16:54 ` [PATCH v2 3/6] x86/irq: Factor out common NMI handling code Jacob Pan
@ 2024-06-11 16:54 ` Jacob Pan
  2024-06-11 18:41   ` H. Peter Anvin
  2024-06-24 23:53   ` Sohil Mehta
  2024-06-11 16:54 ` [PATCH v2 5/6] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Jacob Pan @ 2024-06-11 16:54 UTC (permalink / raw)
  To: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li, Jacob Pan

With NMI source reporting enabled, NMI handler can prioritize the
handling of sources reported explicitly. If the source is unknown, then
resume the existing processing flow. i.e. invoke all NMI handlers.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

---
v2:
   - Disable NMI source reporting once garbage data is given in FRED
return stack. (HPA)
---
 arch/x86/kernel/nmi.c | 49 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 639a34e78bc9..2c391fd59c34 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -149,12 +149,61 @@ static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsig
 	return thishandled;
 }
 
+static inline int nmi_handle_src(unsigned int type, struct pt_regs *regs)
+{
+	unsigned long source_bitmask;
+	struct nmiaction *a;
+	int handled = 0;
+	int vec = 1;
+
+	if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) || type != NMI_LOCAL)
+		return 0;
+
+	source_bitmask = fred_event_data(regs);
+	if (!source_bitmask) {
+		pr_warn_ratelimited("NMI without source information! Disable source reporting.\n");
+		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
+		return 0;
+	}
+
+	/*
+	 * Per NMI source specification, there is no guarantee that a valid
+	 * NMI vector is always delivered, even when the source specified
+	 * one. It is software's responsibility to check all available NMI
+	 * sources when bit 0 is set in the NMI source bitmap. i.e. we have
+	 * to call every handler as if we have no NMI source.
+	 * On the other hand, if we do get non-zero vectors, we know exactly
+	 * what the sources are. So we only call the handlers with the bit set.
+	 */
+	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
+		pr_warn_ratelimited("NMI received with unknown source\n");
+		return 0;
+	}
+
+	rcu_read_lock();
+	/* Bit 0 is for unknown NMI sources, skip it. */
+	for_each_set_bit_from(vec, &source_bitmask, NR_NMI_SOURCE_VECTORS) {
+		a = rcu_dereference(nmiaction_src_table[vec]);
+		if (!a) {
+			pr_warn_ratelimited("NMI received %d no handler", vec);
+			continue;
+		}
+		handled += do_handle_nmi(a, regs, type);
+	}
+	rcu_read_unlock();
+	return handled;
+}
+
 static int nmi_handle(unsigned int type, struct pt_regs *regs)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
 	struct nmiaction *a;
 	int handled=0;
 
+	handled = nmi_handle_src(type, regs);
+	if (handled)
+		return handled;
+
 	rcu_read_lock();
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 5/6] perf/x86: Enable NMI source reporting for perfmon
  2024-06-11 16:54 Jacob Pan
                   ` (3 preceding siblings ...)
  2024-06-11 16:54 ` [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler Jacob Pan
@ 2024-06-11 16:54 ` Jacob Pan
  2024-06-11 19:10   ` H. Peter Anvin
  2024-06-11 16:54 ` [PATCH v2 6/6] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
  2024-06-12  2:04 ` Sean Christopherson
  6 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2024-06-11 16:54 UTC (permalink / raw)
  To: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li, Jacob Pan, Zeng Guang

Program the designated NMI source vector into the performance monitoring
interrupt (PMI) of the local vector table. PMI handler will be directly
invoked when its NMI is generated. This avoids the latency of calling all
NMI handlers blindly.

Co-developed-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

---
v2: Fix a compile error apic_perfmon_ctr is undefined in i386 config
---
 arch/x86/events/core.c       | 8 ++++++--
 arch/x86/events/intel/core.c | 6 +++---
 arch/x86/include/asm/apic.h  | 1 +
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1ef2201e48ac..db8c30881f5c 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -46,6 +46,7 @@
 
 struct x86_pmu x86_pmu __read_mostly;
 static struct pmu pmu;
+u32 apic_perfmon_ctr = APIC_DM_NMI;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	.enabled = 1,
@@ -1680,7 +1681,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 	 * This generic handler doesn't seem to have any issues where the
 	 * unmasking occurs so it was left at the top.
 	 */
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	apic_write(APIC_LVTPC, apic_perfmon_ctr);
 
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
 		if (!test_bit(idx, cpuc->active_mask))
@@ -1723,7 +1724,10 @@ void perf_events_lapic_init(void)
 	/*
 	 * Always use NMI for PMU
 	 */
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
+		apic_perfmon_ctr |= NMI_SOURCE_VEC_PMI;
+
+	apic_write(APIC_LVTPC, apic_perfmon_ctr);
 }
 
 static int
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 38c1b1f1deaa..b4a70457c678 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3093,7 +3093,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	 * NMI handler.
 	 */
 	if (!late_ack && !mid_ack)
-		apic_write(APIC_LVTPC, APIC_DM_NMI);
+		apic_write(APIC_LVTPC, apic_perfmon_ctr);
 	intel_bts_disable_local();
 	cpuc->enabled = 0;
 	__intel_pmu_disable_all(true);
@@ -3130,7 +3130,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 
 done:
 	if (mid_ack)
-		apic_write(APIC_LVTPC, APIC_DM_NMI);
+		apic_write(APIC_LVTPC, apic_perfmon_ctr);
 	/* Only restore PMU state when it's active. See x86_pmu_disable(). */
 	cpuc->enabled = pmu_enabled;
 	if (pmu_enabled)
@@ -3143,7 +3143,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	 * Haswell CPUs.
 	 */
 	if (late_ack)
-		apic_write(APIC_LVTPC, APIC_DM_NMI);
+		apic_write(APIC_LVTPC, apic_perfmon_ctr);
 	return handled;
 }
 
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 9327eb00e96d..bcf8d17240c8 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -49,6 +49,7 @@ static inline void x86_32_probe_apic(void) { }
 #endif
 
 extern u32 cpuid_to_apicid[];
+extern u32 apic_perfmon_ctr;
 
 #define CPU_ACPIID_INVALID	U32_MAX
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 6/6] x86/irq: Enable NMI source on IPIs delivered as NMI
  2024-06-11 16:54 Jacob Pan
                   ` (4 preceding siblings ...)
  2024-06-11 16:54 ` [PATCH v2 5/6] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
@ 2024-06-11 16:54 ` Jacob Pan
  2024-06-12  2:04 ` Sean Christopherson
  6 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2024-06-11 16:54 UTC (permalink / raw)
  To: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li, Jacob Pan

Program designated NMI source vectors for all NMI delivered IPIs
such that their handlers can be selectively invoked.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/irq_vectors.h | 10 ++++++++++
 arch/x86/kernel/apic/hw_nmi.c      |  3 ++-
 arch/x86/kernel/apic/ipi.c         |  4 ++--
 arch/x86/kernel/apic/local.h       | 18 ++++++++++++------
 arch/x86/kernel/cpu/mce/inject.c   |  2 +-
 arch/x86/kernel/kgdb.c             |  2 +-
 arch/x86/kernel/nmi_selftest.c     |  2 +-
 arch/x86/kernel/reboot.c           |  2 +-
 arch/x86/kernel/smp.c              |  2 +-
 9 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 7629319428e5..0b459fd2aa4e 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -133,6 +133,16 @@
 #define NMI_SOURCE_VEC_IPI_TEST		8	/* For remote and local IPIs */
 #define NR_NMI_SOURCE_VECTORS		9
 
+/*
+ * When programming the local APIC, IDT NMI vector and NMI source vector
+ * are encoded in a single 32 bit variable. The top 16 bits contain
+ * the NMI source vector and the bottom 16 bits contain NMI_VECTOR (2)
+ * The top 16 bits are always zero when NMI source feature is not enabled
+ * or the caller does not use NMI source.
+ */
+#define NMI_VECTOR_WITH_SOURCE(src)	(NMI_VECTOR | (src << 16))
+#define NMI_SOURCE_VEC_MASK		GENMASK(15, 0)
+
 #ifdef CONFIG_X86_LOCAL_APIC
 #define FIRST_SYSTEM_VECTOR		POSTED_MSI_NOTIFICATION_VECTOR
 #else
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 9f0125d3b8b0..f73ca95d961e 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -20,6 +20,7 @@
 #include <linux/nmi.h>
 #include <linux/init.h>
 #include <linux/delay.h>
+#include <asm/irq_vectors.h>
 
 #include "local.h"
 
@@ -33,7 +34,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
 #ifdef arch_trigger_cpumask_backtrace
 static void nmi_raise_cpu_backtrace(cpumask_t *mask)
 {
-	__apic_send_IPI_mask(mask, NMI_VECTOR);
+	__apic_send_IPI_mask(mask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_BT));
 }
 
 void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 5da693d633b7..9d2b18e58758 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -157,7 +157,7 @@ static void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
 	 * issues where otherwise the system hangs when the panic CPU tries
 	 * to stop the others before launching the kdump kernel.
 	 */
-	if (unlikely(vector == NMI_VECTOR))
+	if (unlikely(is_nmi_vector(vector)))
 		apic_mem_wait_icr_idle_timeout();
 	else
 		apic_mem_wait_icr_idle();
@@ -174,7 +174,7 @@ void __default_send_IPI_dest_field(unsigned int dest_mask, int vector,
 				   unsigned int dest_mode)
 {
 	/* See comment in __default_send_IPI_shortcut() */
-	if (unlikely(vector == NMI_VECTOR))
+	if (unlikely(is_nmi_vector(vector)))
 		apic_mem_wait_icr_idle_timeout();
 	else
 		apic_mem_wait_icr_idle();
diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h
index 842fe28496be..60e90b7bf058 100644
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -12,6 +12,7 @@
 
 #include <asm/irq_vectors.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 
 /* X2APIC */
 void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
@@ -26,19 +27,24 @@ extern u32 x2apic_max_apicid;
 
 DECLARE_STATIC_KEY_FALSE(apic_use_ipi_shorthand);
 
+static inline bool is_nmi_vector(int vector)
+{
+	return (vector & NMI_SOURCE_VEC_MASK) == NMI_VECTOR;
+}
+
 static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
 					 unsigned int dest)
 {
 	unsigned int icr = shortcut | dest;
 
-	switch (vector) {
-	default:
-		icr |= APIC_DM_FIXED | vector;
-		break;
-	case NMI_VECTOR:
+	if (is_nmi_vector(vector)) {
 		icr |= APIC_DM_NMI;
-		break;
+		if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
+			icr |= vector >> 16;
+	} else {
+		icr |= APIC_DM_FIXED | vector;
 	}
+
 	return icr;
 }
 
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 365a03f11d06..07bc6c29bd83 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -270,7 +270,7 @@ static void __maybe_unused raise_mce(struct mce *m)
 					mce_irq_ipi, NULL, 0);
 				preempt_enable();
 			} else if (m->inject_flags & MCJ_NMI_BROADCAST)
-				__apic_send_IPI_mask(mce_inject_cpumask, NMI_VECTOR);
+				__apic_send_IPI_mask(mce_inject_cpumask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_MCE));
 		}
 		start = jiffies;
 		while (!cpumask_empty(mce_inject_cpumask)) {
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index d167eb23cf13..02198cf9fe21 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -416,7 +416,7 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
  */
 void kgdb_roundup_cpus(void)
 {
-	apic_send_IPI_allbutself(NMI_VECTOR);
+	apic_send_IPI_allbutself(NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_KGDB));
 }
 #endif
 
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index f014c8a66b0c..5aa122d3368c 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -76,7 +76,7 @@ static void __init test_nmi_ipi(struct cpumask *mask)
 	/* sync above data before sending NMI */
 	wmb();
 
-	__apic_send_IPI_mask(mask, NMI_VECTOR);
+	__apic_send_IPI_mask(mask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_TEST));
 
 	/* Don't wait longer than a second */
 	timeout = USEC_PER_SEC;
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index acc19c1d3b4f..fb63bc0d6a0f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -918,7 +918,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	 */
 	wmb();
 
-	apic_send_IPI_allbutself(NMI_VECTOR);
+	apic_send_IPI_allbutself(NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_REBOOT));
 
 	/* Kick CPUs looping in NMI context. */
 	WRITE_ONCE(crash_ipi_issued, 1);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index f27469e40141..b79e78762a73 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -217,7 +217,7 @@ static void native_stop_other_cpus(int wait)
 			pr_emerg("Shutting down cpus with NMI\n");
 
 			for_each_cpu(cpu, &cpus_stop_mask)
-				__apic_send_IPI(cpu, NMI_VECTOR);
+				__apic_send_IPI(cpu, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_SMP_STOP));
 		}
 		/*
 		 * Don't wait longer than 10 ms if the caller didn't
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler
  2024-06-11 16:54 ` [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler Jacob Pan
@ 2024-06-11 18:41   ` H. Peter Anvin
  2024-06-12 21:54     ` Jacob Pan
  2024-06-24 23:53   ` Sohil Mehta
  1 sibling, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2024-06-11 18:41 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li

On 6/11/24 09:54, Jacob Pan wrote:
> +
> +	source_bitmask = fred_event_data(regs);
> +	if (!source_bitmask) {
> +		pr_warn_ratelimited("NMI without source information! Disable source reporting.\n");
> +		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
> +		return 0;
> +	}

Is setup_clear_cpu_cap() even meaningful here?

> +
> +	/*
> +	 * Per NMI source specification, there is no guarantee that a valid
> +	 * NMI vector is always delivered, even when the source specified
> +	 * one. It is software's responsibility to check all available NMI
> +	 * sources when bit 0 is set in the NMI source bitmap. i.e. we have
> +	 * to call every handler as if we have no NMI source.
> +	 * On the other hand, if we do get non-zero vectors, we know exactly
> +	 * what the sources are. So we only call the handlers with the bit set.
> +	 */
> +	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> +		pr_warn_ratelimited("NMI received with unknown source\n");
> +		return 0;
> +	}
> +

You can still dispatch the known NMI handlers early before doing the 
polling.

> +	rcu_read_lock();
> +	/* Bit 0 is for unknown NMI sources, skip it. */
> +	for_each_set_bit_from(vec, &source_bitmask, NR_NMI_SOURCE_VECTORS) {
> +		a = rcu_dereference(nmiaction_src_table[vec]);
> +		if (!a) {
> +			pr_warn_ratelimited("NMI received %d no handler", vec);
> +			continue;
> +		}
> +		handled += do_handle_nmi(a, regs, type);
> +	}
> +	rcu_read_unlock();
> +	return handled;
> +}
> +

That would mean that you would also need to return a bitmask of which 
source vectors need to be handled with polling.

	-hpa

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] perf/x86: Enable NMI source reporting for perfmon
  2024-06-11 16:54 ` [PATCH v2 5/6] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
@ 2024-06-11 19:10   ` H. Peter Anvin
  2024-06-12 20:27     ` Jacob Pan
  0 siblings, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2024-06-11 19:10 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li, Zeng Guang

On 6/11/24 09:54, Jacob Pan wrote:
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 1ef2201e48ac..db8c30881f5c 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -46,6 +46,7 @@
>   
>   struct x86_pmu x86_pmu __read_mostly;
>   static struct pmu pmu;
> +u32 apic_perfmon_ctr = APIC_DM_NMI;
>   
>   DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
>   	.enabled = 1,
> @@ -1680,7 +1681,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>   	 * This generic handler doesn't seem to have any issues where the
>   	 * unmasking occurs so it was left at the top.
>   	 */
> -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> +	apic_write(APIC_LVTPC, apic_perfmon_ctr);
>   
>   	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
>   		if (!test_bit(idx, cpuc->active_mask))
> @@ -1723,7 +1724,10 @@ void perf_events_lapic_init(void)
>   	/*
>   	 * Always use NMI for PMU
>   	 */
> -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> +	if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
> +		apic_perfmon_ctr |= NMI_SOURCE_VEC_PMI;
> +
> +	apic_write(APIC_LVTPC, apic_perfmon_ctr);
>   }
>
There really is no reason to not do this unconditinoally. If NMI source 
is not supported it is simply a noop.

	-hpa


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re:
  2024-06-11 16:54 Jacob Pan
                   ` (5 preceding siblings ...)
  2024-06-11 16:54 ` [PATCH v2 6/6] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
@ 2024-06-12  2:04 ` Sean Christopherson
  2024-06-12  2:55   ` Re: Xin Li
  6 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2024-06-12  2:04 UTC (permalink / raw)
  To: Jacob Pan
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra,
	Andi Kleen, Xin Li

On Tue, Jun 11, 2024, Jacob Pan wrote:
> To tackle these challenges, Intel introduced NMI source reporting as a part
> of the FRED specification (detailed in Chapter 9). 

Chapter 9 of the linked spec is "VMX Interactions with FRED Transitions".  I
spent a minute or so poking around the spec and didn't find anything that describes
how "NMI source reporting" works.

> 1.	Performance monitoring.
> 2.	Inter-Processor Interrupts (IPIs) for functions like CPU backtrace,
> 	machine check, Kernel GNU Debugger (KGDB), reboot, panic stop, and
> 	self-test.
> 
> Other NMI sources will continue to be handled as previously when the NMI
> source is not utilized or remains unidentified.
> 
> Next steps:
> 1. KVM support

I can't tell for sure since I can't find the relevant spec info, but doesn't KVM
support need to land before this gets enabled?  Otherwise the source would get
lost if the NMI arrived while the CPU was in non-root mode, no?  E.g. I don't
see any changes to fred_entry_from_kvm() in this series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-11 16:54 ` [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
@ 2024-06-12  2:32   ` Xin Li
  2024-06-12  2:50     ` H. Peter Anvin
  2024-06-21 23:00     ` Sohil Mehta
  2024-06-21 22:23   ` Sohil Mehta
  1 sibling, 2 replies; 27+ messages in thread
From: Xin Li @ 2024-06-12  2:32 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, linux-perf-users,
	Peter Zijlstra
  Cc: Andi Kleen, Xin Li

On 6/11/2024 9:54 AM, Jacob Pan wrote:
> The lack of a mechanism to pinpoint the origins of Non-Maskable Interrupts
> (NMIs) necessitates that the NMI vector 2 handler consults each NMI source
> handler individually. This approach leads to inefficiencies, delays, and
> the occurrence of unnecessary NMIs, thereby also constraining the potential
> applications of NMIs.
> 
> A new CPU feature, known as NMI source reporting, has been introduced as
> part of the Flexible Return and Event Delivery (FRED) spec. This feature
> enables the NMI vector 2 handler to directly obtain information about the
> NMI source from the FRED event data.
> 
> The functionality of NMI source reporting is tied to the FRED. Although it
> is enumerated by a unique CPUID feature bit, it cannot be turned off
> independently once FRED is activated.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v2: Removed NMI source from static CPU ID dependency table (HPA)
> ---
>   arch/x86/Kconfig                         | 9 +++++++++
>   arch/x86/include/asm/cpufeatures.h       | 1 +
>   arch/x86/include/asm/disabled-features.h | 8 +++++++-
>   arch/x86/kernel/traps.c                  | 4 +++-
>   4 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 1d7122a1883e..b8b15f20b94e 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -511,12 +511,21 @@ config X86_CPU_RESCTRL
>   config X86_FRED
>   	bool "Flexible Return and Event Delivery"
>   	depends on X86_64
> +	select X86_NMI_SOURCE
>   	help
>   	  When enabled, try to use Flexible Return and Event Delivery
>   	  instead of the legacy SYSCALL/SYSENTER/IDT architecture for
>   	  ring transitions and exception/interrupt handling if the
>   	  system supports it.
>   
> +config X86_NMI_SOURCE

Lets reuse X86_FRED instead of adding another hard config option. See
below.

> +	def_bool n
> +	help
> +	  Once enabled, information on NMI originator/source can be provided
> +	  via FRED event data. This makes NMI processing more efficient in that
> +	  NMI handler does not need to check for every possible source at
> +	  runtime when NMI is delivered.
> +
>   config X86_BIGSMP
>   	bool "Support for big SMP systems with more than 8 CPUs"
>   	depends on SMP && X86_32

...

> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
> index c492bdc97b05..3856c4737d65 100644
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -123,6 +123,12 @@
>   # define DISABLE_FRED	(1 << (X86_FEATURE_FRED & 31))
>   #endif
>   
> +#ifdef CONFIG_X86_NMI_SOURCE
> +# define DISABLE_NMI_SOURCE	0
> +#else
> +# define DISABLE_NMI_SOURCE	(1 << (X86_FEATURE_NMI_SOURCE & 31))
> +#endif
> +
>   #ifdef CONFIG_KVM_AMD_SEV
>   #define DISABLE_SEV_SNP		0
>   #else
> @@ -145,7 +151,7 @@
>   #define DISABLED_MASK10	0
>   #define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
>   			 DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
> -#define DISABLED_MASK12	(DISABLE_FRED|DISABLE_LAM)
> +#define DISABLED_MASK12	(DISABLE_FRED|DISABLE_LAM|DISABLE_NMI_SOURCE)
>   #define DISABLED_MASK13	0
>   #define DISABLED_MASK14	0
>   #define DISABLED_MASK15	0
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 4fa0b17e5043..465f04e4a79f 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1427,8 +1427,10 @@ early_param("fred", fred_setup);
>   
>   void __init trap_init(void)
>   {
> -	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred)
> +	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred) {
>   		setup_clear_cpu_cap(X86_FEATURE_FRED);
> +		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
> +	}

With this, no need to add DISABLE_NMI_SOURCE to disabled-features.h:

1) If FRED is not available, NMI source won't be too.
2) If FRED is available but not enabled, all features relying on FRED
should be cleared. We probably should move the feature bits clearing
code into a static function when more such features are added in future.

>   
>   	/* Init cpu_entry_area before IST entries are set up */
>   	setup_cpu_entry_areas();

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-12  2:32   ` Xin Li
@ 2024-06-12  2:50     ` H. Peter Anvin
  2024-06-12  3:04       ` Xin Li
  2024-06-21 23:00     ` Sohil Mehta
  1 sibling, 1 reply; 27+ messages in thread
From: H. Peter Anvin @ 2024-06-12  2:50 UTC (permalink / raw)
  To: Xin Li, Jacob Pan, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra
  Cc: Andi Kleen, Xin Li

On June 11, 2024 7:32:54 PM PDT, Xin Li <xin@zytor.com> wrote:
>On 6/11/2024 9:54 AM, Jacob Pan wrote:
>> The lack of a mechanism to pinpoint the origins of Non-Maskable Interrupts
>> (NMIs) necessitates that the NMI vector 2 handler consults each NMI source
>> handler individually. This approach leads to inefficiencies, delays, and
>> the occurrence of unnecessary NMIs, thereby also constraining the potential
>> applications of NMIs.
>> 
>> A new CPU feature, known as NMI source reporting, has been introduced as
>> part of the Flexible Return and Event Delivery (FRED) spec. This feature
>> enables the NMI vector 2 handler to directly obtain information about the
>> NMI source from the FRED event data.
>> 
>> The functionality of NMI source reporting is tied to the FRED. Although it
>> is enumerated by a unique CPUID feature bit, it cannot be turned off
>> independently once FRED is activated.
>> 
>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> ---
>> v2: Removed NMI source from static CPU ID dependency table (HPA)
>> ---
>>   arch/x86/Kconfig                         | 9 +++++++++
>>   arch/x86/include/asm/cpufeatures.h       | 1 +
>>   arch/x86/include/asm/disabled-features.h | 8 +++++++-
>>   arch/x86/kernel/traps.c                  | 4 +++-
>>   4 files changed, 20 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 1d7122a1883e..b8b15f20b94e 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -511,12 +511,21 @@ config X86_CPU_RESCTRL
>>   config X86_FRED
>>   	bool "Flexible Return and Event Delivery"
>>   	depends on X86_64
>> +	select X86_NMI_SOURCE
>>   	help
>>   	  When enabled, try to use Flexible Return and Event Delivery
>>   	  instead of the legacy SYSCALL/SYSENTER/IDT architecture for
>>   	  ring transitions and exception/interrupt handling if the
>>   	  system supports it.
>>   +config X86_NMI_SOURCE
>
>Lets reuse X86_FRED instead of adding another hard config option. See
>below.
>
>> +	def_bool n
>> +	help
>> +	  Once enabled, information on NMI originator/source can be provided
>> +	  via FRED event data. This makes NMI processing more efficient in that
>> +	  NMI handler does not need to check for every possible source at
>> +	  runtime when NMI is delivered.
>> +
>>   config X86_BIGSMP
>>   	bool "Support for big SMP systems with more than 8 CPUs"
>>   	depends on SMP && X86_32
>
>...
>
>> diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
>> index c492bdc97b05..3856c4737d65 100644
>> --- a/arch/x86/include/asm/disabled-features.h
>> +++ b/arch/x86/include/asm/disabled-features.h
>> @@ -123,6 +123,12 @@
>>   # define DISABLE_FRED	(1 << (X86_FEATURE_FRED & 31))
>>   #endif
>>   +#ifdef CONFIG_X86_NMI_SOURCE
>> +# define DISABLE_NMI_SOURCE	0
>> +#else
>> +# define DISABLE_NMI_SOURCE	(1 << (X86_FEATURE_NMI_SOURCE & 31))
>> +#endif
>> +
>>   #ifdef CONFIG_KVM_AMD_SEV
>>   #define DISABLE_SEV_SNP		0
>>   #else
>> @@ -145,7 +151,7 @@
>>   #define DISABLED_MASK10	0
>>   #define DISABLED_MASK11	(DISABLE_RETPOLINE|DISABLE_RETHUNK|DISABLE_UNRET| \
>>   			 DISABLE_CALL_DEPTH_TRACKING|DISABLE_USER_SHSTK)
>> -#define DISABLED_MASK12	(DISABLE_FRED|DISABLE_LAM)
>> +#define DISABLED_MASK12	(DISABLE_FRED|DISABLE_LAM|DISABLE_NMI_SOURCE)
>>   #define DISABLED_MASK13	0
>>   #define DISABLED_MASK14	0
>>   #define DISABLED_MASK15	0
>> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>> index 4fa0b17e5043..465f04e4a79f 100644
>> --- a/arch/x86/kernel/traps.c
>> +++ b/arch/x86/kernel/traps.c
>> @@ -1427,8 +1427,10 @@ early_param("fred", fred_setup);
>>     void __init trap_init(void)
>>   {
>> -	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred)
>> +	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred) {
>>   		setup_clear_cpu_cap(X86_FEATURE_FRED);
>> +		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
>> +	}
>
>With this, no need to add DISABLE_NMI_SOURCE to disabled-features.h:
>
>1) If FRED is not available, NMI source won't be too.
>2) If FRED is available but not enabled, all features relying on FRED
>should be cleared. We probably should move the feature bits clearing
>code into a static function when more such features are added in future.
>
>>     	/* Init cpu_entry_area before IST entries are set up */
>>   	setup_cpu_entry_areas();
>
>Thanks!
>    Xin

And even if we did, FRED should not *select* NMI_SOURCE; the dependency goes the other way.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re:
  2024-06-12  2:04 ` Sean Christopherson
@ 2024-06-12  2:55   ` Xin Li
  0 siblings, 0 replies; 27+ messages in thread
From: Xin Li @ 2024-06-12  2:55 UTC (permalink / raw)
  To: Sean Christopherson, Jacob Pan
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra,
	Andi Kleen, Xin Li

On 6/11/2024 7:04 PM, Sean Christopherson wrote:
> On Tue, Jun 11, 2024, Jacob Pan wrote:
>> To tackle these challenges, Intel introduced NMI source reporting as a part
>> of the FRED specification (detailed in Chapter 9).
> 
> Chapter 9 of the linked spec is "VMX Interactions with FRED Transitions".  I
> spent a minute or so poking around the spec and didn't find anything that describes
> how "NMI source reporting" works.

I did the same thing when I saw NMI source was added to the spec :)

> 
>> 1.	Performance monitoring.
>> 2.	Inter-Processor Interrupts (IPIs) for functions like CPU backtrace,
>> 	machine check, Kernel GNU Debugger (KGDB), reboot, panic stop, and
>> 	self-test.
>>
>> Other NMI sources will continue to be handled as previously when the NMI
>> source is not utilized or remains unidentified.
>>
>> Next steps:
>> 1. KVM support
> 
> I can't tell for sure since I can't find the relevant spec info, but doesn't KVM
> support need to land before this gets enabled?  Otherwise the source would get
> lost if the NMI arrived while the CPU was in non-root mode, no?  E.g. I don't
> see any changes to fred_entry_from_kvm() in this series.

You're absolutely right!

There is a patch in NMI source KVM patches for this, but as you
mentioned it has to be in this NMI source native patches instead.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-12  2:50     ` H. Peter Anvin
@ 2024-06-12  3:04       ` Xin Li
  0 siblings, 0 replies; 27+ messages in thread
From: Xin Li @ 2024-06-12  3:04 UTC (permalink / raw)
  To: H. Peter Anvin, Jacob Pan, X86 Kernel, LKML, Thomas Gleixner,
	Dave Hansen, Ingo Molnar, Borislav Petkov, linux-perf-users,
	Peter Zijlstra
  Cc: Andi Kleen, Xin Li

On 6/11/2024 7:50 PM, H. Peter Anvin wrote:
> On June 11, 2024 7:32:54 PM PDT, Xin Li <xin@zytor.com> wrote:
>> On 6/11/2024 9:54 AM, Jacob Pan wrote:
>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>> index 1d7122a1883e..b8b15f20b94e 100644
>>> --- a/arch/x86/Kconfig
>>> +++ b/arch/x86/Kconfig
>>> @@ -511,12 +511,21 @@ config X86_CPU_RESCTRL
>>>    config X86_FRED
>>>    	bool "Flexible Return and Event Delivery"
>>>    	depends on X86_64
>>> +	select X86_NMI_SOURCE
>>>    	help
>>>    	  When enabled, try to use Flexible Return and Event Delivery
>>>    	  instead of the legacy SYSCALL/SYSENTER/IDT architecture for
>>>    	  ring transitions and exception/interrupt handling if the
>>>    	  system supports it.
>>>    +config X86_NMI_SOURCE
>>
>> Lets reuse X86_FRED instead of adding another hard config option. See
>> below.

<snip>

>>
>> With this, no need to add DISABLE_NMI_SOURCE to disabled-features.h:
>>
>> 1) If FRED is not available, NMI source won't be too.
>> 2) If FRED is available but not enabled, all features relying on FRED
>> should be cleared. We probably should move the feature bits clearing
>> code into a static function when more such features are added in future.
>>
>>>      	/* Init cpu_entry_area before IST entries are set up */
>>>    	setup_cpu_entry_areas();
>>
>> Thanks!
>>     Xin
> 
> And even if we did, FRED should not *select* NMI_SOURCE; the dependency goes the other way.

Right, I was a bit of confused but was focusing on why do we need this.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/6] perf/x86: Enable NMI source reporting for perfmon
  2024-06-11 19:10   ` H. Peter Anvin
@ 2024-06-12 20:27     ` Jacob Pan
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2024-06-12 20:27 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, Ingo Molnar,
	Borislav Petkov, linux-perf-users, Peter Zijlstra, Andi Kleen,
	Xin Li, Zeng Guang, jacob.jun.pan

Hi H.,

On Tue, 11 Jun 2024 12:10:52 -0700, "H. Peter Anvin" <hpa@zytor.com> wrote:

> On 6/11/24 09:54, Jacob Pan wrote:
> > 
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 1ef2201e48ac..db8c30881f5c 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -46,6 +46,7 @@
> >   
> >   struct x86_pmu x86_pmu __read_mostly;
> >   static struct pmu pmu;
> > +u32 apic_perfmon_ctr = APIC_DM_NMI;
> >   
> >   DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
> >   	.enabled = 1,
> > @@ -1680,7 +1681,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
> >   	 * This generic handler doesn't seem to have any issues where
> > the
> >   	 * unmasking occurs so it was left at the top.
> >   	 */
> > -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> > +	apic_write(APIC_LVTPC, apic_perfmon_ctr);
> >   
> >   	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
> >   		if (!test_bit(idx, cpuc->active_mask))
> > @@ -1723,7 +1724,10 @@ void perf_events_lapic_init(void)
> >   	/*
> >   	 * Always use NMI for PMU
> >   	 */
> > -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> > +	if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
> > +		apic_perfmon_ctr |= NMI_SOURCE_VEC_PMI;
> > +
> > +	apic_write(APIC_LVTPC, apic_perfmon_ctr);
> >   }
> >  
> There really is no reason to not do this unconditinoally. If NMI source 
> is not supported it is simply a noop.
Yes, will do.

I was being paranoid in case some old CPUs don't ignore the vector field.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler
  2024-06-11 18:41   ` H. Peter Anvin
@ 2024-06-12 21:54     ` Jacob Pan
  2024-06-24 23:38       ` Sohil Mehta
  0 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2024-06-12 21:54 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, Ingo Molnar,
	Borislav Petkov, linux-perf-users, Peter Zijlstra, Andi Kleen,
	Xin Li, jacob.jun.pan

Hi H.,

On Tue, 11 Jun 2024 11:41:07 -0700, "H. Peter Anvin" <hpa@zytor.com> wrote:

> On 6/11/24 09:54, Jacob Pan wrote:
> > +
> > +	source_bitmask = fred_event_data(regs);
> > +	if (!source_bitmask) {
> > +		pr_warn_ratelimited("NMI without source information!
> > Disable source reporting.\n");
> > +		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
> > +		return 0;
> > +	}  
> 
> Is setup_clear_cpu_cap() even meaningful here?
Right, alternative patching doesn't work here. Let me use a separate flag.

> 
> > +
> > +	/*
> > +	 * Per NMI source specification, there is no guarantee that a
> > valid
> > +	 * NMI vector is always delivered, even when the source
> > specified
> > +	 * one. It is software's responsibility to check all available
> > NMI
> > +	 * sources when bit 0 is set in the NMI source bitmap. i.e. we
> > have
> > +	 * to call every handler as if we have no NMI source.
> > +	 * On the other hand, if we do get non-zero vectors, we know
> > exactly
> > +	 * what the sources are. So we only call the handlers with the
> > bit set.
> > +	 */
> > +	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> > +		pr_warn_ratelimited("NMI received with unknown
> > source\n");
> > +		return 0;
> > +	}
> > +  
> 
> You can still dispatch the known NMI handlers early before doing the 
> polling.

True, my thinking was based on two conditions:
1. unknown NMI source is a rare/unlikely case
2. when unknown source does get set, it is due to deep CPU idle where
performance optimization is not productive.

So I think any optimization to the unlikely case should not add cost to the
common case. Tracking early/direct dispatched handler adds cost to the
common case. Below is my attempt, there must be a better way.

static int nmi_handle_src(unsigned int type, struct pt_regs *regs, unsigned long *handled_mask)
{
	static bool nmi_source_disabled = false;
	bool has_unknown_src = false;
	unsigned long source_bitmask;
	struct nmiaction *a;
	int handled = 0;
	int vec = 1;

	if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) ||
	    type != NMI_LOCAL || nmi_source_disabled)
		return 0;

	source_bitmask = fred_event_data(regs);
	if (!source_bitmask) {
		pr_warn("NMI received without source information! Disable source reporting.\n");
		nmi_source_disabled = true;
		return 0;
	}

	/*
	 * Per NMI source specification, there is no guarantee that a valid
	 * NMI vector is always delivered, even when the source specified
	 * one. It is software's responsibility to check all available NMI
	 * sources when bit 0 is set in the NMI source bitmap. i.e. we have
	 * to call every handler as if we have no NMI source.
	 * On the other hand, if we do get non-zero vectors, we know exactly
	 * what the sources are. So we only call the handlers with the bit set.
	 */
	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
		pr_warn_ratelimited("NMI received with unknown source\n");
		has_unknown_src = true;
	}

	rcu_read_lock();
	/* Bit 0 is for unknown NMI sources, skip it. */
	for_each_set_bit_from(vec, &source_bitmask, NR_NMI_SOURCE_VECTORS) {
		a = rcu_dereference(nmiaction_src_table[vec]);
		if (!a) {
			pr_warn_ratelimited("NMI received %d no handler", vec);
			continue;
		}
		handled += do_handle_nmi(a, regs, type);
		/*
		 * Needs polling if unknown source bit is set, handled_mask is
		 * used to tell the polling code which NMIs can be skipped.
		 */
		if (has_unknown_src)
			*handled_mask |= BIT(vec);
	}
	rcu_read_unlock();

	return handled;
}

static int nmi_handle(unsigned int type, struct pt_regs *regs)
{
	struct nmi_desc *desc = nmi_to_desc(type);
	unsigned long handled_mask = 0;
	struct nmiaction *a;
	int handled=0;

	/*
	 * Check if the NMI source handling is complete, otherwise polling is
	 * still required. handled_mask is non-zero if NMI source handling is
	 * partial due to unknown NMI sources.
	 */
	handled = nmi_handle_src(type, regs, &handled_mask);
	if (handled && !handled_mask)
		return handled;

	rcu_read_lock();
	/*
	 * NMIs are edge-triggered, which means if you have enough
	 * of them concurrently, you can lose some because only one
	 * can be latched at any given time.  Walk the whole list
	 * to handle those situations.
	 */
	list_for_each_entry_rcu(a, &desc->head, list) {
		/* Skip NMIs handled earlier with source info */
		if (BIT(a->source_vec) & handled_mask)
			continue;
		handled += do_handle_nmi(a, regs, type);
	}
	rcu_read_unlock();

	/* return total number of NMI events handled */
	return handled;
}
NOKPROBE_SYMBOL(nmi_handle);


> > +	rcu_read_lock();
> > +	/* Bit 0 is for unknown NMI sources, skip it. */
> > +	for_each_set_bit_from(vec, &source_bitmask,
> > NR_NMI_SOURCE_VECTORS) {
> > +		a = rcu_dereference(nmiaction_src_table[vec]);
> > +		if (!a) {
> > +			pr_warn_ratelimited("NMI received %d no
> > handler", vec);
> > +			continue;
> > +		}
> > +		handled += do_handle_nmi(a, regs, type);
> > +	}
> > +	rcu_read_unlock();
> > +	return handled;
> > +}
> > +  
> 
> That would mean that you would also need to return a bitmask of which 
> source vectors need to be handled with polling.

Should it be the bitmask to be skipped by polling? see handled_mask in
the code above.



Thanks,

Jacob

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-11 16:54 ` [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
  2024-06-12  2:32   ` Xin Li
@ 2024-06-21 22:23   ` Sohil Mehta
  2024-06-21 23:46     ` Jacob Pan
  1 sibling, 1 reply; 27+ messages in thread
From: Sohil Mehta @ 2024-06-21 22:23 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, linux-perf-users,
	Peter Zijlstra
  Cc: Andi Kleen, Xin Li

Hi Jacob,

On 6/11/2024 9:54 AM, Jacob Pan wrote:

> 
> The functionality of NMI source reporting is tied to the FRED. Although it
> is enumerated by a unique CPUID feature bit, it cannot be turned off
> independently once FRED is activated.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v2: Removed NMI source from static CPU ID dependency table (HPA)

I am not sure if this would work in all scenarios. See below.
Sorry, I couldn't chime-in during the v1 review when this was suggested.


> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 4fa0b17e5043..465f04e4a79f 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1427,8 +1427,10 @@ early_param("fred", fred_setup);
>  
>  void __init trap_init(void)
>  {
> -	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred)
> +	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred) {
>  		setup_clear_cpu_cap(X86_FEATURE_FRED);
> +		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
> +	}
>  
>  	/* Init cpu_entry_area before IST entries are set up */
>  	setup_cpu_entry_areas();

I think this relies on the fact that whenever X86_FEATURE_NMI_SOURCE is
set, X86_FEATURE_FRED will also be set by the hardware. Though this
might be the expected behavior, hardware sometimes messes up and the
dependency entry in the static table would probably help catch that.

IIUC, when X86_FEATURE_NMI_SOURCE is set and X86_FEATURE_FRED is
cleared, cpu_feature_enabled(X86_FEATURE_FRED) will fail and the above
check would not end up clearing X86_FEATURE_NMI_SOURCE.

Isn't the following entry necessary to detect a misconfiguration or is
the purpose of the cpuid_deps table something else?

diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
b/arch/x86/kernel/cpu/cpuid-deps.c
index b7d9f530ae16..39526041e91a 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
        { X86_FEATURE_SHSTK,                    X86_FEATURE_XSAVES    },
        { X86_FEATURE_FRED,                     X86_FEATURE_LKGS      },
        { X86_FEATURE_FRED,                     X86_FEATURE_WRMSRNS   },
+       { X86_FEATURE_NMI_SOURCE,		X86_FEATURE_FRED      },
        {}
 };




^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-12  2:32   ` Xin Li
  2024-06-12  2:50     ` H. Peter Anvin
@ 2024-06-21 23:00     ` Sohil Mehta
  2024-06-28  5:00       ` Jacob Pan
  1 sibling, 1 reply; 27+ messages in thread
From: Sohil Mehta @ 2024-06-21 23:00 UTC (permalink / raw)
  To: Xin Li, Jacob Pan, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, linux-perf-users,
	Peter Zijlstra
  Cc: Andi Kleen, Xin Li


>> +config X86_NMI_SOURCE
> 
> Lets reuse X86_FRED instead of adding another hard config option. See
> below.
> 

I mostly agree with the suggestion here but there seems to be a bit of
confusion regarding feature availability and feature activation.

Availability and activation of X86_FEATURE_NMI_SOURCE depends on FRED
but not the other way around.

In other words, CONFIG_X86_NMI_SOURCE would only be useful if someone
wants to disable NMI_SOURCE even if both X86_FEATURE_FRED and
X86_FEATURE_NMI_SOURCE are available on a platform.

This seems unlikely to me. Reusing CONFIG_X86_FRED seems reasonable.

Sohil

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-21 22:23   ` Sohil Mehta
@ 2024-06-21 23:46     ` Jacob Pan
  2024-06-22  1:08       ` Sohil Mehta
  0 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2024-06-21 23:46 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra,
	Andi Kleen, Xin Li, jacob.jun.pan


On Fri, 21 Jun 2024 15:23:51 -0700, Sohil Mehta <sohil.mehta@intel.com>
wrote:

> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 4fa0b17e5043..465f04e4a79f 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -1427,8 +1427,10 @@ early_param("fred", fred_setup);
> >  
> >  void __init trap_init(void)
> >  {
> > -	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred)
> > +	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred) {
> >  		setup_clear_cpu_cap(X86_FEATURE_FRED);
> > +		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
> > +	}
> >  
> >  	/* Init cpu_entry_area before IST entries are set up */
> >  	setup_cpu_entry_areas();  
> 
> I think this relies on the fact that whenever X86_FEATURE_NMI_SOURCE is
> set, X86_FEATURE_FRED will also be set by the hardware. Though this
> might be the expected behavior, hardware sometimes messes up and the
> dependency entry in the static table would probably help catch that.
> 
> IIUC, when X86_FEATURE_NMI_SOURCE is set and X86_FEATURE_FRED is
> cleared, cpu_feature_enabled(X86_FEATURE_FRED) will fail and the above
> check would not end up clearing X86_FEATURE_NMI_SOURCE.
> 
> Isn't the following entry necessary to detect a misconfiguration or is
> the purpose of the cpuid_deps table something else?
My understanding is that cpuid_deps is to ensure CPU features are
cleared according to its dependency chain. Not for HW bugs/quirks.

> 
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
> b/arch/x86/kernel/cpu/cpuid-deps.c
> index b7d9f530ae16..39526041e91a 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
>         { X86_FEATURE_SHSTK,                    X86_FEATURE_XSAVES    },
>         { X86_FEATURE_FRED,                     X86_FEATURE_LKGS      },
>         { X86_FEATURE_FRED,                     X86_FEATURE_WRMSRNS   },
> +       { X86_FEATURE_NMI_SOURCE,		X86_FEATURE_FRED      },
>         {}
>  };
If FRED is never reported by CPUID, then there would not be any calls to
setup_clear_cpu_cap(X86_FEATURE_FRED), so this table does not help clear
the dependent NMI_SOURCE, right?

In the next version, I will add runtime disable if HW malfunctions. i.e. no
valid bitmask.

Maybe we can also add a big WARN_ON like this:
if (WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_FRED) &&
		cpu_feature_enabled(X86_FEATURE_NMI_SOURCE)) 
	setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-21 23:46     ` Jacob Pan
@ 2024-06-22  1:08       ` Sohil Mehta
  2024-06-27 22:23         ` Jacob Pan
  0 siblings, 1 reply; 27+ messages in thread
From: Sohil Mehta @ 2024-06-22  1:08 UTC (permalink / raw)
  To: Jacob Pan
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra,
	Andi Kleen, Xin Li


>> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
>> b/arch/x86/kernel/cpu/cpuid-deps.c
>> index b7d9f530ae16..39526041e91a 100644
>> --- a/arch/x86/kernel/cpu/cpuid-deps.c
>> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
>> @@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
>>         { X86_FEATURE_SHSTK,                    X86_FEATURE_XSAVES    },
>>         { X86_FEATURE_FRED,                     X86_FEATURE_LKGS      },
>>         { X86_FEATURE_FRED,                     X86_FEATURE_WRMSRNS   },
>> +       { X86_FEATURE_NMI_SOURCE,		X86_FEATURE_FRED      },
>>         {}
>>  };
> If FRED is never reported by CPUID, then there would not be any calls to
> setup_clear_cpu_cap(X86_FEATURE_FRED), so this table does not help clear
> the dependent NMI_SOURCE, right?
> 

I thought there was a common function for all features. I expected it to
go through each feature and clear the ones whose dependency is missing.
But I can't find it easily. Maybe someone else knows this better.

However, anytime do_clear_cpu_cap() is called for any feature it does
the below and scans the cpuid_deps table to clear all features with
missing dependencies. That would cause X86_FEATURE_NMI_SOURCE to be
cleared one way or another.


	/* Loop until we get a stable state. */
	do {
		changed = false;
		for (d = cpuid_deps; d->feature; d++) {
			if (!test_bit(d->depends, disable))
				continue;
			if (__test_and_set_bit(d->feature, disable))
				continue;

			changed = true;
			clear_feature(c, d->feature);
		}
	} while (changed);


> In the next version, I will add runtime disable if HW malfunctions. i.e. no
> valid bitmask.
> 

I don't think we do this for other features that have a missing
dependency. It doesn't seem NMI source is any different from them.

> Maybe we can also add a big WARN_ON like this:
> if (WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_FRED) &&
> 		cpu_feature_enabled(X86_FEATURE_NMI_SOURCE)) 
> 	setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
> 
> Thanks,
> 
> Jacob


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/6] x86/irq: Extend NMI handler registration interface to include source
  2024-06-11 16:54 ` [PATCH v2 2/6] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
@ 2024-06-24 23:16   ` Sohil Mehta
  2024-06-28  4:56     ` Jacob Pan
  0 siblings, 1 reply; 27+ messages in thread
From: Sohil Mehta @ 2024-06-24 23:16 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, linux-perf-users,
	Peter Zijlstra
  Cc: Andi Kleen, Xin Li

On 6/11/2024 9:54 AM, Jacob Pan wrote:
> Add a source vector argument to register_nmi_handler() such that designated
> NMI originators can leverage NMI source reporting feature. For those who
> do not use NMI source reporting, 0 (unknown) is used as the source vector. NMI
> source vectors (up to 16) are pre-defined.
> 

What determines whether a source supports the new reporting vs some that
don't? It might be useful to add that reasoning to the commit message as
well.

I am guessing there is some connection to NMI_LOCAL based on
use_nmi_source() definition but I am not sure.

Also, would it be worthwhile to split this patch into 2? One part that
extents the register_nmi_handler() API and another that allocates the
source vectors to certain sources.


> +static inline bool use_nmi_source(unsigned int type, struct nmiaction *a)
> +{
> +	return (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) &&
> +		type == NMI_LOCAL && a->source_vec);
> +}
> +

Sohil

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler
  2024-06-12 21:54     ` Jacob Pan
@ 2024-06-24 23:38       ` Sohil Mehta
  0 siblings, 0 replies; 27+ messages in thread
From: Sohil Mehta @ 2024-06-24 23:38 UTC (permalink / raw)
  To: Jacob Pan, H. Peter Anvin
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, Ingo Molnar,
	Borislav Petkov, linux-perf-users, Peter Zijlstra, Andi Kleen,
	Xin Li


On 6/12/2024 2:54 PM, Jacob Pan wrote:
> Hi H.,
> 
> On Tue, 11 Jun 2024 11:41:07 -0700, "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
>> On 6/11/24 09:54, Jacob Pan wrote:
>>> +
>>> +	source_bitmask = fred_event_data(regs);
>>> +	if (!source_bitmask) {
>>> +		pr_warn_ratelimited("NMI without source information!
>>> Disable source reporting.\n");
>>> +		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
>>> +		return 0;
>>> +	}  
>>
>> Is setup_clear_cpu_cap() even meaningful here?
> Right, alternative patching doesn't work here. Let me use a separate flag.
> 

You mentioned this somewhere:
"The functionality of NMI source reporting is tied to the FRED. Although
it is enumerated by a unique CPUID feature bit, it cannot be turned off
independently once FRED is activated."

Does this have any implication here? What does disable source reporting
mean if it cannot be turned off?



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler
  2024-06-11 16:54 ` [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler Jacob Pan
  2024-06-11 18:41   ` H. Peter Anvin
@ 2024-06-24 23:53   ` Sohil Mehta
  1 sibling, 0 replies; 27+ messages in thread
From: Sohil Mehta @ 2024-06-24 23:53 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, linux-perf-users,
	Peter Zijlstra
  Cc: Andi Kleen, Xin Li


> +	/*
> +	 * Per NMI source specification, there is no guarantee that a valid
> +	 * NMI vector is always delivered, even when the source specified
> +	 * one. It is software's responsibility to check all available NMI
> +	 * sources when bit 0 is set in the NMI source bitmap. i.e. we have
> +	 * to call every handler as if we have no NMI source.
> +	 * On the other hand, if we do get non-zero vectors, we know exactly
> +	 * what the sources are. So we only call the handlers with the bit set.
> +	 */

The use of "we" here can be a bit confusing. Writing this in an
imperative mood might make it easier to follow.

> +	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> +		pr_warn_ratelimited("NMI received with unknown source\n");
> +		return 0;
> +	}
> +

IIUC, bit 0 will be set for out of bounds vectors (>= 16 bit) as well. I
am not sure how realistic that is or if that is even possible to detect?
I am wondering if there should an explicit error message when such a
scenario happens.


> +	rcu_read_lock();
> +	/* Bit 0 is for unknown NMI sources, skip it. */
> +	for_each_set_bit_from(vec, &source_bitmask, NR_NMI_SOURCE_VECTORS) {
> +		a = rcu_dereference(nmiaction_src_table[vec]);
> +		if (!a) {
> +			pr_warn_ratelimited("NMI received %d no handler", vec);
> +			continue;
> +		}
> +		handled += do_handle_nmi(a, regs, type);
> +	}
> +	rcu_read_unlock();
> +	return handled;
> +}
> +




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-22  1:08       ` Sohil Mehta
@ 2024-06-27 22:23         ` Jacob Pan
  2024-06-27 23:20           ` Sohil Mehta
  0 siblings, 1 reply; 27+ messages in thread
From: Jacob Pan @ 2024-06-27 22:23 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra,
	Andi Kleen, Xin Li, jacob.jun.pan


On Fri, 21 Jun 2024 18:08:14 -0700, Sohil Mehta <sohil.mehta@intel.com>
wrote:

> >> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c
> >> b/arch/x86/kernel/cpu/cpuid-deps.c
> >> index b7d9f530ae16..39526041e91a 100644
> >> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> >> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> >> @@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
> >>         { X86_FEATURE_SHSTK,                    X86_FEATURE_XSAVES
> >> }, { X86_FEATURE_FRED,                     X86_FEATURE_LKGS      },
> >>         { X86_FEATURE_FRED,                     X86_FEATURE_WRMSRNS
> >> },
> >> +       { X86_FEATURE_NMI_SOURCE,		X86_FEATURE_FRED
> >> }, {}
> >>  };  
> > If FRED is never reported by CPUID, then there would not be any calls to
> > setup_clear_cpu_cap(X86_FEATURE_FRED), so this table does not help clear
> > the dependent NMI_SOURCE, right?
> >   
> 
> I thought there was a common function for all features. I expected it to
> go through each feature and clear the ones whose dependency is missing.
> But I can't find it easily. Maybe someone else knows this better.
> 
> However, anytime do_clear_cpu_cap() is called for any feature it does
> the below and scans the cpuid_deps table to clear all features with
> missing dependencies. That would cause X86_FEATURE_NMI_SOURCE to be
> cleared one way or another.
I don't think this is true. For a simplified example:
cpuid_deps has the following feature-depends pairs.
[1, 3]
[2, 3]
now, do_clear_cpu_cap(c, 2)

Before the loop below __set_bit(feature, disable), bit 2 is set. 

Since there is no other features depend on 2, the loop below will not clear
any other features. no?

> 
> 
> 	/* Loop until we get a stable state. */
> 	do {
> 		changed = false;
> 		for (d = cpuid_deps; d->feature; d++) {
> 			if (!test_bit(d->depends, disable))
> 				continue;
> 			if (__test_and_set_bit(d->feature, disable))
> 				continue;
> 
> 			changed = true;
> 			clear_feature(c, d->feature);
> 		}
> 	} while (changed);
> 
> 
> > In the next version, I will add runtime disable if HW malfunctions.
> > i.e. no valid bitmask.
> >   
> 
> I don't think we do this for other features that have a missing
> dependency. It doesn't seem NMI source is any different from them.
> 
NMI source is an optimization with a fallback path *always* available. In
that sense, it can be disabled at runtime without losing functionality.

The closest analogy I can think of are timers for clocksources where we use
higher ranking/cheaper timers first, and only resort to other timers in
case the primary/optimal one fails.

e.g.
root@984fee003c4f:~/jacob# cat
/sys/devices/system/clocksource/clocksource0/available_clocksource  
tsc hpet acpi_pm


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-27 22:23         ` Jacob Pan
@ 2024-06-27 23:20           ` Sohil Mehta
  0 siblings, 0 replies; 27+ messages in thread
From: Sohil Mehta @ 2024-06-27 23:20 UTC (permalink / raw)
  To: Jacob Pan
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra,
	Andi Kleen, Xin Li

On 6/27/2024 3:23 PM, Jacob Pan wrote:

> I don't think this is true. For a simplified example:
> cpuid_deps has the following feature-depends pairs.
> [1, 3]
> [2, 3]
> now, do_clear_cpu_cap(c, 2)
> 
> Before the loop below __set_bit(feature, disable), bit 2 is set. 
> 
> Since there is no other features depend on 2, the loop below will not clear
> any other features. no?
> 

You are right. The table-scan only processes the dependencies of the
feature that is being disabled but not of any other features that might
have a missing dependency.

Maybe it might be useful to have a common function that scans through
all the dependencies at boot. It would mainly help detect hardware (or
VMM) inconsistencies sooner than later.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 2/6] x86/irq: Extend NMI handler registration interface to include source
  2024-06-24 23:16   ` Sohil Mehta
@ 2024-06-28  4:56     ` Jacob Pan
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2024-06-28  4:56 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
	Ingo Molnar, Borislav Petkov, linux-perf-users, Peter Zijlstra,
	Andi Kleen, Xin Li, jacob.jun.pan


On Mon, 24 Jun 2024 16:16:52 -0700, Sohil Mehta <sohil.mehta@intel.com>
wrote:

> On 6/11/2024 9:54 AM, Jacob Pan wrote:
> > Add a source vector argument to register_nmi_handler() such that
> > designated NMI originators can leverage NMI source reporting feature.
> > For those who do not use NMI source reporting, 0 (unknown) is used as
> > the source vector. NMI source vectors (up to 16) are pre-defined.
> >   
> 
> What determines whether a source supports the new reporting vs some that
> don't? It might be useful to add that reasoning to the commit message as
> well.
> 
> I am guessing there is some connection to NMI_LOCAL based on
> use_nmi_source() definition but I am not sure.
Yes, this patch only enables NMI source reporting for local interrupts.
There is no use of MSIs delivered as NMI so far. Will add the rationale
here.

> Also, would it be worthwhile to split this patch into 2? One part that
> extents the register_nmi_handler() API and another that allocates the
> source vectors to certain sources.
Good point, will do.
> 
> > +static inline bool use_nmi_source(unsigned int type, struct nmiaction
> > *a) +{
> > +	return (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) &&
> > +		type == NMI_LOCAL && a->source_vec);
> > +}
> > +  
> 
> Sohil


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-21 23:00     ` Sohil Mehta
@ 2024-06-28  5:00       ` Jacob Pan
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Pan @ 2024-06-28  5:00 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: Xin Li, X86 Kernel, LKML, Thomas Gleixner, Dave Hansen,
	H. Peter Anvin, Ingo Molnar, Borislav Petkov, linux-perf-users,
	Peter Zijlstra, Andi Kleen, Xin Li, jacob.jun.pan


On Fri, 21 Jun 2024 16:00:47 -0700, Sohil Mehta <sohil.mehta@intel.com>
wrote:

> >> +config X86_NMI_SOURCE  
> > 
> > Lets reuse X86_FRED instead of adding another hard config option. See
> > below.
> >   
> 
> I mostly agree with the suggestion here but there seems to be a bit of
> confusion regarding feature availability and feature activation.
> 
> Availability and activation of X86_FEATURE_NMI_SOURCE depends on FRED
> but not the other way around.
> 
> In other words, CONFIG_X86_NMI_SOURCE would only be useful if someone
> wants to disable NMI_SOURCE even if both X86_FEATURE_FRED and
> X86_FEATURE_NMI_SOURCE are available on a platform.
> 
> This seems unlikely to me. Reusing CONFIG_X86_FRED seems reasonable.
agreed, will remove CONFIG_X86_NMI_SOURCE


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-06-28  4:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11 16:54 Jacob Pan
2024-06-11 16:54 ` [PATCH v2 1/6] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
2024-06-12  2:32   ` Xin Li
2024-06-12  2:50     ` H. Peter Anvin
2024-06-12  3:04       ` Xin Li
2024-06-21 23:00     ` Sohil Mehta
2024-06-28  5:00       ` Jacob Pan
2024-06-21 22:23   ` Sohil Mehta
2024-06-21 23:46     ` Jacob Pan
2024-06-22  1:08       ` Sohil Mehta
2024-06-27 22:23         ` Jacob Pan
2024-06-27 23:20           ` Sohil Mehta
2024-06-11 16:54 ` [PATCH v2 2/6] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
2024-06-24 23:16   ` Sohil Mehta
2024-06-28  4:56     ` Jacob Pan
2024-06-11 16:54 ` [PATCH v2 3/6] x86/irq: Factor out common NMI handling code Jacob Pan
2024-06-11 16:54 ` [PATCH v2 4/6] x86/irq: Process nmi sources in NMI handler Jacob Pan
2024-06-11 18:41   ` H. Peter Anvin
2024-06-12 21:54     ` Jacob Pan
2024-06-24 23:38       ` Sohil Mehta
2024-06-24 23:53   ` Sohil Mehta
2024-06-11 16:54 ` [PATCH v2 5/6] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
2024-06-11 19:10   ` H. Peter Anvin
2024-06-12 20:27     ` Jacob Pan
2024-06-11 16:54 ` [PATCH v2 6/6] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
2024-06-12  2:04 ` Sean Christopherson
2024-06-12  2:55   ` Re: Xin Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).