linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] Add support for NMI source reporting
@ 2024-06-28 20:18 Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
                   ` (10 more replies)
  0 siblings, 11 replies; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan

Hi Thomas and all,

Non-Maskable Interrupts (NMIs) are routed to the local Advanced Programmable
Interrupt Controller (APIC) using vector #2. Before the advent of the
Flexible Return and Event Delivery (FRED)[1], the vector information set by
the NMI initiator was disregarded or lost within the hardware, compelling
system software to poll every registered NMI handler to pinpoint the source
of the NMI[2]. This approach led to several issues:

1.	Inefficiency due to the CPU's time spent polling all handlers.
2.	Increased latency from the additional time taken to poll all handlers.
3.	The occurrence of unnecessary NMIs if they are triggered shortly
	after being processed by a different source.

To tackle these challenges, Intel introduced NMI source reporting as a part
of the FRED specification (detailed in Chapter 9). This CPU feature ensures
that while all NMI sources are still aggregated into NMI vector (#2) for
delivery, the source of the NMI is now conveyed through FRED event data
(a 16-bit bitmap on the stack). This allows for the selective dispatch
of the NMI source handler based on the bitmap, eliminating the need to
invoke all NMI source handlers indiscriminately.

In line with the hardware architecture, various interrupt sources can
generate NMIs by encoding an NMI delivery mode. However, this patchset
activates only the local NMI sources that are currently utilized by the
Linux kernel, which includes:

1.	Performance monitoring.
2.	Inter-Processor Interrupts (IPIs) for functions like CPU backtrace,
	machine check, Kernel GNU Debugger (KGDB), reboot, panic stop, and
	self-test.

Other NMI sources will continue to be handled as previously when the NMI
source is not utilized or remains unidentified.

Next steps:
1. KVM support
2. Optimization to reuse IDT NMI vector 2 as NMI source for "known" source.
Link:https://lore.kernel.org/lkml/746fecd5-4c79-42f9-919e-912ec415e73f@zytor.com/


[1] https://www.intel.com/content/www/us/en/content-details/779982/flexible-return-and-event-delivery-fred-specification.html
[2] https://lore.kernel.org/lkml/171011362209.2468526.15187874627966416701.tglx@xen13/


Thanks,

Jacob

---
V3:
	- Added KVM VMX patches to handle NMI exits (Sean)
	- Clean up in KVM for code reuse in PV IPI (patch 10 and 11)
	- Misc fixes based on reviews from HPA, Li Xin, and Sohil
	
Change logs are in individual patches.




Jacob Pan (9):
  x86/irq: Add enumeration of NMI source reporting CPU feature
  x86/irq: Define NMI source vectors
  x86/irq: Extend NMI handler registration interface to include source
  x86/irq: Factor out common NMI handling code
  x86/irq: Process nmi sources in NMI handler
  perf/x86: Enable NMI source reporting for perfmon
  x86/irq: Enable NMI source on IPIs delivered as NMI
  x86/irq: Move __prepare_ICR to x86 common header
  KVM: X86: Use common code for PV IPIs in linux guest

Zeng Guang (2):
  KVM: VMX: Expand FRED kvm entry with event data
  KVM: VMX: Handle NMI Source report in VM exit

 arch/x86/entry/entry_64_fred.S     |   2 +-
 arch/x86/events/amd/ibs.c          |   2 +-
 arch/x86/events/core.c             |   9 ++-
 arch/x86/events/intel/core.c       |   6 +-
 arch/x86/include/asm/apic.h        |  22 ++++++
 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/fred.h        |   8 +-
 arch/x86/include/asm/irq_vectors.h |  38 ++++++++++
 arch/x86/include/asm/nmi.h         |   4 +-
 arch/x86/kernel/apic/hw_nmi.c      |   5 +-
 arch/x86/kernel/apic/ipi.c         |   4 +-
 arch/x86/kernel/apic/local.h       |  16 ----
 arch/x86/kernel/cpu/mce/inject.c   |   4 +-
 arch/x86/kernel/cpu/mshyperv.c     |   2 +-
 arch/x86/kernel/kgdb.c             |   6 +-
 arch/x86/kernel/kvm.c              |  10 +--
 arch/x86/kernel/nmi.c              | 117 ++++++++++++++++++++++++++---
 arch/x86/kernel/nmi_selftest.c     |   7 +-
 arch/x86/kernel/reboot.c           |   4 +-
 arch/x86/kernel/smp.c              |   4 +-
 arch/x86/kernel/traps.c            |   4 +-
 arch/x86/kvm/vmx/vmx.c             |  13 +++-
 arch/x86/platform/uv/uv_nmi.c      |   4 +-
 drivers/acpi/apei/ghes.c           |   2 +-
 drivers/char/ipmi/ipmi_watchdog.c  |   2 +-
 drivers/edac/igen6_edac.c          |   2 +-
 drivers/watchdog/hpwdt.c           |   6 +-
 27 files changed, 224 insertions(+), 80 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 02/11] x86/irq: Define NMI source vectors Jacob Pan
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan

The lack of a mechanism to pinpoint the origins of Non-Maskable Interrupts
(NMIs) necessitates that the NMI vector 2 handler consults each NMI source
handler individually. This approach leads to inefficiencies, delays, and
the occurrence of unnecessary NMIs, thereby also constraining the potential
applications of NMIs.

A new CPU feature, known as NMI source reporting, has been introduced as
part of the Flexible Return and Event Delivery (FRED) spec. This feature
enables the NMI vector 2 handler to directly obtain information about the
NMI source from the FRED event data.

The functionality of NMI source reporting is tied to the FRED. Although it
is enumerated by a unique CPUID feature bit, it cannot be turned off
independently once FRED is activated.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v3: Removed CONFIG_X86_NMI_SOURCE (Li Xin, HPA, Sohil)
v2: Removed NMI source from static CPU ID dependency table (HPA)
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/traps.c            | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..ec78d361e685 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -327,6 +327,7 @@
 #define X86_FEATURE_FRED		(12*32+17) /* Flexible Return and Event Delivery */
 #define X86_FEATURE_LKGS		(12*32+18) /* "" Load "kernel" (userspace) GS */
 #define X86_FEATURE_WRMSRNS		(12*32+19) /* "" Non-serializing WRMSR */
+#define X86_FEATURE_NMI_SOURCE		(12*32+20) /* NMI source reporting */
 #define X86_FEATURE_AMX_FP16		(12*32+21) /* "" AMX fp16 Support */
 #define X86_FEATURE_AVX_IFMA            (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
 #define X86_FEATURE_LAM			(12*32+26) /* Linear Address Masking */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4fa0b17e5043..465f04e4a79f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1427,8 +1427,10 @@ early_param("fred", fred_setup);
 
 void __init trap_init(void)
 {
-	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred)
+	if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred) {
 		setup_clear_cpu_cap(X86_FEATURE_FRED);
+		setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
+	}
 
 	/* Init cpu_entry_area before IST entries are set up */
 	setup_cpu_entry_areas();
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 02/11] x86/irq: Define NMI source vectors
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-29 18:32   ` Xin Li
  2024-06-28 20:18 ` [PATCH v3 03/11] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan

When NMI-source reporting is supported, each logical processor maintains
a 16-bit NMI-source bitmap. It is up to the system software to assign NMI
sources for their matching vector (bit position) in the bitmap.

Notice that NMI source vector is in a different namespace than the IDT
vectors. Though they share the same programming interface/field in the
NMI originator.

This initial allocation of the NMI sources are limited to local NMIs in
that there is no external device NMI usage yet.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/irq_vectors.h | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 13aea8fc3d45..e4cd33bc4fef 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -105,6 +105,34 @@
 
 #define NR_VECTORS			 256
 
+/*
+ * The NMI senders specify the NMI source vector as an 8bit integer in their
+ * vector field with NMI delivery mode. A local APIC receiving an NMI will
+ * set the corresponding bit in a 16bit bitmask, which is accumulated until
+ * the NMI is delivered.
+ * When a sender didn't specify an NMI source vector the source vector will
+ * be 0, which will result in bit 0 of the bitmask being set. For out of
+ * bounds vectors >= 16 bit 0 will also be set.
+ * When bit 0 is set, system software must invoke all registered NMI handlers
+ * as if NMI source feature is not enabled.
+ *
+ * Vector 2 is reserved for matching IDT NMI vector where it may be hardcoded
+ * by some external devices.
+ *
+ * The NMI source vectors are sorted by descending priority with the exceptions
+ * of 0 and 2.
+ */
+#define NMI_SOURCE_VEC_UNKNOWN		0
+#define NMI_SOURCE_VEC_IPI_REBOOT	1	/* Crash reboot */
+#define NMI_SOURCE_VEC_IDT_NMI		2	/* Match IDT NMI vector 2 */
+#define NMI_SOURCE_VEC_IPI_SMP_STOP	3	/* Panic stop CPU */
+#define NMI_SOURCE_VEC_IPI_BT		4	/* CPU backtrace */
+#define NMI_SOURCE_VEC_PMI		5	/* PerfMon counters */
+#define NMI_SOURCE_VEC_IPI_KGDB		6	/* KGDB */
+#define NMI_SOURCE_VEC_IPI_MCE		7	/* MCE injection */
+#define NMI_SOURCE_VEC_IPI_TEST		8	/* For remote and local IPIs */
+#define NR_NMI_SOURCE_VECTORS		9
+
 #ifdef CONFIG_X86_LOCAL_APIC
 #define FIRST_SYSTEM_VECTOR		POSTED_MSI_NOTIFICATION_VECTOR
 #else
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 03/11] x86/irq: Extend NMI handler registration interface to include source
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 02/11] x86/irq: Define NMI source vectors Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 04/11] x86/irq: Factor out common NMI handling code Jacob Pan
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan

Add a source vector argument to register_nmi_handler() such that designated
NMI originators can leverage NMI source reporting feature. For those who
do not use NMI source reporting, 0 (unknown) is used as the source vector. NMI
source vectors (up to 16) are pre-defined.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

---
v3:
   - Move NMI source vector definitions to a separate patch (Sohil)

v2:(address review comments from HPA, not including optimizations"
   - Reserve IDT NMI vector 2 in case of devices use hardcoded vector 2
   - Sort NMI source vector by priority in descending order

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/events/amd/ibs.c         |  2 +-
 arch/x86/events/core.c            |  3 ++-
 arch/x86/include/asm/nmi.h        |  4 +++-
 arch/x86/kernel/apic/hw_nmi.c     |  2 +-
 arch/x86/kernel/cpu/mce/inject.c  |  2 +-
 arch/x86/kernel/cpu/mshyperv.c    |  2 +-
 arch/x86/kernel/kgdb.c            |  4 ++--
 arch/x86/kernel/nmi.c             | 22 ++++++++++++++++++++++
 arch/x86/kernel/nmi_selftest.c    |  5 +++--
 arch/x86/kernel/reboot.c          |  2 +-
 arch/x86/kernel/smp.c             |  2 +-
 arch/x86/platform/uv/uv_nmi.c     |  4 ++--
 drivers/acpi/apei/ghes.c          |  2 +-
 drivers/char/ipmi/ipmi_watchdog.c |  2 +-
 drivers/edac/igen6_edac.c         |  2 +-
 drivers/watchdog/hpwdt.c          |  6 +++---
 16 files changed, 46 insertions(+), 20 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d62..20989071f59a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1246,7 +1246,7 @@ static __init int perf_event_ibs_init(void)
 	if (ret)
 		goto err_op;
 
-	ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
+	ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs", 0);
 	if (ret)
 		goto err_nmi;
 
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07b1ef1..1ef2201e48ac 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2100,7 +2100,8 @@ static int __init init_hw_perf_events(void)
 		x86_pmu.intel_ctrl = (1 << x86_pmu.num_counters) - 1;
 
 	perf_events_lapic_init();
-	register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI");
+
+	register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI", NMI_SOURCE_VEC_PMI);
 
 	unconstrained = (struct event_constraint)
 		__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 41a0ebb699ec..6fe26fea30eb 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -39,15 +39,17 @@ struct nmiaction {
 	u64			max_duration;
 	unsigned long		flags;
 	const char		*name;
+	unsigned int		source_vec;
 };
 
-#define register_nmi_handler(t, fn, fg, n, init...)	\
+#define register_nmi_handler(t, fn, fg, n, src, init...)	\
 ({							\
 	static struct nmiaction init fn##_na = {	\
 		.list = LIST_HEAD_INIT(fn##_na.list),	\
 		.handler = (fn),			\
 		.name = (n),				\
 		.flags = (fg),				\
+		.source_vec = (src),			\
 	};						\
 	__register_nmi_handler((t), &fn##_na);		\
 })
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 45af535c44a0..9f0125d3b8b0 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -54,7 +54,7 @@ NOKPROBE_SYMBOL(nmi_cpu_backtrace_handler);
 static int __init register_nmi_cpu_backtrace_handler(void)
 {
 	register_nmi_handler(NMI_LOCAL, nmi_cpu_backtrace_handler,
-				0, "arch_bt");
+				0, "arch_bt", NMI_SOURCE_VEC_IPI_BT);
 	return 0;
 }
 early_initcall(register_nmi_cpu_backtrace_handler);
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 94953d749475..365a03f11d06 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -769,7 +769,7 @@ static int __init inject_init(void)
 
 	debugfs_init();
 
-	register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0, "mce_notify");
+	register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0, "mce_notify", NMI_SOURCE_VEC_IPI_MCE);
 	mce_register_injector_chain(&inject_nb);
 
 	setup_inj_struct(&i_mce);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e0fd57a8ba84..2fb9408a8ba9 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -486,7 +486,7 @@ static void __init ms_hyperv_init_platform(void)
 	}
 
 	register_nmi_handler(NMI_UNKNOWN, hv_nmi_unknown, NMI_FLAG_FIRST,
-			     "hv_nmi_unknown");
+			     "hv_nmi_unknown", 0);
 #endif
 
 #ifdef CONFIG_X86_IO_APIC
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 9c9faa1634fb..d167eb23cf13 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -603,12 +603,12 @@ int kgdb_arch_init(void)
 		goto out;
 
 	retval = register_nmi_handler(NMI_LOCAL, kgdb_nmi_handler,
-					0, "kgdb");
+					0, "kgdb", NMI_SOURCE_VEC_IPI_KGDB);
 	if (retval)
 		goto out1;
 
 	retval = register_nmi_handler(NMI_UNKNOWN, kgdb_nmi_handler,
-					0, "kgdb");
+					0, "kgdb", 0);
 
 	if (retval)
 		goto out2;
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index ed163c8c8604..1ebe93edba7a 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -86,6 +86,12 @@ static DEFINE_PER_CPU(struct nmi_stats, nmi_stats);
 
 static int ignore_nmis __read_mostly;
 
+/*
+ * Contains all actions registered by originators with source vector,
+ * excluding UNKNOWN NMI source vector 0.
+ */
+static struct nmiaction *nmiaction_src_table[NR_NMI_SOURCE_VECTORS - 1];
+
 int unknown_nmi_panic;
 /*
  * Prevent NMI reason port (0x61) being accessed simultaneously, can
@@ -163,6 +169,12 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
 }
 NOKPROBE_SYMBOL(nmi_handle);
 
+static inline bool use_nmi_source(unsigned int type, struct nmiaction *a)
+{
+	return (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) &&
+		type == NMI_LOCAL && a->source_vec);
+}
+
 int __register_nmi_handler(unsigned int type, struct nmiaction *action)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
@@ -173,6 +185,11 @@ int __register_nmi_handler(unsigned int type, struct nmiaction *action)
 
 	raw_spin_lock_irqsave(&desc->lock, flags);
 
+	if (use_nmi_source(type, action)) {
+		rcu_assign_pointer(nmiaction_src_table[action->source_vec], action);
+		pr_info("NMI source %d registered for %s\n", action->source_vec, action->name);
+	}
+
 	/*
 	 * Indicate if there are multiple registrations on the
 	 * internal NMI handler call chains (SERR and IO_CHECK).
@@ -210,6 +227,11 @@ void unregister_nmi_handler(unsigned int type, const char *name)
 		if (!strcmp(n->name, name)) {
 			WARN(in_nmi(),
 				"Trying to free NMI (%s) from NMI context!\n", n->name);
+			if (use_nmi_source(type, n)) {
+				rcu_assign_pointer(nmiaction_src_table[n->source_vec], NULL);
+				pr_info("NMI source %d unregistered for %s\n", n->source_vec, n->name);
+			}
+
 			list_del_rcu(&n->list);
 			found = n;
 			break;
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index e93a8545c74d..f014c8a66b0c 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -44,7 +44,7 @@ static void __init init_nmi_testsuite(void)
 {
 	/* trap all the unknown NMIs we may generate */
 	register_nmi_handler(NMI_UNKNOWN, nmi_unk_cb, 0, "nmi_selftest_unk",
-			__initdata);
+			0, __initdata);
 }
 
 static void __init cleanup_nmi_testsuite(void)
@@ -67,7 +67,8 @@ static void __init test_nmi_ipi(struct cpumask *mask)
 	unsigned long timeout;
 
 	if (register_nmi_handler(NMI_LOCAL, test_nmi_ipi_callback,
-				 NMI_FLAG_FIRST, "nmi_selftest", __initdata)) {
+				 NMI_FLAG_FIRST, "nmi_selftest", NMI_SOURCE_VEC_IPI_TEST,
+				 __initdata)) {
 		nmi_fail = FAILURE;
 		return;
 	}
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f3130f762784..acc19c1d3b4f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -910,7 +910,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
 	/* Would it be better to replace the trap vector here? */
 	if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
-				 NMI_FLAG_FIRST, "crash"))
+				 NMI_FLAG_FIRST, "crash", NMI_SOURCE_VEC_IPI_REBOOT))
 		return;		/* Return what? */
 	/*
 	 * Ensure the new callback function is set before sending
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 18266cc3d98c..f27469e40141 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -143,7 +143,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
 static int register_stop_handler(void)
 {
 	return register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
-				    NMI_FLAG_FIRST, "smp_stop");
+				    NMI_FLAG_FIRST, "smp_stop", NMI_SOURCE_VEC_IPI_SMP_STOP);
 }
 
 static void native_stop_other_cpus(int wait)
diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 5c50e550ab63..473c34eb264c 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -1029,10 +1029,10 @@ static int uv_handle_nmi_ping(unsigned int reason, struct pt_regs *regs)
 
 static void uv_register_nmi_notifier(void)
 {
-	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
+	if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv", 0))
 		pr_warn("UV: NMI handler failed to register\n");
 
-	if (register_nmi_handler(NMI_LOCAL, uv_handle_nmi_ping, 0, "uvping"))
+	if (register_nmi_handler(NMI_LOCAL, uv_handle_nmi_ping, 0, "uvping", 0))
 		pr_warn("UV: PING NMI handler failed to register\n");
 }
 
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 623cc0cb4a65..393dca95d2b3 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1318,7 +1318,7 @@ static void ghes_nmi_add(struct ghes *ghes)
 {
 	mutex_lock(&ghes_list_mutex);
 	if (list_empty(&ghes_nmi))
-		register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
+		register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes", 0);
 	list_add_rcu(&ghes->list, &ghes_nmi);
 	mutex_unlock(&ghes_list_mutex);
 }
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index 9a459257489f..61bb5dcade5a 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -1272,7 +1272,7 @@ static void check_parms(void)
 	}
 	if (do_nmi && !nmi_handler_registered) {
 		rv = register_nmi_handler(NMI_UNKNOWN, ipmi_nmi, 0,
-						"ipmi");
+						"ipmi", 0);
 		if (rv) {
 			pr_warn("Can't register nmi handler\n");
 			return;
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index dbe9fe5f2ca6..891278245d8b 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1321,7 +1321,7 @@ static int register_err_handler(void)
 	}
 
 	rc = register_nmi_handler(NMI_SERR, ecclog_nmi_handler,
-				  0, IGEN6_NMI_NAME);
+				  0, IGEN6_NMI_NAME, 0);
 	if (rc) {
 		igen6_printk(KERN_ERR, "Failed to register NMI handler\n");
 		return rc;
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index ae30e394d176..5246706afcf6 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -242,13 +242,13 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
 	/*
 	 * Only one function can register for NMI_UNKNOWN
 	 */
-	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout, 0, "hpwdt");
+	retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout, 0, "hpwdt", 0);
 	if (retval)
 		goto error;
-	retval = register_nmi_handler(NMI_SERR, hpwdt_pretimeout, 0, "hpwdt");
+	retval = register_nmi_handler(NMI_SERR, hpwdt_pretimeout, 0, "hpwdt", 0);
 	if (retval)
 		goto error1;
-	retval = register_nmi_handler(NMI_IO_CHECK, hpwdt_pretimeout, 0, "hpwdt");
+	retval = register_nmi_handler(NMI_IO_CHECK, hpwdt_pretimeout, 0, "hpwdt", 0);
 	if (retval)
 		goto error2;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 04/11] x86/irq: Factor out common NMI handling code
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
                   ` (2 preceding siblings ...)
  2024-06-28 20:18 ` [PATCH v3 03/11] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-29  0:31   ` Xin Li
  2024-06-28 20:18 ` [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler Jacob Pan
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan

In preparation for handling NMIs with explicit source reporting, factor
out common code for reuse.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/kernel/nmi.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 1ebe93edba7a..639a34e78bc9 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -135,6 +135,20 @@ static void nmi_check_duration(struct nmiaction *action, u64 duration)
 		action->handler, duration, decimal_msecs);
 }
 
+static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsigned int type)
+{
+	int thishandled;
+	u64 delta;
+
+	delta = sched_clock();
+	thishandled = a->handler(type, regs);
+	delta = sched_clock() - delta;
+	trace_nmi_handler(a->handler, (int)delta, thishandled);
+	nmi_check_duration(a, delta);
+
+	return thishandled;
+}
+
 static int nmi_handle(unsigned int type, struct pt_regs *regs)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
@@ -149,18 +163,8 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
 	 * can be latched at any given time.  Walk the whole list
 	 * to handle those situations.
 	 */
-	list_for_each_entry_rcu(a, &desc->head, list) {
-		int thishandled;
-		u64 delta;
-
-		delta = sched_clock();
-		thishandled = a->handler(type, regs);
-		handled += thishandled;
-		delta = sched_clock() - delta;
-		trace_nmi_handler(a->handler, (int)delta, thishandled);
-
-		nmi_check_duration(a, delta);
-	}
+	list_for_each_entry_rcu(a, &desc->head, list)
+		handled += do_handle_nmi(a, regs, type);
 
 	rcu_read_unlock();
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
                   ` (3 preceding siblings ...)
  2024-06-28 20:18 ` [PATCH v3 04/11] x86/irq: Factor out common NMI handling code Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-29  3:39   ` Xin Li
  2024-07-01 14:31   ` Nikolay Borisov
  2024-06-28 20:18 ` [PATCH v3 06/11] KVM: VMX: Expand FRED kvm entry with event data Jacob Pan
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan

With NMI source reporting enabled, NMI handler can prioritize the
handling of sources reported explicitly. If the source is unknown, then
resume the existing processing flow. i.e. invoke all NMI handlers.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

---
v3:
   - Use a static flag to disable NMIs in case of HW failure
   - Optimize the case when unknown NMIs are mixed with known NMIs(HPA)
v2:
   - Disable NMI source reporting once garbage data is given in FRED
return stack. (HPA)
---
 arch/x86/kernel/nmi.c | 73 +++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 70 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 639a34e78bc9..c3a10af7f26b 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -149,23 +149,90 @@ static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsig
 	return thishandled;
 }
 
+static int nmi_handle_src(unsigned int type, struct pt_regs *regs, unsigned long *handled_mask)
+{
+	static bool nmi_source_disabled;
+	bool has_unknown_src = false;
+	unsigned long source_bitmask;
+	struct nmiaction *a;
+	int handled = 0;
+	int vec = 1;
+
+	if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) ||
+	    type != NMI_LOCAL || nmi_source_disabled)
+		return 0;
+
+	source_bitmask = fred_event_data(regs);
+	if (!source_bitmask) {
+		pr_warn("NMI received without source information! Disable source reporting.\n");
+		nmi_source_disabled = true;
+		return 0;
+	}
+
+	/*
+	 * Per NMI source specification, there is no guarantee that a valid
+	 * NMI vector is always delivered, even when the source specified
+	 * one. It is software's responsibility to check all available NMI
+	 * sources when bit 0 is set in the NMI source bitmap. i.e. we have
+	 * to call every handler as if we have no NMI source.
+	 * On the other hand, if we do get non-zero vectors, we know exactly
+	 * what the sources are. So we only call the handlers with the bit set.
+	 */
+	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
+		pr_warn_ratelimited("NMI received with unknown source\n");
+		has_unknown_src = true;
+	}
+
+	rcu_read_lock();
+	/* Bit 0 is for unknown NMI sources, skip it. */
+	for_each_set_bit_from(vec, &source_bitmask, NR_NMI_SOURCE_VECTORS) {
+		a = rcu_dereference(nmiaction_src_table[vec]);
+		if (!a) {
+			pr_warn_ratelimited("NMI received %d no handler", vec);
+			continue;
+		}
+		handled += do_handle_nmi(a, regs, type);
+		/*
+		 * Needs polling if unknown source bit is set, handled_mask is
+		 * used to tell the polling code which NMIs can be skipped.
+		 */
+		if (has_unknown_src)
+			*handled_mask |= BIT(vec);
+	}
+	rcu_read_unlock();
+
+	return handled;
+}
+
 static int nmi_handle(unsigned int type, struct pt_regs *regs)
 {
 	struct nmi_desc *desc = nmi_to_desc(type);
+	unsigned long handled_mask = 0;
 	struct nmiaction *a;
 	int handled=0;
 
-	rcu_read_lock();
+	/*
+	 * Check if the NMI source handling is complete, otherwise polling is
+	 * still required. handled_mask is non-zero if NMI source handling is
+	 * partial due to unknown NMI sources.
+	 */
+	handled = nmi_handle_src(type, regs, &handled_mask);
+	if (handled && !handled_mask)
+		return handled;
 
+	rcu_read_lock();
 	/*
 	 * NMIs are edge-triggered, which means if you have enough
 	 * of them concurrently, you can lose some because only one
 	 * can be latched at any given time.  Walk the whole list
 	 * to handle those situations.
 	 */
-	list_for_each_entry_rcu(a, &desc->head, list)
+	list_for_each_entry_rcu(a, &desc->head, list) {
+		/* Skip NMIs handled earlier with source info */
+		if (BIT(a->source_vec) & handled_mask)
+			continue;
 		handled += do_handle_nmi(a, regs, type);
-
+	}
 	rcu_read_unlock();
 
 	/* return total number of NMI events handled */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 06/11] KVM: VMX: Expand FRED kvm entry with event data
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
                   ` (4 preceding siblings ...)
  2024-06-28 20:18 ` [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-29  4:01   ` Xin Li
  2024-06-28 20:18 ` [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Zeng Guang, Jacob Pan

From: Zeng Guang <guang.zeng@intel.com>

For VM exits caused by events (NMI, #DB, and #PF) delivered by FRED, the
event data is saved in the exit-qualification field. (FRED spec. 10.6.2)

Expand FRED KVM entry interface to include the event data obtained from
the exit qualification.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/fred.h | 8 ++++----
 arch/x86/kvm/vmx/vmx.c      | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index e86c7ba32435..15f5d2eabd1d 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -63,14 +63,14 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
 
 void asm_fred_entrypoint_user(void);
 void asm_fred_entrypoint_kernel(void);
-void asm_fred_entry_from_kvm(struct fred_ss);
+void asm_fred_entry_from_kvm(struct fred_ss, unsigned long edata);
 
 __visible void fred_entry_from_user(struct pt_regs *regs);
 __visible void fred_entry_from_kernel(struct pt_regs *regs);
 __visible void __fred_entry_from_kvm(struct pt_regs *regs);
 
 /* Can be called from noinstr code, thus __always_inline */
-static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector)
+static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector, unsigned long edata)
 {
 	struct fred_ss ss = {
 		.ss     =__KERNEL_DS,
@@ -80,7 +80,7 @@ static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int
 		.lm     = 1,
 	};
 
-	asm_fred_entry_from_kvm(ss);
+	asm_fred_entry_from_kvm(ss, edata);
 }
 
 void cpu_init_fred_exceptions(void);
@@ -90,7 +90,7 @@ void fred_complete_exception_setup(void);
 static __always_inline unsigned long fred_event_data(struct pt_regs *regs) { return 0; }
 static inline void cpu_init_fred_exceptions(void) { }
 static inline void fred_complete_exception_setup(void) { }
-static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector) { }
+static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector, unsigned long edata) { }
 #endif /* CONFIG_X86_FRED */
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b3c83c06f826..4e7b36081b76 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7024,7 +7024,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
 
 	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
 	if (cpu_feature_enabled(X86_FEATURE_FRED))
-		fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
+		fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector, 0);
 	else
 		vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
 	kvm_after_interrupt(vcpu);
@@ -7332,7 +7332,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	    is_nmi(vmx_get_intr_info(vcpu))) {
 		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
 		if (cpu_feature_enabled(X86_FEATURE_FRED))
-			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR);
+			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 0);
 		else
 			vmx_do_nmi_irqoff();
 		kvm_after_interrupt(vcpu);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
                   ` (5 preceding siblings ...)
  2024-06-28 20:18 ` [PATCH v3 06/11] KVM: VMX: Expand FRED kvm entry with event data Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-29  4:07   ` Xin Li
  2024-06-30 13:04   ` Zeng Guang
  2024-06-28 20:18 ` [PATCH v3 08/11] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Zeng Guang, Jacob Pan

From: Zeng Guang <guang.zeng@intel.com>

If the "NMI exiting" VM-execution control is 1, the value of the 16-bit NMI
source vector is saved in the exit-qualification field in the VMCS when VM
exits occur on CPUs that support NMI source.

KVM that is aware of NMI-source reporting will push the bitmask of NMI source
vectors as the exceptoin event data field on the stack for then entry of FRED
exception. Subsequently, the host NMI exception handler is invoked which
will process NMI source information in the event data. This operation is
independent of vCPU FRED enabling status.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/entry/entry_64_fred.S |  2 +-
 arch/x86/kvm/vmx/vmx.c         | 11 ++++++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index a02bc6f3d2e6..0d934a3fcaf8 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -92,7 +92,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	 * +--------+-----------------+
 	 */
 	push $0				/* Reserved, must be 0 */
-	push $0				/* Event data, 0 for IRQ/NMI */
+	push %rsi			/* Event data for IRQ/NMI */
 	push %rdi			/* fred_ss handed in by the caller */
 	push %rbp
 	pushf
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4e7b36081b76..6719c598fa5f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7331,10 +7331,15 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
 	    is_nmi(vmx_get_intr_info(vcpu))) {
 		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
-		if (cpu_feature_enabled(X86_FEATURE_FRED))
-			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 0);
-		else
+		if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+			unsigned long edata = 0;
+
+			if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
+				edata = vmx_get_exit_qual(vcpu);
+			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, edata);
+		} else {
 			vmx_do_nmi_irqoff();
+		}
 		kvm_after_interrupt(vcpu);
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 08/11] perf/x86: Enable NMI source reporting for perfmon
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
                   ` (6 preceding siblings ...)
  2024-06-28 20:18 ` [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-07-04 14:44   ` Liang, Kan
  2024-06-28 20:18 ` [PATCH v3 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan, Zeng Guang

Program the designated NMI source vector into the performance monitoring
interrupt (PMI) of the local vector table. PMI handler will be directly
invoked when its NMI is generated. This avoids the latency of calling all
NMI handlers blindly.

Co-developed-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

---
v3: Program NMI source vector in PVTPC unconditionally (HPA)
v2: Fix a compile error apic_perfmon_ctr is undefined in i386 config
---
 arch/x86/events/core.c       | 6 ++++--
 arch/x86/events/intel/core.c | 6 +++---
 arch/x86/include/asm/apic.h  | 1 +
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1ef2201e48ac..be75bdcdd400 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -46,6 +46,7 @@
 
 struct x86_pmu x86_pmu __read_mostly;
 static struct pmu pmu;
+u32 apic_perfmon_ctr = APIC_DM_NMI;
 
 DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
 	.enabled = 1,
@@ -1680,7 +1681,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 	 * This generic handler doesn't seem to have any issues where the
 	 * unmasking occurs so it was left at the top.
 	 */
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	apic_write(APIC_LVTPC, apic_perfmon_ctr);
 
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
 		if (!test_bit(idx, cpuc->active_mask))
@@ -1723,7 +1724,8 @@ void perf_events_lapic_init(void)
 	/*
 	 * Always use NMI for PMU
 	 */
-	apic_write(APIC_LVTPC, APIC_DM_NMI);
+	apic_perfmon_ctr |= NMI_SOURCE_VEC_PMI;
+	apic_write(APIC_LVTPC, apic_perfmon_ctr);
 }
 
 static int
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 38c1b1f1deaa..b4a70457c678 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3093,7 +3093,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	 * NMI handler.
 	 */
 	if (!late_ack && !mid_ack)
-		apic_write(APIC_LVTPC, APIC_DM_NMI);
+		apic_write(APIC_LVTPC, apic_perfmon_ctr);
 	intel_bts_disable_local();
 	cpuc->enabled = 0;
 	__intel_pmu_disable_all(true);
@@ -3130,7 +3130,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 
 done:
 	if (mid_ack)
-		apic_write(APIC_LVTPC, APIC_DM_NMI);
+		apic_write(APIC_LVTPC, apic_perfmon_ctr);
 	/* Only restore PMU state when it's active. See x86_pmu_disable(). */
 	cpuc->enabled = pmu_enabled;
 	if (pmu_enabled)
@@ -3143,7 +3143,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	 * Haswell CPUs.
 	 */
 	if (late_ack)
-		apic_write(APIC_LVTPC, APIC_DM_NMI);
+		apic_write(APIC_LVTPC, apic_perfmon_ctr);
 	return handled;
 }
 
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 9327eb00e96d..bcf8d17240c8 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -49,6 +49,7 @@ static inline void x86_32_probe_apic(void) { }
 #endif
 
 extern u32 cpuid_to_apicid[];
+extern u32 apic_perfmon_ctr;
 
 #define CPU_ACPIID_INVALID	U32_MAX
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
                   ` (7 preceding siblings ...)
  2024-06-28 20:18 ` [PATCH v3 08/11] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 10/11] x86/irq: Move __prepare_ICR to x86 common header Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 11/11] KVM: X86: Use common code for PV IPIs in linux guest Jacob Pan
  10 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan

Program designated NMI source vectors for all NMI delivered IPIs
such that their handlers can be selectively invoked.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/irq_vectors.h | 10 ++++++++++
 arch/x86/kernel/apic/hw_nmi.c      |  3 ++-
 arch/x86/kernel/apic/ipi.c         |  4 ++--
 arch/x86/kernel/apic/local.h       | 18 ++++++++++++------
 arch/x86/kernel/cpu/mce/inject.c   |  2 +-
 arch/x86/kernel/kgdb.c             |  2 +-
 arch/x86/kernel/nmi_selftest.c     |  2 +-
 arch/x86/kernel/reboot.c           |  2 +-
 arch/x86/kernel/smp.c              |  2 +-
 9 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index e4cd33bc4fef..4cedebdc1afb 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -133,6 +133,16 @@
 #define NMI_SOURCE_VEC_IPI_TEST		8	/* For remote and local IPIs */
 #define NR_NMI_SOURCE_VECTORS		9
 
+/*
+ * When programming the local APIC, IDT NMI vector and NMI source vector
+ * are encoded in a single 32 bit variable. The top 16 bits contain
+ * the NMI source vector and the bottom 16 bits contain NMI_VECTOR (2)
+ * The top 16 bits are always zero when NMI source feature is not enabled
+ * or the caller does not use NMI source.
+ */
+#define NMI_VECTOR_WITH_SOURCE(src)	(NMI_VECTOR | (src << 16))
+#define NMI_SOURCE_VEC_MASK		GENMASK(15, 0)
+
 #ifdef CONFIG_X86_LOCAL_APIC
 #define FIRST_SYSTEM_VECTOR		POSTED_MSI_NOTIFICATION_VECTOR
 #else
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 9f0125d3b8b0..f73ca95d961e 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -20,6 +20,7 @@
 #include <linux/nmi.h>
 #include <linux/init.h>
 #include <linux/delay.h>
+#include <asm/irq_vectors.h>
 
 #include "local.h"
 
@@ -33,7 +34,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
 #ifdef arch_trigger_cpumask_backtrace
 static void nmi_raise_cpu_backtrace(cpumask_t *mask)
 {
-	__apic_send_IPI_mask(mask, NMI_VECTOR);
+	__apic_send_IPI_mask(mask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_BT));
 }
 
 void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 5da693d633b7..9d2b18e58758 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -157,7 +157,7 @@ static void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
 	 * issues where otherwise the system hangs when the panic CPU tries
 	 * to stop the others before launching the kdump kernel.
 	 */
-	if (unlikely(vector == NMI_VECTOR))
+	if (unlikely(is_nmi_vector(vector)))
 		apic_mem_wait_icr_idle_timeout();
 	else
 		apic_mem_wait_icr_idle();
@@ -174,7 +174,7 @@ void __default_send_IPI_dest_field(unsigned int dest_mask, int vector,
 				   unsigned int dest_mode)
 {
 	/* See comment in __default_send_IPI_shortcut() */
-	if (unlikely(vector == NMI_VECTOR))
+	if (unlikely(is_nmi_vector(vector)))
 		apic_mem_wait_icr_idle_timeout();
 	else
 		apic_mem_wait_icr_idle();
diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h
index 842fe28496be..60e90b7bf058 100644
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -12,6 +12,7 @@
 
 #include <asm/irq_vectors.h>
 #include <asm/apic.h>
+#include <asm/nmi.h>
 
 /* X2APIC */
 void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
@@ -26,19 +27,24 @@ extern u32 x2apic_max_apicid;
 
 DECLARE_STATIC_KEY_FALSE(apic_use_ipi_shorthand);
 
+static inline bool is_nmi_vector(int vector)
+{
+	return (vector & NMI_SOURCE_VEC_MASK) == NMI_VECTOR;
+}
+
 static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
 					 unsigned int dest)
 {
 	unsigned int icr = shortcut | dest;
 
-	switch (vector) {
-	default:
-		icr |= APIC_DM_FIXED | vector;
-		break;
-	case NMI_VECTOR:
+	if (is_nmi_vector(vector)) {
 		icr |= APIC_DM_NMI;
-		break;
+		if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
+			icr |= vector >> 16;
+	} else {
+		icr |= APIC_DM_FIXED | vector;
 	}
+
 	return icr;
 }
 
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 365a03f11d06..07bc6c29bd83 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -270,7 +270,7 @@ static void __maybe_unused raise_mce(struct mce *m)
 					mce_irq_ipi, NULL, 0);
 				preempt_enable();
 			} else if (m->inject_flags & MCJ_NMI_BROADCAST)
-				__apic_send_IPI_mask(mce_inject_cpumask, NMI_VECTOR);
+				__apic_send_IPI_mask(mce_inject_cpumask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_MCE));
 		}
 		start = jiffies;
 		while (!cpumask_empty(mce_inject_cpumask)) {
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index d167eb23cf13..02198cf9fe21 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -416,7 +416,7 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
  */
 void kgdb_roundup_cpus(void)
 {
-	apic_send_IPI_allbutself(NMI_VECTOR);
+	apic_send_IPI_allbutself(NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_KGDB));
 }
 #endif
 
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index f014c8a66b0c..5aa122d3368c 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -76,7 +76,7 @@ static void __init test_nmi_ipi(struct cpumask *mask)
 	/* sync above data before sending NMI */
 	wmb();
 
-	__apic_send_IPI_mask(mask, NMI_VECTOR);
+	__apic_send_IPI_mask(mask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_TEST));
 
 	/* Don't wait longer than a second */
 	timeout = USEC_PER_SEC;
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index acc19c1d3b4f..fb63bc0d6a0f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -918,7 +918,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
 	 */
 	wmb();
 
-	apic_send_IPI_allbutself(NMI_VECTOR);
+	apic_send_IPI_allbutself(NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_REBOOT));
 
 	/* Kick CPUs looping in NMI context. */
 	WRITE_ONCE(crash_ipi_issued, 1);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index f27469e40141..b79e78762a73 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -217,7 +217,7 @@ static void native_stop_other_cpus(int wait)
 			pr_emerg("Shutting down cpus with NMI\n");
 
 			for_each_cpu(cpu, &cpus_stop_mask)
-				__apic_send_IPI(cpu, NMI_VECTOR);
+				__apic_send_IPI(cpu, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_SMP_STOP));
 		}
 		/*
 		 * Don't wait longer than 10 ms if the caller didn't
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 10/11] x86/irq: Move __prepare_ICR to x86 common header
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
                   ` (8 preceding siblings ...)
  2024-06-28 20:18 ` [PATCH v3 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-28 20:18 ` [PATCH v3 11/11] KVM: X86: Use common code for PV IPIs in linux guest Jacob Pan
  10 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan

To reuse __prepare_ICR() outside APIC local code, move it to the x86
common header. e.g. It can be used by the KVM PV IPI code.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/include/asm/apic.h  | 21 +++++++++++++++++++++
 arch/x86/kernel/apic/local.h | 22 ----------------------
 2 files changed, 21 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index bcf8d17240c8..7fb4c3dae569 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -476,6 +476,27 @@ static __always_inline bool apic_id_valid(u32 apic_id)
 	return apic_id <= apic->max_apic_id;
 }
 
+static inline bool is_nmi_vector(int vector)
+{
+	return (vector & NMI_SOURCE_VEC_MASK) == NMI_VECTOR;
+}
+
+static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
+					 unsigned int dest)
+{
+	unsigned int icr = shortcut | dest;
+
+	if (is_nmi_vector(vector)) {
+		icr |= APIC_DM_NMI;
+		if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
+			icr |= vector >> 16;
+	} else {
+		icr |= APIC_DM_FIXED | vector;
+	}
+
+	return icr;
+}
+
 #else /* CONFIG_X86_LOCAL_APIC */
 
 static inline u32 apic_read(u32 reg) { return 0; }
diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h
index 60e90b7bf058..8b1fe152cd2d 100644
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -12,7 +12,6 @@
 
 #include <asm/irq_vectors.h>
 #include <asm/apic.h>
-#include <asm/nmi.h>
 
 /* X2APIC */
 void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
@@ -27,27 +26,6 @@ extern u32 x2apic_max_apicid;
 
 DECLARE_STATIC_KEY_FALSE(apic_use_ipi_shorthand);
 
-static inline bool is_nmi_vector(int vector)
-{
-	return (vector & NMI_SOURCE_VEC_MASK) == NMI_VECTOR;
-}
-
-static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
-					 unsigned int dest)
-{
-	unsigned int icr = shortcut | dest;
-
-	if (is_nmi_vector(vector)) {
-		icr |= APIC_DM_NMI;
-		if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
-			icr |= vector >> 16;
-	} else {
-		icr |= APIC_DM_FIXED | vector;
-	}
-
-	return icr;
-}
-
 void default_init_apic_ldr(void);
 
 void apic_mem_wait_icr_idle(void);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 11/11] KVM: X86: Use common code for PV IPIs in linux guest
  2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
                   ` (9 preceding siblings ...)
  2024-06-28 20:18 ` [PATCH v3 10/11] x86/irq: Move __prepare_ICR to x86 common header Jacob Pan
@ 2024-06-28 20:18 ` Jacob Pan
  2024-06-29 18:38   ` Xin Li
  10 siblings, 1 reply; 33+ messages in thread
From: Jacob Pan @ 2024-06-28 20:18 UTC (permalink / raw)
  To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Jacob Pan, Zeng Guang

Paravirtual apic hooks to enable PV IPIs for KVM if the "send IPI"
hypercall is available. Reuse common code for ICR preparation which
covers NMI-source reporting if in effect.

Originally-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 arch/x86/kernel/kvm.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 263f8aed4e2c..a45d60aa0302 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -516,15 +516,7 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
 
 	local_irq_save(flags);
 
-	switch (vector) {
-	default:
-		icr = APIC_DM_FIXED | vector;
-		break;
-	case NMI_VECTOR:
-		icr = APIC_DM_NMI;
-		break;
-	}
-
+	icr = __prepare_ICR(0, vector, 0);
 	for_each_cpu(cpu, mask) {
 		apic_id = per_cpu(x86_cpu_to_apicid, cpu);
 		if (!ipi_bitmap) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 04/11] x86/irq: Factor out common NMI handling code
  2024-06-28 20:18 ` [PATCH v3 04/11] x86/irq: Factor out common NMI handling code Jacob Pan
@ 2024-06-29  0:31   ` Xin Li
  2024-07-03 23:10     ` Jacob Pan
  0 siblings, 1 reply; 33+ messages in thread
From: Xin Li @ 2024-06-29  0:31 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil

On 6/28/2024 1:18 PM, Jacob Pan wrote:
> In preparation for handling NMIs with explicit source reporting, factor
> out common code for reuse.
> 

My read is that this patch has no functional change, right?

If yes, please add "No functional change intended."

> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   arch/x86/kernel/nmi.c | 28 ++++++++++++++++------------
>   1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 1ebe93edba7a..639a34e78bc9 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -135,6 +135,20 @@ static void nmi_check_duration(struct nmiaction *action, u64 duration)
>   		action->handler, duration, decimal_msecs);
>   }
>   
> +static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsigned int type)
> +{
> +	int thishandled;
> +	u64 delta;
> +
> +	delta = sched_clock();
> +	thishandled = a->handler(type, regs);
> +	delta = sched_clock() - delta;
> +	trace_nmi_handler(a->handler, (int)delta, thishandled);
> +	nmi_check_duration(a, delta);
> +
> +	return thishandled;
> +}
> +
>   static int nmi_handle(unsigned int type, struct pt_regs *regs)
>   {
>   	struct nmi_desc *desc = nmi_to_desc(type);
> @@ -149,18 +163,8 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
>   	 * can be latched at any given time.  Walk the whole list
>   	 * to handle those situations.
>   	 */
> -	list_for_each_entry_rcu(a, &desc->head, list) {
> -		int thishandled;
> -		u64 delta;
> -
> -		delta = sched_clock();
> -		thishandled = a->handler(type, regs);
> -		handled += thishandled;
> -		delta = sched_clock() - delta;
> -		trace_nmi_handler(a->handler, (int)delta, thishandled);
> -
> -		nmi_check_duration(a, delta);
> -	}
> +	list_for_each_entry_rcu(a, &desc->head, list)
> +		handled += do_handle_nmi(a, regs, type);
>   
>   	rcu_read_unlock();
>   

As this is a preparation patch, better move it earlier before any actual 
NMI source changes, maybe the first patch of this series.

Thanks!
     Xin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler
  2024-06-28 20:18 ` [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler Jacob Pan
@ 2024-06-29  3:39   ` Xin Li
  2024-07-07  3:48     ` Jacob Pan
  2024-07-01 14:31   ` Nikolay Borisov
  1 sibling, 1 reply; 33+ messages in thread
From: Xin Li @ 2024-06-29  3:39 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil

On 6/28/2024 1:18 PM, Jacob Pan wrote:
> With NMI source reporting enabled, NMI handler can prioritize the
> handling of sources reported explicitly. If the source is unknown, then
> resume the existing processing flow. i.e. invoke all NMI handlers.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>

The code looks good to me, however please improve coding styles and
comments, see below.

> 
> ---
> v3:
>     - Use a static flag to disable NMIs in case of HW failure
>     - Optimize the case when unknown NMIs are mixed with known NMIs(HPA)
> v2:
>     - Disable NMI source reporting once garbage data is given in FRED
> return stack. (HPA)
> ---
>   arch/x86/kernel/nmi.c | 73 +++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 70 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 639a34e78bc9..c3a10af7f26b 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -149,23 +149,90 @@ static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsig
>   	return thishandled;
>   }
>   
> +static int nmi_handle_src(unsigned int type, struct pt_regs *regs, unsigned long *handled_mask)
> +{
> +	static bool nmi_source_disabled;
> +	bool has_unknown_src = false;
> +	unsigned long source_bitmask;
> +	struct nmiaction *a;
> +	int handled = 0;
> +	int vec = 1;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) ||
> +	    type != NMI_LOCAL || nmi_source_disabled)

Harder to read, no need to break into 2 lines.

> +		return 0;
> +
> +	source_bitmask = fred_event_data(regs);
> +	if (!source_bitmask) {

unlikely()?

> +		pr_warn("NMI received without source information! Disable source reporting.\n");

It sounds you're disabling some hardware functionality. Better to say,
maybe:

Buggy hardware? Disable NMI source handling.

> +		nmi_source_disabled = true;
> +		return 0;
> +	}
> +
> +	/*
> +	 * Per NMI source specification, there is no guarantee that a valid
> +	 * NMI vector is always delivered, even when the source specified
> +	 * one. It is software's responsibility to check all available NMI
> +	 * sources when bit 0 is set in the NMI source bitmap. i.e. we have

s/i.e./I.e.,/

> +	 * to call every handler as if we have no NMI source.

This comment is misleading, because you do skip NMI handlers with source
bits set in polling.

And add an empty line to ease review.

> +	 * On the other hand, if we do get non-zero vectors, we know exactly
> +	 * what the sources are. So we only call the handlers with the bit set.
> +	 */
> +	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> +		pr_warn_ratelimited("NMI received with unknown source\n");

s/source/sources/

> +		has_unknown_src = true;
> +	}
> +
> +	rcu_read_lock();

Add an empty line.

> +	/* Bit 0 is for unknown NMI sources, skip it. */

Put "vec = 1 " close to this comment.

> +	for_each_set_bit_from(vec, &source_bitmask, NR_NMI_SOURCE_VECTORS) {
> +		a = rcu_dereference(nmiaction_src_table[vec]);
> +		if (!a) {
> +			pr_warn_ratelimited("NMI received %d no handler", vec);

Use a better log message.

> +			continue;
> +		}

Empty line again.

> +		handled += do_handle_nmi(a, regs, type);

Ditto.

> +		/*
> +		 * Needs polling if unknown source bit is set, handled_mask is

                                    ^the

> +		 * used to tell the polling code which NMIs can be skipped.
> +		 */
> +		if (has_unknown_src)
> +			*handled_mask |= BIT(vec);
> +	}

empty line please.

> +	rcu_read_unlock();
> +
> +	return handled;
> +}
> +
>   static int nmi_handle(unsigned int type, struct pt_regs *regs)
>   {
>   	struct nmi_desc *desc = nmi_to_desc(type);
> +	unsigned long handled_mask = 0;
>   	struct nmiaction *a;
>   	int handled=0;
>   
> -	rcu_read_lock();
> +	/*
> +	 * Check if the NMI source handling is complete, otherwise polling is
> +	 * still required. handled_mask is non-zero if NMI source handling is
> +	 * partial due to unknown NMI sources.
> +	 */
> +	handled = nmi_handle_src(type, regs, &handled_mask);
> +	if (handled && !handled_mask)
> +		return handled;
>   
> +	rcu_read_lock();

keep original empty lines around it.

>   	/*
>   	 * NMIs are edge-triggered, which means if you have enough
>   	 * of them concurrently, you can lose some because only one
>   	 * can be latched at any given time.  Walk the whole list
>   	 * to handle those situations.
>   	 */
> -	list_for_each_entry_rcu(a, &desc->head, list)
> +	list_for_each_entry_rcu(a, &desc->head, list) {
> +		/* Skip NMIs handled earlier with source info */
> +		if (BIT(a->source_vec) & handled_mask)
> +			continue;
>   		handled += do_handle_nmi(a, regs, type);
> -
> +	}
>   	rcu_read_unlock();

keep original empty lines around it.

>   
>   	/* return total number of NMI events handled */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 06/11] KVM: VMX: Expand FRED kvm entry with event data
  2024-06-28 20:18 ` [PATCH v3 06/11] KVM: VMX: Expand FRED kvm entry with event data Jacob Pan
@ 2024-06-29  4:01   ` Xin Li
  2024-07-01 15:39     ` Jacob Pan
  0 siblings, 1 reply; 33+ messages in thread
From: Xin Li @ 2024-06-29  4:01 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Zeng Guang

On 6/28/2024 1:18 PM, Jacob Pan wrote:
> From: Zeng Guang <guang.zeng@intel.com>
> 
> For VM exits caused by events (NMI, #DB, and #PF) delivered by FRED, the
> event data is saved in the exit-qualification field. (FRED spec. 10.6.2)

I don't like mentioning #DB/#PF here, they belong to the guest that was 
running, and KVM handles them.

While NMIs belong to host, and the host NMI handler needs event data
saved in VMCS in NMI induced VM exits.

Or you paint a full picture.

> Expand FRED KVM entry interface to include the event data obtained from
> the exit qualification.
> 
> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit
  2024-06-28 20:18 ` [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
@ 2024-06-29  4:07   ` Xin Li
  2024-07-01 15:45     ` Jacob Pan
  2024-06-30 13:04   ` Zeng Guang
  1 sibling, 1 reply; 33+ messages in thread
From: Xin Li @ 2024-06-29  4:07 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Zeng Guang

On 6/28/2024 1:18 PM, Jacob Pan wrote:
> From: Zeng Guang <guang.zeng@intel.com>
> 
> If the "NMI exiting" VM-execution control is 1, the value of the 16-bit NMI
> source vector is saved in the exit-qualification field in the VMCS when VM
> exits occur on CPUs that support NMI source.
> 
> KVM that is aware of NMI-source reporting will push the bitmask of NMI source
> vectors as the exceptoin event data field on the stack for then entry of FRED
> exception. Subsequently, the host NMI exception handler is invoked which
> will process NMI source information in the event data. This operation is
> independent of vCPU FRED enabling status.
> 
> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   arch/x86/entry/entry_64_fred.S |  2 +-
>   arch/x86/kvm/vmx/vmx.c         | 11 ++++++++---
>   2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
> index a02bc6f3d2e6..0d934a3fcaf8 100644
> --- a/arch/x86/entry/entry_64_fred.S
> +++ b/arch/x86/entry/entry_64_fred.S
> @@ -92,7 +92,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
>   	 * +--------+-----------------+
>   	 */
>   	push $0				/* Reserved, must be 0 */
> -	push $0				/* Event data, 0 for IRQ/NMI */
> +	push %rsi			/* Event data for IRQ/NMI */
>   	push %rdi			/* fred_ss handed in by the caller */
>   	push %rbp
>   	pushf

This belongs to the previous patch, it is part of the host changes.

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4e7b36081b76..6719c598fa5f 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7331,10 +7331,15 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>   	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
>   	    is_nmi(vmx_get_intr_info(vcpu))) {
>   		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> -		if (cpu_feature_enabled(X86_FEATURE_FRED))
> -			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 0);
> -		else
> +		if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> +			unsigned long edata = 0;
> +
> +			if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
> +				edata = vmx_get_exit_qual(vcpu);
> +			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, edata);

Simply do fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 
vmx_get_exit_qual(vcpu))?

> +		} else {
>   			vmx_do_nmi_irqoff();
> +		}
>   		kvm_after_interrupt(vcpu);
>   	}
>   


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 02/11] x86/irq: Define NMI source vectors
  2024-06-28 20:18 ` [PATCH v3 02/11] x86/irq: Define NMI source vectors Jacob Pan
@ 2024-06-29 18:32   ` Xin Li
  2024-07-01 17:16     ` Jacob Pan
  0 siblings, 1 reply; 33+ messages in thread
From: Xin Li @ 2024-06-29 18:32 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil

On 6/28/2024 1:18 PM, Jacob Pan wrote:
> When NMI-source reporting is supported, each logical processor maintains
> a 16-bit NMI-source bitmap. It is up to the system software to assign NMI
> sources for their matching vector (bit position) in the bitmap.
> 
> Notice that NMI source vector is in a different namespace than the IDT
> vectors. Though they share the same programming interface/field in the
> NMI originator.
> 
> This initial allocation of the NMI sources are limited to local NMIs in
> that there is no external device NMI usage yet.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   arch/x86/include/asm/irq_vectors.h | 28 ++++++++++++++++++++++++++++
>   1 file changed, 28 insertions(+)
> 
> diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
> index 13aea8fc3d45..e4cd33bc4fef 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -105,6 +105,34 @@
>   
>   #define NR_VECTORS			 256
>   
> +/*
> + * The NMI senders specify the NMI source vector as an 8bit integer in their

s/The NMI senders/NMI senders/

> + * vector field with NMI delivery mode. A local APIC receiving an NMI will
> + * set the corresponding bit in a 16bit bitmask, which is accumulated until
> + * the NMI is delivered.
> + * When a sender didn't specify an NMI source vector the source vector will
> + * be 0, which will result in bit 0 of the bitmask being set. For out of
> + * bounds vectors >= 16 bit 0 will also be set.
> + * When bit 0 is set, system software must invoke all registered NMI handlers
> + * as if NMI source feature is not enabled.

Add empty lines in the above paragraph.

> + *
> + * Vector 2 is reserved for matching IDT NMI vector where it may be hardcoded
> + * by some external devices.
> + *
> + * The NMI source vectors are sorted by descending priority with the exceptions
> + * of 0 and 2.
> + */
> +#define NMI_SOURCE_VEC_UNKNOWN		0
> +#define NMI_SOURCE_VEC_IPI_REBOOT	1	/* Crash reboot */
> +#define NMI_SOURCE_VEC_IDT_NMI		2	/* Match IDT NMI vector 2 */
> +#define NMI_SOURCE_VEC_IPI_SMP_STOP	3	/* Panic stop CPU */
> +#define NMI_SOURCE_VEC_IPI_BT		4	/* CPU backtrace */
> +#define NMI_SOURCE_VEC_PMI		5	/* PerfMon counters */
> +#define NMI_SOURCE_VEC_IPI_KGDB		6	/* KGDB */
> +#define NMI_SOURCE_VEC_IPI_MCE		7	/* MCE injection */
> +#define NMI_SOURCE_VEC_IPI_TEST		8	/* For remote and local IPIs */
> +#define NR_NMI_SOURCE_VECTORS		9

Fix alignment.

> +
>   #ifdef CONFIG_X86_LOCAL_APIC
>   #define FIRST_SYSTEM_VECTOR		POSTED_MSI_NOTIFICATION_VECTOR
>   #else


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 11/11] KVM: X86: Use common code for PV IPIs in linux guest
  2024-06-28 20:18 ` [PATCH v3 11/11] KVM: X86: Use common code for PV IPIs in linux guest Jacob Pan
@ 2024-06-29 18:38   ` Xin Li
  2024-07-01 16:38     ` Jacob Pan
  0 siblings, 1 reply; 33+ messages in thread
From: Xin Li @ 2024-06-29 18:38 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil, Zeng Guang

On 6/28/2024 1:18 PM, Jacob Pan wrote:
> Paravirtual apic hooks to enable PV IPIs for KVM if the "send IPI"

s/Paravirtual apic/Paravirtualize APIC/

> hypercall is available. Reuse common code for ICR preparation which
> covers NMI-source reporting if in effect.

I see a lot of "NMI source". Should we use "NMI-source" in all places?

> 
> Originally-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   arch/x86/kernel/kvm.c | 10 +---------
>   1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 263f8aed4e2c..a45d60aa0302 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -516,15 +516,7 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
>   
>   	local_irq_save(flags);
>   
> -	switch (vector) {
> -	default:
> -		icr = APIC_DM_FIXED | vector;
> -		break;
> -	case NMI_VECTOR:
> -		icr = APIC_DM_NMI;
> -		break;
> -	}
> -
> +	icr = __prepare_ICR(0, vector, 0);
>   	for_each_cpu(cpu, mask) {
>   		apic_id = per_cpu(x86_cpu_to_apicid, cpu);
>   		if (!ipi_bitmap) {


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit
  2024-06-28 20:18 ` [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
  2024-06-29  4:07   ` Xin Li
@ 2024-06-30 13:04   ` Zeng Guang
  2024-07-01 15:46     ` Jacob Pan
  1 sibling, 1 reply; 33+ messages in thread
From: Zeng Guang @ 2024-06-30 13:04 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil


On 6/29/2024 4:18 AM, Jacob Pan wrote:
> From: Zeng Guang <guang.zeng@intel.com>
>
> If the "NMI exiting" VM-execution control is 1, the value of the 16-bit NMI
> source vector is saved in the exit-qualification field in the VMCS when VM
> exits occur on CPUs that support NMI source.
>
> KVM that is aware of NMI-source reporting will push the bitmask of NMI source
> vectors as the exceptoin event data field on the stack for then entry of FRED
> exception. Subsequently, the host NMI exception handler is invoked which
> will process NMI source information in the event data. This operation is
> independent of vCPU FRED enabling status.
>
> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   arch/x86/entry/entry_64_fred.S |  2 +-
>   arch/x86/kvm/vmx/vmx.c         | 11 ++++++++---
>   2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
> index a02bc6f3d2e6..0d934a3fcaf8 100644
> --- a/arch/x86/entry/entry_64_fred.S
> +++ b/arch/x86/entry/entry_64_fred.S
> @@ -92,7 +92,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
>   	 * +--------+-----------------+
>   	 */
>   	push $0				/* Reserved, must be 0 */
> -	push $0				/* Event data, 0 for IRQ/NMI */
> +	push %rsi			/* Event data for IRQ/NMI */
>   	push %rdi			/* fred_ss handed in by the caller */
>   	push %rbp
>   	pushf
Move this part to previous patch as it changes the common FRED api and 
prepares for nmi handling in case of nmi
source enabled in this patch.
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4e7b36081b76..6719c598fa5f 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7331,10 +7331,15 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>   	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
>   	    is_nmi(vmx_get_intr_info(vcpu))) {
>   		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> -		if (cpu_feature_enabled(X86_FEATURE_FRED))
> -			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 0);
> -		else
> +		if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> +			unsigned long edata = 0;
> +
> +			if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
> +				edata = vmx_get_exit_qual(vcpu);
> +			fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, edata);
> +		} else {
>   			vmx_do_nmi_irqoff();
> +		}
>   		kvm_after_interrupt(vcpu);
>   	}
>   

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler
  2024-06-28 20:18 ` [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler Jacob Pan
  2024-06-29  3:39   ` Xin Li
@ 2024-07-01 14:31   ` Nikolay Borisov
  2024-07-01 15:36     ` Jacob Pan
  1 sibling, 1 reply; 33+ messages in thread
From: Nikolay Borisov @ 2024-07-01 14:31 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
	Andi Kleen, Mehta, Sohil



On 28.06.24 г. 23:18 ч., Jacob Pan wrote:
> With NMI source reporting enabled, NMI handler can prioritize the
> handling of sources reported explicitly. If the source is unknown, then
> resume the existing processing flow. i.e. invoke all NMI handlers.
> 
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 
> ---
> v3:
>     - Use a static flag to disable NMIs in case of HW failure
>     - Optimize the case when unknown NMIs are mixed with known NMIs(HPA)
> v2:
>     - Disable NMI source reporting once garbage data is given in FRED
> return stack. (HPA)
> ---
>   arch/x86/kernel/nmi.c | 73 +++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 70 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index 639a34e78bc9..c3a10af7f26b 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -149,23 +149,90 @@ static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsig
>   	return thishandled;
>   }
>   
> +static int nmi_handle_src(unsigned int type, struct pt_regs *regs, unsigned long *handled_mask)
> +{
> +	static bool nmi_source_disabled;
> +	bool has_unknown_src = false;
> +	unsigned long source_bitmask;
> +	struct nmiaction *a;
> +	int handled = 0;
> +	int vec = 1;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) ||
> +	    type != NMI_LOCAL || nmi_source_disabled)
> +		return 0;
> +
> +	source_bitmask = fred_event_data(regs);
> +	if (!source_bitmask) {
> +		pr_warn("NMI received without source information! Disable source reporting.\n");
> +		nmi_source_disabled = true;
> +		return 0;
> +	}
> +
> +	/*
> +	 * Per NMI source specification, there is no guarantee that a valid
> +	 * NMI vector is always delivered, even when the source specified
> +	 * one. It is software's responsibility to check all available NMI
> +	 * sources when bit 0 is set in the NMI source bitmap. i.e. we have
> +	 * to call every handler as if we have no NMI source.
> +	 * On the other hand, if we do get non-zero vectors, we know exactly
> +	 * what the sources are. So we only call the handlers with the bit set.
> +	 */
> +	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> +		pr_warn_ratelimited("NMI received with unknown source\n");
> +		has_unknown_src = true;
> +	}
> +
> +	rcu_read_lock();
> +	/* Bit 0 is for unknown NMI sources, skip it. */
> +	for_each_set_bit_from(vec, &source_bitmask, NR_NMI_SOURCE_VECTORS) {
> +		a = rcu_dereference(nmiaction_src_table[vec]);
> +		if (!a) {
> +			pr_warn_ratelimited("NMI received %d no handler", vec);
> +			continue;
> +		}
> +		handled += do_handle_nmi(a, regs, type);
> +		/*
> +		 * Needs polling if unknown source bit is set, handled_mask is
> +		 * used to tell the polling code which NMIs can be skipped.
> +		 */
> +		if (has_unknown_src)
> +			*handled_mask |= BIT(vec);
> +	}
> +	rcu_read_unlock();
> +
> +	return handled;
> +}
> +
>   static int nmi_handle(unsigned int type, struct pt_regs *regs)
>   {
>   	struct nmi_desc *desc = nmi_to_desc(type);
> +	unsigned long handled_mask = 0;
>   	struct nmiaction *a;
>   	int handled=0;
>   
> -	rcu_read_lock();
> +	/*
> +	 * Check if the NMI source handling is complete, otherwise polling is
> +	 * still required. handled_mask is non-zero if NMI source handling is
> +	 * partial due to unknown NMI sources.
> +	 */
> +	handled = nmi_handle_src(type, regs, &handled_mask);
> +	if (handled && !handled_mask)
> +		return handled;

How about renaming handled_mask to "partial_handled_mask" ? Because in 
addition to it being a mask it's also used as a boolean to signal that 
an unknown NMI source was encountered.

>   
> +	rcu_read_lock();
>   	/*
>   	 * NMIs are edge-triggered, which means if you have enough
>   	 * of them concurrently, you can lose some because only one
>   	 * can be latched at any given time.  Walk the whole list
>   	 * to handle those situations.
>   	 */
> -	list_for_each_entry_rcu(a, &desc->head, list)
> +	list_for_each_entry_rcu(a, &desc->head, list) {
> +		/* Skip NMIs handled earlier with source info */
> +		if (BIT(a->source_vec) & handled_mask)
> +			continue;
>   		handled += do_handle_nmi(a, regs, type);
> -
> +	}
>   	rcu_read_unlock();
>   
>   	/* return total number of NMI events handled */

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler
  2024-07-01 14:31   ` Nikolay Borisov
@ 2024-07-01 15:36     ` Jacob Pan
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-07-01 15:36 UTC (permalink / raw)
  To: Nikolay Borisov
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	jacob.jun.pan


On Mon, 1 Jul 2024 17:31:48 +0300, Nikolay Borisov <nik.borisov@suse.com>
wrote:

> On 28.06.24 г. 23:18 ч., Jacob Pan wrote:
> > With NMI source reporting enabled, NMI handler can prioritize the
> > handling of sources reported explicitly. If the source is unknown, then
> > resume the existing processing flow. i.e. invoke all NMI handlers.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > 
> > ---
> > v3:
> >     - Use a static flag to disable NMIs in case of HW failure
> >     - Optimize the case when unknown NMIs are mixed with known NMIs(HPA)
> > v2:
> >     - Disable NMI source reporting once garbage data is given in FRED
> > return stack. (HPA)
> > ---
> >   arch/x86/kernel/nmi.c | 73 +++++++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 70 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 639a34e78bc9..c3a10af7f26b 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -149,23 +149,90 @@ static inline int do_handle_nmi(struct nmiaction
> > *a, struct pt_regs *regs, unsig return thishandled;
> >   }
> >   
> > +static int nmi_handle_src(unsigned int type, struct pt_regs *regs,
> > unsigned long *handled_mask) +{
> > +	static bool nmi_source_disabled;
> > +	bool has_unknown_src = false;
> > +	unsigned long source_bitmask;
> > +	struct nmiaction *a;
> > +	int handled = 0;
> > +	int vec = 1;
> > +
> > +	if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) ||
> > +	    type != NMI_LOCAL || nmi_source_disabled)
> > +		return 0;
> > +
> > +	source_bitmask = fred_event_data(regs);
> > +	if (!source_bitmask) {
> > +		pr_warn("NMI received without source information!
> > Disable source reporting.\n");
> > +		nmi_source_disabled = true;
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * Per NMI source specification, there is no guarantee that a
> > valid
> > +	 * NMI vector is always delivered, even when the source
> > specified
> > +	 * one. It is software's responsibility to check all available
> > NMI
> > +	 * sources when bit 0 is set in the NMI source bitmap. i.e. we
> > have
> > +	 * to call every handler as if we have no NMI source.
> > +	 * On the other hand, if we do get non-zero vectors, we know
> > exactly
> > +	 * what the sources are. So we only call the handlers with the
> > bit set.
> > +	 */
> > +	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> > +		pr_warn_ratelimited("NMI received with unknown
> > source\n");
> > +		has_unknown_src = true;
> > +	}
> > +
> > +	rcu_read_lock();
> > +	/* Bit 0 is for unknown NMI sources, skip it. */
> > +	for_each_set_bit_from(vec, &source_bitmask,
> > NR_NMI_SOURCE_VECTORS) {
> > +		a = rcu_dereference(nmiaction_src_table[vec]);
> > +		if (!a) {
> > +			pr_warn_ratelimited("NMI received %d no
> > handler", vec);
> > +			continue;
> > +		}
> > +		handled += do_handle_nmi(a, regs, type);
> > +		/*
> > +		 * Needs polling if unknown source bit is set,
> > handled_mask is
> > +		 * used to tell the polling code which NMIs can be
> > skipped.
> > +		 */
> > +		if (has_unknown_src)
> > +			*handled_mask |= BIT(vec);
> > +	}
> > +	rcu_read_unlock();
> > +
> > +	return handled;
> > +}
> > +
> >   static int nmi_handle(unsigned int type, struct pt_regs *regs)
> >   {
> >   	struct nmi_desc *desc = nmi_to_desc(type);
> > +	unsigned long handled_mask = 0;
> >   	struct nmiaction *a;
> >   	int handled=0;
> >   
> > -	rcu_read_lock();
> > +	/*
> > +	 * Check if the NMI source handling is complete, otherwise
> > polling is
> > +	 * still required. handled_mask is non-zero if NMI source
> > handling is
> > +	 * partial due to unknown NMI sources.
> > +	 */
> > +	handled = nmi_handle_src(type, regs, &handled_mask);
> > +	if (handled && !handled_mask)
> > +		return handled;  
> 
> How about renaming handled_mask to "partial_handled_mask" ? Because in 
> addition to it being a mask it's also used as a boolean to signal that 
> an unknown NMI source was encountered.
yeah, that is better. will do.

> >   
> > +	rcu_read_lock();
> >   	/*
> >   	 * NMIs are edge-triggered, which means if you have enough
> >   	 * of them concurrently, you can lose some because only one
> >   	 * can be latched at any given time.  Walk the whole list
> >   	 * to handle those situations.
> >   	 */
> > -	list_for_each_entry_rcu(a, &desc->head, list)
> > +	list_for_each_entry_rcu(a, &desc->head, list) {
> > +		/* Skip NMIs handled earlier with source info */
> > +		if (BIT(a->source_vec) & handled_mask)
> > +			continue;
> >   		handled += do_handle_nmi(a, regs, type);
> > -
> > +	}
> >   	rcu_read_unlock();
> >   
> >   	/* return total number of NMI events handled */  


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 06/11] KVM: VMX: Expand FRED kvm entry with event data
  2024-06-29  4:01   ` Xin Li
@ 2024-07-01 15:39     ` Jacob Pan
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-07-01 15:39 UTC (permalink / raw)
  To: Xin Li
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	Zeng Guang, jacob.jun.pan


On Fri, 28 Jun 2024 21:01:28 -0700, Xin Li <xin@zytor.com> wrote:

> On 6/28/2024 1:18 PM, Jacob Pan wrote:
> > From: Zeng Guang <guang.zeng@intel.com>
> > 
> > For VM exits caused by events (NMI, #DB, and #PF) delivered by FRED, the
> > event data is saved in the exit-qualification field. (FRED spec.
> > 10.6.2)  
> 
> I don't like mentioning #DB/#PF here, they belong to the guest that was 
> running, and KVM handles them.
It is part of the spec.

> While NMIs belong to host, and the host NMI handler needs event data
> saved in VMCS in NMI induced VM exits.
> 
> Or you paint a full picture.
I will add your explanation that #DB/#PF belong to the guest so readers can
get a full picture of both usage and spec.
> 
> > Expand FRED KVM entry interface to include the event data obtained from
> > the exit qualification.
> > 
> > Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>  
> 


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit
  2024-06-29  4:07   ` Xin Li
@ 2024-07-01 15:45     ` Jacob Pan
  2024-07-01 17:03       ` Xin Li
  0 siblings, 1 reply; 33+ messages in thread
From: Jacob Pan @ 2024-07-01 15:45 UTC (permalink / raw)
  To: Xin Li
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	Zeng Guang, jacob.jun.pan


On Fri, 28 Jun 2024 21:07:04 -0700, Xin Li <xin@zytor.com> wrote:

> On 6/28/2024 1:18 PM, Jacob Pan wrote:
> > From: Zeng Guang <guang.zeng@intel.com>
> > 
> > If the "NMI exiting" VM-execution control is 1, the value of the 16-bit
> > NMI source vector is saved in the exit-qualification field in the VMCS
> > when VM exits occur on CPUs that support NMI source.
> > 
> > KVM that is aware of NMI-source reporting will push the bitmask of NMI
> > source vectors as the exceptoin event data field on the stack for then
> > entry of FRED exception. Subsequently, the host NMI exception handler
> > is invoked which will process NMI source information in the event data.
> > This operation is independent of vCPU FRED enabling status.
> > 
> > Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   arch/x86/entry/entry_64_fred.S |  2 +-
> >   arch/x86/kvm/vmx/vmx.c         | 11 ++++++++---
> >   2 files changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/entry/entry_64_fred.S
> > b/arch/x86/entry/entry_64_fred.S index a02bc6f3d2e6..0d934a3fcaf8 100644
> > --- a/arch/x86/entry/entry_64_fred.S
> > +++ b/arch/x86/entry/entry_64_fred.S
> > @@ -92,7 +92,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> >   	 * +--------+-----------------+
> >   	 */
> >   	push $0				/* Reserved, must be 0
> > */
> > -	push $0				/* Event data, 0 for
> > IRQ/NMI */
> > +	push %rsi			/* Event data for IRQ/NMI */
> >   	push %rdi			/* fred_ss handed in by the
> > caller */ push %rbp
> >   	pushf  
> 
> This belongs to the previous patch, it is part of the host changes.
right, will do.

> 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 4e7b36081b76..6719c598fa5f 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7331,10 +7331,15 @@ static noinstr void vmx_vcpu_enter_exit(struct
> > kvm_vcpu *vcpu, if ((u16)vmx->exit_reason.basic ==
> > EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) {
> >   		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> > -		if (cpu_feature_enabled(X86_FEATURE_FRED))
> > -			fred_entry_from_kvm(EVENT_TYPE_NMI,
> > NMI_VECTOR, 0);
> > -		else
> > +		if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> > +			unsigned long edata = 0;
> > +
> > +			if
> > (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
> > +				edata = vmx_get_exit_qual(vcpu);
> > +			fred_entry_from_kvm(EVENT_TYPE_NMI,
> > NMI_VECTOR, edata);  
> 
> Simply do fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 
> vmx_get_exit_qual(vcpu))?
I don't have strong preference but having a local variable improves
readability IMHO.

> > +		} else {
> >   			vmx_do_nmi_irqoff();
> > +		}
> >   		kvm_after_interrupt(vcpu);
> >   	}
> >     
> 


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit
  2024-06-30 13:04   ` Zeng Guang
@ 2024-07-01 15:46     ` Jacob Pan
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-07-01 15:46 UTC (permalink / raw)
  To: Zeng Guang
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	jacob.jun.pan


On Sun, 30 Jun 2024 21:04:18 +0800, Zeng Guang <guang.zeng@intel.com> wrote:

> On 6/29/2024 4:18 AM, Jacob Pan wrote:
> > From: Zeng Guang <guang.zeng@intel.com>
> >
> > If the "NMI exiting" VM-execution control is 1, the value of the 16-bit
> > NMI source vector is saved in the exit-qualification field in the VMCS
> > when VM exits occur on CPUs that support NMI source.
> >
> > KVM that is aware of NMI-source reporting will push the bitmask of NMI
> > source vectors as the exceptoin event data field on the stack for then
> > entry of FRED exception. Subsequently, the host NMI exception handler
> > is invoked which will process NMI source information in the event data.
> > This operation is independent of vCPU FRED enabling status.
> >
> > Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   arch/x86/entry/entry_64_fred.S |  2 +-
> >   arch/x86/kvm/vmx/vmx.c         | 11 ++++++++---
> >   2 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/entry/entry_64_fred.S
> > b/arch/x86/entry/entry_64_fred.S index a02bc6f3d2e6..0d934a3fcaf8 100644
> > --- a/arch/x86/entry/entry_64_fred.S
> > +++ b/arch/x86/entry/entry_64_fred.S
> > @@ -92,7 +92,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> >   	 * +--------+-----------------+
> >   	 */
> >   	push $0				/* Reserved, must be 0
> > */
> > -	push $0				/* Event data, 0 for
> > IRQ/NMI */
> > +	push %rsi			/* Event data for IRQ/NMI */
> >   	push %rdi			/* fred_ss handed in by the
> > caller */ push %rbp
> >   	pushf  
> Move this part to previous patch as it changes the common FRED api and 
> prepares for nmi handling in case of nmi
> source enabled in this patch.
You are right, will do.
>  [...]  


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 11/11] KVM: X86: Use common code for PV IPIs in linux guest
  2024-06-29 18:38   ` Xin Li
@ 2024-07-01 16:38     ` Jacob Pan
  2024-07-01 17:13       ` Xin Li
  0 siblings, 1 reply; 33+ messages in thread
From: Jacob Pan @ 2024-07-01 16:38 UTC (permalink / raw)
  To: Xin Li
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	Zeng Guang, jacob.jun.pan


On Sat, 29 Jun 2024 11:38:10 -0700, Xin Li <xin@zytor.com> wrote:

> On 6/28/2024 1:18 PM, Jacob Pan wrote:
> > Paravirtual apic hooks to enable PV IPIs for KVM if the "send IPI"  
> 
> s/Paravirtual apic/Paravirtualize APIC/

Paravirtual APIC makes sense to me. This is also the same language used in
previous commits.

How about:

"The paravirtual APIC hooks in KVM, some of which are used for sending PV
IPIs, can reuse common code for ICR preparation. This shared code also
encompasses NMI-source reporting when in effect."

> > hypercall is available. Reuse common code for ICR preparation which
> > covers NMI-source reporting if in effect.  
> 
> I see a lot of "NMI source". Should we use "NMI-source" in all places?
Not really, here NMI-source is a compound modifier before noun "reporting".

For other places, hyphen(-) is not needed if it is just a noun. e.g.
"partial due to unknown NMI sources"

I will go through the patchset to make sure they are consistent.

> > 
> > Originally-by: Zeng Guang <guang.zeng@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   arch/x86/kernel/kvm.c | 10 +---------
> >   1 file changed, 1 insertion(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > index 263f8aed4e2c..a45d60aa0302 100644
> > --- a/arch/x86/kernel/kvm.c
> > +++ b/arch/x86/kernel/kvm.c
> > @@ -516,15 +516,7 @@ static void __send_ipi_mask(const struct cpumask
> > *mask, int vector) 
> >   	local_irq_save(flags);
> >   
> > -	switch (vector) {
> > -	default:
> > -		icr = APIC_DM_FIXED | vector;
> > -		break;
> > -	case NMI_VECTOR:
> > -		icr = APIC_DM_NMI;
> > -		break;
> > -	}
> > -
> > +	icr = __prepare_ICR(0, vector, 0);
> >   	for_each_cpu(cpu, mask) {
> >   		apic_id = per_cpu(x86_cpu_to_apicid, cpu);
> >   		if (!ipi_bitmap) {  
> 


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit
  2024-07-01 15:45     ` Jacob Pan
@ 2024-07-01 17:03       ` Xin Li
  2024-07-01 18:00         ` Jacob Pan
  0 siblings, 1 reply; 33+ messages in thread
From: Xin Li @ 2024-07-01 17:03 UTC (permalink / raw)
  To: Jacob Pan
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	Zeng Guang

On 7/1/2024 8:45 AM, Jacob Pan wrote:
> 
> On Fri, 28 Jun 2024 21:07:04 -0700, Xin Li <xin@zytor.com> wrote:
> 
>> On 6/28/2024 1:18 PM, Jacob Pan wrote:
>>> From: Zeng Guang <guang.zeng@intel.com>
>>>
>>> If the "NMI exiting" VM-execution control is 1, the value of the 16-bit
>>> NMI source vector is saved in the exit-qualification field in the VMCS
>>> when VM exits occur on CPUs that support NMI source.
>>>
>>> KVM that is aware of NMI-source reporting will push the bitmask of NMI
>>> source vectors as the exceptoin event data field on the stack for then
>>> entry of FRED exception. Subsequently, the host NMI exception handler
>>> is invoked which will process NMI source information in the event data.
>>> This operation is independent of vCPU FRED enabling status.
>>>
>>> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
>>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>>> ---
>>>    arch/x86/entry/entry_64_fred.S |  2 +-
>>>    arch/x86/kvm/vmx/vmx.c         | 11 ++++++++---
>>>    2 files changed, 9 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/entry_64_fred.S
>>> b/arch/x86/entry/entry_64_fred.S index a02bc6f3d2e6..0d934a3fcaf8 100644
>>> --- a/arch/x86/entry/entry_64_fred.S
>>> +++ b/arch/x86/entry/entry_64_fred.S
>>> @@ -92,7 +92,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
>>>    	 * +--------+-----------------+
>>>    	 */
>>>    	push $0				/* Reserved, must be 0
>>> */
>>> -	push $0				/* Event data, 0 for
>>> IRQ/NMI */
>>> +	push %rsi			/* Event data for IRQ/NMI */
>>>    	push %rdi			/* fred_ss handed in by the
>>> caller */ push %rbp
>>>    	pushf
>>
>> This belongs to the previous patch, it is part of the host changes.
> right, will do.
> 
>>
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index 4e7b36081b76..6719c598fa5f 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -7331,10 +7331,15 @@ static noinstr void vmx_vcpu_enter_exit(struct
>>> kvm_vcpu *vcpu, if ((u16)vmx->exit_reason.basic ==
>>> EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) {
>>>    		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
>>> -		if (cpu_feature_enabled(X86_FEATURE_FRED))
>>> -			fred_entry_from_kvm(EVENT_TYPE_NMI,
>>> NMI_VECTOR, 0);
>>> -		else
>>> +		if (cpu_feature_enabled(X86_FEATURE_FRED)) {
>>> +			unsigned long edata = 0;
>>> +
>>> +			if
>>> (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
>>> +				edata = vmx_get_exit_qual(vcpu);
>>> +			fred_entry_from_kvm(EVENT_TYPE_NMI,
>>> NMI_VECTOR, edata);
>>
>> Simply do fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR,
>> vmx_get_exit_qual(vcpu))?
> I don't have strong preference but having a local variable improves
> readability IMHO.

My point was, do we actually need this check:
     (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE)?

I don't have a strong preference either.

> 
>>> +		} else {
>>>    			vmx_do_nmi_irqoff();
>>> +		}
>>>    		kvm_after_interrupt(vcpu);
>>>    	}
>>>      
>>
> 
> 
> Thanks,
> 
> Jacob
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 11/11] KVM: X86: Use common code for PV IPIs in linux guest
  2024-07-01 16:38     ` Jacob Pan
@ 2024-07-01 17:13       ` Xin Li
  0 siblings, 0 replies; 33+ messages in thread
From: Xin Li @ 2024-07-01 17:13 UTC (permalink / raw)
  To: Jacob Pan
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	Zeng Guang

On 7/1/2024 9:38 AM, Jacob Pan wrote:
> 
> On Sat, 29 Jun 2024 11:38:10 -0700, Xin Li <xin@zytor.com> wrote:
> 
>> On 6/28/2024 1:18 PM, Jacob Pan wrote:
>>> Paravirtual apic hooks to enable PV IPIs for KVM if the "send IPI"
>>
>> s/Paravirtual apic/Paravirtualize APIC/
> 
> Paravirtual APIC makes sense to me. This is also the same language used in
> previous commits.

"Paravirtual apic hooks to enable..."

It needs to start with a verb, I can't read it as adj.

> 
> How about:
> 
> "The paravirtual APIC hooks in KVM, some of which are used for sending PV
> IPIs, can reuse common code for ICR preparation. This shared code also
> encompasses NMI-source reporting when in effect."

LGTM.

> 
>>> hypercall is available. Reuse common code for ICR preparation which
>>> covers NMI-source reporting if in effect.
>>
>> I see a lot of "NMI source". Should we use "NMI-source" in all places?
> Not really, here NMI-source is a compound modifier before noun "reporting".
> 
> For other places, hyphen(-) is not needed if it is just a noun. e.g.
> "partial due to unknown NMI sources"
> 
> I will go through the patchset to make sure they are consistent.

Right, make them consistent.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 02/11] x86/irq: Define NMI source vectors
  2024-06-29 18:32   ` Xin Li
@ 2024-07-01 17:16     ` Jacob Pan
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-07-01 17:16 UTC (permalink / raw)
  To: Xin Li
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	jacob.jun.pan


On Sat, 29 Jun 2024 11:32:10 -0700, Xin Li <xin@zytor.com> wrote:

> On 6/28/2024 1:18 PM, Jacob Pan wrote:
> > When NMI-source reporting is supported, each logical processor maintains
> > a 16-bit NMI-source bitmap. It is up to the system software to assign
> > NMI sources for their matching vector (bit position) in the bitmap.
> > 
> > Notice that NMI source vector is in a different namespace than the IDT
> > vectors. Though they share the same programming interface/field in the
> > NMI originator.
> > 
> > This initial allocation of the NMI sources are limited to local NMIs in
> > that there is no external device NMI usage yet.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   arch/x86/include/asm/irq_vectors.h | 28 ++++++++++++++++++++++++++++
> >   1 file changed, 28 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/irq_vectors.h
> > b/arch/x86/include/asm/irq_vectors.h index 13aea8fc3d45..e4cd33bc4fef
> > 100644 --- a/arch/x86/include/asm/irq_vectors.h
> > +++ b/arch/x86/include/asm/irq_vectors.h
> > @@ -105,6 +105,34 @@
> >   
> >   #define NR_VECTORS			 256
> >   
> > +/*
> > + * The NMI senders specify the NMI source vector as an 8bit integer in
> > their  
> 
> s/The NMI senders/NMI senders/
will do

> > + * vector field with NMI delivery mode. A local APIC receiving an NMI
> > will
> > + * set the corresponding bit in a 16bit bitmask, which is accumulated
> > until
> > + * the NMI is delivered.
> > + * When a sender didn't specify an NMI source vector the source vector
> > will
> > + * be 0, which will result in bit 0 of the bitmask being set. For out
> > of
> > + * bounds vectors >= 16 bit 0 will also be set.
> > + * When bit 0 is set, system software must invoke all registered NMI
> > handlers
> > + * as if NMI source feature is not enabled.  
> 
> Add empty lines in the above paragraph.
sounds good.

> > + *
> > + * Vector 2 is reserved for matching IDT NMI vector where it may be
> > hardcoded
> > + * by some external devices.
> > + *
> > + * The NMI source vectors are sorted by descending priority with the
> > exceptions
> > + * of 0 and 2.
> > + */
> > +#define NMI_SOURCE_VEC_UNKNOWN		0
> > +#define NMI_SOURCE_VEC_IPI_REBOOT	1	/* Crash reboot */
> > +#define NMI_SOURCE_VEC_IDT_NMI		2	/* Match IDT
> > NMI vector 2 */ +#define NMI_SOURCE_VEC_IPI_SMP_STOP	3	/*
> > Panic stop CPU */ +#define NMI_SOURCE_VEC_IPI_BT
> > 4	/* CPU backtrace */ +#define NMI_SOURCE_VEC_PMI
> > 5	/* PerfMon counters */ +#define NMI_SOURCE_VEC_IPI_KGDB
> > 	6	/* KGDB */ +#define NMI_SOURCE_VEC_IPI_MCE
> > 	7	/* MCE injection */ +#define
> > NMI_SOURCE_VEC_IPI_TEST		8	/* For remote and local
> > IPIs */ +#define NR_NMI_SOURCE_VECTORS		9  
> 
> Fix alignment.
Do you see alignment problem after the patch is applied? Alignment looks
good with vi, emacs, etc. passed checkpatch --strict.

> > +
> >   #ifdef CONFIG_X86_LOCAL_APIC
> >   #define FIRST_SYSTEM_VECTOR
> > POSTED_MSI_NOTIFICATION_VECTOR #else  
> 


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit
  2024-07-01 17:03       ` Xin Li
@ 2024-07-01 18:00         ` Jacob Pan
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-07-01 18:00 UTC (permalink / raw)
  To: Xin Li
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	Zeng Guang, jacob.jun.pan


On Mon, 1 Jul 2024 10:03:58 -0700, Xin Li <xin@zytor.com> wrote:

> On 7/1/2024 8:45 AM, Jacob Pan wrote:
> > 
> > On Fri, 28 Jun 2024 21:07:04 -0700, Xin Li <xin@zytor.com> wrote:
> >   
> >> On 6/28/2024 1:18 PM, Jacob Pan wrote:  
> >>> From: Zeng Guang <guang.zeng@intel.com>
> >>>
> >>> If the "NMI exiting" VM-execution control is 1, the value of the
> >>> 16-bit NMI source vector is saved in the exit-qualification field in
> >>> the VMCS when VM exits occur on CPUs that support NMI source.
> >>>
> >>> KVM that is aware of NMI-source reporting will push the bitmask of NMI
> >>> source vectors as the exceptoin event data field on the stack for then
> >>> entry of FRED exception. Subsequently, the host NMI exception handler
> >>> is invoked which will process NMI source information in the event
> >>> data. This operation is independent of vCPU FRED enabling status.
> >>>
> >>> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> >>> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> >>> ---
> >>>    arch/x86/entry/entry_64_fred.S |  2 +-
> >>>    arch/x86/kvm/vmx/vmx.c         | 11 ++++++++---
> >>>    2 files changed, 9 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/x86/entry/entry_64_fred.S
> >>> b/arch/x86/entry/entry_64_fred.S index a02bc6f3d2e6..0d934a3fcaf8
> >>> 100644 --- a/arch/x86/entry/entry_64_fred.S
> >>> +++ b/arch/x86/entry/entry_64_fred.S
> >>> @@ -92,7 +92,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> >>>    	 * +--------+-----------------+
> >>>    	 */
> >>>    	push $0				/* Reserved, must
> >>> be 0 */
> >>> -	push $0				/* Event data, 0 for
> >>> IRQ/NMI */
> >>> +	push %rsi			/* Event data for IRQ/NMI */
> >>>    	push %rdi			/* fred_ss handed in by
> >>> the caller */ push %rbp
> >>>    	pushf  
> >>
> >> This belongs to the previous patch, it is part of the host changes.  
> > right, will do.
> >   
> >>  
> >>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >>> index 4e7b36081b76..6719c598fa5f 100644
> >>> --- a/arch/x86/kvm/vmx/vmx.c
> >>> +++ b/arch/x86/kvm/vmx/vmx.c
> >>> @@ -7331,10 +7331,15 @@ static noinstr void vmx_vcpu_enter_exit(struct
> >>> kvm_vcpu *vcpu, if ((u16)vmx->exit_reason.basic ==
> >>> EXIT_REASON_EXCEPTION_NMI && is_nmi(vmx_get_intr_info(vcpu))) {
> >>>    		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> >>> -		if (cpu_feature_enabled(X86_FEATURE_FRED))
> >>> -			fred_entry_from_kvm(EVENT_TYPE_NMI,
> >>> NMI_VECTOR, 0);
> >>> -		else
> >>> +		if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> >>> +			unsigned long edata = 0;
> >>> +
> >>> +			if
> >>> (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
> >>> +				edata = vmx_get_exit_qual(vcpu);
> >>> +			fred_entry_from_kvm(EVENT_TYPE_NMI,
> >>> NMI_VECTOR, edata);  
> >>
> >> Simply do fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR,
> >> vmx_get_exit_qual(vcpu))?  
> > I don't have strong preference but having a local variable improves
> > readability IMHO.  
> 
> My point was, do we actually need this check:
>      (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE)?
I also pondered about the value of this CPUID bit if it is consistently
linked with the FRED bit. But since the architecture provided this
additional enumeration, as noted in Chapter 9 of the FRED specification,
which states: 

"Processors that support FRED *may* also support a related feature called
NMI-source reporting".

The use of "may" suggests that there are scenarios where FRED might exist
without NMI-source reporting. To ensure future compatibility, I believe
it is still valid to maintain this check.


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 04/11] x86/irq: Factor out common NMI handling code
  2024-06-29  0:31   ` Xin Li
@ 2024-07-03 23:10     ` Jacob Pan
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-07-03 23:10 UTC (permalink / raw)
  To: Xin Li
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	jacob.jun.pan


On Fri, 28 Jun 2024 17:31:50 -0700, Xin Li <xin@zytor.com> wrote:

> On 6/28/2024 1:18 PM, Jacob Pan wrote:
> > In preparation for handling NMIs with explicit source reporting, factor
> > out common code for reuse.
> >   
> 
> My read is that this patch has no functional change, right?
> 
> If yes, please add "No functional change intended."
will do.

> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---
> >   arch/x86/kernel/nmi.c | 28 ++++++++++++++++------------
> >   1 file changed, 16 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 1ebe93edba7a..639a34e78bc9 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -135,6 +135,20 @@ static void nmi_check_duration(struct nmiaction
> > *action, u64 duration) action->handler, duration, decimal_msecs);
> >   }
> >   
> > +static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs
> > *regs, unsigned int type) +{
> > +	int thishandled;
> > +	u64 delta;
> > +
> > +	delta = sched_clock();
> > +	thishandled = a->handler(type, regs);
> > +	delta = sched_clock() - delta;
> > +	trace_nmi_handler(a->handler, (int)delta, thishandled);
> > +	nmi_check_duration(a, delta);
> > +
> > +	return thishandled;
> > +}
> > +
> >   static int nmi_handle(unsigned int type, struct pt_regs *regs)
> >   {
> >   	struct nmi_desc *desc = nmi_to_desc(type);
> > @@ -149,18 +163,8 @@ static int nmi_handle(unsigned int type, struct
> > pt_regs *regs)
> >   	 * can be latched at any given time.  Walk the whole list
> >   	 * to handle those situations.
> >   	 */
> > -	list_for_each_entry_rcu(a, &desc->head, list) {
> > -		int thishandled;
> > -		u64 delta;
> > -
> > -		delta = sched_clock();
> > -		thishandled = a->handler(type, regs);
> > -		handled += thishandled;
> > -		delta = sched_clock() - delta;
> > -		trace_nmi_handler(a->handler, (int)delta, thishandled);
> > -
> > -		nmi_check_duration(a, delta);
> > -	}
> > +	list_for_each_entry_rcu(a, &desc->head, list)
> > +		handled += do_handle_nmi(a, regs, type);
> >   
> >   	rcu_read_unlock();
> >     
> 
> As this is a preparation patch, better move it earlier before any actual 
> NMI source changes, maybe the first patch of this series.
This preparatory patch is utilized immediately by the subsequent patch,
enhancing the narrative flow, in my opinion.

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 08/11] perf/x86: Enable NMI source reporting for perfmon
  2024-06-28 20:18 ` [PATCH v3 08/11] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
@ 2024-07-04 14:44   ` Liang, Kan
  2024-07-06 22:49     ` Jacob Pan
  0 siblings, 1 reply; 33+ messages in thread
From: Liang, Kan @ 2024-07-04 14:44 UTC (permalink / raw)
  To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra
  Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, Andi Kleen,
	Mehta, Sohil, Zeng Guang



On 2024-06-28 4:18 p.m., Jacob Pan wrote:
> Program the designated NMI source vector into the performance monitoring
> interrupt (PMI) of the local vector table. PMI handler will be directly
> invoked when its NMI is generated. This avoids the latency of calling all
> NMI handlers blindly.
> 
> Co-developed-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> 
> ---
> v3: Program NMI source vector in PVTPC unconditionally (HPA)
> v2: Fix a compile error apic_perfmon_ctr is undefined in i386 config
> ---
>  arch/x86/events/core.c       | 6 ++++--
>  arch/x86/events/intel/core.c | 6 +++---
>  arch/x86/include/asm/apic.h  | 1 +
>  3 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 1ef2201e48ac..be75bdcdd400 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -46,6 +46,7 @@
>  
>  struct x86_pmu x86_pmu __read_mostly;
>  static struct pmu pmu;
> +u32 apic_perfmon_ctr = APIC_DM_NMI;>
>  DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
>  	.enabled = 1,
> @@ -1680,7 +1681,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>  	 * This generic handler doesn't seem to have any issues where the
>  	 * unmasking occurs so it was left at the top.
>  	 */
> -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> +	apic_write(APIC_LVTPC, apic_perfmon_ctr);
>  
>  	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
>  		if (!test_bit(idx, cpuc->active_mask))
> @@ -1723,7 +1724,8 @@ void perf_events_lapic_init(void)
>  	/*
>  	 * Always use NMI for PMU
>  	 */
> -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> +	apic_perfmon_ctr |= NMI_SOURCE_VEC_PMI;
> +	apic_write(APIC_LVTPC, apic_perfmon_ctr);


It looks like the same value is written unconditionally.

Why not use a macro, e.g., APIC_DM_NMI_WITH_SOURCE, to replace the variable?

Thanks,
Kan

>  }
>  
>  static int
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 38c1b1f1deaa..b4a70457c678 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3093,7 +3093,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
>  	 * NMI handler.
>  	 */
>  	if (!late_ack && !mid_ack)
> -		apic_write(APIC_LVTPC, APIC_DM_NMI);
> +		apic_write(APIC_LVTPC, apic_perfmon_ctr);
>  	intel_bts_disable_local();
>  	cpuc->enabled = 0;
>  	__intel_pmu_disable_all(true);
> @@ -3130,7 +3130,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
>  
>  done:
>  	if (mid_ack)
> -		apic_write(APIC_LVTPC, APIC_DM_NMI);
> +		apic_write(APIC_LVTPC, apic_perfmon_ctr);
>  	/* Only restore PMU state when it's active. See x86_pmu_disable(). */
>  	cpuc->enabled = pmu_enabled;
>  	if (pmu_enabled)
> @@ -3143,7 +3143,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
>  	 * Haswell CPUs.
>  	 */
>  	if (late_ack)
> -		apic_write(APIC_LVTPC, APIC_DM_NMI);
> +		apic_write(APIC_LVTPC, apic_perfmon_ctr);
>  	return handled;
>  }
>  
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index 9327eb00e96d..bcf8d17240c8 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -49,6 +49,7 @@ static inline void x86_32_probe_apic(void) { }
>  #endif
>  
>  extern u32 cpuid_to_apicid[];
> +extern u32 apic_perfmon_ctr;
>  
>  #define CPU_ACPIID_INVALID	U32_MAX
>  

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 08/11] perf/x86: Enable NMI source reporting for perfmon
  2024-07-04 14:44   ` Liang, Kan
@ 2024-07-06 22:49     ` Jacob Pan
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-07-06 22:49 UTC (permalink / raw)
  To: Liang, Kan
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, Andi Kleen, Mehta, Sohil, Zeng Guang,
	jacob.jun.pan


On Thu, 4 Jul 2024 10:44:23 -0400, "Liang, Kan" <kan.liang@linux.intel.com>
wrote:

> On 2024-06-28 4:18 p.m., Jacob Pan wrote:
> > Program the designated NMI source vector into the performance monitoring
> > interrupt (PMI) of the local vector table. PMI handler will be directly
> > invoked when its NMI is generated. This avoids the latency of calling
> > all NMI handlers blindly.
> > 
> > Co-developed-by: Zeng Guang <guang.zeng@intel.com>
> > Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > 
> > ---
> > v3: Program NMI source vector in PVTPC unconditionally (HPA)
> > v2: Fix a compile error apic_perfmon_ctr is undefined in i386 config
> > ---
> >  arch/x86/events/core.c       | 6 ++++--
> >  arch/x86/events/intel/core.c | 6 +++---
> >  arch/x86/include/asm/apic.h  | 1 +
> >  3 files changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > index 1ef2201e48ac..be75bdcdd400 100644
> > --- a/arch/x86/events/core.c
> > +++ b/arch/x86/events/core.c
> > @@ -46,6 +46,7 @@
> >  
> >  struct x86_pmu x86_pmu __read_mostly;
> >  static struct pmu pmu;
> > +u32 apic_perfmon_ctr = APIC_DM_NMI;>
> >  DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
> >  	.enabled = 1,
> > @@ -1680,7 +1681,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
> >  	 * This generic handler doesn't seem to have any issues where
> > the
> >  	 * unmasking occurs so it was left at the top.
> >  	 */
> > -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> > +	apic_write(APIC_LVTPC, apic_perfmon_ctr);
> >  
> >  	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
> >  		if (!test_bit(idx, cpuc->active_mask))
> > @@ -1723,7 +1724,8 @@ void perf_events_lapic_init(void)
> >  	/*
> >  	 * Always use NMI for PMU
> >  	 */
> > -	apic_write(APIC_LVTPC, APIC_DM_NMI);
> > +	apic_perfmon_ctr |= NMI_SOURCE_VEC_PMI;
> > +	apic_write(APIC_LVTPC, apic_perfmon_ctr);  
> 
> 
> It looks like the same value is written unconditionally.
> 
> Why not use a macro, e.g., APIC_DM_NMI_WITH_SOURCE, to replace the
> variable?
> 
yes, it is unconditional now. I will use the following:

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -30,6 +30,8 @@
 #define APIC_EXTNMI_ALL                1
 #define APIC_EXTNMI_NONE       2

+#define APIC_PERF_NMI          (APIC_DM_NMI | NMI_SOURCE_VEC_PMI)


Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler
  2024-06-29  3:39   ` Xin Li
@ 2024-07-07  3:48     ` Jacob Pan
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Pan @ 2024-07-07  3:48 UTC (permalink / raw)
  To: Xin Li
  Cc: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
	linux-perf-users, Peter Zijlstra, Paolo Bonzini, Tony Luck,
	Andy Lutomirski, acme, kan.liang, Andi Kleen, Mehta, Sohil,
	jacob.jun.pan


On Fri, 28 Jun 2024 20:39:24 -0700, Xin Li <xin@zytor.com> wrote:

> On 6/28/2024 1:18 PM, Jacob Pan wrote:
> > With NMI source reporting enabled, NMI handler can prioritize the
> > handling of sources reported explicitly. If the source is unknown, then
> > resume the existing processing flow. i.e. invoke all NMI handlers.
> > 
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>  
> 
> The code looks good to me, however please improve coding styles and
> comments, see below.
> 
> > 
> > ---
> > v3:
> >     - Use a static flag to disable NMIs in case of HW failure
> >     - Optimize the case when unknown NMIs are mixed with known NMIs(HPA)
> > v2:
> >     - Disable NMI source reporting once garbage data is given in FRED
> > return stack. (HPA)
> > ---
> >   arch/x86/kernel/nmi.c | 73 +++++++++++++++++++++++++++++++++++++++++--
> >   1 file changed, 70 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index 639a34e78bc9..c3a10af7f26b 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -149,23 +149,90 @@ static inline int do_handle_nmi(struct nmiaction
> > *a, struct pt_regs *regs, unsig return thishandled;
> >   }
> >   
> > +static int nmi_handle_src(unsigned int type, struct pt_regs *regs,
> > unsigned long *handled_mask) +{
> > +	static bool nmi_source_disabled;
> > +	bool has_unknown_src = false;
> > +	unsigned long source_bitmask;
> > +	struct nmiaction *a;
> > +	int handled = 0;
> > +	int vec = 1;
> > +
> > +	if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) ||
> > +	    type != NMI_LOCAL || nmi_source_disabled)  
> 
> Harder to read, no need to break into 2 lines.
ok.
 
> > +		return 0;
> > +
> > +	source_bitmask = fred_event_data(regs);
> > +	if (!source_bitmask) {  
> 
> unlikely()?
yes.

> 
> > +		pr_warn("NMI received without source information!
> > Disable source reporting.\n");  
> 
> It sounds you're disabling some hardware functionality. Better to say,
> maybe:
> 
> Buggy hardware? Disable NMI source handling.
> 
yes, this is more clear.

> > +		nmi_source_disabled = true;
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * Per NMI source specification, there is no guarantee that a
> > valid
> > +	 * NMI vector is always delivered, even when the source
> > specified
> > +	 * one. It is software's responsibility to check all available
> > NMI
> > +	 * sources when bit 0 is set in the NMI source bitmap. i.e. we
> > have  
> 
> s/i.e./I.e.,/
will fix

> 
> > +	 * to call every handler as if we have no NMI source.  
> 
> This comment is misleading, because you do skip NMI handlers with source
> bits set in polling.
At a high level, it is still accurate to say that every handler is invoked,
whether by the NMI-source handling code or subsequently in the polling code.

How about this:
	 * in the NMI-source reporting bitmap. I.e. we have to call every
	 * handler as if there is no NMI-source reporting feature enabled.
	 *
	 * In this case, handlers for the known NMI sources will be called
	 * first, followed by the remaining handlers, which are called
	 * during the subsequent polling code.


> 
> And add an empty line to ease review.
will do

> 
> > +	 * On the other hand, if we do get non-zero vectors, we know
> > exactly
> > +	 * what the sources are. So we only call the handlers with the
> > bit set.
> > +	 */
> > +	if (source_bitmask & BIT(NMI_SOURCE_VEC_UNKNOWN)) {
> > +		pr_warn_ratelimited("NMI received with unknown
> > source\n");  
> 
> s/source/sources/
> 
> > +		has_unknown_src = true;
> > +	}
> > +
> > +	rcu_read_lock();  
> 
> Add an empty line.
> 
> > +	/* Bit 0 is for unknown NMI sources, skip it. */  
> 
> Put "vec = 1 " close to this comment.
yes

> 
> > +	for_each_set_bit_from(vec, &source_bitmask,
> > NR_NMI_SOURCE_VECTORS) {
> > +		a = rcu_dereference(nmiaction_src_table[vec]);
> > +		if (!a) {
> > +			pr_warn_ratelimited("NMI received %d no
> > handler", vec);  
> 
> Use a better log message.
pr_warn_ratelimited("NMI-source vector %d has no handler!", vec);

> 
> > +			continue;
> > +		}  
> 
> Empty line again.
will add

> 
> > +		handled += do_handle_nmi(a, regs, type);  
> 
> Ditto.
yes

> 
> > +		/*
> > +		 * Needs polling if unknown source bit is set,
> > handled_mask is  
> 
>                                     ^the
got that.

> 
> > +		 * used to tell the polling code which NMIs can be
> > skipped.
> > +		 */
> > +		if (has_unknown_src)
> > +			*handled_mask |= BIT(vec);
> > +	}  
> 
> empty line please.
will add

> 
> > +	rcu_read_unlock();
> > +
> > +	return handled;
> > +}
> > +
> >   static int nmi_handle(unsigned int type, struct pt_regs *regs)
> >   {
> >   	struct nmi_desc *desc = nmi_to_desc(type);
> > +	unsigned long handled_mask = 0;
> >   	struct nmiaction *a;
> >   	int handled=0;
> >   
> > -	rcu_read_lock();
> > +	/*
> > +	 * Check if the NMI source handling is complete, otherwise
> > polling is
> > +	 * still required. handled_mask is non-zero if NMI source
> > handling is
> > +	 * partial due to unknown NMI sources.
> > +	 */
> > +	handled = nmi_handle_src(type, regs, &handled_mask);
> > +	if (handled && !handled_mask)
> > +		return handled;
> >   
> > +	rcu_read_lock();  
> 
> keep original empty lines around it.
ok

> 
> >   	/*
> >   	 * NMIs are edge-triggered, which means if you have enough
> >   	 * of them concurrently, you can lose some because only one
> >   	 * can be latched at any given time.  Walk the whole list
> >   	 * to handle those situations.
> >   	 */
> > -	list_for_each_entry_rcu(a, &desc->head, list)
> > +	list_for_each_entry_rcu(a, &desc->head, list) {
> > +		/* Skip NMIs handled earlier with source info */
> > +		if (BIT(a->source_vec) & handled_mask)
> > +			continue;
> >   		handled += do_handle_nmi(a, regs, type);
> > -
> > +	}
> >   	rcu_read_unlock();  
> 
> keep original empty lines around it.
ok

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-07-07  3:43 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28 20:18 [PATCH v3 00/11] Add support for NMI source reporting Jacob Pan
2024-06-28 20:18 ` [PATCH v3 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
2024-06-28 20:18 ` [PATCH v3 02/11] x86/irq: Define NMI source vectors Jacob Pan
2024-06-29 18:32   ` Xin Li
2024-07-01 17:16     ` Jacob Pan
2024-06-28 20:18 ` [PATCH v3 03/11] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
2024-06-28 20:18 ` [PATCH v3 04/11] x86/irq: Factor out common NMI handling code Jacob Pan
2024-06-29  0:31   ` Xin Li
2024-07-03 23:10     ` Jacob Pan
2024-06-28 20:18 ` [PATCH v3 05/11] x86/irq: Process nmi sources in NMI handler Jacob Pan
2024-06-29  3:39   ` Xin Li
2024-07-07  3:48     ` Jacob Pan
2024-07-01 14:31   ` Nikolay Borisov
2024-07-01 15:36     ` Jacob Pan
2024-06-28 20:18 ` [PATCH v3 06/11] KVM: VMX: Expand FRED kvm entry with event data Jacob Pan
2024-06-29  4:01   ` Xin Li
2024-07-01 15:39     ` Jacob Pan
2024-06-28 20:18 ` [PATCH v3 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
2024-06-29  4:07   ` Xin Li
2024-07-01 15:45     ` Jacob Pan
2024-07-01 17:03       ` Xin Li
2024-07-01 18:00         ` Jacob Pan
2024-06-30 13:04   ` Zeng Guang
2024-07-01 15:46     ` Jacob Pan
2024-06-28 20:18 ` [PATCH v3 08/11] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
2024-07-04 14:44   ` Liang, Kan
2024-07-06 22:49     ` Jacob Pan
2024-06-28 20:18 ` [PATCH v3 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
2024-06-28 20:18 ` [PATCH v3 10/11] x86/irq: Move __prepare_ICR to x86 common header Jacob Pan
2024-06-28 20:18 ` [PATCH v3 11/11] KVM: X86: Use common code for PV IPIs in linux guest Jacob Pan
2024-06-29 18:38   ` Xin Li
2024-07-01 16:38     ` Jacob Pan
2024-07-01 17:13       ` Xin Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).