* [PATCH v4 00/11] Add support for NMI-source reporting
@ 2024-07-09 14:38 Jacob Pan
2024-07-09 14:38 ` [PATCH v4 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
` (10 more replies)
0 siblings, 11 replies; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:38 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan
Hi Thomas and all,
Non-Maskable Interrupts (NMIs) are routed to the local Advanced Programmable
Interrupt Controller (APIC) using vector #2. Before the advent of the
Flexible Return and Event Delivery (FRED)[1], the vector information set by
the NMI initiator was disregarded or lost within the hardware, compelling
system software to poll every registered NMI handler to pinpoint the source
of the NMI[2]. This approach led to several issues:
1. Inefficiency due to the CPU's time spent polling all handlers.
2. Increased latency from the additional time taken to poll all handlers.
3. The occurrence of unnecessary NMIs if they are triggered shortly
after being processed by a different source.
To tackle these challenges, Intel introduced NMI source reporting as a part
of the FRED specification (detailed in Chapter 9). This CPU feature ensures
that while all NMI sources are still aggregated into NMI vector (#2) for
delivery, the source of the NMI is now conveyed through FRED event data
(a 16-bit bitmap on the stack). This allows for the selective dispatch
of the NMI source handler based on the bitmap, eliminating the need to
invoke all NMI source handlers indiscriminately.
In line with the hardware architecture, various interrupt sources can
generate NMIs by encoding an NMI delivery mode. However, this patchset
activates only the local NMI sources that are currently utilized by the
Linux kernel, which includes:
1. Performance monitoring.
2. Inter-Processor Interrupts (IPIs) for functions like CPU backtrace,
machine check, Kernel GNU Debugger (KGDB), reboot, panic stop, and
self-test.
Other NMI sources will continue to be handled as previously when the NMI
source is not utilized or remains unidentified.
Next steps:
1. KVM support
2. Optimization to reuse IDT NMI vector 2 as NMI source for "known" source.
Link:https://lore.kernel.org/lkml/746fecd5-4c79-42f9-919e-912ec415e73f@zytor.com/
[1] https://www.intel.com/content/www/us/en/content-details/779982/flexible-return-and-event-delivery-fred-specification.html
[2] https://lore.kernel.org/lkml/171011362209.2468526.15187874627966416701.tglx@xen13/
Thanks,
Jacob
---
V4:
- Mostly coding style, comments, and naming changes (Li Xin, Nikolay, Kan)
V3:
- Added KVM VMX patches to handle NMI exits (Sean)
- Clean up in KVM for code reuse in PV IPI (patch 10 and 11)
- Misc fixes based on reviews from HPA, Li Xin, and Sohil
Change logs are in individual patches.
Thanks,
Jacob
Jacob Pan (9):
x86/irq: Add enumeration of NMI source reporting CPU feature
x86/irq: Define NMI source vectors
x86/irq: Extend NMI handler registration interface to include source
x86/irq: Factor out common NMI handling code
x86/irq: Process nmi sources in NMI handler
perf/x86: Enable NMI source reporting for perfmon
x86/irq: Enable NMI source on IPIs delivered as NMI
x86/irq: Move __prepare_ICR to x86 common header
KVM: X86: Use common code for PV IPIs in linux guest
Zeng Guang (2):
KVM: VMX: Expand FRED kvm entry with event data
KVM: VMX: Handle NMI Source report in VM exit
arch/x86/entry/entry_64_fred.S | 2 +-
arch/x86/events/amd/ibs.c | 2 +-
arch/x86/events/core.c | 7 +-
arch/x86/events/intel/core.c | 6 +-
arch/x86/include/asm/apic.h | 23 ++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/fred.h | 8 +-
arch/x86/include/asm/irq_vectors.h | 40 +++++++++
arch/x86/include/asm/nmi.h | 4 +-
arch/x86/kernel/apic/hw_nmi.c | 5 +-
arch/x86/kernel/apic/ipi.c | 4 +-
arch/x86/kernel/apic/local.h | 16 ----
arch/x86/kernel/cpu/mce/inject.c | 4 +-
arch/x86/kernel/cpu/mshyperv.c | 2 +-
arch/x86/kernel/kgdb.c | 6 +-
arch/x86/kernel/kvm.c | 10 +--
arch/x86/kernel/nmi.c | 127 ++++++++++++++++++++++++++---
arch/x86/kernel/nmi_selftest.c | 7 +-
arch/x86/kernel/reboot.c | 4 +-
arch/x86/kernel/smp.c | 4 +-
arch/x86/kernel/traps.c | 4 +-
arch/x86/kvm/vmx/vmx.c | 13 ++-
arch/x86/platform/uv/uv_nmi.c | 4 +-
drivers/acpi/apei/ghes.c | 2 +-
drivers/char/ipmi/ipmi_watchdog.c | 2 +-
drivers/edac/igen6_edac.c | 2 +-
drivers/watchdog/hpwdt.c | 6 +-
27 files changed, 237 insertions(+), 78 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
@ 2024-07-09 14:38 ` Jacob Pan
2024-07-09 14:38 ` [PATCH v4 02/11] x86/irq: Define NMI source vectors Jacob Pan
` (9 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:38 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan
The lack of a mechanism to pinpoint the origins of Non-Maskable Interrupts
(NMIs) necessitates that the NMI vector 2 handler consults each NMI source
handler individually. This approach leads to inefficiencies, delays, and
the occurrence of unnecessary NMIs, thereby also constraining the potential
applications of NMIs.
A new CPU feature, known as NMI source reporting, has been introduced as
part of the Flexible Return and Event Delivery (FRED) spec. This feature
enables the NMI vector 2 handler to directly obtain information about the
NMI source from the FRED event data.
The functionality of NMI source reporting is tied to the FRED. Although it
is enumerated by a unique CPUID feature bit, it cannot be turned off
independently once FRED is activated.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v3: Removed CONFIG_X86_NMI_SOURCE (Li Xin, HPA, Sohil)
v2: Removed NMI source from static CPU ID dependency table (HPA)
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/traps.c | 4 +++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 3c7434329661..ec78d361e685 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -327,6 +327,7 @@
#define X86_FEATURE_FRED (12*32+17) /* Flexible Return and Event Delivery */
#define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" (userspace) GS */
#define X86_FEATURE_WRMSRNS (12*32+19) /* "" Non-serializing WRMSR */
+#define X86_FEATURE_NMI_SOURCE (12*32+20) /* NMI source reporting */
#define X86_FEATURE_AMX_FP16 (12*32+21) /* "" AMX fp16 Support */
#define X86_FEATURE_AVX_IFMA (12*32+23) /* "" Support for VPMADD52[H,L]UQ */
#define X86_FEATURE_LAM (12*32+26) /* Linear Address Masking */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4fa0b17e5043..465f04e4a79f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1427,8 +1427,10 @@ early_param("fred", fred_setup);
void __init trap_init(void)
{
- if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred)
+ if (cpu_feature_enabled(X86_FEATURE_FRED) && !enable_fred) {
setup_clear_cpu_cap(X86_FEATURE_FRED);
+ setup_clear_cpu_cap(X86_FEATURE_NMI_SOURCE);
+ }
/* Init cpu_entry_area before IST entries are set up */
setup_cpu_entry_areas();
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 02/11] x86/irq: Define NMI source vectors
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
2024-07-09 14:38 ` [PATCH v4 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
@ 2024-07-09 14:38 ` Jacob Pan
2024-07-09 14:38 ` [PATCH v4 03/11] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
` (8 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:38 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan
When NMI-source reporting is supported, each logical processor maintains
a 16-bit NMI-source bitmap. It is up to the system software to assign NMI
sources for their matching vector (bit position) in the bitmap.
Notice that NMI source vector is in a different namespace than the IDT
vectors. Though they share the same programming interface/field in the
NMI originator.
This initial allocation of the NMI sources are limited to local NMIs in
that there is no external device NMI usage yet.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/include/asm/irq_vectors.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 13aea8fc3d45..4f767c3940d6 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -105,6 +105,36 @@
#define NR_VECTORS 256
+/*
+ * NMI senders specify the NMI-source vector as an 8-bit integer in their
+ * vector field with NMI delivery mode. A local APIC receiving an NMI will
+ * set the corresponding bit in a 16-bit bitmask, which is accumulated until
+ * the NMI is delivered.
+ *
+ * When a sender didn't specify an NMI-source vector the source vector will
+ * be 0, which will result in bit 0 of the bitmask being set. For out of
+ * bounds vectors >= 16, bit 0 will also be set.
+ *
+ * When bit 0 is set, system software must invoke all registered NMI handlers
+ * as if NMI-source reporting feature is not enabled.
+ *
+ * Vector 2 is reserved for matching IDT NMI vector where it may be hardcoded
+ * by some external devices.
+ *
+ * The NMI-source vectors are sorted by descending priority with the exceptions
+ * of 0 and 2.
+ */
+#define NMI_SOURCE_VEC_UNKNOWN 0
+#define NMI_SOURCE_VEC_IPI_REBOOT 1 /* Crash reboot */
+#define NMI_SOURCE_VEC_IDT_NMI 2 /* Match IDT NMI vector 2 */
+#define NMI_SOURCE_VEC_IPI_SMP_STOP 3 /* Panic stop CPU */
+#define NMI_SOURCE_VEC_IPI_BT 4 /* CPU backtrace */
+#define NMI_SOURCE_VEC_PMI 5 /* PerfMon counters */
+#define NMI_SOURCE_VEC_IPI_KGDB 6 /* KGDB */
+#define NMI_SOURCE_VEC_IPI_MCE 7 /* MCE injection */
+#define NMI_SOURCE_VEC_IPI_TEST 8 /* For remote and local IPIs */
+#define NR_NMI_SOURCE_VECTORS 9
+
#ifdef CONFIG_X86_LOCAL_APIC
#define FIRST_SYSTEM_VECTOR POSTED_MSI_NOTIFICATION_VECTOR
#else
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 03/11] x86/irq: Extend NMI handler registration interface to include source
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
2024-07-09 14:38 ` [PATCH v4 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
2024-07-09 14:38 ` [PATCH v4 02/11] x86/irq: Define NMI source vectors Jacob Pan
@ 2024-07-09 14:38 ` Jacob Pan
2024-07-09 14:38 ` [PATCH v4 04/11] x86/irq: Factor out common NMI handling code Jacob Pan
` (7 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:38 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan
Add a source vector argument to register_nmi_handler() such that designated
NMI originators can leverage NMI source reporting feature. For those who
do not use NMI-source reporting, 0 (unknown) is used as the source vector.
NMI-source vectors (up to 16) are pre-defined.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4:
- Tweak commit message
v3:
- Move NMI source vector definitions to a separate patch (Sohil)
v2:(address review comments from HPA, not including optimizations"
- Reserve IDT NMI vector 2 in case of devices use hardcoded vector 2
- Sort NMI source vector by priority in descending order
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/events/amd/ibs.c | 2 +-
arch/x86/events/core.c | 3 ++-
arch/x86/include/asm/nmi.h | 4 +++-
arch/x86/kernel/apic/hw_nmi.c | 2 +-
arch/x86/kernel/cpu/mce/inject.c | 2 +-
arch/x86/kernel/cpu/mshyperv.c | 2 +-
arch/x86/kernel/kgdb.c | 4 ++--
arch/x86/kernel/nmi.c | 22 ++++++++++++++++++++++
arch/x86/kernel/nmi_selftest.c | 5 +++--
arch/x86/kernel/reboot.c | 2 +-
arch/x86/kernel/smp.c | 2 +-
arch/x86/platform/uv/uv_nmi.c | 4 ++--
drivers/acpi/apei/ghes.c | 2 +-
drivers/char/ipmi/ipmi_watchdog.c | 2 +-
drivers/edac/igen6_edac.c | 2 +-
drivers/watchdog/hpwdt.c | 6 +++---
16 files changed, 46 insertions(+), 20 deletions(-)
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index e91970b01d62..20989071f59a 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1246,7 +1246,7 @@ static __init int perf_event_ibs_init(void)
if (ret)
goto err_op;
- ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
+ ret = register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs", 0);
if (ret)
goto err_nmi;
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 5b0dd07b1ef1..1ef2201e48ac 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2100,7 +2100,8 @@ static int __init init_hw_perf_events(void)
x86_pmu.intel_ctrl = (1 << x86_pmu.num_counters) - 1;
perf_events_lapic_init();
- register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI");
+
+ register_nmi_handler(NMI_LOCAL, perf_event_nmi_handler, 0, "PMI", NMI_SOURCE_VEC_PMI);
unconstrained = (struct event_constraint)
__EVENT_CONSTRAINT(0, (1ULL << x86_pmu.num_counters) - 1,
diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 41a0ebb699ec..6fe26fea30eb 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -39,15 +39,17 @@ struct nmiaction {
u64 max_duration;
unsigned long flags;
const char *name;
+ unsigned int source_vec;
};
-#define register_nmi_handler(t, fn, fg, n, init...) \
+#define register_nmi_handler(t, fn, fg, n, src, init...) \
({ \
static struct nmiaction init fn##_na = { \
.list = LIST_HEAD_INIT(fn##_na.list), \
.handler = (fn), \
.name = (n), \
.flags = (fg), \
+ .source_vec = (src), \
}; \
__register_nmi_handler((t), &fn##_na); \
})
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 45af535c44a0..9f0125d3b8b0 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -54,7 +54,7 @@ NOKPROBE_SYMBOL(nmi_cpu_backtrace_handler);
static int __init register_nmi_cpu_backtrace_handler(void)
{
register_nmi_handler(NMI_LOCAL, nmi_cpu_backtrace_handler,
- 0, "arch_bt");
+ 0, "arch_bt", NMI_SOURCE_VEC_IPI_BT);
return 0;
}
early_initcall(register_nmi_cpu_backtrace_handler);
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 94953d749475..365a03f11d06 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -769,7 +769,7 @@ static int __init inject_init(void)
debugfs_init();
- register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0, "mce_notify");
+ register_nmi_handler(NMI_LOCAL, mce_raise_notify, 0, "mce_notify", NMI_SOURCE_VEC_IPI_MCE);
mce_register_injector_chain(&inject_nb);
setup_inj_struct(&i_mce);
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e0fd57a8ba84..2fb9408a8ba9 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -486,7 +486,7 @@ static void __init ms_hyperv_init_platform(void)
}
register_nmi_handler(NMI_UNKNOWN, hv_nmi_unknown, NMI_FLAG_FIRST,
- "hv_nmi_unknown");
+ "hv_nmi_unknown", 0);
#endif
#ifdef CONFIG_X86_IO_APIC
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 9c9faa1634fb..d167eb23cf13 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -603,12 +603,12 @@ int kgdb_arch_init(void)
goto out;
retval = register_nmi_handler(NMI_LOCAL, kgdb_nmi_handler,
- 0, "kgdb");
+ 0, "kgdb", NMI_SOURCE_VEC_IPI_KGDB);
if (retval)
goto out1;
retval = register_nmi_handler(NMI_UNKNOWN, kgdb_nmi_handler,
- 0, "kgdb");
+ 0, "kgdb", 0);
if (retval)
goto out2;
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index ed163c8c8604..5491b460f32b 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -86,6 +86,12 @@ static DEFINE_PER_CPU(struct nmi_stats, nmi_stats);
static int ignore_nmis __read_mostly;
+/*
+ * Contains all actions registered by originators with source vector,
+ * excluding UNKNOWN NMI-source vector 0.
+ */
+static struct nmiaction *nmiaction_src_table[NR_NMI_SOURCE_VECTORS - 1];
+
int unknown_nmi_panic;
/*
* Prevent NMI reason port (0x61) being accessed simultaneously, can
@@ -163,6 +169,12 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
}
NOKPROBE_SYMBOL(nmi_handle);
+static inline bool use_nmi_source(unsigned int type, struct nmiaction *a)
+{
+ return (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) &&
+ type == NMI_LOCAL && a->source_vec);
+}
+
int __register_nmi_handler(unsigned int type, struct nmiaction *action)
{
struct nmi_desc *desc = nmi_to_desc(type);
@@ -173,6 +185,11 @@ int __register_nmi_handler(unsigned int type, struct nmiaction *action)
raw_spin_lock_irqsave(&desc->lock, flags);
+ if (use_nmi_source(type, action)) {
+ rcu_assign_pointer(nmiaction_src_table[action->source_vec], action);
+ pr_info("NMI source %d registered for %s\n", action->source_vec, action->name);
+ }
+
/*
* Indicate if there are multiple registrations on the
* internal NMI handler call chains (SERR and IO_CHECK).
@@ -210,6 +227,11 @@ void unregister_nmi_handler(unsigned int type, const char *name)
if (!strcmp(n->name, name)) {
WARN(in_nmi(),
"Trying to free NMI (%s) from NMI context!\n", n->name);
+ if (use_nmi_source(type, n)) {
+ rcu_assign_pointer(nmiaction_src_table[n->source_vec], NULL);
+ pr_info("NMI source %d unregistered for %s\n", n->source_vec, n->name);
+ }
+
list_del_rcu(&n->list);
found = n;
break;
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index e93a8545c74d..f014c8a66b0c 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -44,7 +44,7 @@ static void __init init_nmi_testsuite(void)
{
/* trap all the unknown NMIs we may generate */
register_nmi_handler(NMI_UNKNOWN, nmi_unk_cb, 0, "nmi_selftest_unk",
- __initdata);
+ 0, __initdata);
}
static void __init cleanup_nmi_testsuite(void)
@@ -67,7 +67,8 @@ static void __init test_nmi_ipi(struct cpumask *mask)
unsigned long timeout;
if (register_nmi_handler(NMI_LOCAL, test_nmi_ipi_callback,
- NMI_FLAG_FIRST, "nmi_selftest", __initdata)) {
+ NMI_FLAG_FIRST, "nmi_selftest", NMI_SOURCE_VEC_IPI_TEST,
+ __initdata)) {
nmi_fail = FAILURE;
return;
}
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f3130f762784..acc19c1d3b4f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -910,7 +910,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
/* Would it be better to replace the trap vector here? */
if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
- NMI_FLAG_FIRST, "crash"))
+ NMI_FLAG_FIRST, "crash", NMI_SOURCE_VEC_IPI_REBOOT))
return; /* Return what? */
/*
* Ensure the new callback function is set before sending
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 18266cc3d98c..f27469e40141 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -143,7 +143,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
static int register_stop_handler(void)
{
return register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
- NMI_FLAG_FIRST, "smp_stop");
+ NMI_FLAG_FIRST, "smp_stop", NMI_SOURCE_VEC_IPI_SMP_STOP);
}
static void native_stop_other_cpus(int wait)
diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 5c50e550ab63..473c34eb264c 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -1029,10 +1029,10 @@ static int uv_handle_nmi_ping(unsigned int reason, struct pt_regs *regs)
static void uv_register_nmi_notifier(void)
{
- if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv"))
+ if (register_nmi_handler(NMI_UNKNOWN, uv_handle_nmi, 0, "uv", 0))
pr_warn("UV: NMI handler failed to register\n");
- if (register_nmi_handler(NMI_LOCAL, uv_handle_nmi_ping, 0, "uvping"))
+ if (register_nmi_handler(NMI_LOCAL, uv_handle_nmi_ping, 0, "uvping", 0))
pr_warn("UV: PING NMI handler failed to register\n");
}
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 623cc0cb4a65..393dca95d2b3 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -1318,7 +1318,7 @@ static void ghes_nmi_add(struct ghes *ghes)
{
mutex_lock(&ghes_list_mutex);
if (list_empty(&ghes_nmi))
- register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes");
+ register_nmi_handler(NMI_LOCAL, ghes_notify_nmi, 0, "ghes", 0);
list_add_rcu(&ghes->list, &ghes_nmi);
mutex_unlock(&ghes_list_mutex);
}
diff --git a/drivers/char/ipmi/ipmi_watchdog.c b/drivers/char/ipmi/ipmi_watchdog.c
index 9a459257489f..61bb5dcade5a 100644
--- a/drivers/char/ipmi/ipmi_watchdog.c
+++ b/drivers/char/ipmi/ipmi_watchdog.c
@@ -1272,7 +1272,7 @@ static void check_parms(void)
}
if (do_nmi && !nmi_handler_registered) {
rv = register_nmi_handler(NMI_UNKNOWN, ipmi_nmi, 0,
- "ipmi");
+ "ipmi", 0);
if (rv) {
pr_warn("Can't register nmi handler\n");
return;
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index dbe9fe5f2ca6..891278245d8b 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1321,7 +1321,7 @@ static int register_err_handler(void)
}
rc = register_nmi_handler(NMI_SERR, ecclog_nmi_handler,
- 0, IGEN6_NMI_NAME);
+ 0, IGEN6_NMI_NAME, 0);
if (rc) {
igen6_printk(KERN_ERR, "Failed to register NMI handler\n");
return rc;
diff --git a/drivers/watchdog/hpwdt.c b/drivers/watchdog/hpwdt.c
index ae30e394d176..5246706afcf6 100644
--- a/drivers/watchdog/hpwdt.c
+++ b/drivers/watchdog/hpwdt.c
@@ -242,13 +242,13 @@ static int hpwdt_init_nmi_decoding(struct pci_dev *dev)
/*
* Only one function can register for NMI_UNKNOWN
*/
- retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout, 0, "hpwdt");
+ retval = register_nmi_handler(NMI_UNKNOWN, hpwdt_pretimeout, 0, "hpwdt", 0);
if (retval)
goto error;
- retval = register_nmi_handler(NMI_SERR, hpwdt_pretimeout, 0, "hpwdt");
+ retval = register_nmi_handler(NMI_SERR, hpwdt_pretimeout, 0, "hpwdt", 0);
if (retval)
goto error1;
- retval = register_nmi_handler(NMI_IO_CHECK, hpwdt_pretimeout, 0, "hpwdt");
+ retval = register_nmi_handler(NMI_IO_CHECK, hpwdt_pretimeout, 0, "hpwdt", 0);
if (retval)
goto error2;
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 04/11] x86/irq: Factor out common NMI handling code
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
` (2 preceding siblings ...)
2024-07-09 14:38 ` [PATCH v4 03/11] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
@ 2024-07-09 14:38 ` Jacob Pan
2024-07-09 14:39 ` [PATCH v4 05/11] x86/irq: Process nmi sources in NMI handler Jacob Pan
` (6 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:38 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan
In preparation for handling NMIs with explicit source reporting, factor
out common code for reuse. No functional change.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/kernel/nmi.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 5491b460f32b..b96667eed106 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -135,6 +135,20 @@ static void nmi_check_duration(struct nmiaction *action, u64 duration)
action->handler, duration, decimal_msecs);
}
+static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsigned int type)
+{
+ int thishandled;
+ u64 delta;
+
+ delta = sched_clock();
+ thishandled = a->handler(type, regs);
+ delta = sched_clock() - delta;
+ trace_nmi_handler(a->handler, (int)delta, thishandled);
+ nmi_check_duration(a, delta);
+
+ return thishandled;
+}
+
static int nmi_handle(unsigned int type, struct pt_regs *regs)
{
struct nmi_desc *desc = nmi_to_desc(type);
@@ -149,18 +163,8 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
* can be latched at any given time. Walk the whole list
* to handle those situations.
*/
- list_for_each_entry_rcu(a, &desc->head, list) {
- int thishandled;
- u64 delta;
-
- delta = sched_clock();
- thishandled = a->handler(type, regs);
- handled += thishandled;
- delta = sched_clock() - delta;
- trace_nmi_handler(a->handler, (int)delta, thishandled);
-
- nmi_check_duration(a, delta);
- }
+ list_for_each_entry_rcu(a, &desc->head, list)
+ handled += do_handle_nmi(a, regs, type);
rcu_read_unlock();
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 05/11] x86/irq: Process nmi sources in NMI handler
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
` (3 preceding siblings ...)
2024-07-09 14:38 ` [PATCH v4 04/11] x86/irq: Factor out common NMI handling code Jacob Pan
@ 2024-07-09 14:39 ` Jacob Pan
2024-07-09 14:39 ` [PATCH v4 06/11] KVM: VMX: Expand FRED kvm entry with event data Jacob Pan
` (5 subsequent siblings)
10 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:39 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan
When NMI-source reporting is enabled, the vector 2 NMI handler can
prioritize the handling of explicitly reported sources. If the source
is unknown, it will continue with the existing processing flow, meaning
all NMI handlers will be invoked.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4:
- Coding style changes (Li Xin)
- Renamed handled_mask to partial_bitmap (Nikolay)
v3:
- Use a static flag to disable NMIs in case of HW failure
- Optimize the case when unknown NMIs are mixed with known NMIs(HPA)
v2:
- Disable NMI source reporting once garbage data is given in FRED
return stack. (HPA)
process nmi
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/kernel/nmi.c | 83 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 82 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index b96667eed106..edb169289a1d 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -149,12 +149,89 @@ static inline int do_handle_nmi(struct nmiaction *a, struct pt_regs *regs, unsig
return thishandled;
}
+static int nmi_handle_src(unsigned int type, struct pt_regs *regs, unsigned long *partial_bitmap)
+{
+ static bool nmi_source_disabled;
+ bool has_unknown_src = false;
+ unsigned long source_bitmap;
+ struct nmiaction *a;
+ int handled = 0;
+ int vec;
+
+ if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE) || type != NMI_LOCAL || nmi_source_disabled)
+ return 0;
+
+ source_bitmap = fred_event_data(regs);
+ if (unlikely(!source_bitmap)) {
+ pr_warn("Buggy hardware! Disable NMI-source handling.\n");
+ nmi_source_disabled = true;
+ return 0;
+ }
+
+ /*
+ * There is no guarantee that a valid NMI-source vector is always
+ * delivered, even when the originator specified one. It is software's
+ * responsibility to check all available NMI sources when bit 0 is set
+ * in the NMI-source reporting bitmap. I.e. we have to call every
+ * handler as if there is no NMI-source reporting feature enabled.
+ *
+ * In this case, handlers for the known NMI sources will be called
+ * first, followed by the remaining handlers, which are called
+ * during the subsequent polling code.
+ *
+ * Conversely, if non-zero vectors appear in the source bitmap, we
+ * can precisely identify the sources. Therefore, we only invoke the
+ * handlers for which the corresponding bits are set.
+ */
+ if (unlikely(source_bitmap & BIT(NMI_SOURCE_VEC_UNKNOWN))) {
+ pr_warn_ratelimited("NMI received with unknown sources\n");
+ has_unknown_src = true;
+ }
+
+ rcu_read_lock();
+
+ /* Bit 0 is for unknown NMI sources, skip it. */
+ vec = 1;
+ for_each_set_bit_from(vec, &source_bitmap, NR_NMI_SOURCE_VECTORS) {
+ a = rcu_dereference(nmiaction_src_table[vec]);
+ if (!a) {
+ pr_warn_ratelimited("NMI-source vector %d has no handler!", vec);
+ continue;
+ }
+
+ handled += do_handle_nmi(a, regs, type);
+
+ /*
+ * Needs polling if the unknown source bit is set.
+ * partial_bitmap is used to tell the polling code which
+ * NMIs have already been handled based on explicit source
+ * thus can be skipped.
+ */
+ if (has_unknown_src)
+ *partial_bitmap |= BIT(vec);
+ }
+
+ rcu_read_unlock();
+
+ return handled;
+}
+
static int nmi_handle(unsigned int type, struct pt_regs *regs)
{
struct nmi_desc *desc = nmi_to_desc(type);
+ unsigned long partial_bitmap = 0;
struct nmiaction *a;
int handled=0;
+ /*
+ * Check if the NMI source handling is complete, otherwise polling is
+ * still required. partial_bitmap is non-zero if NMI source handling is
+ * partial due to unknown NMI sources.
+ */
+ handled = nmi_handle_src(type, regs, &partial_bitmap);
+ if (handled && !partial_bitmap)
+ return handled;
+
rcu_read_lock();
/*
@@ -163,8 +240,12 @@ static int nmi_handle(unsigned int type, struct pt_regs *regs)
* can be latched at any given time. Walk the whole list
* to handle those situations.
*/
- list_for_each_entry_rcu(a, &desc->head, list)
+ list_for_each_entry_rcu(a, &desc->head, list) {
+ /* Skip NMIs handled earlier with source info */
+ if (BIT(a->source_vec) & partial_bitmap)
+ continue;
handled += do_handle_nmi(a, regs, type);
+ }
rcu_read_unlock();
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 06/11] KVM: VMX: Expand FRED kvm entry with event data
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
` (4 preceding siblings ...)
2024-07-09 14:39 ` [PATCH v4 05/11] x86/irq: Process nmi sources in NMI handler Jacob Pan
@ 2024-07-09 14:39 ` Jacob Pan
2024-08-16 14:12 ` Sean Christopherson
2024-07-09 14:39 ` [PATCH v4 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
` (4 subsequent siblings)
10 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:39 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Zeng Guang, Jacob Pan
From: Zeng Guang <guang.zeng@intel.com>
According to FRED specification 10.6.2, VM exits triggered by events such as
NMI, #DB, and #PF will have their event data stored in the exit-qualification
field.
However, #DB and #PF are owned by the running guest, which is managed by KVM.
NMIs belong to the host, and the host NMI handler requires the event data
stored in the VMCS for NMI-induced VM exits.
This patch enhances the FRED KVM entry interface to include the event data
derived from the exit qualification. Currently, it is used exclusively for
NMI-source reporting
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/entry/entry_64_fred.S | 2 +-
arch/x86/include/asm/fred.h | 8 ++++----
arch/x86/kvm/vmx/vmx.c | 4 ++--
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index a02bc6f3d2e6..0d934a3fcaf8 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -92,7 +92,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
* +--------+-----------------+
*/
push $0 /* Reserved, must be 0 */
- push $0 /* Event data, 0 for IRQ/NMI */
+ push %rsi /* Event data for IRQ/NMI */
push %rdi /* fred_ss handed in by the caller */
push %rbp
pushf
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index e86c7ba32435..15f5d2eabd1d 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -63,14 +63,14 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
void asm_fred_entrypoint_user(void);
void asm_fred_entrypoint_kernel(void);
-void asm_fred_entry_from_kvm(struct fred_ss);
+void asm_fred_entry_from_kvm(struct fred_ss, unsigned long edata);
__visible void fred_entry_from_user(struct pt_regs *regs);
__visible void fred_entry_from_kernel(struct pt_regs *regs);
__visible void __fred_entry_from_kvm(struct pt_regs *regs);
/* Can be called from noinstr code, thus __always_inline */
-static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector)
+static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector, unsigned long edata)
{
struct fred_ss ss = {
.ss =__KERNEL_DS,
@@ -80,7 +80,7 @@ static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int
.lm = 1,
};
- asm_fred_entry_from_kvm(ss);
+ asm_fred_entry_from_kvm(ss, edata);
}
void cpu_init_fred_exceptions(void);
@@ -90,7 +90,7 @@ void fred_complete_exception_setup(void);
static __always_inline unsigned long fred_event_data(struct pt_regs *regs) { return 0; }
static inline void cpu_init_fred_exceptions(void) { }
static inline void fred_complete_exception_setup(void) { }
-static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector) { }
+static __always_inline void fred_entry_from_kvm(unsigned int type, unsigned int vector, unsigned long edata) { }
#endif /* CONFIG_X86_FRED */
#endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b3c83c06f826..4e7b36081b76 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7024,7 +7024,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
if (cpu_feature_enabled(X86_FEATURE_FRED))
- fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
+ fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector, 0);
else
vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
kvm_after_interrupt(vcpu);
@@ -7332,7 +7332,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
is_nmi(vmx_get_intr_info(vcpu))) {
kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
if (cpu_feature_enabled(X86_FEATURE_FRED))
- fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR);
+ fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 0);
else
vmx_do_nmi_irqoff();
kvm_after_interrupt(vcpu);
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 07/11] KVM: VMX: Handle NMI Source report in VM exit
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
` (5 preceding siblings ...)
2024-07-09 14:39 ` [PATCH v4 06/11] KVM: VMX: Expand FRED kvm entry with event data Jacob Pan
@ 2024-07-09 14:39 ` Jacob Pan
2024-08-16 14:05 ` Sean Christopherson
2024-07-09 14:39 ` [PATCH v4 08/11] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
` (3 subsequent siblings)
10 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:39 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Zeng Guang, Jacob Pan
From: Zeng Guang <guang.zeng@intel.com>
If the "NMI exiting" VM-execution control is 1, the value of the 16-bit NMI
source vector is saved in the exit-qualification field in the VMCS when VM
exits occur on CPUs that support NMI source.
KVM that is aware of NMI-source reporting will push the bitmask of NMI source
vectors as the exceptoin event data field on the stack for then entry of FRED
exception. Subsequently, the host NMI exception handler is invoked which
will process NMI source information in the event data. This operation is
independent of vCPU FRED enabling status.
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/kvm/vmx/vmx.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4e7b36081b76..6719c598fa5f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7331,10 +7331,15 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
is_nmi(vmx_get_intr_info(vcpu))) {
kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
- if (cpu_feature_enabled(X86_FEATURE_FRED))
- fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 0);
- else
+ if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+ unsigned long edata = 0;
+
+ if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
+ edata = vmx_get_exit_qual(vcpu);
+ fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, edata);
+ } else {
vmx_do_nmi_irqoff();
+ }
kvm_after_interrupt(vcpu);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 08/11] perf/x86: Enable NMI source reporting for perfmon
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
` (6 preceding siblings ...)
2024-07-09 14:39 ` [PATCH v4 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
@ 2024-07-09 14:39 ` Jacob Pan
2024-07-09 15:04 ` Liang, Kan
2024-07-09 14:39 ` [PATCH v4 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
` (2 subsequent siblings)
10 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:39 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan, Zeng Guang
Program the designated NMI source vector into the performance monitoring
interrupt (PMI) of the local vector table. PMI handler will be directly
invoked when its NMI is generated. This avoids the latency of calling all
NMI handlers blindly.
Co-developed-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4: Use a macro for programming PVTPC unconditionally (Kan)
v3: Program NMI source vector in PVTPC unconditionally (HPA)
v2: Fix a compile error apic_perfmon_ctr is undefined in i386 config
---
arch/x86/events/core.c | 4 ++--
arch/x86/events/intel/core.c | 6 +++---
arch/x86/include/asm/apic.h | 2 ++
3 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1ef2201e48ac..e69c52f9d662 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1680,7 +1680,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
* This generic handler doesn't seem to have any issues where the
* unmasking occurs so it was left at the top.
*/
- apic_write(APIC_LVTPC, APIC_DM_NMI);
+ apic_write(APIC_LVTPC, APIC_PERF_NMI);
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
if (!test_bit(idx, cpuc->active_mask))
@@ -1723,7 +1723,7 @@ void perf_events_lapic_init(void)
/*
* Always use NMI for PMU
*/
- apic_write(APIC_LVTPC, APIC_DM_NMI);
+ apic_write(APIC_LVTPC, APIC_PERF_NMI);
}
static int
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 38c1b1f1deaa..e7e114616e24 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3093,7 +3093,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
* NMI handler.
*/
if (!late_ack && !mid_ack)
- apic_write(APIC_LVTPC, APIC_DM_NMI);
+ apic_write(APIC_LVTPC, APIC_PERF_NMI);
intel_bts_disable_local();
cpuc->enabled = 0;
__intel_pmu_disable_all(true);
@@ -3130,7 +3130,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
done:
if (mid_ack)
- apic_write(APIC_LVTPC, APIC_DM_NMI);
+ apic_write(APIC_LVTPC, APIC_PERF_NMI);
/* Only restore PMU state when it's active. See x86_pmu_disable(). */
cpuc->enabled = pmu_enabled;
if (pmu_enabled)
@@ -3143,7 +3143,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
* Haswell CPUs.
*/
if (late_ack)
- apic_write(APIC_LVTPC, APIC_DM_NMI);
+ apic_write(APIC_LVTPC, APIC_PERF_NMI);
return handled;
}
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 9327eb00e96d..d284eff7849c 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -30,6 +30,8 @@
#define APIC_EXTNMI_ALL 1
#define APIC_EXTNMI_NONE 2
+#define APIC_PERF_NMI (APIC_DM_NMI | NMI_SOURCE_VEC_PMI)
+
/*
* Define the default level of output to be very little
* This can be turned up by using apic=verbose for more
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
` (7 preceding siblings ...)
2024-07-09 14:39 ` [PATCH v4 08/11] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
@ 2024-07-09 14:39 ` Jacob Pan
2024-08-16 13:45 ` Sean Christopherson
2024-07-09 14:39 ` [PATCH v4 10/11] x86/irq: Move __prepare_ICR to x86 common header Jacob Pan
2024-07-09 14:39 ` [PATCH v4 11/11] KVM: X86: Use common code for PV IPIs in linux guest Jacob Pan
10 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:39 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan
Program designated NMI source vectors for all NMI delivered IPIs
such that their handlers can be selectively invoked.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4: Enhance comments, no functional changes (Li Xin)
---
arch/x86/include/asm/irq_vectors.h | 10 ++++++++++
arch/x86/kernel/apic/hw_nmi.c | 3 ++-
arch/x86/kernel/apic/ipi.c | 4 ++--
arch/x86/kernel/apic/local.h | 18 ++++++++++++------
arch/x86/kernel/cpu/mce/inject.c | 2 +-
arch/x86/kernel/kgdb.c | 2 +-
arch/x86/kernel/nmi_selftest.c | 2 +-
arch/x86/kernel/reboot.c | 2 +-
arch/x86/kernel/smp.c | 2 +-
9 files changed, 31 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 4f767c3940d6..9b7241e7faa3 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -135,6 +135,16 @@
#define NMI_SOURCE_VEC_IPI_TEST 8 /* For remote and local IPIs */
#define NR_NMI_SOURCE_VECTORS 9
+/*
+ * When programming the local APIC, IDT NMI vector and NMI-source vector
+ * are encoded in a single 32 bit variable. The top 16 bits contain
+ * the NMI-source vector and the bottom 16 bits contain NMI_VECTOR (2)
+ * The top 16 bits are always zero when NMI-source reporting feature
+ * is not enabled or the caller does not use NMI-source reporting.
+ */
+#define NMI_VECTOR_WITH_SOURCE(src) (NMI_VECTOR | (src << 16))
+#define NMI_SOURCE_VEC_MASK GENMASK(15, 0)
+
#ifdef CONFIG_X86_LOCAL_APIC
#define FIRST_SYSTEM_VECTOR POSTED_MSI_NOTIFICATION_VECTOR
#else
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 9f0125d3b8b0..f73ca95d961e 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -20,6 +20,7 @@
#include <linux/nmi.h>
#include <linux/init.h>
#include <linux/delay.h>
+#include <asm/irq_vectors.h>
#include "local.h"
@@ -33,7 +34,7 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
#ifdef arch_trigger_cpumask_backtrace
static void nmi_raise_cpu_backtrace(cpumask_t *mask)
{
- __apic_send_IPI_mask(mask, NMI_VECTOR);
+ __apic_send_IPI_mask(mask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_BT));
}
void arch_trigger_cpumask_backtrace(const cpumask_t *mask, int exclude_cpu)
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 5da693d633b7..9d2b18e58758 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -157,7 +157,7 @@ static void __default_send_IPI_shortcut(unsigned int shortcut, int vector)
* issues where otherwise the system hangs when the panic CPU tries
* to stop the others before launching the kdump kernel.
*/
- if (unlikely(vector == NMI_VECTOR))
+ if (unlikely(is_nmi_vector(vector)))
apic_mem_wait_icr_idle_timeout();
else
apic_mem_wait_icr_idle();
@@ -174,7 +174,7 @@ void __default_send_IPI_dest_field(unsigned int dest_mask, int vector,
unsigned int dest_mode)
{
/* See comment in __default_send_IPI_shortcut() */
- if (unlikely(vector == NMI_VECTOR))
+ if (unlikely(is_nmi_vector(vector)))
apic_mem_wait_icr_idle_timeout();
else
apic_mem_wait_icr_idle();
diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h
index 842fe28496be..60e90b7bf058 100644
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -12,6 +12,7 @@
#include <asm/irq_vectors.h>
#include <asm/apic.h>
+#include <asm/nmi.h>
/* X2APIC */
void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
@@ -26,19 +27,24 @@ extern u32 x2apic_max_apicid;
DECLARE_STATIC_KEY_FALSE(apic_use_ipi_shorthand);
+static inline bool is_nmi_vector(int vector)
+{
+ return (vector & NMI_SOURCE_VEC_MASK) == NMI_VECTOR;
+}
+
static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
unsigned int dest)
{
unsigned int icr = shortcut | dest;
- switch (vector) {
- default:
- icr |= APIC_DM_FIXED | vector;
- break;
- case NMI_VECTOR:
+ if (is_nmi_vector(vector)) {
icr |= APIC_DM_NMI;
- break;
+ if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
+ icr |= vector >> 16;
+ } else {
+ icr |= APIC_DM_FIXED | vector;
}
+
return icr;
}
diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inject.c
index 365a03f11d06..07bc6c29bd83 100644
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -270,7 +270,7 @@ static void __maybe_unused raise_mce(struct mce *m)
mce_irq_ipi, NULL, 0);
preempt_enable();
} else if (m->inject_flags & MCJ_NMI_BROADCAST)
- __apic_send_IPI_mask(mce_inject_cpumask, NMI_VECTOR);
+ __apic_send_IPI_mask(mce_inject_cpumask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_MCE));
}
start = jiffies;
while (!cpumask_empty(mce_inject_cpumask)) {
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index d167eb23cf13..02198cf9fe21 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -416,7 +416,7 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
*/
void kgdb_roundup_cpus(void)
{
- apic_send_IPI_allbutself(NMI_VECTOR);
+ apic_send_IPI_allbutself(NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_KGDB));
}
#endif
diff --git a/arch/x86/kernel/nmi_selftest.c b/arch/x86/kernel/nmi_selftest.c
index f014c8a66b0c..5aa122d3368c 100644
--- a/arch/x86/kernel/nmi_selftest.c
+++ b/arch/x86/kernel/nmi_selftest.c
@@ -76,7 +76,7 @@ static void __init test_nmi_ipi(struct cpumask *mask)
/* sync above data before sending NMI */
wmb();
- __apic_send_IPI_mask(mask, NMI_VECTOR);
+ __apic_send_IPI_mask(mask, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_TEST));
/* Don't wait longer than a second */
timeout = USEC_PER_SEC;
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index acc19c1d3b4f..fb63bc0d6a0f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -918,7 +918,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
*/
wmb();
- apic_send_IPI_allbutself(NMI_VECTOR);
+ apic_send_IPI_allbutself(NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_REBOOT));
/* Kick CPUs looping in NMI context. */
WRITE_ONCE(crash_ipi_issued, 1);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index f27469e40141..b79e78762a73 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -217,7 +217,7 @@ static void native_stop_other_cpus(int wait)
pr_emerg("Shutting down cpus with NMI\n");
for_each_cpu(cpu, &cpus_stop_mask)
- __apic_send_IPI(cpu, NMI_VECTOR);
+ __apic_send_IPI(cpu, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_SMP_STOP));
}
/*
* Don't wait longer than 10 ms if the caller didn't
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 10/11] x86/irq: Move __prepare_ICR to x86 common header
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
` (8 preceding siblings ...)
2024-07-09 14:39 ` [PATCH v4 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
@ 2024-07-09 14:39 ` Jacob Pan
2024-07-09 14:39 ` [PATCH v4 11/11] KVM: X86: Use common code for PV IPIs in linux guest Jacob Pan
10 siblings, 0 replies; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:39 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan
To reuse __prepare_ICR() outside APIC local code, move it to the x86
common header. e.g. It can be used by the KVM PV IPI code.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
arch/x86/include/asm/apic.h | 21 +++++++++++++++++++++
arch/x86/kernel/apic/local.h | 22 ----------------------
2 files changed, 21 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index d284eff7849c..6c98c51dc1b7 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -477,6 +477,27 @@ static __always_inline bool apic_id_valid(u32 apic_id)
return apic_id <= apic->max_apic_id;
}
+static inline bool is_nmi_vector(int vector)
+{
+ return (vector & NMI_SOURCE_VEC_MASK) == NMI_VECTOR;
+}
+
+static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
+ unsigned int dest)
+{
+ unsigned int icr = shortcut | dest;
+
+ if (is_nmi_vector(vector)) {
+ icr |= APIC_DM_NMI;
+ if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
+ icr |= vector >> 16;
+ } else {
+ icr |= APIC_DM_FIXED | vector;
+ }
+
+ return icr;
+}
+
#else /* CONFIG_X86_LOCAL_APIC */
static inline u32 apic_read(u32 reg) { return 0; }
diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h
index 60e90b7bf058..8b1fe152cd2d 100644
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -12,7 +12,6 @@
#include <asm/irq_vectors.h>
#include <asm/apic.h>
-#include <asm/nmi.h>
/* X2APIC */
void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
@@ -27,27 +26,6 @@ extern u32 x2apic_max_apicid;
DECLARE_STATIC_KEY_FALSE(apic_use_ipi_shorthand);
-static inline bool is_nmi_vector(int vector)
-{
- return (vector & NMI_SOURCE_VEC_MASK) == NMI_VECTOR;
-}
-
-static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
- unsigned int dest)
-{
- unsigned int icr = shortcut | dest;
-
- if (is_nmi_vector(vector)) {
- icr |= APIC_DM_NMI;
- if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
- icr |= vector >> 16;
- } else {
- icr |= APIC_DM_FIXED | vector;
- }
-
- return icr;
-}
-
void default_init_apic_ldr(void);
void apic_mem_wait_icr_idle(void);
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 11/11] KVM: X86: Use common code for PV IPIs in linux guest
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
` (9 preceding siblings ...)
2024-07-09 14:39 ` [PATCH v4 10/11] x86/irq: Move __prepare_ICR to x86 common header Jacob Pan
@ 2024-07-09 14:39 ` Jacob Pan
2024-08-16 13:52 ` Sean Christopherson
10 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2024-07-09 14:39 UTC (permalink / raw)
To: X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, kan.liang,
Andi Kleen, Nikolay Borisov, Mehta, Sohil, Jacob Pan, Zeng Guang
The paravirtual APIC hooks in KVM, some of which are used for sending PV
IPIs, can reuse common code for ICR preparation. This shared code also
encompasses NMI-source reporting when in effect.
Originally-by: Zeng Guang <guang.zeng@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
v4: Refine comments, no functional change.
---
arch/x86/kernel/kvm.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 263f8aed4e2c..a45d60aa0302 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -516,15 +516,7 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
local_irq_save(flags);
- switch (vector) {
- default:
- icr = APIC_DM_FIXED | vector;
- break;
- case NMI_VECTOR:
- icr = APIC_DM_NMI;
- break;
- }
-
+ icr = __prepare_ICR(0, vector, 0);
for_each_cpu(cpu, mask) {
apic_id = per_cpu(x86_cpu_to_apicid, cpu);
if (!ipi_bitmap) {
--
2.25.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 08/11] perf/x86: Enable NMI source reporting for perfmon
2024-07-09 14:39 ` [PATCH v4 08/11] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
@ 2024-07-09 15:04 ` Liang, Kan
0 siblings, 0 replies; 17+ messages in thread
From: Liang, Kan @ 2024-07-09 15:04 UTC (permalink / raw)
To: Jacob Pan, X86 Kernel, Sean Christopherson, LKML, Thomas Gleixner,
Dave Hansen, H. Peter Anvin, Ingo Molnar, Borislav Petkov, Xin Li,
linux-perf-users, Peter Zijlstra
Cc: Paolo Bonzini, Tony Luck, Andy Lutomirski, acme, Andi Kleen,
Nikolay Borisov, Mehta, Sohil, Zeng Guang
On 2024-07-09 10:39 a.m., Jacob Pan wrote:
> Program the designated NMI source vector into the performance monitoring
> interrupt (PMI) of the local vector table. PMI handler will be directly
> invoked when its NMI is generated. This avoids the latency of calling all
> NMI handlers blindly.
>
> Co-developed-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Thanks,
Kan
> ---
> v4: Use a macro for programming PVTPC unconditionally (Kan)
> v3: Program NMI source vector in PVTPC unconditionally (HPA)
> v2: Fix a compile error apic_perfmon_ctr is undefined in i386 config
> ---
> arch/x86/events/core.c | 4 ++--
> arch/x86/events/intel/core.c | 6 +++---
> arch/x86/include/asm/apic.h | 2 ++
> 3 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 1ef2201e48ac..e69c52f9d662 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1680,7 +1680,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
> * This generic handler doesn't seem to have any issues where the
> * unmasking occurs so it was left at the top.
> */
> - apic_write(APIC_LVTPC, APIC_DM_NMI);
> + apic_write(APIC_LVTPC, APIC_PERF_NMI);
>
> for (idx = 0; idx < x86_pmu.num_counters; idx++) {
> if (!test_bit(idx, cpuc->active_mask))
> @@ -1723,7 +1723,7 @@ void perf_events_lapic_init(void)
> /*
> * Always use NMI for PMU
> */
> - apic_write(APIC_LVTPC, APIC_DM_NMI);
> + apic_write(APIC_LVTPC, APIC_PERF_NMI);
> }
>
> static int
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 38c1b1f1deaa..e7e114616e24 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3093,7 +3093,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
> * NMI handler.
> */
> if (!late_ack && !mid_ack)
> - apic_write(APIC_LVTPC, APIC_DM_NMI);
> + apic_write(APIC_LVTPC, APIC_PERF_NMI);
> intel_bts_disable_local();
> cpuc->enabled = 0;
> __intel_pmu_disable_all(true);
> @@ -3130,7 +3130,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
>
> done:
> if (mid_ack)
> - apic_write(APIC_LVTPC, APIC_DM_NMI);
> + apic_write(APIC_LVTPC, APIC_PERF_NMI);
> /* Only restore PMU state when it's active. See x86_pmu_disable(). */
> cpuc->enabled = pmu_enabled;
> if (pmu_enabled)
> @@ -3143,7 +3143,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
> * Haswell CPUs.
> */
> if (late_ack)
> - apic_write(APIC_LVTPC, APIC_DM_NMI);
> + apic_write(APIC_LVTPC, APIC_PERF_NMI);
> return handled;
> }
>
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index 9327eb00e96d..d284eff7849c 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -30,6 +30,8 @@
> #define APIC_EXTNMI_ALL 1
> #define APIC_EXTNMI_NONE 2
>
> +#define APIC_PERF_NMI (APIC_DM_NMI | NMI_SOURCE_VEC_PMI)
> +
> /*
> * Define the default level of output to be very little
> * This can be turned up by using apic=verbose for more
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI
2024-07-09 14:39 ` [PATCH v4 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
@ 2024-08-16 13:45 ` Sean Christopherson
0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2024-08-16 13:45 UTC (permalink / raw)
To: Jacob Pan
Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
Ingo Molnar, Borislav Petkov, Xin Li, linux-perf-users,
Peter Zijlstra, Paolo Bonzini, Tony Luck, Andy Lutomirski, acme,
kan.liang, Andi Kleen, Nikolay Borisov, Sohil Mehta
On Tue, Jul 09, 2024, Jacob Pan wrote:
> Program designated NMI source vectors for all NMI delivered IPIs
> such that their handlers can be selectively invoked.
>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v4: Enhance comments, no functional changes (Li Xin)
> ---
> arch/x86/include/asm/irq_vectors.h | 10 ++++++++++
> arch/x86/kernel/apic/hw_nmi.c | 3 ++-
> arch/x86/kernel/apic/ipi.c | 4 ++--
> arch/x86/kernel/apic/local.h | 18 ++++++++++++------
> arch/x86/kernel/cpu/mce/inject.c | 2 +-
> arch/x86/kernel/kgdb.c | 2 +-
> arch/x86/kernel/nmi_selftest.c | 2 +-
> arch/x86/kernel/reboot.c | 2 +-
> arch/x86/kernel/smp.c | 2 +-
> 9 files changed, 31 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
> index 4f767c3940d6..9b7241e7faa3 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -135,6 +135,16 @@
> #define NMI_SOURCE_VEC_IPI_TEST 8 /* For remote and local IPIs */
> #define NR_NMI_SOURCE_VECTORS 9
>
> +/*
> + * When programming the local APIC, IDT NMI vector and NMI-source vector
> + * are encoded in a single 32 bit variable. The top 16 bits contain
> + * the NMI-source vector and the bottom 16 bits contain NMI_VECTOR (2)
> + * The top 16 bits are always zero when NMI-source reporting feature
> + * is not enabled or the caller does not use NMI-source reporting.
This is *extremely* misleading, bordering on being an outright lie. The vectors
are not encoded in a single 32-bit variable when programming the _local APIC_,
that is 100% an arbitrary software construct. The actually write to APIC.ICR
morphs bits 15:0 into the TYPE, and the bits 31:16 into the VECTOR.
> + */
> +#define NMI_VECTOR_WITH_SOURCE(src) (NMI_VECTOR | (src << 16))
Why require callers to use NMI_VECTOR_WITH_SOURCE instead of having macros to
directly encode each source NMI, a la APIC_PERF_NMI?
To me, this:
__apic_send_IPI(cpu, NMI_VECTOR_WITH_SOURCE(NMI_SOURCE_VEC_IPI_SMP_STOP));
is *way* harder to read/parse than:
__apic_send_IPI(cpu, NMI_VECTOR_SMP_STOP);
especially since that first one blasts way past 80 chars (yeah, I know 80 is now
a soft limit, but it's still nice to keep line lengths short when possible).
> +#define NMI_SOURCE_VEC_MASK GENMASK(15, 0)
IMO, this is an absolutely awful encoding scheme. Vectors are 8-bit values, so
why on earth use 16 bits? And @vector is passed along as a _signed_ integer,
which means 32-bit kernels could end up observing negative values, which probably
isn't problematic in practice, but it's unnecessarily confusing.
All this FRED stuff is hard enough to follow given the specs have been rolled out
piecemeal (someone at Intel must get paid based on how many specs they publish),
using a software-defined scheme when FRED is already overloading a decades old
hardware-defined encoding is just mean.
Why not encode APIC_DM_NMI straightaway? You're already making it a hard
requirement that the backend (__prepare_ICR()) be able to handle a @vector that
has bits 31:8!=0. I don't see how the above scheme provides any value.
Side topic, APIC_DM_FIXED_MASK should really be APIC_DM_MASK, that "FIXED" part
is completely wrong.
E.g.
static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
unsigned int dest)
{
unsigned int icr = shortcut | dest;
/*
* Callers are allowed to encode the NMI delivery mode directly, which
* allows using the vector field to provide the NMI source (FRED only).
*/
if ((vector & APIC_DM_MASK) == APIC_DM_NMI) {
icr |= vector;
/*
* Pre-FRED, the actual vector is ignored for NMIs, but zero it
* if NMI source reporting is unsupported so as not to risk
* breakage on misbehaving hardware/hypervisors.
*/
if (!cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
icr &= ~APIC_VECTOR_MASK;
} else if (vector == NMI_VECTOR) {
icr |= APIC_DM_NMI;
} else {
icr |= APIC_DM_FIXED | vector;
}
return icr;
}
and then NMI_VECTOR_SMP_STOP would be (APIC_DM_NMI | NMI_SOURCE_VEC_IPI_SMP_STOP).
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 11/11] KVM: X86: Use common code for PV IPIs in linux guest
2024-07-09 14:39 ` [PATCH v4 11/11] KVM: X86: Use common code for PV IPIs in linux guest Jacob Pan
@ 2024-08-16 13:52 ` Sean Christopherson
0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2024-08-16 13:52 UTC (permalink / raw)
To: Jacob Pan
Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
Ingo Molnar, Borislav Petkov, Xin Li, linux-perf-users,
Peter Zijlstra, Paolo Bonzini, Tony Luck, Andy Lutomirski, acme,
kan.liang, Andi Kleen, Nikolay Borisov, Sohil Mehta, Zeng Guang
"x86/kvm:" for the scope. "KVM: x86:" is for host-side KVM, this is guest code.
On Tue, Jul 09, 2024, Jacob Pan wrote:
> The paravirtual APIC hooks in KVM, some of which are used for sending PV
> IPIs, can reuse common code for ICR preparation. This shared code also
> encompasses NMI-source reporting when in effect.
Please state what the patch actually does, not what it can do. For folks that
aren't intimately familiar with FRED (read: me), that second sentence in particular
is wildly unhelpful. I had to download yet another version of the FRED spec, and
decipher the poorly documented software-defined encoding scheme introduced by this
series just to understand what this patch does.
And the order of patches in this series is broken. Overloading the vector *before*
switching the PV IPI code to __prepare_ICR() will result in KVM sending garbage
to the host. I.e. _all_ IPI implementations need to be made safe before the NMI
source reporting code can be introduced.
> Originally-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> v4: Refine comments, no functional change.
> ---
> arch/x86/kernel/kvm.c | 10 +---------
> 1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 263f8aed4e2c..a45d60aa0302 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -516,15 +516,7 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
>
> local_irq_save(flags);
>
> - switch (vector) {
> - default:
> - icr = APIC_DM_FIXED | vector;
> - break;
> - case NMI_VECTOR:
> - icr = APIC_DM_NMI;
> - break;
> - }
> -
> + icr = __prepare_ICR(0, vector, 0);
Rather than force KVM to throw in junk dest+shorthand, what about adding a
__prepare_ICR_vector()? Then KVM doesn't need to arbitrarily pass zeroes, and
even __prepare_ICR() itself benefits (IMO), e.g. this is nice and easy to read:
static inline unsigned int __prepare_ICR(unsigned int shortcut, int vector,
unsigned int dest)
{
return shortcut | dest | __prepare_ICR_vector(vector);
}
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 07/11] KVM: VMX: Handle NMI Source report in VM exit
2024-07-09 14:39 ` [PATCH v4 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
@ 2024-08-16 14:05 ` Sean Christopherson
0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2024-08-16 14:05 UTC (permalink / raw)
To: Jacob Pan
Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
Ingo Molnar, Borislav Petkov, Xin Li, linux-perf-users,
Peter Zijlstra, Paolo Bonzini, Tony Luck, Andy Lutomirski, acme,
kan.liang, Andi Kleen, Nikolay Borisov, Sohil Mehta, Zeng Guang
On Tue, Jul 09, 2024, Jacob Pan wrote:
> From: Zeng Guang <guang.zeng@intel.com>
>
> If the "NMI exiting" VM-execution control is 1, the value of the 16-bit NMI
> source vector is saved in the exit-qualification field in the VMCS when VM
> exits occur on CPUs that support NMI source.
>
> KVM that is aware of NMI-source reporting will push the bitmask of NMI source
> vectors as the exceptoin event data field on the stack for then entry of FRED
> exception. Subsequently, the host NMI exception handler is invoked which
> will process NMI source information in the event data. This operation is
> independent of vCPU FRED enabling status.
>
> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 11 ++++++++---
> 1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 4e7b36081b76..6719c598fa5f 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7331,10 +7331,15 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
> is_nmi(vmx_get_intr_info(vcpu))) {
> kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> - if (cpu_feature_enabled(X86_FEATURE_FRED))
> - fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, 0);
> - else
> + if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> + unsigned long edata = 0;
> +
> + if (cpu_feature_enabled(X86_FEATURE_NMI_SOURCE))
> + edata = vmx_get_exit_qual(vcpu);
> + fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR, edata);
Eh, I would just east the extra VMREAD if KVM happens to run on a CPU that supports
FRED but not NMI source reporting. Per spec 7.0, the EXIT_QUALIFICATION is 0 if
NMI source isn't supported, i.e. it's guaranteed to not cause functional problems.
I assume there won't be many (any?) CPUs with FRED but not NMI_SOURCE (and if there
are, why!?!?!). On the other hand, gating this on NMI_SOURCE means KVM will need
another update if some other event data is ever added for NMIs. And readers don't
have to hunt through the FRED spec with a fine-toothed comb to suss out that
EXIT_QUALIFICATION == event data, but that KVM is getting cute and only reading
it when it can be non-zero (as far as KVM knows).
As a bonus, the code is less ugly:
if (cpu_feature_enabled(X86_FEATURE_FRED))
fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR,
vmx_get_exit_qual(vcpu));
else
vmx_do_nmi_irqoff();
> + } else {
> vmx_do_nmi_irqoff();
> + }
> kvm_after_interrupt(vcpu);
> }
>
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 06/11] KVM: VMX: Expand FRED kvm entry with event data
2024-07-09 14:39 ` [PATCH v4 06/11] KVM: VMX: Expand FRED kvm entry with event data Jacob Pan
@ 2024-08-16 14:12 ` Sean Christopherson
0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2024-08-16 14:12 UTC (permalink / raw)
To: Jacob Pan
Cc: X86 Kernel, LKML, Thomas Gleixner, Dave Hansen, H. Peter Anvin,
Ingo Molnar, Borislav Petkov, Xin Li, linux-perf-users,
Peter Zijlstra, Paolo Bonzini, Tony Luck, Andy Lutomirski, acme,
kan.liang, Andi Kleen, Nikolay Borisov, Sohil Mehta, Zeng Guang
On Tue, Jul 09, 2024, Jacob Pan wrote:
> From: Zeng Guang <guang.zeng@intel.com>
>
> According to FRED specification 10.6.2, VM exits triggered by events such as
> NMI, #DB, and #PF will have their event data stored in the exit-qualification
> field.
>
> However, #DB and #PF are owned by the running guest, which is managed by KVM.
This is wrong. #DB and #PF _may_ be guest #DBs/#PFs. When host userspace is
debugging, #DBs may be the host's. And when KVM is using shadow paging, or the
VM is running with a MAXPHYADDR smaller than the host's, #PF may be the host's
responsiblity.
> NMIs belong to the host, and the host NMI handler requires the event data
> stored in the VMCS for NMI-induced VM exits.
>
> This patch enhances the FRED KVM entry interface to include the event data
^^^^^^^^^^
Documentation/process/maintainer-tip.rst
> derived from the exit qualification.
It's not "derived" from the exit qualification, it's stored there in the exact
format that's used by FRED.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-08-16 14:12 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-09 14:38 [PATCH v4 00/11] Add support for NMI-source reporting Jacob Pan
2024-07-09 14:38 ` [PATCH v4 01/11] x86/irq: Add enumeration of NMI source reporting CPU feature Jacob Pan
2024-07-09 14:38 ` [PATCH v4 02/11] x86/irq: Define NMI source vectors Jacob Pan
2024-07-09 14:38 ` [PATCH v4 03/11] x86/irq: Extend NMI handler registration interface to include source Jacob Pan
2024-07-09 14:38 ` [PATCH v4 04/11] x86/irq: Factor out common NMI handling code Jacob Pan
2024-07-09 14:39 ` [PATCH v4 05/11] x86/irq: Process nmi sources in NMI handler Jacob Pan
2024-07-09 14:39 ` [PATCH v4 06/11] KVM: VMX: Expand FRED kvm entry with event data Jacob Pan
2024-08-16 14:12 ` Sean Christopherson
2024-07-09 14:39 ` [PATCH v4 07/11] KVM: VMX: Handle NMI Source report in VM exit Jacob Pan
2024-08-16 14:05 ` Sean Christopherson
2024-07-09 14:39 ` [PATCH v4 08/11] perf/x86: Enable NMI source reporting for perfmon Jacob Pan
2024-07-09 15:04 ` Liang, Kan
2024-07-09 14:39 ` [PATCH v4 09/11] x86/irq: Enable NMI source on IPIs delivered as NMI Jacob Pan
2024-08-16 13:45 ` Sean Christopherson
2024-07-09 14:39 ` [PATCH v4 10/11] x86/irq: Move __prepare_ICR to x86 common header Jacob Pan
2024-07-09 14:39 ` [PATCH v4 11/11] KVM: X86: Use common code for PV IPIs in linux guest Jacob Pan
2024-08-16 13:52 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).