Linux-HyperV List
 help / color / mirror / Atom feed
* [PATCH v9 17/36] x86/fred: Define a common function type fred_handler
From: Xin Li @ 2023-07-31  6:32 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

FRED event delivery establishes a full supervisor context by saving
the essential information about an event to a FRED stack frame, e.g.,
the faulting linear address of a #PF is saved as event data of a FRED
stack frame. Thus a struct pt_regs has all the needed data to handle
an event and it's the only input argument of a FRED event handler.

Define fred_handler, a common function type used in the FRED event
dispatch framework, which makes it easier to find the entry points
(via grep), allows the prototype to change if necessary without
requiring changing changes everywhere, and makes sure that all the
entry points have the proper decorations (currently noinstr, but
could change in the future.)

Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/fred.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index d76e681a806f..b45c1bea5b7f 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -68,6 +68,19 @@
 #define FRED_SSX_64_BIT_MODE_BIT	57
 #define FRED_SSX_64_BIT_MODE		_BITUL(FRED_SSX_64_BIT_MODE_BIT)
 
+/*
+ * FRED event delivery establishes a full supervisor context by
+ * saving the essential information about an event to a FRED
+ * stack frame, e.g., the faulting linear address of a #PF is
+ * saved as event data of a FRED #PF stack frame. Thus a struct
+ * pt_regs has all the needed data to handle an event and it's
+ * the only input argument of a FRED event handler.
+ *
+ * FRED handlers need to be placed in the noinstr text section.
+ */
+#define DECLARE_FRED_HANDLER(f) void f (struct pt_regs *regs)
+#define DEFINE_FRED_HANDLER(f) noinstr DECLARE_FRED_HANDLER(f)
+
 #ifdef CONFIG_X86_FRED
 
 #ifndef __ASSEMBLY__
@@ -97,6 +110,8 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
 	return fred_info(regs)->edata;
 }
 
+typedef DECLARE_FRED_HANDLER((*fred_handler));
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* CONFIG_X86_FRED */
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 18/36] x86/fred: Add a page fault entry stub for FRED
From: Xin Li @ 2023-07-31  6:32 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Add a page fault entry stub for FRED.

On a FRED system, the faulting address (CR2) is passed on the stack,
to avoid the problem of transient state. Thus we get the page fault
address from the stack instead of CR2.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/fred.h |  2 ++
 arch/x86/mm/fault.c         | 18 ++++++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index b45c1bea5b7f..fb8e7b4f2d38 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -112,6 +112,8 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
 
 typedef DECLARE_FRED_HANDLER((*fred_handler));
 
+DECLARE_FRED_HANDLER(fred_exc_page_fault);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* CONFIG_X86_FRED */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e8711b2cafaf..dd3df092d0f2 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -34,6 +34,7 @@
 #include <asm/kvm_para.h>		/* kvm_handle_async_pf		*/
 #include <asm/vdso.h>			/* fixup_vdso_exception()	*/
 #include <asm/irq_stack.h>
+#include <asm/fred.h>
 
 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -1495,9 +1496,10 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 	}
 }
 
-DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
+static __always_inline void page_fault_common(struct pt_regs *regs,
+					      unsigned int error_code,
+					      unsigned long address)
 {
-	unsigned long address = read_cr2();
 	irqentry_state_t state;
 
 	prefetchw(&current->mm->mmap_lock);
@@ -1544,3 +1546,15 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 
 	irqentry_exit(regs, state);
 }
+
+DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
+{
+	page_fault_common(regs, error_code, read_cr2());
+}
+
+#ifdef CONFIG_X86_FRED
+DEFINE_FRED_HANDLER(fred_exc_page_fault)
+{
+	page_fault_common(regs, regs->orig_ax, fred_event_data(regs));
+}
+#endif
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 19/36] x86/fred: Add a debug fault entry stub for FRED
From: Xin Li @ 2023-07-31  6:33 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Add a debug fault entry stub for FRED.

On a FRED system, the debug trap status information (DR6) is passed
on the stack, to avoid the problem of transient state. Furthermore,
FRED transitions avoid a lot of ugly corner cases the handling of which
can, and should be, skipped.

The FRED debug trap status information saved on the stack differs from DR6
in both stickiness and polarity; it is exactly what debug_read_clear_dr6()
returns, and exc_debug_user()/exc_debug_kernel() expect.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v1:
* call irqentry_nmi_{enter,exit}() in both IDT and FRED debug fault kernel
  handler (Peter Zijlstra).
---
 arch/x86/include/asm/fred.h |  1 +
 arch/x86/kernel/traps.c     | 56 +++++++++++++++++++++++++++----------
 2 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index fb8e7b4f2d38..ad7b79130b1e 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -112,6 +112,7 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
 
 typedef DECLARE_FRED_HANDLER((*fred_handler));
 
+DECLARE_FRED_HANDLER(fred_exc_debug);
 DECLARE_FRED_HANDLER(fred_exc_page_fault);
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4a817d20ce3b..b10464966a81 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -47,6 +47,7 @@
 #include <asm/debugreg.h>
 #include <asm/realmode.h>
 #include <asm/text-patching.h>
+#include <asm/fred.h>
 #include <asm/ftrace.h>
 #include <asm/traps.h>
 #include <asm/desc.h>
@@ -1021,21 +1022,9 @@ static bool notify_debug(struct pt_regs *regs, unsigned long *dr6)
 	return false;
 }
 
-static __always_inline void exc_debug_kernel(struct pt_regs *regs,
-					     unsigned long dr6)
+static __always_inline void debug_kernel_common(struct pt_regs *regs,
+						unsigned long dr6)
 {
-	/*
-	 * Disable breakpoints during exception handling; recursive exceptions
-	 * are exceedingly 'fun'.
-	 *
-	 * Since this function is NOKPROBE, and that also applies to
-	 * HW_BREAKPOINT_X, we can't hit a breakpoint before this (XXX except a
-	 * HW_BREAKPOINT_W on our stack)
-	 *
-	 * Entry text is excluded for HW_BP_X and cpu_entry_area, which
-	 * includes the entry stack is excluded for everything.
-	 */
-	unsigned long dr7 = local_db_save();
 	irqentry_state_t irq_state = irqentry_nmi_enter(regs);
 	instrumentation_begin();
 
@@ -1063,7 +1052,8 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 	 * Catch SYSENTER with TF set and clear DR_STEP. If this hit a
 	 * watchpoint at the same time then that will still be handled.
 	 */
-	if ((dr6 & DR_STEP) && is_sysenter_singlestep(regs))
+	if (!cpu_feature_enabled(X86_FEATURE_FRED) &&
+	    (dr6 & DR_STEP) && is_sysenter_singlestep(regs))
 		dr6 &= ~DR_STEP;
 
 	/*
@@ -1091,7 +1081,25 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
 out:
 	instrumentation_end();
 	irqentry_nmi_exit(regs, irq_state);
+}
 
+static __always_inline void exc_debug_kernel(struct pt_regs *regs,
+					     unsigned long dr6)
+{
+	/*
+	 * Disable breakpoints during exception handling; recursive exceptions
+	 * are exceedingly 'fun'.
+	 *
+	 * Since this function is NOKPROBE, and that also applies to
+	 * HW_BREAKPOINT_X, we can't hit a breakpoint before this (XXX except a
+	 * HW_BREAKPOINT_W on our stack)
+	 *
+	 * Entry text is excluded for HW_BP_X and cpu_entry_area, which
+	 * includes the entry stack is excluded for everything.
+	 */
+	unsigned long dr7 = local_db_save();
+
+	debug_kernel_common(regs, dr6);
 	local_db_restore(dr7);
 }
 
@@ -1180,6 +1188,24 @@ DEFINE_IDTENTRY_DEBUG_USER(exc_debug)
 {
 	exc_debug_user(regs, debug_read_clear_dr6());
 }
+
+# ifdef CONFIG_X86_FRED
+DEFINE_FRED_HANDLER(fred_exc_debug)
+{
+	/*
+	 * The FRED debug information saved onto stack differs from
+	 * DR6 in both stickiness and polarity; it is exactly what
+	 * debug_read_clear_dr6() returns.
+	 */
+	unsigned long dr6 = fred_event_data(regs);
+
+	if (user_mode(regs))
+		exc_debug_user(regs, dr6);
+	else
+		debug_kernel_common(regs, dr6);
+}
+# endif /* CONFIG_X86_FRED */
+
 #else
 /* 32 bit does not have separate entry points. */
 DEFINE_IDTENTRY_RAW(exc_debug)
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 20/36] x86/fred: Add a NMI entry stub for FRED
From: Xin Li @ 2023-07-31  6:33 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

On a FRED system, NMIs nest both with themselves and faults, transient
information is saved into the stack frame, and NMI unblocking only
happens when the stack frame indicates that so should happen.

Thus, the NMI entry stub for FRED is really quite small...

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/fred.h |  1 +
 arch/x86/kernel/nmi.c       | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index ad7b79130b1e..2a7c47dfd733 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -112,6 +112,7 @@ static __always_inline unsigned long fred_event_data(struct pt_regs *regs)
 
 typedef DECLARE_FRED_HANDLER((*fred_handler));
 
+DECLARE_FRED_HANDLER(fred_exc_nmi);
 DECLARE_FRED_HANDLER(fred_exc_debug);
 DECLARE_FRED_HANDLER(fred_exc_page_fault);
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index a0c551846b35..f803e2bcd024 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -34,6 +34,7 @@
 #include <asm/cache.h>
 #include <asm/nospec-branch.h>
 #include <asm/sev.h>
+#include <asm/fred.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/nmi.h>
@@ -643,6 +644,24 @@ void nmi_backtrace_stall_check(const struct cpumask *btp)
 
 #endif
 
+#ifdef CONFIG_X86_FRED
+DEFINE_FRED_HANDLER(fred_exc_nmi)
+{
+	/*
+	 * With FRED, CR2 and DR6 are pushed atomically on faults,
+	 * so we don't have to worry about saving and restoring them.
+	 * Breakpoint faults nest, so assume it is OK to leave DR7
+	 * enabled.
+	 */
+	irqentry_state_t irq_state = irqentry_nmi_enter(regs);
+
+	inc_irq_stat(__nmi_count);
+	default_do_nmi(regs);
+
+	irqentry_nmi_exit(regs, irq_state);
+}
+#endif
+
 void stop_nmi(void)
 {
 	ignore_nmis++;
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 21/36] x86/fred: Add a machine check entry stub for FRED
From: Xin Li @ 2023-07-31  6:33 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

Add a machine check entry stub for FRED.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v5:
* Disallow #DB inside #MCE for robustness sake (Peter Zijlstra).
---
 arch/x86/include/asm/fred.h    |  1 +
 arch/x86/kernel/cpu/mce/core.c | 15 +++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 2a7c47dfd733..f559dd9dc4f2 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -115,6 +115,7 @@ typedef DECLARE_FRED_HANDLER((*fred_handler));
 DECLARE_FRED_HANDLER(fred_exc_nmi);
 DECLARE_FRED_HANDLER(fred_exc_debug);
 DECLARE_FRED_HANDLER(fred_exc_page_fault);
+DECLARE_FRED_HANDLER(fred_exc_machine_check);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b8ad5a5b4026..98456e20f155 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -52,6 +52,7 @@
 #include <asm/mce.h>
 #include <asm/msr.h>
 #include <asm/reboot.h>
+#include <asm/fred.h>
 
 #include "internal.h"
 
@@ -2118,6 +2119,20 @@ DEFINE_IDTENTRY_MCE_USER(exc_machine_check)
 	exc_machine_check_user(regs);
 	local_db_restore(dr7);
 }
+
+#ifdef CONFIG_X86_FRED
+DEFINE_FRED_HANDLER(fred_exc_machine_check)
+{
+	unsigned long dr7;
+
+	dr7 = local_db_save();
+	if (user_mode(regs))
+		exc_machine_check_user(regs);
+	else
+		exc_machine_check_kernel(regs);
+	local_db_restore(dr7);
+}
+#endif
 #else
 /* 32bit unified entry point */
 DEFINE_IDTENTRY_RAW(exc_machine_check)
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 22/36] x86/fred: Add a double fault entry stub for FRED
From: Xin Li @ 2023-07-31  6:33 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

The IDT event delivery of a double fault pushes an error code into the
orig_ax member of the pt_regs structure, and the error code is passed
as the second argument of its C-handler exc_double_fault(), although
the pt_regs structure is already passed as the first argument.

The existing IDT double fault asm entry code does the following

  movq ORIG_RAX(%rsp), %rsi	/* get error code into 2nd argument*/
  movq $-1, ORIG_RAX(%rsp)	/* no syscall to restart */

to set the orig_ax member to -1 just before calling the C-handler.

X86_TRAP_TS, X86_TRAP_NP, X86_TRAP_SS, X86_TRAP_GP, X86_TRAP_AC and
X86_TRAP_CP are all handled in the same way because the IDT event
delivery pushes an error code into their stack frame for them.

The commit d99015b1abbad ("x86: move entry_64.S register saving out of
the macros") introduced the changes to set orig_ax to -1, but I can't
see why. Our tests with FRED seem fine if orig_ax is left unchanged
instead of set to -1. It's probably cleaner and simpler to remove the
second argument from exc_double_fault() while leave orig_ax unchanged
to pass the error code inside the first argument, at least on native
x86_64. That would be a separate, pre-FRED, patch.

For now just add a double fault entry stub for FRED, which simply
calls the existing exc_double_fault().

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/fred.h | 1 +
 arch/x86/kernel/traps.c     | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index f559dd9dc4f2..bd701ac87528 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -116,6 +116,7 @@ DECLARE_FRED_HANDLER(fred_exc_nmi);
 DECLARE_FRED_HANDLER(fred_exc_debug);
 DECLARE_FRED_HANDLER(fred_exc_page_fault);
 DECLARE_FRED_HANDLER(fred_exc_machine_check);
+DECLARE_FRED_HANDLER(fred_exc_double_fault);
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b10464966a81..49dd92458eb0 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -555,6 +555,13 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
 	instrumentation_end();
 }
 
+#ifdef CONFIG_X86_FRED
+DEFINE_FRED_HANDLER(fred_exc_double_fault)
+{
+	exc_double_fault(regs, regs->orig_ax);
+}
+#endif
+
 DEFINE_IDTENTRY(exc_bounds)
 {
 	if (notify_die(DIE_TRAP, "bounds", regs, 0,
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 23/36] x86/entry: Remove idtentry_sysvec from entry_{32,64}.S
From: Xin Li @ 2023-07-31  6:33 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

idtentry_sysvec is really just DECLARE_IDTENTRY defined in
<asm/idtentry.h>, no need to define it separately.

Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/entry/entry_32.S       | 4 ----
 arch/x86/entry/entry_64.S       | 8 --------
 arch/x86/include/asm/idtentry.h | 2 +-
 3 files changed, 1 insertion(+), 13 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 6e6af42e044a..e0f22ad8ff7e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -649,10 +649,6 @@ SYM_CODE_START_LOCAL(asm_\cfunc)
 SYM_CODE_END(asm_\cfunc)
 .endm
 
-.macro idtentry_sysvec vector cfunc
-	idtentry \vector asm_\cfunc \cfunc has_error_code=0
-.endm
-
 /*
  * Include the defines which emit the idt entries which are shared
  * shared between 32 and 64 bit and emit the __irqentry_text_* markers
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 8069151176f2..44f14b990597 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -438,14 +438,6 @@ SYM_CODE_END(\asmsym)
 	idtentry \vector asm_\cfunc \cfunc has_error_code=1
 .endm
 
-/*
- * System vectors which invoke their handlers directly and are not
- * going through the regular common device interrupt handling code.
- */
-.macro idtentry_sysvec vector cfunc
-	idtentry \vector asm_\cfunc \cfunc has_error_code=0
-.endm
-
 /**
  * idtentry_mce_db - Macro to generate entry stubs for #MC and #DB
  * @vector:		Vector number
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index cd5c10a74071..6817c0f8e323 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -447,7 +447,7 @@ __visible noinstr void func(struct pt_regs *regs,			\
 
 /* System vector entries */
 #define DECLARE_IDTENTRY_SYSVEC(vector, func)				\
-	idtentry_sysvec vector func
+	DECLARE_IDTENTRY(vector, func)
 
 #ifdef CONFIG_X86_64
 # define DECLARE_IDTENTRY_MCE(vector, func)				\
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 24/36] x86/idtentry: Incorporate definitions/declarations of the FRED external interrupt handler type
From: Xin Li @ 2023-07-31  6:33 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

FRED operates differently from IDT in terms of interrupt handling.
Instead of directly dispatching an interrupt to its handler based
on the interrupt vector, FRED requires the software to dispatch
an event to its handler based on both the event's type and vector.
Therefore, an event dispatch framework must be implemented to
facilitate the event-to-handler dispatch process.

The FRED event dispatch framework assumes control once an event is
delivered, starting from two FRED entry points, after which several
event dispatch tables are introduced to facilitate the dispatching.
The first level dispatching is event type based, and two tables need
to be defined, one for ring 3 event dispatching, and the other for
ring 0. The second level dispatching is event vector based, and
several tables need to be defined, e.g., an exception handler table
for exception dispatching.

Handlers in these tables are typically noinstr. However for external
interrupt dispatching, irqentry_{enter,exit}() and
instrumentation_{begin,end}() can be extracted from respective interrupt
handler to the dispatch framework. As a result, FRED external interrupt
handlers don't need to be noinstr.

Incorporate definitions/declarations of FRED external interrupt handler
types into the IDT entry macros.

It is probably better to rename idtentry as event_entry.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v8:
* Put IDTENTRY changes in a separate patch (Thomas Gleixner).
---
 arch/x86/include/asm/idtentry.h | 91 +++++++++++++++++++++++++++++----
 1 file changed, 82 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 6817c0f8e323..e67d111bf932 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -167,17 +167,22 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
 
 /**
  * DECLARE_IDTENTRY_IRQ - Declare functions for device interrupt IDT entry
- *			  points (common/spurious)
+ *			  points (common/spurious) and their corresponding
+ *			  software based dispatch handlers in the non-noinstr
+ *			  text section
  * @vector:	Vector number (ignored for C)
  * @func:	Function name of the entry point
  *
- * Maps to DECLARE_IDTENTRY_ERRORCODE()
+ * Maps to DECLARE_IDTENTRY_ERRORCODE(), plus a dispatch function prototype
  */
 #define DECLARE_IDTENTRY_IRQ(vector, func)				\
-	DECLARE_IDTENTRY_ERRORCODE(vector, func)
+	DECLARE_IDTENTRY_ERRORCODE(vector, func);			\
+	void dispatch_##func(struct pt_regs *regs, unsigned long error_code)
 
 /**
  * DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
+ *			 and their corresponding software based dispatch
+ *			 handlers in the non-noinstr text section
  * @func:	Function name of the entry point
  *
  * The vector number is pushed by the low level entry stub and handed
@@ -187,6 +192,11 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
  * irq_enter/exit_rcu() are invoked before the function body and the
  * KVM L1D flush request is set. Stack switching to the interrupt stack
  * has to be done in the function body if necessary.
+ *
+ * dispatch_func() is a software based dispatch handler in the non-noinstr
+ * text section, assuming the irqentry_{enter,exit}() and
+ * instrumentation_{begin,end}() helpers are invoked in the external
+ * interrupt dispatch framework before and after dispatch_func().
  */
 #define DEFINE_IDTENTRY_IRQ(func)					\
 static void __##func(struct pt_regs *regs, u32 vector);			\
@@ -204,31 +214,68 @@ __visible noinstr void func(struct pt_regs *regs,			\
 	irqentry_exit(regs, state);					\
 }									\
 									\
+__visible void dispatch_##func(struct pt_regs *regs,			\
+			       unsigned long error_code)		\
+{									\
+	u32 vector = (u32)(u8)error_code;				\
+									\
+	kvm_set_cpu_l1tf_flush_l1d();					\
+	run_irq_on_irqstack_cond(__##func, regs, vector);		\
+}									\
+									\
 static noinline void __##func(struct pt_regs *regs, u32 vector)
 
+/*
+ * Define a function type system_interrupt_handler as the element type of
+ * the table system_interrupt_handlers.
+ *
+ * System interrupt handlers don't take any interrupt vector number, or
+ * any interrupt error code as arguments, as a system interrupt handler
+ * is defined to handle a specific interrupt vector, and no error code
+ * is defined for external interrupts. It takes only one argument of type
+ * struct pt_regs *.
+ */
+#define DECLARE_SYSTEM_INTERRUPT_HANDLER(f) \
+	void f (struct pt_regs *regs)
+#define DEFINE_SYSTEM_INTERRUPT_HANDLER(f) \
+	__visible DECLARE_SYSTEM_INTERRUPT_HANDLER(f)
+typedef DECLARE_SYSTEM_INTERRUPT_HANDLER((*system_interrupt_handler));
+
 /**
  * DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
+ *			     and their corresponding software based dispatch
+ *			     handlers in the non-noinstr text section
  * @vector:	Vector number (ignored for C)
  * @func:	Function name of the entry point
  *
- * Declares three functions:
+ * Declares four functions:
  * - The ASM entry point: asm_##func
  * - The XEN PV trap entry point: xen_##func (maybe unused)
  * - The C handler called from the ASM entry point
+ * - The C handler used in the system interrupt handler table
  *
- * Maps to DECLARE_IDTENTRY().
+ * Maps to DECLARE_IDTENTRY(), plus a dispatch table function prototype
  */
 #define DECLARE_IDTENTRY_SYSVEC(vector, func)				\
-	DECLARE_IDTENTRY(vector, func)
+	DECLARE_IDTENTRY(vector, func);					\
+	DECLARE_SYSTEM_INTERRUPT_HANDLER(dispatch_table_##func)
 
 /**
  * DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
+ *			    and their corresponding software based dispatch
+ *			    handlers in the non-noinstr text section
  * @func:	Function name of the entry point
  *
  * irqentry_enter/exit() and irq_enter/exit_rcu() are invoked before the
  * function body. KVM L1D flush request is set.
  *
- * Runs the function on the interrupt stack if the entry hit kernel mode
+ * Runs the function on the interrupt stack if the entry hit kernel mode.
+ *
+ * dispatch_table_func() is used to fill the system interrupt handler table
+ * for system interrupts dispatching, assuming the irqentry_{enter,exit}()
+ * and instrumentation_{begin,end}() helpers are invoked in the external
+ * interrupt dispatch framework before and after dispatch_table_func(),
+ * thus in the non-noinstr text section.
  */
 #define DEFINE_IDTENTRY_SYSVEC(func)					\
 static void __##func(struct pt_regs *regs);				\
@@ -244,11 +291,19 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	irqentry_exit(regs, state);					\
 }									\
 									\
+DEFINE_SYSTEM_INTERRUPT_HANDLER(dispatch_table_##func)			\
+{									\
+	kvm_set_cpu_l1tf_flush_l1d();					\
+	run_sysvec_on_irqstack_cond(__##func, regs);			\
+}									\
+									\
 static noinline void __##func(struct pt_regs *regs)
 
 /**
  * DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
- *				   entry points
+ *				   entry points and their corresponding
+ *				   software based dispatch handlers in
+ *				   the non-noinstr text section
  * @func:	Function name of the entry point
  *
  * Runs the function on the interrupted stack. No switch to IRQ stack and
@@ -256,6 +311,12 @@ static noinline void __##func(struct pt_regs *regs)
  *
  * Only use for 'empty' vectors like reschedule IPI and KVM posted
  * interrupt vectors.
+ *
+ * dispatch_table_func() is used to fill the system interrupt handler table
+ * for system interrupts dispatching, assuming the irqentry_{enter,exit}()
+ * and instrumentation_{begin,end}() helpers are invoked in the external
+ * interrupt dispatch framework before and after dispatch_table_func(),
+ * thus in the non-noinstr text section.
  */
 #define DEFINE_IDTENTRY_SYSVEC_SIMPLE(func)				\
 static __always_inline void __##func(struct pt_regs *regs);		\
@@ -273,6 +334,14 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	irqentry_exit(regs, state);					\
 }									\
 									\
+DEFINE_SYSTEM_INTERRUPT_HANDLER(dispatch_table_##func)			\
+{									\
+	__irq_enter_raw();						\
+	kvm_set_cpu_l1tf_flush_l1d();					\
+	__##func (regs);						\
+	__irq_exit_raw();						\
+}									\
+									\
 static __always_inline void __##func(struct pt_regs *regs)
 
 /**
@@ -647,7 +716,11 @@ DECLARE_IDTENTRY_SYSVEC(X86_PLATFORM_IPI_VECTOR,	sysvec_x86_platform_ipi);
 #endif
 
 #ifdef CONFIG_SMP
-DECLARE_IDTENTRY(RESCHEDULE_VECTOR,			sysvec_reschedule_ipi);
+/*
+ * Use DECLARE_IDTENTRY_SYSVEC instead of DECLARE_IDTENTRY to add a
+ * software based dispatch handler declaration for RESCHEDULE_VECTOR.
+ */
+DECLARE_IDTENTRY_SYSVEC(RESCHEDULE_VECTOR,		sysvec_reschedule_ipi);
 DECLARE_IDTENTRY_SYSVEC(REBOOT_VECTOR,			sysvec_reboot);
 DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_SINGLE_VECTOR,	sysvec_call_function_single);
 DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_VECTOR,		sysvec_call_function);
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 25/36] x86/traps: Add a system interrupt handler table for system interrupt dispatch
From: Xin Li @ 2023-07-31  6:33 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-1-xin3.li@intel.com>

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

On x86, external interrupts can be categorized into two groups:
  1) System interrupts
  2) External device interrupts

All external device interrupts are directed to the common_interrupt(),
which, in turn, dispatches these external device interrupts using a
per-CPU external device interrupt dispatch table vector_irq.

To handle system interrupts, a system interrupt handler table needs to
be introduced. This table enables the direct dispatching of a system
interrupt to its corresponding handler. As a result, a software-based
dispatch function will be implemented as:

  void external_interrupt(struct pt_regs *regs)
  {
    u8 vector = regs->vector;
    if (is_system_interrupt(vector))
      system_interrupt_handlers[vector_to_sysvec(vector)](regs);
    else /* external device interrupt */
      common_interrupt(regs);
  }

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Co-developed-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v8:
* Remove junk code that assumes no local APIC on x86_64 (Thomas Gleixner).

Changes since v5:
* Initialize system_interrupt_handlers with dispatch_table_spurious_interrupt()
  instead of NULL to get rid of any NULL check (Peter Zijlstra).
---
 arch/x86/kernel/traps.c | 50 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 49dd92458eb0..e430a8c47931 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1488,6 +1488,56 @@ DEFINE_IDTENTRY_SW(iret_error)
 }
 #endif
 
+#ifdef CONFIG_X86_64
+
+static void dispatch_table_spurious_interrupt(struct pt_regs *regs)
+{
+	dispatch_spurious_interrupt(regs, regs->vector);
+}
+
+#define SYSV(x,y) [(x) - FIRST_SYSTEM_VECTOR] = y
+
+static system_interrupt_handler system_interrupt_handlers[NR_SYSTEM_VECTORS] = {
+	[0 ... NR_SYSTEM_VECTORS-1]		= dispatch_table_spurious_interrupt,
+#ifdef CONFIG_SMP
+	SYSV(RESCHEDULE_VECTOR,			dispatch_table_sysvec_reschedule_ipi),
+	SYSV(CALL_FUNCTION_VECTOR,		dispatch_table_sysvec_call_function),
+	SYSV(CALL_FUNCTION_SINGLE_VECTOR,	dispatch_table_sysvec_call_function_single),
+	SYSV(REBOOT_VECTOR,			dispatch_table_sysvec_reboot),
+#endif
+
+#ifdef CONFIG_X86_THERMAL_VECTOR
+	SYSV(THERMAL_APIC_VECTOR,		dispatch_table_sysvec_thermal),
+#endif
+
+#ifdef CONFIG_X86_MCE_THRESHOLD
+	SYSV(THRESHOLD_APIC_VECTOR,		dispatch_table_sysvec_threshold),
+#endif
+
+#ifdef CONFIG_X86_MCE_AMD
+	SYSV(DEFERRED_ERROR_VECTOR,		dispatch_table_sysvec_deferred_error),
+#endif
+
+#ifdef CONFIG_X86_LOCAL_APIC
+	SYSV(LOCAL_TIMER_VECTOR,		dispatch_table_sysvec_apic_timer_interrupt),
+	SYSV(X86_PLATFORM_IPI_VECTOR,		dispatch_table_sysvec_x86_platform_ipi),
+# ifdef CONFIG_HAVE_KVM
+	SYSV(POSTED_INTR_VECTOR,		dispatch_table_sysvec_kvm_posted_intr_ipi),
+	SYSV(POSTED_INTR_WAKEUP_VECTOR,		dispatch_table_sysvec_kvm_posted_intr_wakeup_ipi),
+	SYSV(POSTED_INTR_NESTED_VECTOR,		dispatch_table_sysvec_kvm_posted_intr_nested_ipi),
+# endif
+# ifdef CONFIG_IRQ_WORK
+	SYSV(IRQ_WORK_VECTOR,			dispatch_table_sysvec_irq_work),
+# endif
+	SYSV(SPURIOUS_APIC_VECTOR,		dispatch_table_sysvec_spurious_apic_interrupt),
+	SYSV(ERROR_APIC_VECTOR,			dispatch_table_sysvec_error_interrupt),
+#endif
+};
+
+#undef SYSV
+
+#endif /* CONFIG_X86_64 */
+
 void __init trap_init(void)
 {
 	/* Init cpu_entry_area before IST entries are set up */
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 26/36] x86/traps: Add sysvec_install() to install a system interrupt handler
From: Xin Li @ 2023-07-31  6:40 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy

Add sysvec_install() to install a system interrupt handler into both
the IDT and system_interrupt_handlers. The latter is used to dispatch
system interrupts to their respective handlers when FRED is enabled.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v8:
* Introduce a macro sysvec_install() to derive the asm handler name from
  a C handler, which simplifies the code and avoids an ugly typecast
  (Thomas Gleixner).
---
 arch/x86/include/asm/traps.h     | 19 +++++++++++++++++++
 arch/x86/kernel/cpu/acrn.c       |  5 +++--
 arch/x86/kernel/cpu/mshyperv.c   | 16 ++++++++--------
 arch/x86/kernel/kvm.c            |  2 +-
 arch/x86/kernel/traps.c          |  6 ++++++
 drivers/xen/events/events_base.c |  3 ++-
 6 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 47ecfff2c83d..cba3e4dfc329 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -47,4 +47,23 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs,
 				      struct stack_info *info);
 #endif
 
+#ifdef CONFIG_X86_64
+inline void set_sysvec_handler(unsigned int i, system_interrupt_handler func);
+
+static inline void sysvec_setup_fred(unsigned int vector, system_interrupt_handler func)
+{
+	BUG_ON(vector < FIRST_SYSTEM_VECTOR);
+	set_sysvec_handler(vector - FIRST_SYSTEM_VECTOR, func);
+}
+#else
+static inline void sysvec_setup_fred(unsigned int vector, system_interrupt_handler func)
+{
+}
+#endif
+
+#define sysvec_install(vector, func) {					\
+	sysvec_setup_fred(vector, func);				\
+	alloc_intr_gate(vector, asm_##func);				\
+}
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index 485441b7f030..a879b4b87740 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -18,6 +18,7 @@
 #include <asm/hypervisor.h>
 #include <asm/idtentry.h>
 #include <asm/irq_regs.h>
+#include <asm/traps.h>
 
 static u32 __init acrn_detect(void)
 {
@@ -26,8 +27,8 @@ static u32 __init acrn_detect(void)
 
 static void __init acrn_init_platform(void)
 {
-	/* Setup the IDT for ACRN hypervisor callback */
-	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_acrn_hv_callback);
+	/* Install system interrupt handler for ACRN hypervisor callback */
+	sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_acrn_hv_callback);
 
 	x86_platform.calibrate_tsc = acrn_get_tsc_khz;
 	x86_platform.calibrate_cpu = acrn_get_tsc_khz;
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index c7969e806c64..134830a7f575 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -28,6 +28,7 @@
 #include <asm/i8259.h>
 #include <asm/apic.h>
 #include <asm/timer.h>
+#include <asm/traps.h>
 #include <asm/reboot.h>
 #include <asm/nmi.h>
 #include <clocksource/hyperv_timer.h>
@@ -480,19 +481,18 @@ static void __init ms_hyperv_init_platform(void)
 	 */
 	x86_platform.apic_post_init = hyperv_init;
 	hyperv_setup_mmu_ops();
-	/* Setup the IDT for hypervisor callback */
-	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_hyperv_callback);
 
-	/* Setup the IDT for reenlightenment notifications */
+	/* Install system interrupt handler for hypervisor callback */
+	sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_hyperv_callback);
+
+	/* Install system interrupt handler for reenlightenment notifications */
 	if (ms_hyperv.features & HV_ACCESS_REENLIGHTENMENT) {
-		alloc_intr_gate(HYPERV_REENLIGHTENMENT_VECTOR,
-				asm_sysvec_hyperv_reenlightenment);
+		sysvec_install(HYPERV_REENLIGHTENMENT_VECTOR, sysvec_hyperv_reenlightenment);
 	}
 
-	/* Setup the IDT for stimer0 */
+	/* Install system interrupt handler for stimer0 */
 	if (ms_hyperv.misc_features & HV_STIMER_DIRECT_MODE_AVAILABLE) {
-		alloc_intr_gate(HYPERV_STIMER0_VECTOR,
-				asm_sysvec_hyperv_stimer0);
+		sysvec_install(HYPERV_STIMER0_VECTOR, sysvec_hyperv_stimer0);
 	}
 
 # ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1cceac5984da..12c799412c5d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -829,7 +829,7 @@ static void __init kvm_guest_init(void)
 
 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
 		static_branch_enable(&kvm_async_pf_enabled);
-		alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_kvm_asyncpf_interrupt);
+		sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_kvm_asyncpf_interrupt);
 	}
 
 #ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index e430a8c47931..9040c7f01c93 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1536,6 +1536,12 @@ static system_interrupt_handler system_interrupt_handlers[NR_SYSTEM_VECTORS] = {
 
 #undef SYSV
 
+void set_sysvec_handler(unsigned int i, system_interrupt_handler func)
+{
+	BUG_ON(i >= NR_SYSTEM_VECTORS);
+	system_interrupt_handlers[i] = func;
+}
+
 #endif /* CONFIG_X86_64 */
 
 void __init trap_init(void)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c7715f8bd452..16d51338e1f8 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -45,6 +45,7 @@
 #include <asm/irq.h>
 #include <asm/io_apic.h>
 #include <asm/i8259.h>
+#include <asm/traps.h>
 #include <asm/xen/cpuid.h>
 #include <asm/xen/pci.h>
 #endif
@@ -2249,7 +2250,7 @@ static __init void xen_alloc_callback_vector(void)
 		return;
 
 	pr_info("Xen HVM callback vector for event delivery is enabled\n");
-	alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_xen_hvm_callback);
+	sysvec_install(HYPERVISOR_CALLBACK_VECTOR, sysvec_xen_hvm_callback);
 }
 #else
 void xen_setup_callback_vector(void) {}
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 27/36] x86/traps: Add external_interrupt() to dispatch external interrupts
From: Xin Li @ 2023-07-31  6:40 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

external_interrupt() dispatches all external interrupts: it checks if an
external interrupt is a system interrupt, if yes it dipatches it through
the system_interrupt_handlers table, otherwise to
dispatch_common_interrupt().

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Co-developed-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v8:
* Reword the patch description, which was confusing (Thomas Gleixner).

Changes since v5:
* Initialize system_interrupt_handlers with dispatch_table_spurious_interrupt()
  instead of NULL to get rid of a branch (Peter Zijlstra).
---
 arch/x86/include/asm/traps.h |  2 ++
 arch/x86/kernel/traps.c      | 18 ++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index cba3e4dfc329..48daa78ee88c 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -66,4 +66,6 @@ static inline void sysvec_setup_fred(unsigned int vector, system_interrupt_handl
 	alloc_intr_gate(vector, asm_##func);				\
 }
 
+int external_interrupt(struct pt_regs *regs);
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9040c7f01c93..90fdfcccee7a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1542,6 +1542,24 @@ void set_sysvec_handler(unsigned int i, system_interrupt_handler func)
 	system_interrupt_handlers[i] = func;
 }
 
+int external_interrupt(struct pt_regs *regs)
+{
+	unsigned int vector = regs->vector;
+	unsigned int sysvec = vector - FIRST_SYSTEM_VECTOR;
+
+	if (unlikely(vector < FIRST_EXTERNAL_VECTOR)) {
+		pr_err("invalid external interrupt vector %d\n", vector);
+		return -EINVAL;
+	}
+
+	if (sysvec < NR_SYSTEM_VECTORS)
+		system_interrupt_handlers[sysvec](regs);
+	else
+		dispatch_common_interrupt(regs, vector);
+
+	return 0;
+}
+
 #endif /* CONFIG_X86_64 */
 
 void __init trap_init(void)
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 28/36] x86/idtentry: Incorporate declaration/definition of the FRED exception handler type
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy

The existing IDT exception C-handlers of X86_TRAP_TS, X86_TRAP_NP,
X86_TRAP_SS, X86_TRAP_GP, X86_TRAP_AC and X86_TRAP_CP take an error
code as the second argument, thus their FRED version handlers simply
call the corresponding existing IDT handlers with orig_ax from the
pt_regs structure as the second argument.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/idtentry.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index e67d111bf932..3b743c3fbe91 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -11,6 +11,7 @@
 #include <linux/entry-common.h>
 #include <linux/hardirq.h>
 
+#include <asm/fred.h>
 #include <asm/irq_stack.h>
 
 /**
@@ -67,13 +68,16 @@ static __always_inline void __##func(struct pt_regs *regs)
 
 /**
  * DECLARE_IDTENTRY_ERRORCODE - Declare functions for simple IDT entry points
+ *				and their corresponding software based
+ *				dispatch handler
  *				Error code pushed by hardware
  * @vector:	Vector number (ignored for C)
  * @func:	Function name of the entry point
  *
- * Declares three functions:
+ * Declares four functions:
  * - The ASM entry point: asm_##func
  * - The XEN PV trap entry point: xen_##func (maybe unused)
+ * - The C handler called from the FRED event dispatch framework
  * - The C handler called from the ASM entry point
  *
  * Same as DECLARE_IDTENTRY, but has an extra error_code argument for the
@@ -82,14 +86,19 @@ static __always_inline void __##func(struct pt_regs *regs)
 #define DECLARE_IDTENTRY_ERRORCODE(vector, func)			\
 	asmlinkage void asm_##func(void);				\
 	asmlinkage void xen_asm_##func(void);				\
+	__visible DECLARE_FRED_HANDLER(fred_##func);			\
 	__visible void func(struct pt_regs *regs, unsigned long error_code)
 
 /**
  * DEFINE_IDTENTRY_ERRORCODE - Emit code for simple IDT entry points
+ *			       and their corresponding software based
+ *			       dispatch handler
  *			       Error code pushed by hardware
  * @func:	Function name of the entry point
  *
- * Same as DEFINE_IDTENTRY, but has an extra error_code argument
+ * Same as DEFINE_IDTENTRY, but has an extra error_code argument. The
+ * fred_func() simply calls func() with passing orig_ax as its second
+ * argument.
  */
 #define DEFINE_IDTENTRY_ERRORCODE(func)					\
 static __always_inline void __##func(struct pt_regs *regs,		\
@@ -106,6 +115,11 @@ __visible noinstr void func(struct pt_regs *regs,			\
 	irqentry_exit(regs, state);					\
 }									\
 									\
+__visible DEFINE_FRED_HANDLER(fred_##func)				\
+{									\
+	func(regs, regs->orig_ax);					\
+}									\
+									\
 static __always_inline void __##func(struct pt_regs *regs,		\
 				     unsigned long error_code)
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 29/36] x86/fred: FRED entry/exit and dispatch code
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

The code to actually handle kernel and event entry/exit using
FRED. It is split up into two files thus:

- entry_64_fred.S contains the actual entrypoints and exit code, and
  saves and restores registers.
- entry_fred.c contains the two-level event dispatch code for FRED.
  The first-level dispatch is on the event type, and the second-level
  is on the event vector.

Originally-by: Megha Dey <megha.dey@intel.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Co-developed-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v8:
* Don't do syscall early out in fred_entry_from_user() before there are
  proper performance numbers and justifications (Thomas Gleixner).
* Add the control exception handler to the FRED exception handler table
  (Thomas Gleixner).
* Add ENDBR to the FRED_ENTER asm macro.
* Reflect the FRED spec 5.0 change that ERETS and ERETU add 8 to %rsp
  before popping the return context from the stack.

Changes since v1:
* Initialize a FRED exception handler to fred_bad_event() instead of NULL
  if no FRED handler defined for an exception vector (Peter Zijlstra).
* Push calling irqentry_{enter,exit}() and instrumentation_{begin,end}()
  down into individual FRED exception handlers, instead of in the dispatch
  framework (Peter Zijlstra).
---
 arch/x86/entry/Makefile               |   5 +-
 arch/x86/entry/entry_64_fred.S        |  53 +++++++
 arch/x86/entry/entry_fred.c           | 220 ++++++++++++++++++++++++++
 arch/x86/include/asm/asm-prototypes.h |   1 +
 arch/x86/include/asm/fred.h           |   4 +
 arch/x86/include/asm/idtentry.h       |   4 +
 6 files changed, 286 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/entry/entry_64_fred.S
 create mode 100644 arch/x86/entry/entry_fred.c

diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
index ca2fe186994b..c93e7f5c2a06 100644
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -18,6 +18,9 @@ obj-y				+= vdso/
 obj-y				+= vsyscall/
 
 obj-$(CONFIG_PREEMPTION)	+= thunk_$(BITS).o
+CFLAGS_entry_fred.o		+= -fno-stack-protector
+CFLAGS_REMOVE_entry_fred.o	+= -pg $(CC_FLAGS_FTRACE)
+obj-$(CONFIG_X86_FRED)		+= entry_64_fred.o entry_fred.o
+
 obj-$(CONFIG_IA32_EMULATION)	+= entry_64_compat.o syscall_32.o
 obj-$(CONFIG_X86_X32_ABI)	+= syscall_x32.o
-
diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
new file mode 100644
index 000000000000..4ae12d557db3
--- /dev/null
+++ b/arch/x86/entry/entry_64_fred.S
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * The actual FRED entry points.
+ */
+
+#include <asm/fred.h>
+
+#include "calling.h"
+
+	.code64
+	.section ".noinstr.text", "ax"
+
+.macro FRED_ENTER
+	UNWIND_HINT_END_OF_STACK
+	ENDBR
+	PUSH_AND_CLEAR_REGS
+	movq	%rsp, %rdi	/* %rdi -> pt_regs */
+.endm
+
+.macro FRED_EXIT
+	UNWIND_HINT_REGS
+	POP_REGS
+.endm
+
+/*
+ * The new RIP value that FRED event delivery establishes is
+ * IA32_FRED_CONFIG & ~FFFH for events that occur in ring 3.
+ * Thus the FRED ring 3 entry point must be 4K page aligned.
+ */
+	.align 4096
+
+SYM_CODE_START_NOALIGN(fred_entrypoint_user)
+	FRED_ENTER
+	call	fred_entry_from_user
+SYM_INNER_LABEL(fred_exit_user, SYM_L_GLOBAL)
+	FRED_EXIT
+	ERETU
+SYM_CODE_END(fred_entrypoint_user)
+
+.fill fred_entrypoint_kernel - ., 1, 0xcc
+
+/*
+ * The new RIP value that FRED event delivery establishes is
+ * (IA32_FRED_CONFIG & ~FFFH) + 256 for events that occur in
+ * ring 0, i.e., fred_entrypoint_user + 256.
+ */
+	.org fred_entrypoint_user+256
+SYM_CODE_START_NOALIGN(fred_entrypoint_kernel)
+	FRED_ENTER
+	call	fred_entry_from_kernel
+	FRED_EXIT
+	ERETS
+SYM_CODE_END(fred_entrypoint_kernel)
diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
new file mode 100644
index 000000000000..1688e7e09370
--- /dev/null
+++ b/arch/x86/entry/entry_fred.c
@@ -0,0 +1,220 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * This contains the dispatch functions called from the entry point
+ * assembly.
+ */
+
+#include <linux/kernel.h>
+#include <linux/kdebug.h>
+#include <linux/nospec.h>
+
+#include <asm/fred.h>
+#include <asm/idtentry.h>
+#include <asm/syscall.h>
+#include <asm/trapnr.h>
+#include <asm/traps.h>
+#include <asm/kdebug.h>
+
+static DEFINE_FRED_HANDLER(fred_bad_event)
+{
+	irqentry_state_t irq_state = irqentry_nmi_enter(regs);
+
+	instrumentation_begin();
+
+	/* Panic on events from a high stack level */
+	if (regs->sl > 0) {
+		pr_emerg("PANIC: invalid or fatal FRED event; event type %u "
+			 "vector %u error 0x%lx aux 0x%lx at %04x:%016lx\n",
+			 regs->type, regs->vector, regs->orig_ax,
+			 fred_event_data(regs), regs->cs, regs->ip);
+		die("invalid or fatal FRED event", regs, regs->orig_ax);
+		panic("invalid or fatal FRED event");
+	} else {
+		unsigned long flags = oops_begin();
+		int sig = SIGKILL;
+
+		pr_alert("BUG: invalid or fatal FRED event; event type %u "
+			 "vector %u error 0x%lx aux 0x%lx at %04x:%016lx\n",
+			 regs->type, regs->vector, regs->orig_ax,
+			 fred_event_data(regs), regs->cs, regs->ip);
+
+		if (__die("Invalid or fatal FRED event", regs, regs->orig_ax))
+			sig = 0;
+
+		oops_end(flags, regs, sig);
+	}
+
+	instrumentation_end();
+	irqentry_nmi_exit(regs, irq_state);
+}
+
+static DEFINE_FRED_HANDLER(fred_exception)
+{
+	/*
+	 * Exceptions that cannot happen on FRED h/w are set to fred_bad_event().
+	 */
+	static const fred_handler exception_handlers[NUM_EXCEPTION_VECTORS] = {
+		[0 ... NUM_EXCEPTION_VECTORS-1] = fred_bad_event,
+
+		[X86_TRAP_DE]		= exc_divide_error,
+		[X86_TRAP_DB]		= fred_exc_debug,
+		[X86_TRAP_BP]		= exc_int3,
+		[X86_TRAP_OF]		= exc_overflow,
+		[X86_TRAP_BR]		= exc_bounds,
+		[X86_TRAP_UD]		= exc_invalid_op,
+		[X86_TRAP_NM]		= exc_device_not_available,
+		[X86_TRAP_DF]		= fred_exc_double_fault,
+		[X86_TRAP_TS]		= fred_exc_invalid_tss,
+		[X86_TRAP_NP]		= fred_exc_segment_not_present,
+		[X86_TRAP_SS]		= fred_exc_stack_segment,
+		[X86_TRAP_GP]		= fred_exc_general_protection,
+		[X86_TRAP_PF]		= fred_exc_page_fault,
+		[X86_TRAP_MF]		= exc_coprocessor_error,
+		[X86_TRAP_AC]		= fred_exc_alignment_check,
+		[X86_TRAP_MC]		= fred_exc_machine_check,
+		[X86_TRAP_XF]		= exc_simd_coprocessor_error,
+		[X86_TRAP_CP]		= fred_exc_control_protection,
+	};
+
+	exception_handlers[regs->vector](regs);
+}
+
+static __always_inline void fred_emulate_trap(struct pt_regs *regs)
+{
+	regs->orig_ax = 0;
+	fred_exception(regs);
+}
+
+static __always_inline void fred_emulate_fault(struct pt_regs *regs)
+{
+	regs->ip -= regs->instr_len;
+	fred_emulate_trap(regs);
+}
+
+static DEFINE_FRED_HANDLER(fred_sw_interrupt_user)
+{
+	/*
+	 * In compat mode INT $0x80 (32bit system call) is
+	 * performance-critical. Handle it first.
+	 */
+	if (IS_ENABLED(CONFIG_IA32_EMULATION) &&
+	    likely(regs->vector == IA32_SYSCALL_VECTOR)) {
+		regs->orig_ax = regs->ax;
+		regs->ax = -ENOSYS;
+		return do_int80_syscall_32(regs);
+	}
+
+	/*
+	 * Some software exceptions can also be triggered as
+	 * int instructions, for historical reasons.
+	 */
+	switch (regs->vector) {
+	case X86_TRAP_BP:
+	case X86_TRAP_OF:
+		fred_emulate_trap(regs);
+		break;
+	default:
+		regs->vector = X86_TRAP_GP;
+		fred_emulate_fault(regs);
+		break;
+	}
+}
+
+static DEFINE_FRED_HANDLER(fred_other_default)
+{
+	regs->vector = X86_TRAP_UD;
+	fred_emulate_fault(regs);
+}
+
+static DEFINE_FRED_HANDLER(fred_syscall)
+{
+	regs->orig_ax = regs->ax;
+	regs->ax = -ENOSYS;
+	do_syscall_64(regs, regs->orig_ax);
+}
+
+#if IS_ENABLED(CONFIG_IA32_EMULATION)
+/*
+ * Emulate SYSENTER if applicable. This is not the preferred system
+ * call in 32-bit mode under FRED, rather int $0x80 is preferred and
+ * exported in the vdso.
+ */
+static DEFINE_FRED_HANDLER(fred_sysenter)
+{
+	regs->orig_ax = regs->ax;
+	regs->ax = -ENOSYS;
+	do_fast_syscall_32(regs);
+}
+#else
+#define fred_sysenter fred_other_default
+#endif
+
+static DEFINE_FRED_HANDLER(fred_other)
+{
+	static const fred_handler user_other_handlers[FRED_NUM_OTHER_VECTORS] =
+	{
+		/*
+		 * Vector 0 of the other event type is not used
+		 * per FRED spec 5.0.
+		 */
+		[0]		= fred_other_default,
+		[FRED_SYSCALL]	= fred_syscall,
+		[FRED_SYSENTER]	= fred_sysenter
+	};
+
+	user_other_handlers[regs->vector](regs);
+}
+
+static DEFINE_FRED_HANDLER(fred_hw_interrupt)
+{
+	irqentry_state_t state = irqentry_enter(regs);
+
+	instrumentation_begin();
+	external_interrupt(regs);
+	instrumentation_end();
+	irqentry_exit(regs, state);
+}
+
+__visible noinstr void fred_entry_from_user(struct pt_regs *regs)
+{
+	static const fred_handler user_handlers[FRED_EVENT_TYPE_COUNT] =
+	{
+		[EVENT_TYPE_HWINT]	= fred_hw_interrupt,
+		[EVENT_TYPE_RESERVED]	= fred_bad_event,
+		[EVENT_TYPE_NMI]	= fred_exc_nmi,
+		[EVENT_TYPE_SWINT]	= fred_sw_interrupt_user,
+		[EVENT_TYPE_HWFAULT]	= fred_exception,
+		[EVENT_TYPE_SWFAULT]	= fred_exception,
+		[EVENT_TYPE_PRIVSW]	= fred_exception,
+		[EVENT_TYPE_OTHER]	= fred_other
+	};
+
+	/*
+	 * FRED employs a two-level event dispatch mechanism, with the
+	 * first-level on the type of an event and the second-level on
+	 * its vector. Here is the first-level dispatch for ring 3 events.
+	 */
+	user_handlers[regs->type](regs);
+}
+
+__visible noinstr void fred_entry_from_kernel(struct pt_regs *regs)
+{
+	static const fred_handler kernel_handlers[FRED_EVENT_TYPE_COUNT] =
+	{
+		[EVENT_TYPE_HWINT]	= fred_hw_interrupt,
+		[EVENT_TYPE_RESERVED]	= fred_bad_event,
+		[EVENT_TYPE_NMI]	= fred_exc_nmi,
+		[EVENT_TYPE_SWINT]	= fred_bad_event,
+		[EVENT_TYPE_HWFAULT]	= fred_exception,
+		[EVENT_TYPE_SWFAULT]	= fred_exception,
+		[EVENT_TYPE_PRIVSW]	= fred_exception,
+		[EVENT_TYPE_OTHER]	= fred_bad_event
+	};
+
+	/*
+	 * FRED employs a two-level event dispatch mechanism, with the
+	 * first-level on the type of an event and the second-level on
+	 * its vector. Here is the first-level dispatch for ring 0 events.
+	 */
+	kernel_handlers[regs->type](regs);
+}
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index b1a98fa38828..36505b991f88 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -13,6 +13,7 @@
 #include <asm/preempt.h>
 #include <asm/asm.h>
 #include <asm/gsseg.h>
+#include <asm/fred.h>
 
 #ifndef CONFIG_X86_CMPXCHG64
 extern void cmpxchg8b_emu(void);
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index bd701ac87528..3c91f0eae62e 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -118,6 +118,10 @@ DECLARE_FRED_HANDLER(fred_exc_page_fault);
 DECLARE_FRED_HANDLER(fred_exc_machine_check);
 DECLARE_FRED_HANDLER(fred_exc_double_fault);
 
+/* The actual assembly entry point for ring 3 and 0 */
+extern asmlinkage __visible void fred_entrypoint_user(void);
+extern asmlinkage __visible void fred_entrypoint_kernel(void);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* CONFIG_X86_FRED */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 3b743c3fbe91..0df3a3cc7e0f 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -661,6 +661,8 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	exc_machine_check);
 #ifdef CONFIG_XEN_PV
 DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_exc_machine_check);
 #endif
+#else
+#define fred_exc_machine_check		fred_bad_event
 #endif
 
 /* NMI */
@@ -699,6 +701,8 @@ DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_DF,	xenpv_exc_double_fault);
 /* #CP */
 #ifdef CONFIG_X86_KERNEL_IBT
 DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_CP,	exc_control_protection);
+#else
+#define fred_exc_control_protection	fred_bad_event
 #endif
 
 /* #VC */
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 30/36] x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy

If the stack frame contains an invalid user context (e.g. due to invalid SS,
a non-canonical RIP, etc.) the ERETU instruction will trap (#SS or #GP).

From a Linux point of view, this really should be considered a user space
failure, so use the standard fault fixup mechanism to intercept the fault,
fix up the exception frame, and redirect execution to fred_entrypoint_user.
The end result is that it appears just as if the hardware had taken the
exception immediately after completing the transition to user space.

Suggested-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v8:
* Reflect the FRED spec 5.0 change that ERETS and ERETU add 8 to %rsp
  before popping the return context from the stack.

Changes since v6:
* Add a comment to explain why it is safe to write to the previous FRED stack
  frame. (Lai Jiangshan).

Changes since v5:
* Move the NMI bit from an invalid stack frame, which caused ERETU to fault,
  to the fault handler's stack frame, thus to unblock NMI ASAP if NMI is blocked
  (Lai Jiangshan).
---
 arch/x86/entry/entry_64_fred.S             |  5 +-
 arch/x86/include/asm/extable_fixup_types.h |  4 +-
 arch/x86/mm/extable.c                      | 79 ++++++++++++++++++++++
 3 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 4ae12d557db3..d24bf7f10ac8 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -3,6 +3,7 @@
  * The actual FRED entry points.
  */
 
+#include <asm/asm.h>
 #include <asm/fred.h>
 
 #include "calling.h"
@@ -34,7 +35,9 @@ SYM_CODE_START_NOALIGN(fred_entrypoint_user)
 	call	fred_entry_from_user
 SYM_INNER_LABEL(fred_exit_user, SYM_L_GLOBAL)
 	FRED_EXIT
-	ERETU
+1:	ERETU
+
+	_ASM_EXTABLE_TYPE(1b, fred_entrypoint_user, EX_TYPE_ERETU)
 SYM_CODE_END(fred_entrypoint_user)
 
 .fill fred_entrypoint_kernel - ., 1, 0xcc
diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
index 991e31cfde94..1585c798a02f 100644
--- a/arch/x86/include/asm/extable_fixup_types.h
+++ b/arch/x86/include/asm/extable_fixup_types.h
@@ -64,6 +64,8 @@
 #define	EX_TYPE_UCOPY_LEN4		(EX_TYPE_UCOPY_LEN | EX_DATA_IMM(4))
 #define	EX_TYPE_UCOPY_LEN8		(EX_TYPE_UCOPY_LEN | EX_DATA_IMM(8))
 
-#define EX_TYPE_ZEROPAD			20 /* longword load with zeropad on fault */
+#define	EX_TYPE_ZEROPAD			20 /* longword load with zeropad on fault */
+
+#define	EX_TYPE_ERETU			21
 
 #endif
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 271dcb2deabc..0874f29e85ef 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -6,6 +6,7 @@
 #include <xen/xen.h>
 
 #include <asm/fpu/api.h>
+#include <asm/fred.h>
 #include <asm/sev.h>
 #include <asm/traps.h>
 #include <asm/kdebug.h>
@@ -223,6 +224,80 @@ static bool ex_handler_ucopy_len(const struct exception_table_entry *fixup,
 	return ex_handler_uaccess(fixup, regs, trapnr, fault_address);
 }
 
+#ifdef CONFIG_X86_FRED
+static bool ex_handler_eretu(const struct exception_table_entry *fixup,
+			     struct pt_regs *regs, unsigned long error_code)
+{
+	struct pt_regs *uregs = (struct pt_regs *)
+		(regs->sp - offsetof(struct pt_regs, orig_ax));
+	unsigned short ss = uregs->ss;
+	unsigned short cs = uregs->cs;
+
+	/*
+	 * Move the NMI bit from the invalid stack frame, which caused ERETU
+	 * to fault, to the fault handler's stack frame, thus to unblock NMI
+	 * with the fault handler's ERETS instruction ASAP if NMI is blocked.
+	 */
+	regs->nmi = uregs->nmi;
+
+	/*
+	 * Sync event information to uregs, i.e., the ERETU return frame, but
+	 * is it safe to write to the ERETU return frame which is just above
+	 * current event stack frame?
+	 *
+	 * The RSP used by FRED to push a stack frame is not the value in %rsp,
+	 * it is calculated from %rsp with the following 2 steps:
+	 * 1) RSP = %rsp - (IA32_FRED_CONFIG & 0x1c0)	// Reserve N*64 bytes
+	 * 2) RSP = RSP & ~0x3f		// Align to a 64-byte cache line
+	 * when an event delivery doesn't trigger a stack level change.
+	 *
+	 * Here is an example with N*64 (N=1) bytes reserved:
+	 *
+	 *  64-byte cache line ==>  ______________
+	 *                         |___Reserved___|
+	 *                         |__Event_data__|
+	 *                         |_____SS_______|
+	 *                         |_____RSP______|
+	 *                         |_____FLAGS____|
+	 *                         |_____CS_______|
+	 *                         |_____IP_______|
+	 *  64-byte cache line ==> |__Error_code__| <== ERETU return frame
+	 *                         |______________|
+	 *                         |______________|
+	 *                         |______________|
+	 *                         |______________|
+	 *                         |______________|
+	 *                         |______________|
+	 *                         |______________|
+	 *  64-byte cache line ==> |______________| <== RSP after step 1) and 2)
+	 *                         |___Reserved___|
+	 *                         |__Event_data__|
+	 *                         |_____SS_______|
+	 *                         |_____RSP______|
+	 *                         |_____FLAGS____|
+	 *                         |_____CS_______|
+	 *                         |_____IP_______|
+	 *  64-byte cache line ==> |__Error_code__| <== ERETS return frame
+	 *
+	 * Thus a new FRED stack frame will always be pushed below a previous
+	 * FRED stack frame ((N*64) bytes may be reserved between), and it is
+	 * safe to write to a previous FRED stack frame as they never overlap.
+	 */
+	fred_info(uregs)->edata = fred_event_data(regs);
+	uregs->ssx = regs->ssx;
+	uregs->ss = ss;
+	/* The NMI bit was moved away above */
+	uregs->nmi = 0;
+	uregs->csx = regs->csx;
+	uregs->sl = 0;
+	uregs->wfe = 0;
+	uregs->cs = cs;
+	uregs->orig_ax = error_code;
+
+	return ex_handler_default(fixup, regs);
+}
+#endif
+
 int ex_get_fixup_type(unsigned long ip)
 {
 	const struct exception_table_entry *e = search_exception_tables(ip);
@@ -300,6 +375,10 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
 		return ex_handler_ucopy_len(e, regs, trapnr, fault_addr, reg, imm);
 	case EX_TYPE_ZEROPAD:
 		return ex_handler_zeropad(e, regs, fault_addr);
+#ifdef CONFIG_X86_FRED
+	case EX_TYPE_ERETU:
+		return ex_handler_eretu(e, regs, error_code);
+#endif
 	}
 	BUG();
 }
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 31/36] x86/traps: Export external_interrupt() for handling IRQ in IRQ induced VM exits
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731064133.3881-1-xin3.li@intel.com>

Export external_interrupt() for handling IRQ in IRQ induced VM exits.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kernel/traps.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 90fdfcccee7a..6143ad56008e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1560,6 +1560,11 @@ int external_interrupt(struct pt_regs *regs)
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+/* For KVM VMX to handle IRQs in IRQ induced VM exits. */
+EXPORT_SYMBOL_GPL(external_interrupt);
+#endif
+
 #endif /* CONFIG_X86_64 */
 
 void __init trap_init(void)
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 32/36] x86/fred: Export fred_entrypoint_kernel() for handling NMI in NMI induced VM exits
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731064133.3881-1-xin3.li@intel.com>

Export fred_entrypoint_kernel() for handling NMI in NMI induced VM exits.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/entry/entry_64_fred.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index d24bf7f10ac8..12063267d2ac 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -4,6 +4,7 @@
  */
 
 #include <asm/asm.h>
+#include <asm/export.h>
 #include <asm/fred.h>
 
 #include "calling.h"
@@ -54,3 +55,4 @@ SYM_CODE_START_NOALIGN(fred_entrypoint_kernel)
 	FRED_EXIT
 	ERETS
 SYM_CODE_END(fred_entrypoint_kernel)
+EXPORT_SYMBOL(fred_entrypoint_kernel)
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 36/36] x86/fred: Disable FRED by default in its early stage
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731064133.3881-1-xin3.li@intel.com>

Disable FRED by default in its early stage.

To enable FRED, a new kernel command line option "fred" needs to be added.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v7:
* Add a log message when FRED is enabled.
---
 Documentation/admin-guide/kernel-parameters.txt | 4 ++++
 arch/x86/kernel/cpu/common.c                    | 3 +++
 arch/x86/kernel/fred.c                          | 3 +++
 3 files changed, 10 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a1457995fd41..cb12decfcdc0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1513,6 +1513,10 @@
 			Warning: use of this parameter will taint the kernel
 			and may cause unknown problems.
 
+	fred
+			Forcefully enable flexible return and event delivery,
+			which is otherwise disabled by default.
+
 	ftrace=[tracer]
 			[FTRACE] will set and start the specified tracer
 			as early as possible in order to facilitate early
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index b34a8a138755..38cf4f64a56e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1451,6 +1451,9 @@ static void __init cpu_parse_early_param(void)
 	char *argptr = arg, *opt;
 	int arglen, taint = 0;
 
+	if (!cmdline_find_option_bool(boot_command_line, "fred"))
+		setup_clear_cpu_cap(X86_FEATURE_FRED);
+
 #ifdef CONFIG_X86_32
 	if (cmdline_find_option_bool(boot_command_line, "no387"))
 #ifdef CONFIG_MATH_EMULATION
diff --git a/arch/x86/kernel/fred.c b/arch/x86/kernel/fred.c
index 7fdf79c964a8..a4a726ea9fc2 100644
--- a/arch/x86/kernel/fred.c
+++ b/arch/x86/kernel/fred.c
@@ -8,6 +8,9 @@
 
 void cpu_init_fred_exceptions(void)
 {
+	/* When FRED is enabled by default, this log message may not needed */
+	pr_info("Initialize FRED on CPU%d\n", smp_processor_id());
+
 	wrmsrl(MSR_IA32_FRED_CONFIG,
 	       /* Reserve for CALL emulation */
 	       FRED_CONFIG_REDZONE |
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 35/36] x86/fred: FRED initialization code
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731064133.3881-1-xin3.li@intel.com>

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

The code to initialize FRED when it's available and _not_ disabled.

cpu_init_fred_exceptions() is the core function to initialize FRED,
which
  1. Sets up FRED entrypoints for events happening in ring 0 and 3.
  2. Sets up a default stack for event handling.
  3. Sets up dedicated event stacks for DB/NMI/MC/DF, equivalent to
     the IDT IST stacks.
  4. Forces 32-bit system calls to use "int $0x80" only.
  5. Enables FRED and invalidtes IDT.

When the FRED is used, cpu_init_exception_handling() initializes FRED
through calling cpu_init_fred_exceptions(), otherwise it sets up TSS
IST and loads IDT.

As FRED uses the ring 3 FRED entrypoint for SYSCALL and SYSENTER,
it skips setting up SYSCALL/SYSENTER related MSRs, e.g., MSR_LSTAR.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Co-developed-by: Xin Li <xin3.li@intel.com>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v8:
* Move this patch after all required changes are in place (Thomas
  Gleixner).

Changes since v5:
* Add a comment for FRED stack level settings (Lai Jiangshan).
* Define #DB/NMI/#MC/#DF stack levels using macros.
---
 arch/x86/include/asm/fred.h  | 28 ++++++++++++++++
 arch/x86/include/asm/traps.h |  4 ++-
 arch/x86/kernel/Makefile     |  1 +
 arch/x86/kernel/cpu/common.c | 28 +++++++++++++---
 arch/x86/kernel/fred.c       | 64 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/irqinit.c    |  7 +++-
 arch/x86/kernel/traps.c      | 11 ++++++-
 7 files changed, 135 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/kernel/fred.c

diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 3c91f0eae62e..6031138b778c 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -68,6 +68,19 @@
 #define FRED_SSX_64_BIT_MODE_BIT	57
 #define FRED_SSX_64_BIT_MODE		_BITUL(FRED_SSX_64_BIT_MODE_BIT)
 
+/* #DB in the kernel would imply the use of a kernel debugger. */
+#define FRED_DB_STACK_LEVEL		1
+#define FRED_NMI_STACK_LEVEL		2
+#define FRED_MC_STACK_LEVEL		2
+/*
+ * #DF is the highest level because a #DF means "something went wrong
+ * *while delivering an exception*." The number of cases for which that
+ * can happen with FRED is drastically reduced and basically amounts to
+ * "the stack you pointed me to is broken." Thus, always change stacks
+ * on #DF, which means it should be at the highest level.
+ */
+#define FRED_DF_STACK_LEVEL		3
+
 /*
  * FRED event delivery establishes a full supervisor context by
  * saving the essential information about an event to a FRED
@@ -122,8 +135,23 @@ DECLARE_FRED_HANDLER(fred_exc_double_fault);
 extern asmlinkage __visible void fred_entrypoint_user(void);
 extern asmlinkage __visible void fred_entrypoint_kernel(void);
 
+void cpu_init_fred_exceptions(void);
+void fred_setup_apic(void);
+
 #endif /* __ASSEMBLY__ */
 
+#else
+#ifndef __ASSEMBLY__
+static inline void cpu_init_fred_exceptions(void)
+{
+	BUG();
+}
+
+static inline void fred_setup_apic(void)
+{
+	BUG();
+}
+#endif
 #endif /* CONFIG_X86_FRED */
 
 #endif /* ASM_X86_FRED_H */
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 48daa78ee88c..da7e8ab1d66d 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -49,6 +49,7 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs,
 
 #ifdef CONFIG_X86_64
 inline void set_sysvec_handler(unsigned int i, system_interrupt_handler func);
+bool is_sysvec_used(unsigned int i);
 
 static inline void sysvec_setup_fred(unsigned int vector, system_interrupt_handler func)
 {
@@ -63,7 +64,8 @@ static inline void sysvec_setup_fred(unsigned int vector, system_interrupt_handl
 
 #define sysvec_install(vector, func) {					\
 	sysvec_setup_fred(vector, func);				\
-	alloc_intr_gate(vector, asm_##func);				\
+	if (!cpu_feature_enabled(X86_FEATURE_FRED))			\
+		alloc_intr_gate(vector, asm_##func);			\
 }
 
 int external_interrupt(struct pt_regs *regs);
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4070a01c11b7..46d8daa11c17 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -48,6 +48,7 @@ obj-y			+= platform-quirks.o
 obj-y			+= process_$(BITS).o signal.o signal_$(BITS).o
 obj-y			+= traps.o idt.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
 obj-y			+= time.o ioport.o dumpstack.o nmi.o
+obj-$(CONFIG_X86_FRED)	+= fred.o
 obj-$(CONFIG_MODIFY_LDT_SYSCALL)	+= ldt.o
 obj-y			+= setup.o x86_init.o i8259.o irqinit.o
 obj-$(CONFIG_JUMP_LABEL)	+= jump_label.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index bb03dacc5fb8..b34a8a138755 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -62,6 +62,7 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/fred.h>
 #include <asm/uv/uv.h>
 #include <asm/set_memory.h>
 #include <asm/traps.h>
@@ -2062,13 +2063,24 @@ static inline void idt_syscall_init(void)
 	       X86_EFLAGS_AC|X86_EFLAGS_ID);
 }
 
+static inline void fred_syscall_init(void)
+{
+	/* Both sysexit and sysret cause #UD when FRED is enabled */
+	wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)GDT_ENTRY_INVALID_SEG);
+	wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
+}
+
 /* May not be marked __init: used by software suspend */
 void syscall_init(void)
 {
 	/* The default user and kernel segments */
 	wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
 
-	idt_syscall_init();
+	if (cpu_feature_enabled(X86_FEATURE_FRED))
+		fred_syscall_init();
+	else
+		idt_syscall_init();
 }
 
 #else	/* CONFIG_X86_64 */
@@ -2184,8 +2196,6 @@ void cpu_init_exception_handling(void)
 	/* paranoid_entry() gets the CPU number from the GDT */
 	setup_getcpu(cpu);
 
-	/* IST vectors need TSS to be set up. */
-	tss_setup_ist(tss);
 	tss_setup_io_bitmap(tss);
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 
@@ -2194,8 +2204,16 @@ void cpu_init_exception_handling(void)
 	/* GHCB needs to be setup to handle #VC. */
 	setup_ghcb();
 
-	/* Finally load the IDT */
-	load_current_idt();
+	if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+		/* Set up FRED exception handling */
+		cpu_init_fred_exceptions();
+	} else {
+		/* IST vectors need TSS to be set up. */
+		tss_setup_ist(tss);
+
+		/* Finally load the IDT */
+		load_current_idt();
+	}
 }
 
 /*
diff --git a/arch/x86/kernel/fred.c b/arch/x86/kernel/fred.c
new file mode 100644
index 000000000000..7fdf79c964a8
--- /dev/null
+++ b/arch/x86/kernel/fred.c
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/kernel.h>
+
+#include <asm/desc.h>
+#include <asm/fred.h>
+#include <asm/tlbflush.h>
+#include <asm/traps.h>
+
+void cpu_init_fred_exceptions(void)
+{
+	wrmsrl(MSR_IA32_FRED_CONFIG,
+	       /* Reserve for CALL emulation */
+	       FRED_CONFIG_REDZONE |
+	       FRED_CONFIG_INT_STKLVL(0) |
+	       FRED_CONFIG_ENTRYPOINT(fred_entrypoint_user));
+
+	/*
+	 * The purpose of separate stacks for NMI, #DB and #MC *in the kernel*
+	 * (remember that user space faults are always taken on stack level 0)
+	 * is to avoid overflowing the kernel stack.
+	 */
+	wrmsrl(MSR_IA32_FRED_STKLVLS,
+	       FRED_STKLVL(X86_TRAP_DB,  FRED_DB_STACK_LEVEL) |
+	       FRED_STKLVL(X86_TRAP_NMI, FRED_NMI_STACK_LEVEL) |
+	       FRED_STKLVL(X86_TRAP_MC,  FRED_MC_STACK_LEVEL) |
+	       FRED_STKLVL(X86_TRAP_DF,  FRED_DF_STACK_LEVEL));
+
+	/* The FRED equivalents to IST stacks... */
+	wrmsrl(MSR_IA32_FRED_RSP1, __this_cpu_ist_top_va(DB));
+	wrmsrl(MSR_IA32_FRED_RSP2, __this_cpu_ist_top_va(NMI));
+	wrmsrl(MSR_IA32_FRED_RSP3, __this_cpu_ist_top_va(DF));
+
+	/* Not used with FRED */
+	wrmsrl(MSR_LSTAR, 0ULL);
+	wrmsrl(MSR_CSTAR, 0ULL);
+	wrmsrl_safe(MSR_IA32_SYSENTER_CS,  (u64)GDT_ENTRY_INVALID_SEG);
+	wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL);
+	wrmsrl_safe(MSR_IA32_SYSENTER_EIP, 0ULL);
+
+	/* Enable FRED */
+	cr4_set_bits(X86_CR4_FRED);
+	/* Any further IDT use is a bug */
+	idt_invalidate();
+
+	/* Use int $0x80 for 32-bit system calls in FRED mode */
+	setup_clear_cpu_cap(X86_FEATURE_SYSENTER32);
+	setup_clear_cpu_cap(X86_FEATURE_SYSCALL32);
+}
+
+/*
+ * Initialize system vectors from a FRED perspective, so
+ * lapic_assign_system_vectors() can do its job.
+ */
+void __init fred_setup_apic(void)
+{
+	int i;
+
+	for (i = 0; i < FIRST_EXTERNAL_VECTOR; i++)
+		set_bit(i, system_vectors);
+
+	for (i = 0; i < NR_SYSTEM_VECTORS; i++)
+		if (is_sysvec_used(i))
+			set_bit(i + FIRST_SYSTEM_VECTOR, system_vectors);
+}
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index c683666876f1..2a510f72dd11 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -28,6 +28,7 @@
 #include <asm/setup.h>
 #include <asm/i8259.h>
 #include <asm/traps.h>
+#include <asm/fred.h>
 #include <asm/prom.h>
 
 /*
@@ -96,7 +97,11 @@ void __init native_init_IRQ(void)
 	/* Execute any quirks before the call gates are initialised: */
 	x86_init.irqs.pre_vector_init();
 
-	idt_setup_apic_and_irq_gates();
+	if (cpu_feature_enabled(X86_FEATURE_FRED))
+		fred_setup_apic();
+	else
+		idt_setup_apic_and_irq_gates();
+
 	lapic_assign_system_vectors();
 
 	if (!acpi_ioapic && !of_ioapic && nr_legacy_irqs()) {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 6143ad56008e..21eeba7b188f 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1542,6 +1542,12 @@ void set_sysvec_handler(unsigned int i, system_interrupt_handler func)
 	system_interrupt_handlers[i] = func;
 }
 
+bool is_sysvec_used(unsigned int i)
+{
+	BUG_ON(i >= NR_SYSTEM_VECTORS);
+	return system_interrupt_handlers[i] != dispatch_table_spurious_interrupt;
+}
+
 int external_interrupt(struct pt_regs *regs)
 {
 	unsigned int vector = regs->vector;
@@ -1577,7 +1583,10 @@ void __init trap_init(void)
 
 	/* Initialize TSS before setting up traps so ISTs work */
 	cpu_init_exception_handling();
+
 	/* Setup traps as cpu_init() might #GP */
-	idt_setup_traps();
+	if (!cpu_feature_enabled(X86_FEATURE_FRED))
+		idt_setup_traps();
+
 	cpu_init();
 }
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 33/36] KVM: VMX: Add VMX_DO_FRED_EVENT_IRQOFF for IRQ/NMI handling
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731064133.3881-1-xin3.li@intel.com>

Compared to an IDT stack frame, a FRED stack frame has extra 16 bytes of
information pushed at the regular stack top and 8 bytes of error code _always_
pushed at the regular stack bottom, add VMX_DO_FRED_EVENT_IRQOFF to generate
FRED stack frames with event type and vector properly set. Thus, IRQ/NMI can
be handled with the existing approach when FRED is enabled.

For IRQ handling, general purpose registers are pushed to the stack to form
a pt_regs structure, which is then used to call external_interrupt(). As a
result, IRQ handling no longer re-enters the noinstr code.

Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---

Changes since v8:
* Add a new macro VMX_DO_FRED_EVENT_IRQOFF for FRED instead of refactoring
  VMX_DO_EVENT_IRQOFF (Sean Christopherson).
* Do NOT use a trampoline, just LEA+PUSH the return RIP, PUSH the error code,
  and jump to the FRED kernel entry point for NMI or call external_interrupt()
  for IRQs (Sean Christopherson).
* Call external_interrupt() only when FRED is enabled, and convert the non-FRED
  handling to external_interrupt() after FRED lands (Sean Christopherson).
---
 arch/x86/kvm/vmx/vmenter.S | 88 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c     | 19 ++++++--
 2 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 07e927d4d099..5ee6a57b59a5 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -2,12 +2,14 @@
 #include <linux/linkage.h>
 #include <asm/asm.h>
 #include <asm/bitsperlong.h>
+#include <asm/fred.h>
 #include <asm/kvm_vcpu_regs.h>
 #include <asm/nospec-branch.h>
 #include <asm/percpu.h>
 #include <asm/segment.h>
 #include "kvm-asm-offsets.h"
 #include "run_flags.h"
+#include "../../entry/calling.h"
 
 #define WORD_SIZE (BITS_PER_LONG / 8)
 
@@ -31,6 +33,80 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
 
+#ifdef CONFIG_X86_FRED
+.macro VMX_DO_FRED_EVENT_IRQOFF branch_insn branch_target nmi=0
+	/*
+	 * Unconditionally create a stack frame, getting the correct RSP on the
+	 * stack (for x86-64) would take two instructions anyways, and RBP can
+	 * be used to restore RSP to make objtool happy (see below).
+	 */
+	push %_ASM_BP
+	mov %_ASM_SP, %_ASM_BP
+
+	/*
+	 * Don't check the FRED stack level, the call stack leading to this
+	 * helper is effectively constant and shallow (relatively speaking).
+	 *
+	 * Emulate the FRED-defined redzone and stack alignment.
+	 */
+	sub $(FRED_CONFIG_REDZONE_AMOUNT << 6), %rsp
+	and $FRED_STACK_FRAME_RSP_MASK, %rsp
+
+	/*
+	 * A FRED stack frame has extra 16 bytes of information pushed at the
+	 * regular stack top compared to an IDT stack frame.
+	 */
+	push $0		/* Reserved by FRED, must be 0 */
+	push $0		/* FRED event data, 0 for NMI and external interrupts */
+
+	shl $32, %rdi				/* FRED event type and vector */
+	.if \nmi
+	bts $FRED_SSX_NMI_BIT, %rdi		/* Set the NMI bit */
+	.endif
+	bts $FRED_SSX_64_BIT_MODE_BIT, %rdi	/* Set the 64-bit mode */
+	or $__KERNEL_DS, %rdi
+	push %rdi
+	push %rbp
+	pushf
+	mov $__KERNEL_CS, %rax
+	push %rax
+
+	/*
+	 * Unlike the IDT event delivery, FRED _always_ pushes an error code
+	 * after pushing the return RIP, thus the CALL instruction CANNOT be
+	 * used here to push the return RIP, otherwise there is no chance to
+	 * push an error code before invoking the IRQ/NMI handler.
+	 *
+	 * Use LEA to get the return RIP and push it, then push an error code.
+	 */
+	lea 1f(%rip), %rax
+	push %rax
+	push $0		/* FRED error code, 0 for NMI and external interrupts */
+
+	.if \nmi == 0
+	PUSH_REGS
+	mov %rsp, %rdi
+	.endif
+
+	\branch_insn \branch_target
+
+	.if \nmi == 0
+	POP_REGS
+	.endif
+
+1:
+	/*
+	 * "Restore" RSP from RBP, even though IRET has already unwound RSP to
+	 * the correct value.  objtool doesn't know the callee will IRET and,
+	 * without the explicit restore, thinks the stack is getting walloped.
+	 * Using an unwind hint is problematic due to x86-64's dynamic alignment.
+	 */
+	mov %_ASM_BP, %_ASM_SP
+	pop %_ASM_BP
+	RET
+.endm
+#endif
+
 .macro VMX_DO_EVENT_IRQOFF call_insn call_target
 	/*
 	 * Unconditionally create a stack frame, getting the correct RSP on the
@@ -299,6 +375,12 @@ SYM_INNER_LABEL_ALIGN(vmx_vmexit, SYM_L_GLOBAL)
 
 SYM_FUNC_END(__vmx_vcpu_run)
 
+#ifdef CONFIG_X86_FRED
+SYM_FUNC_START(vmx_do_fred_nmi_irqoff)
+	VMX_DO_FRED_EVENT_IRQOFF jmp fred_entrypoint_kernel nmi=1
+SYM_FUNC_END(vmx_do_fred_nmi_irqoff)
+#endif
+
 SYM_FUNC_START(vmx_do_nmi_irqoff)
 	VMX_DO_EVENT_IRQOFF call asm_exc_nmi_kvm_vmx
 SYM_FUNC_END(vmx_do_nmi_irqoff)
@@ -357,6 +439,12 @@ SYM_FUNC_START(vmread_error_trampoline)
 SYM_FUNC_END(vmread_error_trampoline)
 #endif
 
+#ifdef CONFIG_X86_FRED
+SYM_FUNC_START(vmx_do_fred_interrupt_irqoff)
+	VMX_DO_FRED_EVENT_IRQOFF call external_interrupt
+SYM_FUNC_END(vmx_do_fred_interrupt_irqoff)
+#endif
+
 SYM_FUNC_START(vmx_do_interrupt_irqoff)
 	VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
 SYM_FUNC_END(vmx_do_interrupt_irqoff)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0ecf4be2c6af..4e90c69a92bf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6890,6 +6890,14 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
 	memset(vmx->pi_desc.pir, 0, sizeof(vmx->pi_desc.pir));
 }
 
+#ifdef CONFIG_X86_FRED
+void vmx_do_fred_interrupt_irqoff(unsigned int vector);
+void vmx_do_fred_nmi_irqoff(unsigned int vector);
+#else
+#define vmx_do_fred_interrupt_irqoff(x) BUG()
+#define vmx_do_fred_nmi_irqoff(x) BUG()
+#endif
+
 void vmx_do_interrupt_irqoff(unsigned long entry);
 void vmx_do_nmi_irqoff(void);
 
@@ -6932,14 +6940,16 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 {
 	u32 intr_info = vmx_get_intr_info(vcpu);
 	unsigned int vector = intr_info & INTR_INFO_VECTOR_MASK;
-	gate_desc *desc = (gate_desc *)host_idt_base + vector;
 
 	if (KVM_BUG(!is_external_intr(intr_info), vcpu->kvm,
 	    "unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
 	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
-	vmx_do_interrupt_irqoff(gate_offset(desc));
+	if (cpu_feature_enabled(X86_FEATURE_FRED))
+		vmx_do_fred_interrupt_irqoff(vector);	/* Event type is 0 */
+	else
+		vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
 	kvm_after_interrupt(vcpu);
 
 	vcpu->arch.at_instruction_boundary = true;
@@ -7225,7 +7235,10 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 	if ((u16)vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
 	    is_nmi(vmx_get_intr_info(vcpu))) {
 		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
-		vmx_do_nmi_irqoff();
+		if (cpu_feature_enabled(X86_FEATURE_FRED))
+			vmx_do_fred_nmi_irqoff((EVENT_TYPE_NMI << 16) | NMI_VECTOR);
+		else
+			vmx_do_nmi_irqoff();
 		kvm_after_interrupt(vcpu);
 	}
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH v9 34/36] x86/syscall: Split IDT syscall setup code into idt_syscall_init()
From: Xin Li @ 2023-07-31  6:41 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel
  Cc: Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Xin Li, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731064133.3881-1-xin3.li@intel.com>

Split IDT syscall setup code into idt_syscall_init() to make it
cleaner to add FRED syscall setup code.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/kernel/cpu/common.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 331b06d19f7f..bb03dacc5fb8 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2027,10 +2027,8 @@ static void wrmsrl_cstar(unsigned long val)
 		wrmsrl(MSR_CSTAR, val);
 }
 
-/* May not be marked __init: used by software suspend */
-void syscall_init(void)
+static inline void idt_syscall_init(void)
 {
-	wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
 	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
 
 #ifdef CONFIG_IA32_EMULATION
@@ -2064,6 +2062,15 @@ void syscall_init(void)
 	       X86_EFLAGS_AC|X86_EFLAGS_ID);
 }
 
+/* May not be marked __init: used by software suspend */
+void syscall_init(void)
+{
+	/* The default user and kernel segments */
+	wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
+
+	idt_syscall_init();
+}
+
 #else	/* CONFIG_X86_64 */
 
 #ifdef CONFIG_STACKPROTECTOR
-- 
2.34.1


^ permalink raw reply related

* W sprawie samochodu
From:  Radosław Grabowski  @ 2023-07-31  8:35 UTC (permalink / raw)
  To: linux-hyperv

Dzień dobry,

chcielibyśmy zapewnić Państwu kompleksowe rozwiązania, jeśli chodzi o system monitoringu GPS.

Precyzyjne monitorowanie pojazdów na mapach cyfrowych, śledzenie ich parametrów eksploatacyjnych w czasie rzeczywistym oraz kontrola paliwa to kluczowe funkcjonalności naszego systemu. 

Organizowanie pracy pracowników jest dzięki temu prostsze i bardziej efektywne, a oszczędności i optymalizacja w zakresie ponoszonych kosztów, mają dla każdego przedsiębiorcy ogromne znaczenie.

Dopasujemy naszą ofertę do Państwa oczekiwań i potrzeb organizacji. Czy moglibyśmy porozmawiać o naszej propozycji?


Pozdrawiam
Radosław Grabowski

^ permalink raw reply

* Re: [PATCH v9 05/36] x86/opcode: Add ERETU, ERETS instructions to x86-opcode-map
From: Masami Hiramatsu @ 2023-07-31  9:43 UTC (permalink / raw)
  To: Xin Li
  Cc: linux-doc, linux-kernel, linux-edac, linux-hyperv, kvm, xen-devel,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, Andy Lutomirski, Oleg Nesterov,
	Tony Luck, K . Y . Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov, Sean Christopherson,
	Peter Zijlstra, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Josh Poimboeuf, Paul E . McKenney,
	Catalin Marinas, Randy Dunlap, Steven Rostedt, Kim Phillips,
	Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Reinette Chatre, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masami Hiramatsu, Masahiro Yamada,
	Ze Gao, Fei Li, Conghui, Ashok Raj, Jason A . Donenfeld,
	Mark Rutland, Jacob Pan, Jiapeng Chong, Jane Malalane,
	David Woodhouse, Boris Ostrovsky, Arnaldo Carvalho de Melo,
	Yantengsi, Christophe Leroy, Sathvika Vasireddy
In-Reply-To: <20230731063317.3720-6-xin3.li@intel.com>

On Sun, 30 Jul 2023 23:32:46 -0700
Xin Li <xin3.li@intel.com> wrote:

> From: "H. Peter Anvin (Intel)" <hpa@zytor.com>
> 
> Add instruction opcodes used by FRED ERETU/ERETS to x86-opcode-map.
> 
> Opcode numbers are per FRED spec v5.0.
> 
> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> Tested-by: Shan Kang <shan.kang@intel.com>
> Signed-off-by: Xin Li <xin3.li@intel.com>

This looks good to me. (ERETS has the opcode F2 0F 01 CA, ERETU
has the opcode F3 0F 01 CA)

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thank you,

> ---
>  arch/x86/lib/x86-opcode-map.txt       | 2 +-
>  tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index 5168ee0360b2..7a269e269dc0 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -1052,7 +1052,7 @@ EndTable
>  
>  GrpTable: Grp7
>  0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
> -1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B)
> +1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
>  2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
>  3: LIDT Ms
>  4: SMSW Mw/Rv
> diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
> index 5168ee0360b2..7a269e269dc0 100644
> --- a/tools/arch/x86/lib/x86-opcode-map.txt
> +++ b/tools/arch/x86/lib/x86-opcode-map.txt
> @@ -1052,7 +1052,7 @@ EndTable
>  
>  GrpTable: Grp7
>  0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
> -1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B)
> +1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
>  2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
>  3: LIDT Ms
>  4: SMSW Mw/Rv
> -- 
> 2.34.1
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* RE: [PATCH v9 05/36] x86/opcode: Add ERETU, ERETS instructions to x86-opcode-map
From: Li, Xin3 @ 2023-07-31 16:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-edac@vger.kernel.org, linux-hyperv@vger.kernel.org,
	kvm@vger.kernel.org, xen-devel@lists.xenproject.org,
	Jonathan Corbet, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86@kernel.org, H . Peter Anvin, Lutomirski, Andy,
	Oleg Nesterov, Luck, Tony, K . Y . Srinivasan, Haiyang Zhang,
	Wei Liu, Cui, Dexuan, Paolo Bonzini, Wanpeng Li, Vitaly Kuznetsov,
	Christopherson,, Sean, Peter Zijlstra, Gross, Jurgen,
	Stefano Stabellini, Oleksandr Tyshchenko, Josh Poimboeuf,
	Paul E . McKenney, Catalin Marinas, Randy Dunlap, Steven Rostedt,
	Kim Phillips, Hyeonggon Yoo, Liam R . Howlett, Sebastian Reichel,
	Kirill A . Shutemov, Suren Baghdasaryan, Pawan Gupta, Jiaxi Chen,
	Babu Moger, Jim Mattson, Sandipan Das, Lai Jiangshan,
	Hans de Goede, Chatre, Reinette, Daniel Sneddon, Breno Leitao,
	Nikunj A Dadhania, Brian Gerst, Sami Tolvanen,
	Alexander Potapenko, Andrew Morton, Arnd Bergmann,
	Eric W . Biederman, Kees Cook, Masahiro Yamada, Ze Gao, Li, Fei1,
	Conghui, Raj, Ashok, Jason A . Donenfeld, Mark Rutland, Jacob Pan,
	Jiapeng Chong, Jane Malalane, Woodhouse, David, Ostrovsky, Boris,
	Arnaldo Carvalho de Melo, Yantengsi, Christophe Leroy,
	Sathvika Vasireddy
In-Reply-To: <20230731184309.9888dfd44fa1a5fd69c779cd@kernel.org>

> > Add instruction opcodes used by FRED ERETU/ERETS to x86-opcode-map.
> >
> > Opcode numbers are per FRED spec v5.0.
> >
> > Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
> > Tested-by: Shan Kang <shan.kang@intel.com>
> > Signed-off-by: Xin Li <xin3.li@intel.com>
> 
> This looks good to me. (ERETS has the opcode F2 0F 01 CA, ERETU has the opcode
> F3 0F 01 CA)
> 
> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> 
> Thank you,

Thanks! Will add your RB.
  Xin


^ permalink raw reply

* Re: [PATCH 1/1] scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
From: Martin K. Petersen @ 2023-07-31 19:46 UTC (permalink / raw)
  To: kys, longli, wei.liu, decui, jejb, linux-hyperv, linux-kernel,
	linux-scsi, Michael Kelley
  Cc: Martin K . Petersen, stable
In-Reply-To: <1690606764-79669-1-git-send-email-mikelley@microsoft.com>

On Fri, 28 Jul 2023 21:59:24 -0700, Michael Kelley wrote:

> Hyper-V provides the ability to connect Fibre Channel LUNs to the host
> system and present them in a guest VM as a SCSI device. I/O to the vFC
> device is handled by the storvsc driver. The storvsc driver includes
> a partial integration with the FC transport implemented in the generic
> portion of the Linux SCSI subsystem so that FC attributes can be
> displayed in /sys.  However, the partial integration means that some
> aspects of vFC don't work properly. Unfortunately, a full and correct
> integration isn't practical because of limitations in what Hyper-V
> provides to the guest.
> 
> [...]

Applied to 6.5/scsi-fixes, thanks!

[1/1] scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
      https://git.kernel.org/mkp/scsi/c/175544ad48cb

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH] hv_balloon: Update the balloon driver to use the SBRM API
From: Boqun Feng @ 2023-07-31 21:25 UTC (permalink / raw)
  To: levymitchell0
  Cc: K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	linux-hyperv, linux-kernel, mikelly, peterz
In-Reply-To: <20230726-master-v1-1-b2ce6a4538db@gmail.com>

Hi Mitchell,

On Wed, Jul 26, 2023 at 12:23:31AM +0000, Mitchell Levy via B4 Relay wrote:
> From: Mitchell Levy <levymitchell0@gmail.com>
> 
> 
> 
> ---

I don't know whether it's a tool issue or something else, but all words
after the "---" line in the email will be discarded from a commit log.
You can try to apply this patch yourself and see the result:

	b4 shazam 20230726-master-v1-1-b2ce6a4538db@gmail.com 

> This patch is intended as a proof-of-concept for the new SBRM
> machinery[1]. For some brief background, the idea behind SBRM is using
> the __cleanup__ attribute to automatically unlock locks (or otherwise
> release resources) when they go out of scope, similar to C++ style RAII.
> This promises some benefits such as making code simpler (particularly
> where you have lots of goto fail; type constructs) as well as reducing
> the surface area for certain kinds of bugs.
> 
> The changes in this patch should not result in any difference in how the
> code actually runs (i.e., it's purely an exercise in this new syntax
> sugar). In one instance SBRM was not appropriate, so I left that part
> alone, but all other locking/unlocking is handled automatically in this
> patch.
> 
> Link: https://lore.kernel.org/all/20230626125726.GU4253@hirez.programming.kicks-ass.net/ [1]
> 
> Suggested-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: "Mitchell Levy (Microsoft)" <levymitchell0@gmail.com>

Beside the above format issue, the code looks good to me, nice job!

Feel free to add:

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

> ---
>  drivers/hv/hv_balloon.c | 82 +++++++++++++++++++++++--------------------------
>  1 file changed, 38 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index dffcc894f117..2812601e84da 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -8,6 +8,7 @@
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> +#include <linux/cleanup.h>
>  #include <linux/kernel.h>
>  #include <linux/jiffies.h>
>  #include <linux/mman.h>
> @@ -646,7 +647,7 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
>  			      void *v)
>  {
>  	struct memory_notify *mem = (struct memory_notify *)v;
> -	unsigned long flags, pfn_count;
> +	unsigned long pfn_count;
>  
>  	switch (val) {
>  	case MEM_ONLINE:
> @@ -655,21 +656,22 @@ static int hv_memory_notifier(struct notifier_block *nb, unsigned long val,
>  		break;
>  
>  	case MEM_OFFLINE:
> -		spin_lock_irqsave(&dm_device.ha_lock, flags);
> -		pfn_count = hv_page_offline_check(mem->start_pfn,
> -						  mem->nr_pages);
> -		if (pfn_count <= dm_device.num_pages_onlined) {
> -			dm_device.num_pages_onlined -= pfn_count;
> -		} else {
> -			/*
> -			 * We're offlining more pages than we managed to online.
> -			 * This is unexpected. In any case don't let
> -			 * num_pages_onlined wrap around zero.
> -			 */
> -			WARN_ON_ONCE(1);
> -			dm_device.num_pages_onlined = 0;
> +		scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> +			pfn_count = hv_page_offline_check(mem->start_pfn,
> +							  mem->nr_pages);
> +			if (pfn_count <= dm_device.num_pages_onlined) {
> +				dm_device.num_pages_onlined -= pfn_count;
> +			} else {
> +				/*
> +				 * We're offlining more pages than we
> +				 * managed to online. This is
> +				 * unexpected. In any case don't let
> +				 * num_pages_onlined wrap around zero.
> +				 */
> +				WARN_ON_ONCE(1);
> +				dm_device.num_pages_onlined = 0;
> +			}
>  		}
> -		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  		break;
>  	case MEM_GOING_ONLINE:
>  	case MEM_GOING_OFFLINE:
> @@ -721,24 +723,23 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  	unsigned long start_pfn;
>  	unsigned long processed_pfn;
>  	unsigned long total_pfn = pfn_count;
> -	unsigned long flags;
>  
>  	for (i = 0; i < (size/HA_CHUNK); i++) {
>  		start_pfn = start + (i * HA_CHUNK);
>  
> -		spin_lock_irqsave(&dm_device.ha_lock, flags);
> -		has->ha_end_pfn +=  HA_CHUNK;
> +		scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> +			has->ha_end_pfn +=  HA_CHUNK;
>  
> -		if (total_pfn > HA_CHUNK) {
> -			processed_pfn = HA_CHUNK;
> -			total_pfn -= HA_CHUNK;
> -		} else {
> -			processed_pfn = total_pfn;
> -			total_pfn = 0;
> -		}
> +			if (total_pfn > HA_CHUNK) {
> +				processed_pfn = HA_CHUNK;
> +				total_pfn -= HA_CHUNK;
> +			} else {
> +				processed_pfn = total_pfn;
> +				total_pfn = 0;
> +			}
>  
> -		has->covered_end_pfn +=  processed_pfn;
> -		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
> +			has->covered_end_pfn +=  processed_pfn;
> +		}
>  
>  		reinit_completion(&dm_device.ol_waitevent);
>  
> @@ -758,10 +759,10 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  				 */
>  				do_hot_add = false;
>  			}
> -			spin_lock_irqsave(&dm_device.ha_lock, flags);
> -			has->ha_end_pfn -= HA_CHUNK;
> -			has->covered_end_pfn -=  processed_pfn;
> -			spin_unlock_irqrestore(&dm_device.ha_lock, flags);
> +			scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> +				has->ha_end_pfn -= HA_CHUNK;
> +				has->covered_end_pfn -=  processed_pfn;
> +			}
>  			break;
>  		}
>  
> @@ -781,10 +782,9 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  static void hv_online_page(struct page *pg, unsigned int order)
>  {
>  	struct hv_hotadd_state *has;
> -	unsigned long flags;
>  	unsigned long pfn = page_to_pfn(pg);
>  
> -	spin_lock_irqsave(&dm_device.ha_lock, flags);
> +	guard(spinlock_irqsave)(&dm_device.ha_lock);
>  	list_for_each_entry(has, &dm_device.ha_region_list, list) {
>  		/* The page belongs to a different HAS. */
>  		if ((pfn < has->start_pfn) ||
> @@ -794,7 +794,6 @@ static void hv_online_page(struct page *pg, unsigned int order)
>  		hv_bring_pgs_online(has, pfn, 1UL << order);
>  		break;
>  	}
> -	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  }
>  
>  static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
> @@ -803,9 +802,8 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
>  	struct hv_hotadd_gap *gap;
>  	unsigned long residual, new_inc;
>  	int ret = 0;
> -	unsigned long flags;
>  
> -	spin_lock_irqsave(&dm_device.ha_lock, flags);
> +	guard(spinlock_irqsave)(&dm_device.ha_lock);
>  	list_for_each_entry(has, &dm_device.ha_region_list, list) {
>  		/*
>  		 * If the pfn range we are dealing with is not in the current
> @@ -852,7 +850,6 @@ static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
>  		ret = 1;
>  		break;
>  	}
> -	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  
>  	return ret;
>  }
> @@ -947,7 +944,6 @@ static unsigned long process_hot_add(unsigned long pg_start,
>  {
>  	struct hv_hotadd_state *ha_region = NULL;
>  	int covered;
> -	unsigned long flags;
>  
>  	if (pfn_cnt == 0)
>  		return 0;
> @@ -979,9 +975,9 @@ static unsigned long process_hot_add(unsigned long pg_start,
>  		ha_region->covered_end_pfn = pg_start;
>  		ha_region->end_pfn = rg_start + rg_size;
>  
> -		spin_lock_irqsave(&dm_device.ha_lock, flags);
> -		list_add_tail(&ha_region->list, &dm_device.ha_region_list);
> -		spin_unlock_irqrestore(&dm_device.ha_lock, flags);
> +		scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
> +			list_add_tail(&ha_region->list, &dm_device.ha_region_list);
> +		}
>  	}
>  
>  do_pg_range:
> @@ -2047,7 +2043,6 @@ static void balloon_remove(struct hv_device *dev)
>  	struct hv_dynmem_device *dm = hv_get_drvdata(dev);
>  	struct hv_hotadd_state *has, *tmp;
>  	struct hv_hotadd_gap *gap, *tmp_gap;
> -	unsigned long flags;
>  
>  	if (dm->num_pages_ballooned != 0)
>  		pr_warn("Ballooned pages: %d\n", dm->num_pages_ballooned);
> @@ -2073,7 +2068,7 @@ static void balloon_remove(struct hv_device *dev)
>  #endif
>  	}
>  
> -	spin_lock_irqsave(&dm_device.ha_lock, flags);
> +	guard(spinlock_irqsave)(&dm_device.ha_lock);
>  	list_for_each_entry_safe(has, tmp, &dm->ha_region_list, list) {
>  		list_for_each_entry_safe(gap, tmp_gap, &has->gap_list, list) {
>  			list_del(&gap->list);
> @@ -2082,7 +2077,6 @@ static void balloon_remove(struct hv_device *dev)
>  		list_del(&has->list);
>  		kfree(has);
>  	}
> -	spin_unlock_irqrestore(&dm_device.ha_lock, flags);
>  }
>  
>  static int balloon_suspend(struct hv_device *hv_dev)
> 
> ---
> base-commit: 3f01e9fed8454dcd89727016c3e5b2fbb8f8e50c
> change-id: 20230725-master-bbcd9205758b
> 
> Best regards,
> -- 
> Mitchell Levy <levymitchell0@gmail.com>
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox