public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Xin Li <xin3.li@intel.com>
To: linux-kernel@vger.kernel.org, x86@kernel.org, kvm@vger.kernel.org
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, peterz@infradead.org,
	andrew.cooper3@citrix.com, seanjc@google.com,
	pbonzini@redhat.com, ravi.v.shankar@intel.com
Subject: [RFC PATCH 02/32] x86/traps: add a system interrupt table for system interrupt dispatch
Date: Mon, 19 Dec 2022 22:36:28 -0800	[thread overview]
Message-ID: <20221220063658.19271-3-xin3.li@intel.com> (raw)
In-Reply-To: <20221220063658.19271-1-xin3.li@intel.com>

From: "H. Peter Anvin (Intel)" <hpa@zytor.com>

Upon receiving an external interrupt, KVM VMX reinjects it through
calling the interrupt handler in its IDT descriptor on the current
kernel stack, which essentially uses the IDT as an interrupt dispatch
table.

However the IDT is one of the lowest level critical data structures
between a x86 CPU and the Linux kernel, we should avoid using it
*directly* whenever possible, espeically in a software defined manner.

On x86, external interrupts are divided into the following groups
  1) system interrupts
  2) external device interrupts
With the IDT, system interrupts are dispatched through the IDT
directly, while external device interrupts are all routed to the
external interrupt dispatch function common_interrupt(), which
dispatches external device interrupts through a per-CPU external
interrupt dispatch table vector_irq.

To eliminate dispatching external interrupts through the IDT, add
a system interrupt handler table for dispatching a system interrupt
to its corresponding handler directly. Thus a software based dispatch
function will be:

  void external_interrupt(struct pt_regs *regs, u8 vector)
  {
    if (is_system_interrupt(vector))
      system_interrupt_handlers[vector_to_sysvec(vector)](regs);
    else /* external device interrupt */
      common_interrupt(regs, vector);
  }

What's more, with the Intel FRED (Flexible Return and Event Delivery)
architecture, IDT, the hardware based event dispatch table, is gone,
and the Linux kernel needs to dispatch events to their handlers with
vector to handler mappings, the dispatch function external_interrupt()
is also needed.

Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Co-developed-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
---
 arch/x86/include/asm/idtentry.h | 56 +++++++++++++++++++++++++++------
 arch/x86/include/asm/traps.h    |  7 +++++
 arch/x86/kernel/traps.c         | 40 +++++++++++++++++++++++
 3 files changed, 93 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 72184b0b2219..966d720046f1 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -167,18 +167,24 @@ __visible noinstr void func(struct pt_regs *regs, unsigned long error_code)
 
 /**
  * DECLARE_IDTENTRY_IRQ - Declare functions for device interrupt IDT entry
- *			  points (common/spurious)
+ *			  points (common/spurious) and their corresponding
+ *			  software based dispatch handlers in non-noinstr
+ *			  text section
  * @vector:	Vector number (ignored for C)
  * @func:	Function name of the entry point
  *
  * Maps to DECLARE_IDTENTRY_ERRORCODE()
  */
 #define DECLARE_IDTENTRY_IRQ(vector, func)				\
-	DECLARE_IDTENTRY_ERRORCODE(vector, func)
+	DECLARE_IDTENTRY_ERRORCODE(vector, func);			\
+	void dispatch_##func(struct pt_regs *regs, unsigned long error_code)
 
 /**
  * DEFINE_IDTENTRY_IRQ - Emit code for device interrupt IDT entry points
- * @func:	Function name of the entry point
+ *			 and their corresponding software based dispatch
+ *			 handlers in non-noinstr text section.
+ * @func:		Function name of the IDT entry point
+ * @dispatch_func:	Function name of the software based dispatch handler
  *
  * The vector number is pushed by the low level entry stub and handed
  * to the function as error_code argument which needs to be truncated
@@ -204,10 +210,20 @@ __visible noinstr void func(struct pt_regs *regs,			\
 	irqentry_exit(regs, state);					\
 }									\
 									\
+void dispatch_##func(struct pt_regs *regs, unsigned long error_code)	\
+{									\
+	u32 vector = (u32)(u8)error_code;				\
+									\
+	kvm_set_cpu_l1tf_flush_l1d();					\
+	run_irq_on_irqstack_cond(__##func, regs, vector);		\
+}									\
+									\
 static noinline void __##func(struct pt_regs *regs, u32 vector)
 
 /**
  * DECLARE_IDTENTRY_SYSVEC - Declare functions for system vector entry points
+ *			     and their corresponding software based dispatch
+ *			     handlers in non-noinstr text section
  * @vector:	Vector number (ignored for C)
  * @func:	Function name of the entry point
  *
@@ -215,15 +231,20 @@ static noinline void __##func(struct pt_regs *regs, u32 vector)
  * - The ASM entry point: asm_##func
  * - The XEN PV trap entry point: xen_##func (maybe unused)
  * - The C handler called from the ASM entry point
+ * - The C handler used in the system interrupt handler table
  *
  * Maps to DECLARE_IDTENTRY().
  */
 #define DECLARE_IDTENTRY_SYSVEC(vector, func)				\
-	DECLARE_IDTENTRY(vector, func)
+	DECLARE_IDTENTRY(vector, func);					\
+	void dispatch_table_##func(struct pt_regs *regs)
 
 /**
  * DEFINE_IDTENTRY_SYSVEC - Emit code for system vector IDT entry points
- * @func:	Function name of the entry point
+ *			    and their corresponding software based dispatch
+ *			    handlers in non-noinstr text section
+ * @func:		Function name of the IDT entry point
+ * @dispatch_table_func:Function name of the software based dispatch handler
  *
  * irqentry_enter/exit() and irq_enter/exit_rcu() are invoked before the
  * function body. KVM L1D flush request is set.
@@ -244,12 +265,21 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	irqentry_exit(regs, state);					\
 }									\
 									\
+void dispatch_table_##func(struct pt_regs *regs)			\
+{									\
+	kvm_set_cpu_l1tf_flush_l1d();					\
+	run_sysvec_on_irqstack_cond(__##func, regs);			\
+}									\
+									\
 static noinline void __##func(struct pt_regs *regs)
 
 /**
  * DEFINE_IDTENTRY_SYSVEC_SIMPLE - Emit code for simple system vector IDT
- *				   entry points
- * @func:	Function name of the entry point
+ *				   entry points and their corresponding
+ *				   software based dispatch handlers in
+ *				   non-noinstr text section
+ * @func:		Function name of the IDT entry point
+ * @dispatch_table_func:Function name of the software based dispatch handler
  *
  * Runs the function on the interrupted stack. No switch to IRQ stack and
  * only the minimal __irq_enter/exit() handling.
@@ -273,6 +303,14 @@ __visible noinstr void func(struct pt_regs *regs)			\
 	irqentry_exit(regs, state);					\
 }									\
 									\
+void dispatch_table_##func(struct pt_regs *regs)			\
+{									\
+	__irq_enter_raw();						\
+	kvm_set_cpu_l1tf_flush_l1d();					\
+	__##func (regs);						\
+	__irq_exit_raw();						\
+}									\
+									\
 static __always_inline void __##func(struct pt_regs *regs)
 
 /**
@@ -638,9 +676,7 @@ DECLARE_IDTENTRY(X86_TRAP_VE,		exc_virtualization_exception);
 
 /* Device interrupts common/spurious */
 DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	common_interrupt);
-#ifdef CONFIG_X86_LOCAL_APIC
 DECLARE_IDTENTRY_IRQ(X86_TRAP_OTHER,	spurious_interrupt);
-#endif
 
 /* System vector entry points */
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -651,7 +687,7 @@ DECLARE_IDTENTRY_SYSVEC(X86_PLATFORM_IPI_VECTOR,	sysvec_x86_platform_ipi);
 #endif
 
 #ifdef CONFIG_SMP
-DECLARE_IDTENTRY(RESCHEDULE_VECTOR,			sysvec_reschedule_ipi);
+DECLARE_IDTENTRY_SYSVEC(RESCHEDULE_VECTOR,		sysvec_reschedule_ipi);
 DECLARE_IDTENTRY_SYSVEC(IRQ_MOVE_CLEANUP_VECTOR,	sysvec_irq_move_cleanup);
 DECLARE_IDTENTRY_SYSVEC(REBOOT_VECTOR,			sysvec_reboot);
 DECLARE_IDTENTRY_SYSVEC(CALL_FUNCTION_SINGLE_VECTOR,	sysvec_call_function_single);
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 47ecfff2c83d..28c8ba5fd81c 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -47,4 +47,11 @@ void __noreturn handle_stack_overflow(struct pt_regs *regs,
 				      struct stack_info *info);
 #endif
 
+/*
+ * How system interrupt handlers are called.
+ */
+#define DECLARE_SYSTEM_INTERRUPT_HANDLER(f)	\
+	void f (struct pt_regs *regs)
+typedef DECLARE_SYSTEM_INTERRUPT_HANDLER((*system_interrupt_handler));
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index d3fdec706f1d..8f751c06c052 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1451,6 +1451,46 @@ DEFINE_IDTENTRY_SW(iret_error)
 }
 #endif
 
+#define SYSV(x,y) [(x) - FIRST_SYSTEM_VECTOR] = y
+
+static system_interrupt_handler system_interrupt_handlers[NR_SYSTEM_VECTORS] = {
+#ifdef CONFIG_SMP
+	SYSV(RESCHEDULE_VECTOR,			dispatch_table_sysvec_reschedule_ipi),
+	SYSV(CALL_FUNCTION_VECTOR,		dispatch_table_sysvec_call_function),
+	SYSV(CALL_FUNCTION_SINGLE_VECTOR,	dispatch_table_sysvec_call_function_single),
+	SYSV(REBOOT_VECTOR,			dispatch_table_sysvec_reboot),
+#endif
+
+#ifdef CONFIG_X86_THERMAL_VECTOR
+	SYSV(THERMAL_APIC_VECTOR,		dispatch_table_sysvec_thermal),
+#endif
+
+#ifdef CONFIG_X86_MCE_THRESHOLD
+	SYSV(THRESHOLD_APIC_VECTOR,		dispatch_table_sysvec_threshold),
+#endif
+
+#ifdef CONFIG_X86_MCE_AMD
+	SYSV(DEFERRED_ERROR_VECTOR,		dispatch_table_sysvec_deferred_error),
+#endif
+
+#ifdef CONFIG_X86_LOCAL_APIC
+	SYSV(LOCAL_TIMER_VECTOR,		dispatch_table_sysvec_apic_timer_interrupt),
+	SYSV(X86_PLATFORM_IPI_VECTOR,		dispatch_table_sysvec_x86_platform_ipi),
+# ifdef CONFIG_HAVE_KVM
+	SYSV(POSTED_INTR_VECTOR,		dispatch_table_sysvec_kvm_posted_intr_ipi),
+	SYSV(POSTED_INTR_WAKEUP_VECTOR,		dispatch_table_sysvec_kvm_posted_intr_wakeup_ipi),
+	SYSV(POSTED_INTR_NESTED_VECTOR,		dispatch_table_sysvec_kvm_posted_intr_nested_ipi),
+# endif
+# ifdef CONFIG_IRQ_WORK
+	SYSV(IRQ_WORK_VECTOR,			dispatch_table_sysvec_irq_work),
+# endif
+	SYSV(SPURIOUS_APIC_VECTOR,		dispatch_table_sysvec_spurious_apic_interrupt),
+	SYSV(ERROR_APIC_VECTOR,			dispatch_table_sysvec_error_interrupt),
+#endif
+};
+
+#undef SYSV
+
 void __init trap_init(void)
 {
 	/* Init cpu_entry_area before IST entries are set up */
-- 
2.34.1


  parent reply	other threads:[~2022-12-20  7:02 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-20  6:36 [RFC PATCH 00/32] x86: enable FRED for x86-64 Xin Li
2022-12-20  6:36 ` [RFC PATCH 01/32] x86/traps: let common_interrupt() handle IRQ_MOVE_CLEANUP_VECTOR Xin Li
2022-12-20  6:36 ` Xin Li [this message]
2022-12-20  6:36 ` [RFC PATCH 03/32] x86/traps: add install_system_interrupt_handler() Xin Li
2022-12-20  6:36 ` [RFC PATCH 04/32] x86/traps: add external_interrupt() to dispatch external interrupts Xin Li
2022-12-20  6:36 ` [RFC PATCH 05/32] x86/traps: add exc_raise_irq() for VMX IRQ reinjection Xin Li
2023-01-09 18:20   ` Li, Xin3
2022-12-20  6:36 ` [RFC PATCH 06/32] x86/cpufeature: add the cpu feature bit for FRED Xin Li
2022-12-20  6:36 ` [RFC PATCH 07/32] x86/opcode: add ERETU, ERETS instructions to x86-opcode-map Xin Li
2022-12-20  6:36 ` [RFC PATCH 08/32] x86/objtool: teach objtool about ERETU and ERETS Xin Li
2022-12-20  6:36 ` [RFC PATCH 09/32] x86/cpu: add X86_CR4_FRED macro Xin Li
2022-12-20  6:36 ` [RFC PATCH 10/32] x86/fred: add Kconfig option for FRED (CONFIG_X86_FRED) Xin Li
2022-12-20  6:36 ` [RFC PATCH 11/32] x86/fred: if CONFIG_X86_FRED is disabled, disable FRED support Xin Li
2022-12-20  6:36 ` [RFC PATCH 12/32] x86/cpu: add MSR numbers for FRED configuration Xin Li
2022-12-20  6:36 ` [RFC PATCH 13/32] x86/fred: header file for event types Xin Li
2022-12-20  6:36 ` [RFC PATCH 14/32] x86/fred: header file with FRED definitions Xin Li
2022-12-20  8:56   ` Peter Zijlstra
2022-12-21  2:58     ` Li, Xin3
2022-12-22 13:03       ` Peter Zijlstra
2022-12-23 19:34         ` H. Peter Anvin
2022-12-20  6:36 ` [RFC PATCH 15/32] x86/fred: make unions for the cs and ss fields in struct pt_regs Xin Li
2022-12-20  6:36 ` [RFC PATCH 16/32] x86/fred: reserve space for the FRED stack frame Xin Li
2022-12-20  6:36 ` [RFC PATCH 17/32] x86/fred: add a page fault entry stub for FRED Xin Li
2022-12-20  6:36 ` [RFC PATCH 18/32] x86/fred: add a debug " Xin Li
2022-12-20  9:15   ` Peter Zijlstra
2022-12-20  6:36 ` [RFC PATCH 19/32] x86/fred: add a NMI " Xin Li
2022-12-20  6:36 ` [RFC PATCH 20/32] x86/fred: add a machine check " Xin Li
2022-12-20  6:36 ` [RFC PATCH 21/32] x86/fred: FRED entry/exit and dispatch code Xin Li
2022-12-20  9:35   ` Peter Zijlstra
2022-12-21  2:56     ` Li, Xin3
2022-12-22 13:08       ` Peter Zijlstra
2022-12-20  6:36 ` [RFC PATCH 22/32] x86/fred: FRED initialization code Xin Li
2022-12-20  9:45   ` Peter Zijlstra
2022-12-20  9:55     ` Andrew Cooper
2022-12-20 10:02       ` Peter Zijlstra
2022-12-21  5:28         ` Li, Xin3
2022-12-21  5:44           ` H. Peter Anvin
2022-12-22 13:09             ` Peter Zijlstra
2022-12-23 19:30               ` H. Peter Anvin
2022-12-23 19:37       ` H. Peter Anvin
2022-12-20  6:36 ` [RFC PATCH 23/32] x86/fred: update MSR_IA32_FRED_RSP0 during task switch Xin Li
2022-12-20  9:48   ` Peter Zijlstra
2022-12-20 18:47     ` Li, Xin3
2022-12-23 19:42     ` H. Peter Anvin
2022-12-24  3:02       ` Li, Xin3
2022-12-20  6:36 ` [RFC PATCH 24/32] x86/fred: let ret_from_fork() jmp to fred_exit_user when FRED is enabled Xin Li
2022-12-20  6:36 ` [RFC PATCH 25/32] x86/fred: disallow the swapgs instruction " Xin Li
2022-12-20  6:36 ` [RFC PATCH 26/32] x86/fred: no ESPFIX needed " Xin Li
2022-12-20  6:36 ` [RFC PATCH 27/32] x86/fred: allow single-step trap and NMI when starting a new thread Xin Li
2022-12-20  6:36 ` [RFC PATCH 28/32] x86/fred: fixup fault on ERETU by jumping to fred_entrypoint_user Xin Li
2022-12-20  6:36 ` [RFC PATCH 29/32] x86/ia32: do not modify the DPL bits for a null selector Xin Li
2022-12-20  6:36 ` [RFC PATCH 30/32] x86/fred: allow FRED systems to use interrupt vectors 0x10-0x1f Xin Li
2022-12-20  6:36 ` [RFC PATCH 31/32] x86/fred: allow dynamic stack frame size Xin Li
2022-12-20  6:36 ` [RFC PATCH 32/32] x86/fred: disable FRED by default in its early stage Xin Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221220063658.19271-3-xin3.li@intel.com \
    --to=xin3.li@intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox