The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts
@ 2025-11-25 21:50 Thomas Gleixner
  2025-11-25 21:50 ` [patch V2 1/3] x86/msi: Make irq_retrigger() functional for posted MSI Thomas Gleixner
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Thomas Gleixner @ 2025-11-25 21:50 UTC (permalink / raw)
  To: LKML; +Cc: x86, Luigi Rizzo

A small update to V1 which can be found here:

  https://lore.kernel.org/lkml/20251125101912.564125647@linutronix.de

Luigi reported that the retrigger mechanism for posted MSI interrupts is
broken. That happens because retrigger sends an IPI to the actual allocated
vector, which is handled correctly, but lacks an EOI. That leaves a stale
APIC ISR bit around.

The following series addresses this and does some related cleanups in that
area on top.

Changes vs. V1:

   Use __this_cpu_*() - Luigi

Thanks,

	tglx
---
 arch/x86/include/asm/irq_remapping.h |   12 ++++++-
 arch/x86/kernel/irq.c                |   54 +++++++++++++++++++++++------------
 drivers/iommu/intel/irq_remapping.c  |   12 +++----
 3 files changed, 52 insertions(+), 26 deletions(-)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch V2 1/3] x86/msi: Make irq_retrigger() functional for posted MSI
  2025-11-25 21:50 [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Thomas Gleixner
@ 2025-11-25 21:50 ` Thomas Gleixner
  2025-12-17 17:48   ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
  2025-11-25 21:50 ` [patch V2 2/3] x86/irq: Cleanup posted MSI code Thomas Gleixner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2025-11-25 21:50 UTC (permalink / raw)
  To: LKML; +Cc: x86, Luigi Rizzo, stable

From: Thomas Gleixner <tglx@linutronix.de>

Luigi reported that retriggering a posted MSI interrupt does not work
correctly.

The reason is that the retrigger happens at the vector domain by sending an
IPI to the actual vector on the target CPU. That works correctly exactly
once because the posted MSI interrupt chip does not issue an EOI as that's
only required for the posted MSI notification vector itself.

As a consequence the vector becomes stale in the ISR, which not only
affects this vector but also any lower priority vector in the affected
APIC because the ISR bit is not cleared.

Luigi proposed to set the vector in the remap PIR bitmap and raise the
posted MSI notification vector. That works, but that still does not cure a
related problem:

  If there is ever a stray interrupt on such a vector, then the related
  APIC ISR bit becomes stale due to the lack of EOI as described above.
  Unlikely to happen, but if it happens it's not debuggable at all.

So instead of playing games with the PIR, this can be actually solved
for both cases by:

 1) Keeping track of the posted interrupt vector handler state

 2) Implementing a posted MSI specific irq_ack() callback which checks that
    state. If the posted vector handler is inactive it issues an EOI,
    otherwise it delegates that to the posted handler.

This is correct versus affinity changes and concurrent events on the posted
vector as the actual handler invocation is serialized through the interrupt
descriptor lock.

Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs")
Reported-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Luigi Rizzo <lrizzo@google.com>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/lkml/20251124104836.3685533-1-lrizzo@google.com
---
V2: Use __this_cpu...() - Luigi
---
 arch/x86/include/asm/irq_remapping.h |    7 +++++++
 arch/x86/kernel/irq.c                |   23 +++++++++++++++++++++++
 drivers/iommu/intel/irq_remapping.c  |    8 ++++----
 3 files changed, 34 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -87,4 +87,11 @@ static inline void panic_if_irq_remap(co
 }
 
 #endif /* CONFIG_IRQ_REMAP */
+
+#ifdef CONFIG_X86_POSTED_MSI
+void intel_ack_posted_msi_irq(struct irq_data *irqd);
+#else
+#define intel_ack_posted_msi_irq	NULL
+#endif
+
 #endif /* __X86_IRQ_REMAPPING_H */
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -396,6 +396,7 @@ DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm
 
 /* Posted Interrupt Descriptors for coalesced MSIs to be posted */
 DEFINE_PER_CPU_ALIGNED(struct pi_desc, posted_msi_pi_desc);
+static DEFINE_PER_CPU_CACHE_HOT(bool, posted_msi_handler_active);
 
 void intel_posted_msi_init(void)
 {
@@ -413,6 +414,25 @@ void intel_posted_msi_init(void)
 	this_cpu_write(posted_msi_pi_desc.ndst, destination);
 }
 
+void intel_ack_posted_msi_irq(struct irq_data *irqd)
+{
+	irq_move_irq(irqd);
+
+	/*
+	 * Handle the rare case that irq_retrigger() raised the actual
+	 * assigned vector on the target CPU, which means that it was not
+	 * invoked via the posted MSI handler below. In that case APIC EOI
+	 * is required as otherwise the ISR entry becomes stale and lower
+	 * priority interrupts are never going to be delivered after that.
+	 *
+	 * If the posted handler invoked the device interrupt handler then
+	 * the EOI would be premature because it would acknowledge the
+	 * posted vector.
+	 */
+	if (unlikely(!__this_cpu_read(posted_msi_handler_active)))
+		apic_eoi();
+}
+
 static __always_inline bool handle_pending_pir(unsigned long *pir, struct pt_regs *regs)
 {
 	unsigned long pir_copy[NR_PIR_WORDS];
@@ -445,6 +465,8 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi
 
 	pid = this_cpu_ptr(&posted_msi_pi_desc);
 
+	/* Mark the handler active for intel_ack_posted_msi_irq() */
+	__this_cpu_write(posted_msi_handler_active, true);
 	inc_irq_stat(posted_msi_notification_count);
 	irq_enter();
 
@@ -473,6 +495,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi
 
 	apic_eoi();
 	irq_exit();
+	__this_cpu_write(posted_msi_handler_active, false);
 	set_irq_regs(old_regs);
 }
 #endif /* X86_POSTED_MSI */
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1303,17 +1303,17 @@ static struct irq_chip intel_ir_chip = {
  *	irq_enter();
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				irq_move_irq(); // No EOI
+ *				intel_ack_posted_msi_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				irq_move_irq(); // No EOI
+ *				intel_ack_posted_msi_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				irq_move_irq(); // No EOI
+ *				intel_ack_posted_msi_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *	apic_eoi()
@@ -1322,7 +1322,7 @@ static struct irq_chip intel_ir_chip = {
  */
 static struct irq_chip intel_ir_chip_post_msi = {
 	.name			= "INTEL-IR-POST",
-	.irq_ack		= irq_move_irq,
+	.irq_ack		= intel_ack_posted_msi_irq,
 	.irq_set_affinity	= intel_ir_set_affinity,
 	.irq_compose_msi_msg	= intel_ir_compose_msi_msg,
 	.irq_set_vcpu_affinity	= intel_ir_set_vcpu_affinity,


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch V2 2/3] x86/irq: Cleanup posted MSI code
  2025-11-25 21:50 [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Thomas Gleixner
  2025-11-25 21:50 ` [patch V2 1/3] x86/msi: Make irq_retrigger() functional for posted MSI Thomas Gleixner
@ 2025-11-25 21:50 ` Thomas Gleixner
  2025-12-17 17:48   ` [tip: x86/irq] " tip-bot2 for Thomas Gleixner
  2025-12-18 22:03   ` tip-bot2 for Thomas Gleixner
  2025-11-25 21:50 ` [patch V2 3/3] x86/irq_remapping: Sanitize posted_msi_supported() Thomas Gleixner
  2025-12-17 17:03 ` [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Luigi Rizzo
  3 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2025-11-25 21:50 UTC (permalink / raw)
  To: LKML; +Cc: x86, Luigi Rizzo

Make code and comments readable and use __this_cpu..() as this is
guaranteed to be invoked with interrupts disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Use __this_cpu...() - Luigi
---
 arch/x86/kernel/irq.c |   31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -400,11 +400,9 @@ static DEFINE_PER_CPU_CACHE_HOT(bool, po
 
 void intel_posted_msi_init(void)
 {
-	u32 destination;
-	u32 apic_id;
+	u32 destination, apic_id;
 
 	this_cpu_write(posted_msi_pi_desc.nv, POSTED_MSI_NOTIFICATION_VECTOR);
-
 	/*
 	 * APIC destination ID is stored in bit 8:15 while in XAPIC mode.
 	 * VT-d spec. CH 9.11
@@ -448,8 +446,8 @@ static __always_inline bool handle_pendi
 }
 
 /*
- * Performance data shows that 3 is good enough to harvest 90+% of the benefit
- * on high IRQ rate workload.
+ * Performance data shows that 3 is good enough to harvest 90+% of the
+ * benefit on high interrupt rate workloads.
  */
 #define MAX_POSTED_MSI_COALESCING_LOOP 3
 
@@ -459,11 +457,8 @@ static __always_inline bool handle_pendi
  */
 DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
 {
+	struct pi_desc *pid = __this_cpu_ptr(&posted_msi_pi_desc);
 	struct pt_regs *old_regs = set_irq_regs(regs);
-	struct pi_desc *pid;
-	int i = 0;
-
-	pid = this_cpu_ptr(&posted_msi_pi_desc);
 
 	/* Mark the handler active for intel_ack_posted_msi_irq() */
 	__this_cpu_write(posted_msi_handler_active, true);
@@ -471,25 +466,25 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi
 	irq_enter();
 
 	/*
-	 * Max coalescing count includes the extra round of handle_pending_pir
-	 * after clearing the outstanding notification bit. Hence, at most
-	 * MAX_POSTED_MSI_COALESCING_LOOP - 1 loops are executed here.
+	 * Loop only MAX_POSTED_MSI_COALESCING_LOOP - 1 times here to take
+	 * the final handle_pending_pir() invocation after clearing the
+	 * outstanding notification bit into account.
 	 */
-	while (++i < MAX_POSTED_MSI_COALESCING_LOOP) {
+	for (int i = 1; i < MAX_POSTED_MSI_COALESCING_LOOP; i++) {
 		if (!handle_pending_pir(pid->pir, regs))
 			break;
 	}
 
 	/*
-	 * Clear outstanding notification bit to allow new IRQ notifications,
-	 * do this last to maximize the window of interrupt coalescing.
+	 * Clear the outstanding notification bit to rearm the notification
+	 * mechanism.
 	 */
 	pi_clear_on(pid);
 
 	/*
-	 * There could be a race of PI notification and the clearing of ON bit,
-	 * process PIR bits one last time such that handling the new interrupts
-	 * are not delayed until the next IRQ.
+	 * Clearing the ON bit can race with a notification. Process the
+	 * PIR bits one last time so that handling the new interrupts is
+	 * not delayed until the next notification happens.
 	 */
 	handle_pending_pir(pid->pir, regs);
 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch V2 3/3] x86/irq_remapping: Sanitize posted_msi_supported()
  2025-11-25 21:50 [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Thomas Gleixner
  2025-11-25 21:50 ` [patch V2 1/3] x86/msi: Make irq_retrigger() functional for posted MSI Thomas Gleixner
  2025-11-25 21:50 ` [patch V2 2/3] x86/irq: Cleanup posted MSI code Thomas Gleixner
@ 2025-11-25 21:50 ` Thomas Gleixner
  2025-12-17 17:48   ` [tip: x86/irq] " tip-bot2 for Thomas Gleixner
  2025-12-18 22:03   ` tip-bot2 for Thomas Gleixner
  2025-12-17 17:03 ` [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Luigi Rizzo
  3 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2025-11-25 21:50 UTC (permalink / raw)
  To: LKML; +Cc: x86, Luigi Rizzo

posted_msi_supported() is a misnomer as it actually checks whether it is
enabled or not. Aside of that this does not take CONFIG_X86_POSTED_MSI into
account which is required to actually use it.

Rename it to posted_msi_enabled() and make the return value depend on
CONFIG_X86_POSTED_MSI, which allows the compiler to eliminate the related
dead code and data if disabled:

  text	   data	    bss	    dec	    hex	filename
  10046	    701	   3296	  14043	   36db	drivers/iommu/intel/irq_remapping.o
   9904	    413	   3296	  13613	   352d	drivers/iommu/intel/irq_remapping.o

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/irq_remapping.h |    5 +++--
 drivers/iommu/intel/irq_remapping.c  |    4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -67,9 +67,10 @@ static inline struct irq_domain *arch_ge
 
 extern bool enable_posted_msi;
 
-static inline bool posted_msi_supported(void)
+static inline bool posted_msi_enabled(void)
 {
-	return enable_posted_msi && irq_remapping_cap(IRQ_POSTING_CAP);
+	return IS_ENABLED(CONFIG_X86_POSTED_MSI) &&
+		enable_posted_msi && irq_remapping_cap(IRQ_POSTING_CAP);
 }
 
 #else  /* CONFIG_IRQ_REMAP */
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1368,7 +1368,7 @@ static void intel_irq_remapping_prepare_
 		break;
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		if (posted_msi_supported()) {
+		if (posted_msi_enabled()) {
 			prepare_irte_posted(irte);
 			data->irq_2_iommu.posted_msi = 1;
 		}
@@ -1460,7 +1460,7 @@ static int intel_irq_remapping_alloc(str
 
 		irq_data->hwirq = (index << 16) + i;
 		irq_data->chip_data = ird;
-		if (posted_msi_supported() &&
+		if (posted_msi_enabled() &&
 		    ((info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI) ||
 		     (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX)))
 			irq_data->chip = &intel_ir_chip_post_msi;


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts
  2025-11-25 21:50 [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Thomas Gleixner
                   ` (2 preceding siblings ...)
  2025-11-25 21:50 ` [patch V2 3/3] x86/irq_remapping: Sanitize posted_msi_supported() Thomas Gleixner
@ 2025-12-17 17:03 ` Luigi Rizzo
  2025-12-17 17:37   ` Thomas Gleixner
  3 siblings, 1 reply; 11+ messages in thread
From: Luigi Rizzo @ 2025-12-17 17:03 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, x86

On Tue, Nov 25, 2025 at 10:50 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> A small update to V1 which can be found here:
>
>   https://lore.kernel.org/lkml/20251125101912.564125647@linutronix.de
>
> Luigi reported that the retrigger mechanism for posted MSI interrupts is
> broken. That happens because retrigger sends an IPI to the actual allocated
> vector, which is handled correctly, but lacks an EOI. That leaves a stale
> APIC ISR bit around.
>
> The following series addresses this and does some related cleanups in that
> area on top.

What is happening with this series, is there any blocker ?

cheers
luigi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts
  2025-12-17 17:03 ` [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Luigi Rizzo
@ 2025-12-17 17:37   ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2025-12-17 17:37 UTC (permalink / raw)
  To: Luigi Rizzo; +Cc: LKML, x86

On Wed, Dec 17 2025 at 18:03, Luigi Rizzo wrote:
> On Tue, Nov 25, 2025 at 10:50 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> A small update to V1 which can be found here:
>>
>>   https://lore.kernel.org/lkml/20251125101912.564125647@linutronix.de
>>
>> Luigi reported that the retrigger mechanism for posted MSI interrupts is
>> broken. That happens because retrigger sends an IPI to the actual allocated
>> vector, which is handled correctly, but lacks an EOI. That leaves a stale
>> APIC ISR bit around.
>>
>> The following series addresses this and does some related cleanups in that
>> area on top.
>
> What is happening with this series, is there any blocker ?

I don't think so. It's probably because people were traveling to Japan
or distracted otherwise. I'll take care of it now.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip: x86/urgent] x86/msi: Make irq_retrigger() functional for posted MSI
  2025-11-25 21:50 ` [patch V2 1/3] x86/msi: Make irq_retrigger() functional for posted MSI Thomas Gleixner
@ 2025-12-17 17:48   ` tip-bot2 for Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-12-17 17:48 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Luigi Rizzo, Thomas Gleixner, stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     0edc78b82bea85e1b2165d8e870a5c3535919695
Gitweb:        https://git.kernel.org/tip/0edc78b82bea85e1b2165d8e870a5c3535919695
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 25 Nov 2025 22:50:45 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 17 Dec 2025 18:41:52 +01:00

x86/msi: Make irq_retrigger() functional for posted MSI

Luigi reported that retriggering a posted MSI interrupt does not work
correctly.

The reason is that the retrigger happens at the vector domain by sending an
IPI to the actual vector on the target CPU. That works correctly exactly
once because the posted MSI interrupt chip does not issue an EOI as that's
only required for the posted MSI notification vector itself.

As a consequence the vector becomes stale in the ISR, which not only
affects this vector but also any lower priority vector in the affected
APIC because the ISR bit is not cleared.

Luigi proposed to set the vector in the remap PIR bitmap and raise the
posted MSI notification vector. That works, but that still does not cure a
related problem:

  If there is ever a stray interrupt on such a vector, then the related
  APIC ISR bit becomes stale due to the lack of EOI as described above.
  Unlikely to happen, but if it happens it's not debuggable at all.

So instead of playing games with the PIR, this can be actually solved
for both cases by:

 1) Keeping track of the posted interrupt vector handler state

 2) Implementing a posted MSI specific irq_ack() callback which checks that
    state. If the posted vector handler is inactive it issues an EOI,
    otherwise it delegates that to the posted handler.

This is correct versus affinity changes and concurrent events on the posted
vector as the actual handler invocation is serialized through the interrupt
descriptor lock.

Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs")
Reported-by: Luigi Rizzo <lrizzo@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Luigi Rizzo <lrizzo@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251125214631.044440658@linutronix.de
Closes: https://lore.kernel.org/lkml/20251124104836.3685533-1-lrizzo@google.com
---
 arch/x86/include/asm/irq_remapping.h |  7 +++++++
 arch/x86/kernel/irq.c                | 23 +++++++++++++++++++++++
 drivers/iommu/intel/irq_remapping.c  |  8 ++++----
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 5a0d424..4e55d17 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -87,4 +87,11 @@ static inline void panic_if_irq_remap(const char *msg)
 }
 
 #endif /* CONFIG_IRQ_REMAP */
+
+#ifdef CONFIG_X86_POSTED_MSI
+void intel_ack_posted_msi_irq(struct irq_data *irqd);
+#else
+#define intel_ack_posted_msi_irq	NULL
+#endif
+
 #endif /* __X86_IRQ_REMAPPING_H */
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 86f4e57..b2fe618 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -397,6 +397,7 @@ DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_kvm_posted_intr_nested_ipi)
 
 /* Posted Interrupt Descriptors for coalesced MSIs to be posted */
 DEFINE_PER_CPU_ALIGNED(struct pi_desc, posted_msi_pi_desc);
+static DEFINE_PER_CPU_CACHE_HOT(bool, posted_msi_handler_active);
 
 void intel_posted_msi_init(void)
 {
@@ -414,6 +415,25 @@ void intel_posted_msi_init(void)
 	this_cpu_write(posted_msi_pi_desc.ndst, destination);
 }
 
+void intel_ack_posted_msi_irq(struct irq_data *irqd)
+{
+	irq_move_irq(irqd);
+
+	/*
+	 * Handle the rare case that irq_retrigger() raised the actual
+	 * assigned vector on the target CPU, which means that it was not
+	 * invoked via the posted MSI handler below. In that case APIC EOI
+	 * is required as otherwise the ISR entry becomes stale and lower
+	 * priority interrupts are never going to be delivered after that.
+	 *
+	 * If the posted handler invoked the device interrupt handler then
+	 * the EOI would be premature because it would acknowledge the
+	 * posted vector.
+	 */
+	if (unlikely(!__this_cpu_read(posted_msi_handler_active)))
+		apic_eoi();
+}
+
 static __always_inline bool handle_pending_pir(unsigned long *pir, struct pt_regs *regs)
 {
 	unsigned long pir_copy[NR_PIR_WORDS];
@@ -446,6 +466,8 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
 
 	pid = this_cpu_ptr(&posted_msi_pi_desc);
 
+	/* Mark the handler active for intel_ack_posted_msi_irq() */
+	__this_cpu_write(posted_msi_handler_active, true);
 	inc_irq_stat(posted_msi_notification_count);
 	irq_enter();
 
@@ -474,6 +496,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
 
 	apic_eoi();
 	irq_exit();
+	__this_cpu_write(posted_msi_handler_active, false);
 	set_irq_regs(old_regs);
 }
 #endif /* X86_POSTED_MSI */
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 4f9b01d..8bcbfe3 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1303,17 +1303,17 @@ static struct irq_chip intel_ir_chip = {
  *	irq_enter();
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				irq_move_irq(); // No EOI
+ *				intel_ack_posted_msi_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				irq_move_irq(); // No EOI
+ *				intel_ack_posted_msi_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *		handle_edge_irq()
  *			irq_chip_ack_parent()
- *				irq_move_irq(); // No EOI
+ *				intel_ack_posted_msi_irq(); // No EOI
  *			handle_irq_event()
  *				driver_handler()
  *	apic_eoi()
@@ -1322,7 +1322,7 @@ static struct irq_chip intel_ir_chip = {
  */
 static struct irq_chip intel_ir_chip_post_msi = {
 	.name			= "INTEL-IR-POST",
-	.irq_ack		= irq_move_irq,
+	.irq_ack		= intel_ack_posted_msi_irq,
 	.irq_set_affinity	= intel_ir_set_affinity,
 	.irq_compose_msi_msg	= intel_ir_compose_msi_msg,
 	.irq_set_vcpu_affinity	= intel_ir_set_vcpu_affinity,

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: x86/irq] x86/irq_remapping: Sanitize posted_msi_supported()
  2025-11-25 21:50 ` [patch V2 3/3] x86/irq_remapping: Sanitize posted_msi_supported() Thomas Gleixner
@ 2025-12-17 17:48   ` tip-bot2 for Thomas Gleixner
  2025-12-18 22:03   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-12-17 17:48 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the x86/irq branch of tip:

Commit-ID:     64d4c88270cf90089434e3db67ed443fd982a9a2
Gitweb:        https://git.kernel.org/tip/64d4c88270cf90089434e3db67ed443fd982a9a2
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 25 Nov 2025 22:50:49 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 17 Dec 2025 18:44:17 +01:00

x86/irq_remapping: Sanitize posted_msi_supported()

posted_msi_supported() is a misnomer as it actually checks whether it is
enabled or not. Aside of that this does not take CONFIG_X86_POSTED_MSI into
account which is required to actually use it.

Rename it to posted_msi_enabled() and make the return value depend on
CONFIG_X86_POSTED_MSI, which allows the compiler to eliminate the related
dead code and data if disabled:

  text	   data	    bss	    dec	    hex	filename
  10046	    701	   3296	  14043	   36db	drivers/iommu/intel/irq_remapping.o
   9904	    413	   3296	  13613	   352d	drivers/iommu/intel/irq_remapping.o

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251125214631.170499997@linutronix.de
---
 arch/x86/include/asm/irq_remapping.h | 5 +++--
 drivers/iommu/intel/irq_remapping.c  | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 4e55d17..37b94f4 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -67,9 +67,10 @@ static inline struct irq_domain *arch_get_ir_parent_domain(void)
 
 extern bool enable_posted_msi;
 
-static inline bool posted_msi_supported(void)
+static inline bool posted_msi_enabled(void)
 {
-	return enable_posted_msi && irq_remapping_cap(IRQ_POSTING_CAP);
+	return IS_ENABLED(CONFIG_X86_POSTED_MSI) &&
+		enable_posted_msi && irq_remapping_cap(IRQ_POSTING_CAP);
 }
 
 #else  /* CONFIG_IRQ_REMAP */
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 8bcbfe3..ecb591e 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1368,7 +1368,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 		break;
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		if (posted_msi_supported()) {
+		if (posted_msi_enabled()) {
 			prepare_irte_posted(irte);
 			data->irq_2_iommu.posted_msi = 1;
 		}
@@ -1460,7 +1460,7 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain,
 
 		irq_data->hwirq = (index << 16) + i;
 		irq_data->chip_data = ird;
-		if (posted_msi_supported() &&
+		if (posted_msi_enabled() &&
 		    ((info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI) ||
 		     (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX)))
 			irq_data->chip = &intel_ir_chip_post_msi;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: x86/irq] x86/irq: Cleanup posted MSI code
  2025-11-25 21:50 ` [patch V2 2/3] x86/irq: Cleanup posted MSI code Thomas Gleixner
@ 2025-12-17 17:48   ` tip-bot2 for Thomas Gleixner
  2025-12-18 22:03   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-12-17 17:48 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the x86/irq branch of tip:

Commit-ID:     329e2051476858f264e2c217c6db4e68e203d5db
Gitweb:        https://git.kernel.org/tip/329e2051476858f264e2c217c6db4e68e203d5db
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 25 Nov 2025 22:50:47 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 17 Dec 2025 18:44:16 +01:00

x86/irq: Cleanup posted MSI code

Make code and comments readable and use __this_cpu..() as this is
guaranteed to be invoked with interrupts disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251125214631.108458942@linutronix.de
---
 arch/x86/kernel/irq.c | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index b2fe618..7bc640d 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -401,11 +401,9 @@ static DEFINE_PER_CPU_CACHE_HOT(bool, posted_msi_handler_active);
 
 void intel_posted_msi_init(void)
 {
-	u32 destination;
-	u32 apic_id;
+	u32 destination, apic_id;
 
 	this_cpu_write(posted_msi_pi_desc.nv, POSTED_MSI_NOTIFICATION_VECTOR);
-
 	/*
 	 * APIC destination ID is stored in bit 8:15 while in XAPIC mode.
 	 * VT-d spec. CH 9.11
@@ -449,8 +447,8 @@ static __always_inline bool handle_pending_pir(unsigned long *pir, struct pt_reg
 }
 
 /*
- * Performance data shows that 3 is good enough to harvest 90+% of the benefit
- * on high IRQ rate workload.
+ * Performance data shows that 3 is good enough to harvest 90+% of the
+ * benefit on high interrupt rate workloads.
  */
 #define MAX_POSTED_MSI_COALESCING_LOOP 3
 
@@ -460,11 +458,8 @@ static __always_inline bool handle_pending_pir(unsigned long *pir, struct pt_reg
  */
 DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
 {
+	struct pi_desc *pid = __this_cpu_ptr(&posted_msi_pi_desc);
 	struct pt_regs *old_regs = set_irq_regs(regs);
-	struct pi_desc *pid;
-	int i = 0;
-
-	pid = this_cpu_ptr(&posted_msi_pi_desc);
 
 	/* Mark the handler active for intel_ack_posted_msi_irq() */
 	__this_cpu_write(posted_msi_handler_active, true);
@@ -472,25 +467,25 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
 	irq_enter();
 
 	/*
-	 * Max coalescing count includes the extra round of handle_pending_pir
-	 * after clearing the outstanding notification bit. Hence, at most
-	 * MAX_POSTED_MSI_COALESCING_LOOP - 1 loops are executed here.
+	 * Loop only MAX_POSTED_MSI_COALESCING_LOOP - 1 times here to take
+	 * the final handle_pending_pir() invocation after clearing the
+	 * outstanding notification bit into account.
 	 */
-	while (++i < MAX_POSTED_MSI_COALESCING_LOOP) {
+	for (int i = 1; i < MAX_POSTED_MSI_COALESCING_LOOP; i++) {
 		if (!handle_pending_pir(pid->pir, regs))
 			break;
 	}
 
 	/*
-	 * Clear outstanding notification bit to allow new IRQ notifications,
-	 * do this last to maximize the window of interrupt coalescing.
+	 * Clear the outstanding notification bit to rearm the notification
+	 * mechanism.
 	 */
 	pi_clear_on(pid);
 
 	/*
-	 * There could be a race of PI notification and the clearing of ON bit,
-	 * process PIR bits one last time such that handling the new interrupts
-	 * are not delayed until the next IRQ.
+	 * Clearing the ON bit can race with a notification. Process the
+	 * PIR bits one last time so that handling the new interrupts is
+	 * not delayed until the next notification happens.
 	 */
 	handle_pending_pir(pid->pir, regs);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: x86/irq] x86/irq_remapping: Sanitize posted_msi_supported()
  2025-11-25 21:50 ` [patch V2 3/3] x86/irq_remapping: Sanitize posted_msi_supported() Thomas Gleixner
  2025-12-17 17:48   ` [tip: x86/irq] " tip-bot2 for Thomas Gleixner
@ 2025-12-18 22:03   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-12-18 22:03 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the x86/irq branch of tip:

Commit-ID:     d441e38a2c87824afc7e656e634e55141d015307
Gitweb:        https://git.kernel.org/tip/d441e38a2c87824afc7e656e634e55141d015307
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 25 Nov 2025 22:50:49 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Dec 2025 22:59:40 +01:00

x86/irq_remapping: Sanitize posted_msi_supported()

posted_msi_supported() is a misnomer as it actually checks whether it is
enabled or not. Aside of that this does not take CONFIG_X86_POSTED_MSI into
account which is required to actually use it.

Rename it to posted_msi_enabled() and make the return value depend on
CONFIG_X86_POSTED_MSI, which allows the compiler to eliminate the related
dead code and data if disabled:

  text	   data	    bss	    dec	    hex	filename
  10046	    701	   3296	  14043	   36db	drivers/iommu/intel/irq_remapping.o
   9904	    413	   3296	  13613	   352d	drivers/iommu/intel/irq_remapping.o

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251125214631.170499997@linutronix.de
---
 arch/x86/include/asm/irq_remapping.h | 5 +++--
 drivers/iommu/intel/irq_remapping.c  | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index 4e55d17..37b94f4 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -67,9 +67,10 @@ static inline struct irq_domain *arch_get_ir_parent_domain(void)
 
 extern bool enable_posted_msi;
 
-static inline bool posted_msi_supported(void)
+static inline bool posted_msi_enabled(void)
 {
-	return enable_posted_msi && irq_remapping_cap(IRQ_POSTING_CAP);
+	return IS_ENABLED(CONFIG_X86_POSTED_MSI) &&
+		enable_posted_msi && irq_remapping_cap(IRQ_POSTING_CAP);
 }
 
 #else  /* CONFIG_IRQ_REMAP */
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 8bcbfe3..ecb591e 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1368,7 +1368,7 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data,
 		break;
 	case X86_IRQ_ALLOC_TYPE_PCI_MSI:
 	case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-		if (posted_msi_supported()) {
+		if (posted_msi_enabled()) {
 			prepare_irte_posted(irte);
 			data->irq_2_iommu.posted_msi = 1;
 		}
@@ -1460,7 +1460,7 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain,
 
 		irq_data->hwirq = (index << 16) + i;
 		irq_data->chip_data = ird;
-		if (posted_msi_supported() &&
+		if (posted_msi_enabled() &&
 		    ((info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI) ||
 		     (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX)))
 			irq_data->chip = &intel_ir_chip_post_msi;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [tip: x86/irq] x86/irq: Cleanup posted MSI code
  2025-11-25 21:50 ` [patch V2 2/3] x86/irq: Cleanup posted MSI code Thomas Gleixner
  2025-12-17 17:48   ` [tip: x86/irq] " tip-bot2 for Thomas Gleixner
@ 2025-12-18 22:03   ` tip-bot2 for Thomas Gleixner
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-12-18 22:03 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the x86/irq branch of tip:

Commit-ID:     4021a6dad720273a95ac3c0816fc48e35e77dace
Gitweb:        https://git.kernel.org/tip/4021a6dad720273a95ac3c0816fc48e35e77dace
Author:        Thomas Gleixner <tglx@linutronix.de>
AuthorDate:    Tue, 25 Nov 2025 22:50:47 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 18 Dec 2025 22:59:40 +01:00

x86/irq: Cleanup posted MSI code

Make code and comments readable and use __this_cpu..() as this is
guaranteed to be invoked with interrupts disabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251125214631.108458942@linutronix.de
---
 arch/x86/kernel/irq.c | 31 +++++++++++++------------------
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index b2fe618..d817feb 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -401,11 +401,9 @@ static DEFINE_PER_CPU_CACHE_HOT(bool, posted_msi_handler_active);
 
 void intel_posted_msi_init(void)
 {
-	u32 destination;
-	u32 apic_id;
+	u32 destination, apic_id;
 
 	this_cpu_write(posted_msi_pi_desc.nv, POSTED_MSI_NOTIFICATION_VECTOR);
-
 	/*
 	 * APIC destination ID is stored in bit 8:15 while in XAPIC mode.
 	 * VT-d spec. CH 9.11
@@ -449,8 +447,8 @@ static __always_inline bool handle_pending_pir(unsigned long *pir, struct pt_reg
 }
 
 /*
- * Performance data shows that 3 is good enough to harvest 90+% of the benefit
- * on high IRQ rate workload.
+ * Performance data shows that 3 is good enough to harvest 90+% of the
+ * benefit on high interrupt rate workloads.
  */
 #define MAX_POSTED_MSI_COALESCING_LOOP 3
 
@@ -460,11 +458,8 @@ static __always_inline bool handle_pending_pir(unsigned long *pir, struct pt_reg
  */
 DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
 {
+	struct pi_desc *pid = this_cpu_ptr(&posted_msi_pi_desc);
 	struct pt_regs *old_regs = set_irq_regs(regs);
-	struct pi_desc *pid;
-	int i = 0;
-
-	pid = this_cpu_ptr(&posted_msi_pi_desc);
 
 	/* Mark the handler active for intel_ack_posted_msi_irq() */
 	__this_cpu_write(posted_msi_handler_active, true);
@@ -472,25 +467,25 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
 	irq_enter();
 
 	/*
-	 * Max coalescing count includes the extra round of handle_pending_pir
-	 * after clearing the outstanding notification bit. Hence, at most
-	 * MAX_POSTED_MSI_COALESCING_LOOP - 1 loops are executed here.
+	 * Loop only MAX_POSTED_MSI_COALESCING_LOOP - 1 times here to take
+	 * the final handle_pending_pir() invocation after clearing the
+	 * outstanding notification bit into account.
 	 */
-	while (++i < MAX_POSTED_MSI_COALESCING_LOOP) {
+	for (int i = 1; i < MAX_POSTED_MSI_COALESCING_LOOP; i++) {
 		if (!handle_pending_pir(pid->pir, regs))
 			break;
 	}
 
 	/*
-	 * Clear outstanding notification bit to allow new IRQ notifications,
-	 * do this last to maximize the window of interrupt coalescing.
+	 * Clear the outstanding notification bit to rearm the notification
+	 * mechanism.
 	 */
 	pi_clear_on(pid);
 
 	/*
-	 * There could be a race of PI notification and the clearing of ON bit,
-	 * process PIR bits one last time such that handling the new interrupts
-	 * are not delayed until the next IRQ.
+	 * Clearing the ON bit can race with a notification. Process the
+	 * PIR bits one last time so that handling the new interrupts is
+	 * not delayed until the next notification happens.
 	 */
 	handle_pending_pir(pid->pir, regs);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-12-18 22:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25 21:50 [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Thomas Gleixner
2025-11-25 21:50 ` [patch V2 1/3] x86/msi: Make irq_retrigger() functional for posted MSI Thomas Gleixner
2025-12-17 17:48   ` [tip: x86/urgent] " tip-bot2 for Thomas Gleixner
2025-11-25 21:50 ` [patch V2 2/3] x86/irq: Cleanup posted MSI code Thomas Gleixner
2025-12-17 17:48   ` [tip: x86/irq] " tip-bot2 for Thomas Gleixner
2025-12-18 22:03   ` tip-bot2 for Thomas Gleixner
2025-11-25 21:50 ` [patch V2 3/3] x86/irq_remapping: Sanitize posted_msi_supported() Thomas Gleixner
2025-12-17 17:48   ` [tip: x86/irq] " tip-bot2 for Thomas Gleixner
2025-12-18 22:03   ` tip-bot2 for Thomas Gleixner
2025-12-17 17:03 ` [patch V2 0/3] x86/irq: Bugfix and cleanup for posted MSI interrupts Luigi Rizzo
2025-12-17 17:37   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox