linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 6.16-6.6] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU
       [not found] <20250805130945.471732-1-sashal@kernel.org>
@ 2025-08-05 13:09 ` Sasha Levin
  2025-08-05 13:20   ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Sasha Levin @ 2025-08-05 13:09 UTC (permalink / raw)
  To: patches, stable
  Cc: Marc Zyngier, Lorenzo Pieralisi, Bjorn Helgaas, Sasha Levin, toan,
	kwilczynski, mani, linux-pci, linux-arm-kernel

From: Marc Zyngier <maz@kernel.org>

[ Upstream commit 3cc8f625e4c6a0e9f936da6b94166e62e387fe1d ]

Since changing the affinity of an MSI really is about changing
the target address and that it isn't possible to mask an individual
MSI, it is completely possible for an interrupt to race with itself,
usually resulting in a lost interrupt.

Paper over the design blunder by informing the core code of this
sad state of affairs.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20250708173404.1278635-11-maz@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees. Here's my
detailed analysis:

## Key Evidence for Backporting:

1. **Fixes a Real Bug - Lost Interrupts**: The commit addresses a race
   condition where MSI interrupts can be lost when CPU affinity changes
   occur. The commit message explicitly states this results in "a lost
   interrupt" which is a serious issue that can cause system malfunction
   or hangs.

2. **Minimal and Contained Change**: The fix consists of a single line
   addition:
  ```c
  irqd_set_resend_when_in_progress(irq_get_irq_data(virq));
  ```
  This is an extremely small, targeted fix that only affects the xgene-
  msi driver's interrupt allocation path.

3. **Part of a Broader Fix Pattern**: This commit follows the same
   pattern established for other interrupt controllers:
   - Commit 9c15eeb5362c4 introduced this mechanism for fasteoi handlers
     in 2023
   - Commit 2c9e7f857400 extended it to handle_simple_irq() flow
   - This commit applies the same proven workaround to the xgene-msi
     driver

4. **Addresses Hardware Design Limitation**: The commit message
   describes this as papering over a "design blunder" - the hardware
   inability to mask individual MSIs combined with affinity changes
   creates an unavoidable race condition. This is not a new feature but
   a critical workaround for existing hardware defects.

5. **History of Race Conditions in This Driver**: The xgene-msi driver
   has had race condition issues before (commit a93c00e5f975 fixed a
   different race in 2021), indicating this subsystem needs these types
   of fixes for stability.

6. **No Architecture Changes**: The fix uses existing kernel
   infrastructure (irqd_set_resend_when_in_progress) without introducing
   new APIs or changing kernel architecture.

7. **Low Risk of Regression**: The change only affects the specific
   xgene-msi driver and only adds a resend mechanism when interrupts are
   already in progress - it doesn't change the normal interrupt handling
   path.

## Stable Tree Criteria Met:
- ✅ Fixes a real bug (lost interrupts)
- ✅ Small change (1 line)
- ✅ Not a new feature
- ✅ Isolated to specific driver
- ✅ Uses established kernel mechanisms
- ✅ Addresses hardware limitation that affects deployed systems

The fix prevents interrupt loss during CPU affinity changes on X-Gene
MSI controllers, which is exactly the type of bug fix that stable
kernels should receive to maintain system reliability.

 drivers/pci/controller/pci-xgene-msi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index b05ec8b0bb93..50647fa14e69 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -200,6 +200,7 @@ static int xgene_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 	irq_domain_set_info(domain, virq, msi_irq,
 			    &xgene_msi_bottom_irq_chip, domain->host_data,
 			    handle_simple_irq, NULL, NULL);
+	irqd_set_resend_when_in_progress(irq_get_irq_data(virq));
 
 	return 0;
 }
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH AUTOSEL 6.16-6.6] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU
  2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.6] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU Sasha Levin
@ 2025-08-05 13:20   ` Marc Zyngier
  2025-08-05 13:59     ` Sasha Levin
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2025-08-05 13:20 UTC (permalink / raw)
  To: Sasha Levin
  Cc: patches, stable, Lorenzo Pieralisi, Bjorn Helgaas, toan,
	kwilczynski, mani, linux-pci, linux-arm-kernel

On Tue, 05 Aug 2025 14:09:34 +0100,
Sasha Levin <sashal@kernel.org> wrote:
> 
> From: Marc Zyngier <maz@kernel.org>
> 
> [ Upstream commit 3cc8f625e4c6a0e9f936da6b94166e62e387fe1d ]
> 
> Since changing the affinity of an MSI really is about changing
> the target address and that it isn't possible to mask an individual
> MSI, it is completely possible for an interrupt to race with itself,
> usually resulting in a lost interrupt.
> 
> Paper over the design blunder by informing the core code of this
> sad state of affairs.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> Link: https://lore.kernel.org/r/20250708173404.1278635-11-maz@kernel.org
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> 
> LLM Generated explanations, may be completely bogus:

s/may be//. It is an amusing read though, specially when quoting
totally unrelated patches, so thumbs up for the comical value.

But I'm not even going to entertain explaining *why* backporting this
patch on its own is nonsense. Reading the original series should be
enlightening enough.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH AUTOSEL 6.16-6.6] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU
  2025-08-05 13:20   ` Marc Zyngier
@ 2025-08-05 13:59     ` Sasha Levin
  2025-08-05 18:09       ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Sasha Levin @ 2025-08-05 13:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: patches, stable, Lorenzo Pieralisi, Bjorn Helgaas, toan,
	kwilczynski, mani, linux-pci, linux-arm-kernel

On Tue, Aug 05, 2025 at 02:20:52PM +0100, Marc Zyngier wrote:
>On Tue, 05 Aug 2025 14:09:34 +0100,
>Sasha Levin <sashal@kernel.org> wrote:
>>
>> From: Marc Zyngier <maz@kernel.org>
>>
>> [ Upstream commit 3cc8f625e4c6a0e9f936da6b94166e62e387fe1d ]
>>
>> Since changing the affinity of an MSI really is about changing
>> the target address and that it isn't possible to mask an individual
>> MSI, it is completely possible for an interrupt to race with itself,
>> usually resulting in a lost interrupt.
>>
>> Paper over the design blunder by informing the core code of this
>> sad state of affairs.
>>
>> Signed-off-by: Marc Zyngier <maz@kernel.org>
>> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>> Link: https://lore.kernel.org/r/20250708173404.1278635-11-maz@kernel.org
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>> ---
>>
>> LLM Generated explanations, may be completely bogus:
>
>s/may be//. It is an amusing read though, specially when quoting
>totally unrelated patches, so thumbs up for the comical value.

Yeah, it's still very much at the "junior engineer" level, but honestly
I think that just the boolean yes/no answers out of it provides a better
noise to signal ratio than the older AUTOSEL.

>But I'm not even going to entertain explaining *why* backporting this
>patch on its own is nonsense. Reading the original series should be
>enlightening enough.

Sadly it doesn't have the context to understand that that specific
conmit is part of a larger series. That information just disappears when
patches are applied into git.

I'll drop it, thanks!

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH AUTOSEL 6.16-6.6] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU
  2025-08-05 13:59     ` Sasha Levin
@ 2025-08-05 18:09       ` Marc Zyngier
  0 siblings, 0 replies; 4+ messages in thread
From: Marc Zyngier @ 2025-08-05 18:09 UTC (permalink / raw)
  To: Sasha Levin
  Cc: patches, stable, Lorenzo Pieralisi, Bjorn Helgaas, toan,
	kwilczynski, mani, linux-pci, linux-arm-kernel

On Tue, 05 Aug 2025 14:59:27 +0100,
Sasha Levin <sashal@kernel.org> wrote:
> 
> On Tue, Aug 05, 2025 at 02:20:52PM +0100, Marc Zyngier wrote:
> > On Tue, 05 Aug 2025 14:09:34 +0100,
> > Sasha Levin <sashal@kernel.org> wrote:
> >> 
> >> From: Marc Zyngier <maz@kernel.org>
> >> 
> >> [ Upstream commit 3cc8f625e4c6a0e9f936da6b94166e62e387fe1d ]
> >> 
> >> Since changing the affinity of an MSI really is about changing
> >> the target address and that it isn't possible to mask an individual
> >> MSI, it is completely possible for an interrupt to race with itself,
> >> usually resulting in a lost interrupt.
> >> 
> >> Paper over the design blunder by informing the core code of this
> >> sad state of affairs.
> >> 
> >> Signed-off-by: Marc Zyngier <maz@kernel.org>
> >> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
> >> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> >> Link: https://lore.kernel.org/r/20250708173404.1278635-11-maz@kernel.org
> >> Signed-off-by: Sasha Levin <sashal@kernel.org>
> >> ---
> >> 
> >> LLM Generated explanations, may be completely bogus:
> > 
> > s/may be//. It is an amusing read though, specially when quoting
> > totally unrelated patches, so thumbs up for the comical value.
> 
> Yeah, it's still very much at the "junior engineer" level

It's not, and that's the main issue. A junior engineer would get into
the rabbit hole of backporting too much, as they would be unable to
separate the essential logic from the surrounding fluff. There would
be a lot of noise, but it would be OK.

Your "thing" is very much at the "Senior Marketroid" level, in the
sense that it manages to drag some semi-relevant information from
various sources, and condenses it into an advertisement for snake oil.

I think I know who which of the two I want to work with.

	M.

-- 
Jazz isn't dead. It just smells funny.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-08-05 18:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250805130945.471732-1-sashal@kernel.org>
2025-08-05 13:09 ` [PATCH AUTOSEL 6.16-6.6] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU Sasha Levin
2025-08-05 13:20   ` Marc Zyngier
2025-08-05 13:59     ` Sasha Levin
2025-08-05 18:09       ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).