public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: aacraid: Stop using PCI_IRQ_AFFINITY
@ 2025-07-15 11:15 John Garry
  2025-07-15 19:18 ` John Meneghini
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: John Garry @ 2025-07-15 11:15 UTC (permalink / raw)
  To: jejb, martin.petersen, sagar.biradar
  Cc: linux-scsi, jmeneghi, hare, ming.lei, John Garry

When PCI_IRQ_AFFINITY is set for calling pci_alloc_irq_vectors(), it
means interrupts are spread around the available CPUs. It also means that
the interrupts become managed, which means that an interrupt is shutdown
when all the CPUs in the interrupt affinity mask go offline.

Using managed interrupts in this way means that we should ensure that
completions should not occur on HW queues where the associated interrupt
is shutdown. This is typically achieved by ensuring only CPUs which are
online can generate IO completion traffic to the HW queue which they are
mapped to (so that they can also serve completion interrupts for that
HW queue).

The problem in the driver is that a CPU can generate completions to
a HW queue whose interrupt may be shutdown, as the CPUs in the HW queue
interrupt affinity mask may be offline. This can cause IOs to never
complete and hang the system. The driver maintains its own CPU <-> HW
queue mapping for submissions, see aac_fib_vector_assign(), but this
does not reflect the CPU <-> HW queue interrupt affinity mapping.

Commit 9dc704dcc09e ("scsi: aacraid: Reply queue mapping to CPUs based on
IRQ affinity") tried to remedy this issue may mapping CPUs properly to
HW queue interrupts. However this was later reverted in commit c5becf57dd56
("Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ
affinity") - it seems that there were other reports of hangs. I guess that
this was due to some implementation issue in the original commit or
maybe a HW issue.

Fix the very original hang by just not using managed interrupts by not
setting PCI_IRQ_AFFINITY.  In this way, all CPUs will be in each HW
queue affinity mask, so should not create completion problems if any
CPUs go offline.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
build tested only

diff --git a/drivers/scsi/aacraid/comminit.c b/drivers/scsi/aacraid/comminit.c
index 28cf18955a08..726c8531b7d3 100644
--- a/drivers/scsi/aacraid/comminit.c
+++ b/drivers/scsi/aacraid/comminit.c
@@ -481,8 +481,7 @@ void aac_define_int_mode(struct aac_dev *dev)
 	    pci_find_capability(dev->pdev, PCI_CAP_ID_MSIX)) {
 		min_msix = 2;
 		i = pci_alloc_irq_vectors(dev->pdev,
-					  min_msix, msi_count,
-					  PCI_IRQ_MSIX | PCI_IRQ_AFFINITY);
+					  min_msix, msi_count, PCI_IRQ_MSIX);
 		if (i > 0) {
 			dev->msi_enabled = 1;
 			msi_count = i;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-31  4:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15 11:15 [PATCH] scsi: aacraid: Stop using PCI_IRQ_AFFINITY John Garry
2025-07-15 19:18 ` John Meneghini
2025-07-24 17:23 ` John Meneghini
2025-07-25  8:24   ` John Garry
2025-07-24 17:29 ` John Meneghini
2025-07-25  1:18 ` Martin K. Petersen
2025-07-31  4:44 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox