linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity
@ 2025-06-18 19:24 John Meneghini
  2025-06-18 19:37 ` John Meneghini
  2025-06-19 10:46 ` John Garry
  0 siblings, 2 replies; 5+ messages in thread
From: John Meneghini @ 2025-06-18 19:24 UTC (permalink / raw)
  To: James.Bottomley, martin.petersen, linux-scsi, aacraid, corbet
  Cc: linux-doc, linux-kernel, thenzl, Scott.Benesh, Don.Brace,
	Tom.White, Abhinav.Kuchibhotla, sagar.biradar, mpatalan

From: Sagar Biradar <sagar.biradar@microchip.com>

From: Sagar Biradar <sagar.biradar@microchip.com>

This patch fixes a bug in the original path that caused I/O hangs. The
I/O hangs were because of an MSIx vector not having a mapped online CPU
upon receiving completion.

This patch enables Multi-Q support in the aacriad driver. Multi-Q support
in the driver is needed to support CPU offlining.

SCSI cmds use the mq_map to get the vector_no via blk_mq_unique_tag()
and blk_mq_unique_tag_to_hwq() - which are setup during the blk_mq init.
For reserved cmds, or the ones before the blk_mq init, use the vector_no
0, which is the norm since don't yet have a proper mapping to the queues.

Note that this change can cause a drop in performance in some
configurations. To address any concerns about performance the
CONFIG_SCSI_AACRAID_MULTIQ option has been added.

The CONFIG_SCSI_AACRAID_MULTIQ option is on by default to ensure that
CPU offlining with MultiQ support is enabled. To disable MultiQ support
compile the kernel with CONFIG_SCSI_AACRAID_MULTIQ=N. Disabling MultiQ
support should not be done if your application uses CPU offling.

Closes: https://lore.kernel.org/linux-scsi/20250130173314.608836-1-sagar.biradar@microchip.com/

Fixes: c5becf57dd56 ("Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity"")
Signed-off-by: Sagar Biradar <sagar.biradar@microchip.com>
[jmeneghi: replace aac_cpu_offline_feature with Kconfig option]
Co-developed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Gilbert Wu <gilbert.wu@microchip.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Tested-by: Marco Patalano <mpatalan@redhat.com>
---
 Documentation/scsi/aacraid.rst | 11 +++++++++++
 MAINTAINERS                    |  1 +
 drivers/scsi/Kconfig           |  2 +-
 drivers/scsi/aacraid/Kconfig   | 20 ++++++++++++++++++++
 drivers/scsi/aacraid/aachba.c  |  1 -
 drivers/scsi/aacraid/aacraid.h |  3 +++
 drivers/scsi/aacraid/commsup.c | 16 ++++++++++++++--
 drivers/scsi/aacraid/linit.c   | 23 ++++++++++++++++++++++-
 drivers/scsi/aacraid/src.c     | 28 +++++++++++++++++++++++++++-
 9 files changed, 99 insertions(+), 6 deletions(-)
 create mode 100644 drivers/scsi/aacraid/Kconfig

diff --git a/Documentation/scsi/aacraid.rst b/Documentation/scsi/aacraid.rst
index 1904674b94f3..0fc35edd0ac9 100644
--- a/Documentation/scsi/aacraid.rst
+++ b/Documentation/scsi/aacraid.rst
@@ -129,6 +129,17 @@ Supported Cards/Chipsets
 People
 ======
 
+Sagar Biradar <Sagar.Biradar@microchip.com>
+
+ - Added support for CPU offlining and updated the driver to support MultiQ.
+ - Introduced the option CONFIG_SCSI_AACRAID_MULTIQ to control this feature.
+
+By default, MultiQ support is disabled to provide optimal I/O performance.
+
+Enable CONFIG_SCSI_AACRAID_MULTIQ only if CPU offlining support is required,
+as enabling it may result in reduced performance in some configurations.
+Note : Disabling MultiQ while still offlining CPUs may lead to I/O hangs.
+
 Alan Cox <alan@lxorguk.ukuu.org.uk>
 
 Christoph Hellwig <hch@infradead.org>
diff --git a/MAINTAINERS b/MAINTAINERS
index a92290fffa16..996c90959caa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -184,6 +184,7 @@ L:	linux-scsi@vger.kernel.org
 S:	Supported
 W:	http://www.adaptec.com/
 F:	Documentation/scsi/aacraid.rst
+F:	drivers/scsi/aacraid/Kconfig
 F:	drivers/scsi/aacraid/
 
 AAEON UPBOARD FPGA MFD DRIVER
diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index 5522310bab8d..a6739cd94db7 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -456,7 +456,7 @@ config SCSI_AACRAID
 	  To compile this driver as a module, choose M here: the module
 	  will be called aacraid.
 
-
+source "drivers/scsi/aacraid/Kconfig"
 source "drivers/scsi/aic7xxx/Kconfig.aic7xxx"
 source "drivers/scsi/aic7xxx/Kconfig.aic79xx"
 source "drivers/scsi/aic94xx/Kconfig"
diff --git a/drivers/scsi/aacraid/Kconfig b/drivers/scsi/aacraid/Kconfig
new file mode 100644
index 000000000000..c69e1a7a77d2
--- /dev/null
+++ b/drivers/scsi/aacraid/Kconfig
@@ -0,0 +1,20 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Kernel configuration file for the aacraid driver.
+#
+# Copyright (c) 2025 Microchip Technology Inc. and its subsidiaries
+#  (mailto:storagedev@microchip.com)
+#
+config SCSI_AACRAID_MULTIQ
+	bool "AACRAID Multiq support"
+	depends on SCSI_AACRAID
+	default n
+	help
+	  This option enables MultiQ support in the aacraid driver.
+
+	  Enabling MultiQ allows the driver to safely handle CPU offlining.
+	  However, it may cause a performance drop in certain configurations.
+
+	  Enable this option only if your system requires CPU offlining support.
+
+	  If unsure, say N.
diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 0be719f38377..db9bef348834 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -328,7 +328,6 @@ MODULE_PARM_DESC(wwn, "Select a WWN type for the arrays:\n"
 	"\t1 - Array Meta Data Signature (default)\n"
 	"\t2 - Adapter Serial Number");
 
-
 static inline int aac_valid_context(struct scsi_cmnd *scsicmd,
 		struct fib *fibptr) {
 	struct scsi_device *device;
diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
index 0a5888b53d6d..557f9fc50012 100644
--- a/drivers/scsi/aacraid/aacraid.h
+++ b/drivers/scsi/aacraid/aacraid.h
@@ -1672,6 +1672,9 @@ struct aac_dev
 	u32			handle_pci_error;
 	bool			init_reset;
 	u8			soft_reset_support;
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+	u8			use_map_queue;
+#endif
 };
 
 #define aac_adapter_interrupt(dev) \
diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
index 7d9a4dce236b..309a166ce6e4 100644
--- a/drivers/scsi/aacraid/commsup.c
+++ b/drivers/scsi/aacraid/commsup.c
@@ -215,8 +215,17 @@ int aac_fib_setup(struct aac_dev * dev)
 struct fib *aac_fib_alloc_tag(struct aac_dev *dev, struct scsi_cmnd *scmd)
 {
 	struct fib *fibptr;
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+	u32 blk_tag;
+	int i;
 
+	blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
+	i = blk_mq_unique_tag_to_tag(blk_tag);
+	fibptr = &dev->fibs[i];
+#else
 	fibptr = &dev->fibs[scsi_cmd_to_rq(scmd)->tag];
+#endif
+
 	/*
 	 *	Null out fields that depend on being zero at the start of
 	 *	each I/O
@@ -242,14 +251,17 @@ struct fib *aac_fib_alloc(struct aac_dev *dev)
 {
 	struct fib * fibptr;
 	unsigned long flags;
+
 	spin_lock_irqsave(&dev->fib_lock, flags);
+	/*  Management FIB allocation: use free list within reserved range */
 	fibptr = dev->free_fib;
-	if(!fibptr){
+	if (!fibptr) {
 		spin_unlock_irqrestore(&dev->fib_lock, flags);
-		return fibptr;
+		return NULL;
 	}
 	dev->free_fib = fibptr->next;
 	spin_unlock_irqrestore(&dev->fib_lock, flags);
+
 	/*
 	 *	Set the proper node type code and node byte size
 	 */
diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
index 4b12e6dd8f07..8c56579a8efc 100644
--- a/drivers/scsi/aacraid/linit.c
+++ b/drivers/scsi/aacraid/linit.c
@@ -506,6 +506,17 @@ static int aac_sdev_configure(struct scsi_device *sdev,
 	return 0;
 }
 
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+static void aac_map_queues(struct Scsi_Host *shost)
+{
+	struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
+	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+
+	blk_mq_map_hw_queues(qmap, &aac->pdev->dev, 0);
+	aac->use_map_queue = true;
+}
+#endif
+
 /**
  *	aac_change_queue_depth		-	alter queue depths
  *	@sdev:	SCSI device we are considering
@@ -1490,6 +1501,9 @@ static const struct scsi_host_template aac_driver_template = {
 	.bios_param			= aac_biosparm,
 	.shost_groups			= aac_host_groups,
 	.sdev_configure			= aac_sdev_configure,
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+	.map_queues			= aac_map_queues,
+#endif
 	.change_queue_depth		= aac_change_queue_depth,
 	.sdev_groups			= aac_dev_groups,
 	.eh_abort_handler		= aac_eh_abort,
@@ -1777,7 +1791,11 @@ static int aac_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
 	shost->max_lun = AAC_MAX_LUN;
 
 	pci_set_drvdata(pdev, shost);
-
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+	shost->nr_hw_queues = aac->max_msix;
+	shost->can_queue = min_t(int, aac->vector_cap, shost->can_queue);
+	shost->host_tagset = 1;
+#endif
 	error = scsi_add_host(shost, &pdev->dev);
 	if (error)
 		goto out_deinit;
@@ -1908,6 +1926,9 @@ static void aac_remove_one(struct pci_dev *pdev)
 	struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
 
 	aac_cancel_rescan_worker(aac);
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+	aac->use_map_queue = false;
+#endif
 	scsi_remove_host(shost);
 
 	__aac_shutdown(aac);
diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 28115ed637e8..5881bce3ccfc 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -493,6 +493,12 @@ static int aac_src_deliver_message(struct fib *fib)
 #endif
 
 	u16 vector_no;
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+	struct scsi_cmnd *scmd;
+	u32 blk_tag;
+	struct Scsi_Host *shost = dev->scsi_host_ptr;
+	struct blk_mq_queue_map *qmap;
+#endif
 
 	atomic_inc(&q->numpending);
 
@@ -505,8 +511,28 @@ static int aac_src_deliver_message(struct fib *fib)
 		if ((dev->comm_interface == AAC_COMM_MESSAGE_TYPE3)
 			&& dev->sa_firmware)
 			vector_no = aac_get_vector(dev);
-		else
+		else {
+#ifdef CONFIG_SCSI_AACRAID_MULTIQ
+			if (!fib->vector_no || !fib->callback_data) {
+				if (shost && dev->use_map_queue) {
+					qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
+					vector_no = qmap->mq_map[raw_smp_processor_id()];
+				}
+				/*
+				 *	We hardcode the vector_no for reserved commands
+				 *	as a valid shost is absent during the init.
+				 */
+				else
+					vector_no = 0;
+			} else {
+				scmd = (struct scsi_cmnd *)fib->callback_data;
+				blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
+				vector_no = blk_mq_unique_tag_to_hwq(blk_tag);
+			}
+#else
 			vector_no = fib->vector_no;
+#endif
+		}
 
 		if (native_hba) {
 			if (fib->flags & FIB_CONTEXT_FLAG_NATIVE_HBA_TMF) {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity
  2025-06-18 19:24 [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity John Meneghini
@ 2025-06-18 19:37 ` John Meneghini
  2025-06-19 10:46 ` John Garry
  1 sibling, 0 replies; 5+ messages in thread
From: John Meneghini @ 2025-06-18 19:37 UTC (permalink / raw)
  To: martin.petersen
  Cc: linux-doc, linux-kernel, thenzl, Scott.Benesh, Don.Brace,
	Tom.White, Abhinav.Kuchibhotla, sagar.biradar, mpatalan,
	James.Bottomley, linux-scsi, aacraid, corbet

Martin,

This is the AACRAID patch we talked about at LSFMM this spring.  This change has undergone extensive internal review and testing at Red Hat while working closely with Microchip. We'd really like to get this merged into v6.16 as a fix because the current upstream aacraid driver is badly broken when CPU offline is used.

I've based this patch on scsi/6.16/scsi-fixes.

Thanks,

/John

On 6/18/25 3:24 PM, John Meneghini wrote:
> From: Sagar Biradar <sagar.biradar@microchip.com>
> 
> From: Sagar Biradar <sagar.biradar@microchip.com>
> 
> This patch fixes a bug in the original path that caused I/O hangs. The
> I/O hangs were because of an MSIx vector not having a mapped online CPU
> upon receiving completion.
> 
> This patch enables Multi-Q support in the aacriad driver. Multi-Q support
> in the driver is needed to support CPU offlining.
> 
> SCSI cmds use the mq_map to get the vector_no via blk_mq_unique_tag()
> and blk_mq_unique_tag_to_hwq() - which are setup during the blk_mq init.
> For reserved cmds, or the ones before the blk_mq init, use the vector_no
> 0, which is the norm since don't yet have a proper mapping to the queues.
> 
> Note that this change can cause a drop in performance in some
> configurations. To address any concerns about performance the
> CONFIG_SCSI_AACRAID_MULTIQ option has been added.
> 
> The CONFIG_SCSI_AACRAID_MULTIQ option is on by default to ensure that
> CPU offlining with MultiQ support is enabled. To disable MultiQ support
> compile the kernel with CONFIG_SCSI_AACRAID_MULTIQ=N. Disabling MultiQ
> support should not be done if your application uses CPU offling.
> 
> Closes: https://lore.kernel.org/linux-scsi/20250130173314.608836-1-sagar.biradar@microchip.com/
> 
> Fixes: c5becf57dd56 ("Revert "scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity"")
> Signed-off-by: Sagar Biradar <sagar.biradar@microchip.com>
> [jmeneghi: replace aac_cpu_offline_feature with Kconfig option]
> Co-developed-by: John Meneghini <jmeneghi@redhat.com>
> Signed-off-by: John Meneghini <jmeneghi@redhat.com>
> Reviewed-by: Gilbert Wu <gilbert.wu@microchip.com>
> Reviewed-by: Tomas Henzl <thenzl@redhat.com>
> Tested-by: Marco Patalano <mpatalan@redhat.com>
> ---
>   Documentation/scsi/aacraid.rst | 11 +++++++++++
>   MAINTAINERS                    |  1 +
>   drivers/scsi/Kconfig           |  2 +-
>   drivers/scsi/aacraid/Kconfig   | 20 ++++++++++++++++++++
>   drivers/scsi/aacraid/aachba.c  |  1 -
>   drivers/scsi/aacraid/aacraid.h |  3 +++
>   drivers/scsi/aacraid/commsup.c | 16 ++++++++++++++--
>   drivers/scsi/aacraid/linit.c   | 23 ++++++++++++++++++++++-
>   drivers/scsi/aacraid/src.c     | 28 +++++++++++++++++++++++++++-
>   9 files changed, 99 insertions(+), 6 deletions(-)
>   create mode 100644 drivers/scsi/aacraid/Kconfig
> 
> diff --git a/Documentation/scsi/aacraid.rst b/Documentation/scsi/aacraid.rst
> index 1904674b94f3..0fc35edd0ac9 100644
> --- a/Documentation/scsi/aacraid.rst
> +++ b/Documentation/scsi/aacraid.rst
> @@ -129,6 +129,17 @@ Supported Cards/Chipsets
>   People
>   ======
>   
> +Sagar Biradar <Sagar.Biradar@microchip.com>
> +
> + - Added support for CPU offlining and updated the driver to support MultiQ.
> + - Introduced the option CONFIG_SCSI_AACRAID_MULTIQ to control this feature.
> +
> +By default, MultiQ support is disabled to provide optimal I/O performance.
> +
> +Enable CONFIG_SCSI_AACRAID_MULTIQ only if CPU offlining support is required,
> +as enabling it may result in reduced performance in some configurations.
> +Note : Disabling MultiQ while still offlining CPUs may lead to I/O hangs.
> +
>   Alan Cox <alan@lxorguk.ukuu.org.uk>
>   
>   Christoph Hellwig <hch@infradead.org>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a92290fffa16..996c90959caa 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -184,6 +184,7 @@ L:	linux-scsi@vger.kernel.org
>   S:	Supported
>   W:	http://www.adaptec.com/
>   F:	Documentation/scsi/aacraid.rst
> +F:	drivers/scsi/aacraid/Kconfig
>   F:	drivers/scsi/aacraid/
>   
>   AAEON UPBOARD FPGA MFD DRIVER
> diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
> index 5522310bab8d..a6739cd94db7 100644
> --- a/drivers/scsi/Kconfig
> +++ b/drivers/scsi/Kconfig
> @@ -456,7 +456,7 @@ config SCSI_AACRAID
>   	  To compile this driver as a module, choose M here: the module
>   	  will be called aacraid.
>   
> -
> +source "drivers/scsi/aacraid/Kconfig"
>   source "drivers/scsi/aic7xxx/Kconfig.aic7xxx"
>   source "drivers/scsi/aic7xxx/Kconfig.aic79xx"
>   source "drivers/scsi/aic94xx/Kconfig"
> diff --git a/drivers/scsi/aacraid/Kconfig b/drivers/scsi/aacraid/Kconfig
> new file mode 100644
> index 000000000000..c69e1a7a77d2
> --- /dev/null
> +++ b/drivers/scsi/aacraid/Kconfig
> @@ -0,0 +1,20 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Kernel configuration file for the aacraid driver.
> +#
> +# Copyright (c) 2025 Microchip Technology Inc. and its subsidiaries
> +#  (mailto:storagedev@microchip.com)
> +#
> +config SCSI_AACRAID_MULTIQ
> +	bool "AACRAID Multiq support"
> +	depends on SCSI_AACRAID
> +	default n
> +	help
> +	  This option enables MultiQ support in the aacraid driver.
> +
> +	  Enabling MultiQ allows the driver to safely handle CPU offlining.
> +	  However, it may cause a performance drop in certain configurations.
> +
> +	  Enable this option only if your system requires CPU offlining support.
> +
> +	  If unsure, say N.
> diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
> index 0be719f38377..db9bef348834 100644
> --- a/drivers/scsi/aacraid/aachba.c
> +++ b/drivers/scsi/aacraid/aachba.c
> @@ -328,7 +328,6 @@ MODULE_PARM_DESC(wwn, "Select a WWN type for the arrays:\n"
>   	"\t1 - Array Meta Data Signature (default)\n"
>   	"\t2 - Adapter Serial Number");
>   
> -
>   static inline int aac_valid_context(struct scsi_cmnd *scsicmd,
>   		struct fib *fibptr) {
>   	struct scsi_device *device;
> diff --git a/drivers/scsi/aacraid/aacraid.h b/drivers/scsi/aacraid/aacraid.h
> index 0a5888b53d6d..557f9fc50012 100644
> --- a/drivers/scsi/aacraid/aacraid.h
> +++ b/drivers/scsi/aacraid/aacraid.h
> @@ -1672,6 +1672,9 @@ struct aac_dev
>   	u32			handle_pci_error;
>   	bool			init_reset;
>   	u8			soft_reset_support;
> +#ifdef CONFIG_SCSI_AACRAID_MULTIQ
> +	u8			use_map_queue;
> +#endif
>   };
>   
>   #define aac_adapter_interrupt(dev) \
> diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c
> index 7d9a4dce236b..309a166ce6e4 100644
> --- a/drivers/scsi/aacraid/commsup.c
> +++ b/drivers/scsi/aacraid/commsup.c
> @@ -215,8 +215,17 @@ int aac_fib_setup(struct aac_dev * dev)
>   struct fib *aac_fib_alloc_tag(struct aac_dev *dev, struct scsi_cmnd *scmd)
>   {
>   	struct fib *fibptr;
> +#ifdef CONFIG_SCSI_AACRAID_MULTIQ
> +	u32 blk_tag;
> +	int i;
>   
> +	blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
> +	i = blk_mq_unique_tag_to_tag(blk_tag);
> +	fibptr = &dev->fibs[i];
> +#else
>   	fibptr = &dev->fibs[scsi_cmd_to_rq(scmd)->tag];
> +#endif
> +
>   	/*
>   	 *	Null out fields that depend on being zero at the start of
>   	 *	each I/O
> @@ -242,14 +251,17 @@ struct fib *aac_fib_alloc(struct aac_dev *dev)
>   {
>   	struct fib * fibptr;
>   	unsigned long flags;
> +
>   	spin_lock_irqsave(&dev->fib_lock, flags);
> +	/*  Management FIB allocation: use free list within reserved range */
>   	fibptr = dev->free_fib;
> -	if(!fibptr){
> +	if (!fibptr) {
>   		spin_unlock_irqrestore(&dev->fib_lock, flags);
> -		return fibptr;
> +		return NULL;
>   	}
>   	dev->free_fib = fibptr->next;
>   	spin_unlock_irqrestore(&dev->fib_lock, flags);
> +
>   	/*
>   	 *	Set the proper node type code and node byte size
>   	 */
> diff --git a/drivers/scsi/aacraid/linit.c b/drivers/scsi/aacraid/linit.c
> index 4b12e6dd8f07..8c56579a8efc 100644
> --- a/drivers/scsi/aacraid/linit.c
> +++ b/drivers/scsi/aacraid/linit.c
> @@ -506,6 +506,17 @@ static int aac_sdev_configure(struct scsi_device *sdev,
>   	return 0;
>   }
>   
> +#ifdef CONFIG_SCSI_AACRAID_MULTIQ
> +static void aac_map_queues(struct Scsi_Host *shost)
> +{
> +	struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
> +	struct blk_mq_queue_map *qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> +
> +	blk_mq_map_hw_queues(qmap, &aac->pdev->dev, 0);
> +	aac->use_map_queue = true;
> +}
> +#endif
> +
>   /**
>    *	aac_change_queue_depth		-	alter queue depths
>    *	@sdev:	SCSI device we are considering
> @@ -1490,6 +1501,9 @@ static const struct scsi_host_template aac_driver_template = {
>   	.bios_param			= aac_biosparm,
>   	.shost_groups			= aac_host_groups,
>   	.sdev_configure			= aac_sdev_configure,
> +#ifdef CONFIG_SCSI_AACRAID_MULTIQ
> +	.map_queues			= aac_map_queues,
> +#endif
>   	.change_queue_depth		= aac_change_queue_depth,
>   	.sdev_groups			= aac_dev_groups,
>   	.eh_abort_handler		= aac_eh_abort,
> @@ -1777,7 +1791,11 @@ static int aac_probe_one(struct pci_dev *pdev, const struct pci_device_id *id)
>   	shost->max_lun = AAC_MAX_LUN;
>   
>   	pci_set_drvdata(pdev, shost);
> -
> +#ifdef CONFIG_SCSI_AACRAID_MULTIQ
> +	shost->nr_hw_queues = aac->max_msix;
> +	shost->can_queue = min_t(int, aac->vector_cap, shost->can_queue);
> +	shost->host_tagset = 1;
> +#endif
>   	error = scsi_add_host(shost, &pdev->dev);
>   	if (error)
>   		goto out_deinit;
> @@ -1908,6 +1926,9 @@ static void aac_remove_one(struct pci_dev *pdev)
>   	struct aac_dev *aac = (struct aac_dev *)shost->hostdata;
>   
>   	aac_cancel_rescan_worker(aac);
> +#ifdef CONFIG_SCSI_AACRAID_MULTIQ
> +	aac->use_map_queue = false;
> +#endif
>   	scsi_remove_host(shost);
>   
>   	__aac_shutdown(aac);
> diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
> index 28115ed637e8..5881bce3ccfc 100644
> --- a/drivers/scsi/aacraid/src.c
> +++ b/drivers/scsi/aacraid/src.c
> @@ -493,6 +493,12 @@ static int aac_src_deliver_message(struct fib *fib)
>   #endif
>   
>   	u16 vector_no;
> +#ifdef CONFIG_SCSI_AACRAID_MULTIQ
> +	struct scsi_cmnd *scmd;
> +	u32 blk_tag;
> +	struct Scsi_Host *shost = dev->scsi_host_ptr;
> +	struct blk_mq_queue_map *qmap;
> +#endif
>   
>   	atomic_inc(&q->numpending);
>   
> @@ -505,8 +511,28 @@ static int aac_src_deliver_message(struct fib *fib)
>   		if ((dev->comm_interface == AAC_COMM_MESSAGE_TYPE3)
>   			&& dev->sa_firmware)
>   			vector_no = aac_get_vector(dev);
> -		else
> +		else {
> +#ifdef CONFIG_SCSI_AACRAID_MULTIQ
> +			if (!fib->vector_no || !fib->callback_data) {
> +				if (shost && dev->use_map_queue) {
> +					qmap = &shost->tag_set.map[HCTX_TYPE_DEFAULT];
> +					vector_no = qmap->mq_map[raw_smp_processor_id()];
> +				}
> +				/*
> +				 *	We hardcode the vector_no for reserved commands
> +				 *	as a valid shost is absent during the init.
> +				 */
> +				else
> +					vector_no = 0;
> +			} else {
> +				scmd = (struct scsi_cmnd *)fib->callback_data;
> +				blk_tag = blk_mq_unique_tag(scsi_cmd_to_rq(scmd));
> +				vector_no = blk_mq_unique_tag_to_hwq(blk_tag);
> +			}
> +#else
>   			vector_no = fib->vector_no;
> +#endif
> +		}
>   
>   		if (native_hba) {
>   			if (fib->flags & FIB_CONTEXT_FLAG_NATIVE_HBA_TMF) {


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity
  2025-06-18 19:24 [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity John Meneghini
  2025-06-18 19:37 ` John Meneghini
@ 2025-06-19 10:46 ` John Garry
  2025-07-10  6:49   ` Sagar.Biradar
  1 sibling, 1 reply; 5+ messages in thread
From: John Garry @ 2025-06-19 10:46 UTC (permalink / raw)
  To: John Meneghini, James.Bottomley, martin.petersen, linux-scsi,
	aacraid, corbet
  Cc: linux-doc, linux-kernel, thenzl, Scott.Benesh, Don.Brace,
	Tom.White, Abhinav.Kuchibhotla, sagar.biradar, mpatalan

On 18/06/2025 20:24, John Meneghini wrote:
> From: Sagar Biradar <sagar.biradar@microchip.com>
> 
> From: Sagar Biradar <sagar.biradar@microchip.com>
> 
> This patch fixes a bug in the original path that caused I/O hangs. The
> I/O hangs were because of an MSIx vector not having a mapped online CPU
> upon receiving completion.
> 
> This patch enables Multi-Q support in the aacriad driver. Multi-Q support
> in the driver is needed to support CPU offlining.

I assume that you mean "safe" CPU offlining.

It seems to me that in all cases we use queue interrupt affinity 
spreading and managed interrupts for MSIX, right?

See aac_define_int_mode() -> pci_alloc_irq_vectors(..., PCI_IRQ_MSIX | 
PCI_IRQ_AFFINITY);

But then for this non- Multi-Q support, the queue seems to be chosen 
based on a round-robin approach in the driver. That round-robin comes 
from how fib.vector_no is assigned in aac_fib_vector_assign(). If this 
is the case, then why are managed interrupts being used for this non- 
Multi-Q support at all?

I may be wrong about this. That driver is hard to understand with so 
many knobs.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity
  2025-06-19 10:46 ` John Garry
@ 2025-07-10  6:49   ` Sagar.Biradar
  2025-07-10  8:12     ` John Garry
  0 siblings, 1 reply; 5+ messages in thread
From: Sagar.Biradar @ 2025-07-10  6:49 UTC (permalink / raw)
  To: john.g.garry, jmeneghi, James.Bottomley, martin.petersen,
	linux-scsi, aacraid, corbet
  Cc: linux-doc, linux-kernel, thenzl, Scott.Benesh, Don.Brace,
	Tom.White, Abhinav.Kuchibhotla, mpatalan, Raja.VS



> -----Original Message-----
> From: John Garry <john.g.garry@oracle.com>
> Sent: Thursday, June 19, 2025 3:47 AM
> To: John Meneghini <jmeneghi@redhat.com>;
> James.Bottomley@HansenPartnership.com; martin.petersen@oracle.com;
> linux-scsi@vger.kernel.org; aacraid@microsemi.com; corbet@lwn.net
> Cc: linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org;
> thenzl@redhat.com; Scott Benesh - C33703 <Scott.Benesh@microchip.com>;
> Don Brace - C33706 <Don.Brace@microchip.com>; Tom White - C33503
> <Tom.White@microchip.com>; Abhinav Kuchibhotla - C70322
> <Abhinav.Kuchibhotla@microchip.com>; Sagar Biradar - C34249
> <Sagar.Biradar@microchip.com>; mpatalan@redhat.com
> Subject: Re: [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based
> on IRQ affinity
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the
> content is safe
> 
> On 18/06/2025 20:24, John Meneghini wrote:
> > From: Sagar Biradar <sagar.biradar@microchip.com>
> >
> > From: Sagar Biradar <sagar.biradar@microchip.com>
> >
> > This patch fixes a bug in the original path that caused I/O hangs. The
> > I/O hangs were because of an MSIx vector not having a mapped online CPU
> > upon receiving completion.
> >
> > This patch enables Multi-Q support in the aacriad driver. Multi-Q support
> > in the driver is needed to support CPU offlining.
> 
> I assume that you mean "safe" CPU offlining.
> 
> It seems to me that in all cases we use queue interrupt affinity
> spreading and managed interrupts for MSIX, right?
> 
> See aac_define_int_mode() -> pci_alloc_irq_vectors(..., PCI_IRQ_MSIX |
> PCI_IRQ_AFFINITY);
> 
> But then for this non- Multi-Q support, the queue seems to be chosen
> based on a round-robin approach in the driver. That round-robin comes
> from how fib.vector_no is assigned in aac_fib_vector_assign(). If this
> is the case, then why are managed interrupts being used for this non-
> Multi-Q support at all?
> 
> I may be wrong about this. That driver is hard to understand with so
> many knobs.
> 


Thank you very much for raising this — you're right that using PCI_IRQ_AFFINITY in non-MultiQ mode doesn’t offer real value, since the driver doesn’t utilize the affinity mapping.
That said, the current implementation is functionally correct and consistent with the driver’s historical behavior. 
To keep the patch focused and avoid scope creep, I’d prefer to leave the affinity flag logic as is for now.

I’d be happy to follow up with a small cleanup patch, sometime in future, to improve this if you think it would help.

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity
  2025-07-10  6:49   ` Sagar.Biradar
@ 2025-07-10  8:12     ` John Garry
  0 siblings, 0 replies; 5+ messages in thread
From: John Garry @ 2025-07-10  8:12 UTC (permalink / raw)
  To: Sagar.Biradar, jmeneghi, James.Bottomley, martin.petersen,
	linux-scsi, aacraid, corbet, hch, dwagner
  Cc: linux-doc, linux-kernel, thenzl, Scott.Benesh, Don.Brace,
	Tom.White, Abhinav.Kuchibhotla, mpatalan, Raja.VS

On 10/07/2025 07:49, Sagar.Biradar@microchip.com wrote:

+ Daniel, hch

>>> This patch fixes a bug in the original path that caused I/O hangs. The
>>> I/O hangs were because of an MSIx vector not having a mapped online CPU
>>> upon receiving completion.
>>>
>>> This patch enables Multi-Q support in the aacriad driver. Multi-Q support
>>> in the driver is needed to support CPU offlining.
>> I assume that you mean "safe" CPU offlining.
>>
>> It seems to me that in all cases we use queue interrupt affinity
>> spreading and managed interrupts for MSIX, right?
>>
>> See aac_define_int_mode() -> pci_alloc_irq_vectors(..., PCI_IRQ_MSIX |
>> PCI_IRQ_AFFINITY);
>>
>> But then for this non- Multi-Q support, the queue seems to be chosen
>> based on a round-robin approach in the driver. That round-robin comes
>> from how fib.vector_no is assigned in aac_fib_vector_assign(). If this
>> is the case, then why are managed interrupts being used for this non-
>> Multi-Q support at all?
>>
>> I may be wrong about this. That driver is hard to understand with so
>> many knobs.
>>
> 
> Thank you very much for raising this — you're right that using PCI_IRQ_AFFINITY in non-MultiQ mode doesn’t offer real value,
 > since the driver doesn’t utilize the affinity mapping.> That said, 
the current implementation is functionally correct and consistent with 
the driver’s historical behavior.

For some time now this driver has been having issues in this area, so it 
is strange to say that behavior is correct.

Indeed, this patch is trying to fix the broken behaviour Re. CPU 
hotplug, right?

> To keep the patch focused and avoid scope creep, I’d prefer to leave the affinity flag logic as is for now.

To me, first it would be good to stop using PCI_IRQ_AFFINITY - that 
should fix any broken behaviour regarding CPU hotplug.

Then, if you still want to use PCI_IRQ_AFFINITY, then do it like it is 
done in this patch.

> 
> I’d be happy to follow up with a small cleanup patch, sometime in future, to improve this if you think it would help.

Daniel (cc'ed) is working on a method to isolate CPUs so that CPUs using 
for queue mapping can be configured for tuning performance, so that you 
don't need Kconfig options like in this patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-10  8:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-18 19:24 [PATCH v3] scsi: aacraid: Fix reply queue mapping to CPUs based on IRQ affinity John Meneghini
2025-06-18 19:37 ` John Meneghini
2025-06-19 10:46 ` John Garry
2025-07-10  6:49   ` Sagar.Biradar
2025-07-10  8:12     ` John Garry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).