[PATCH 0/2] Allow dyn pci vector allocation of MANA

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Allow dyn pci vector allocation of MANA
@ 2025-04-16 15:35 Shradha Gupta
  2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta
  2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta
  0 siblings, 2 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-16 15:35 UTC (permalink / raw)
  To: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li,
	Thomas Gleixner, Bjorn Helgaas, Rob Herring,
	Manivannan Sadhasivam, Krzysztof Wilczyński,
	Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang,
	K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela,
	Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm
  Cc: Shradha Gupta

In this patchset we want to enable the MANA driver to be able to
allocate MSI vectors in PCI dynamically

The first patch targets the changes required for enabling the support
of dyn vector allocation in pci-hyperv PCI controller. It also consists
of changes in the PCI tree to allow clean support for dynamic allocation
of IRQ vectors for PCI controllers.

The second patch has the changes in MANA driver to be able to allocate
pci vectors dynamically if it is supported by the infra. If the support
does not exist it defaults to older behavior.

For this submission, I wasn't sure if we should specify net-next or pci
tree. Please let me know what the recommendation is.

Shradha Gupta (2):
  PCI: hv: enable pci_hyperv to allow dynamic vector allocation
  net: mana: Allow MANA driver to allocate PCI vector dynamically

 .../net/ethernet/microsoft/mana/gdma_main.c   | 306 ++++++++++++++----
 drivers/pci/controller/pci-hyperv.c           |   7 +-
 drivers/pci/msi/irqdomain.c                   |   5 +-
 include/linux/msi.h                           |   2 +
 include/net/mana/gdma.h                       |   5 +-
 5 files changed, 260 insertions(+), 65 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation
  2025-04-16 15:35 [PATCH 0/2] Allow dyn pci vector allocation of MANA Shradha Gupta
@ 2025-04-16 15:36 ` Shradha Gupta
  2025-04-16 18:30   ` Bjorn Helgaas
  2025-04-17 10:00   ` Thomas Gleixner
  2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta
  1 sibling, 2 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-16 15:36 UTC (permalink / raw)
  To: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li,
	Thomas Gleixner, Bjorn Helgaas, Rob Herring,
	Manivannan Sadhasivam, Krzysztof Wilczyński,
	Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang,
	K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela,
	Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm
  Cc: Shradha Gupta

For supporting dynamic MSI vector allocation by pci controllers, enabling
the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci
controllers like pci-hyperv to support dynamic MSI vector allocation.
Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use
pci_msix_prepare_desc() in pci_hyperv controller

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 7 +++++--
 drivers/pci/msi/irqdomain.c         | 5 +++--
 include/linux/msi.h                 | 2 ++
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index ac27bda5ba26..f2fa6bdb6bb8 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -598,7 +598,8 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data)
 	return cfg->vector;
 }
 
-#define hv_msi_prepare		pci_msi_prepare
+#define hv_msi_prepare			pci_msi_prepare
+#define hv_msix_prepare_desc		pci_msix_prepare_desc
 
 /**
  * hv_arch_irq_unmask() - "Unmask" the IRQ by setting its current
@@ -727,6 +728,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 #define FLOW_HANDLER		NULL
 #define FLOW_NAME		NULL
 #define hv_msi_prepare		NULL
+#define hv_msix_prepare_desc	NULL
 
 struct hv_pci_chip_data {
 	DECLARE_BITMAP(spi_map, HV_PCI_MSI_SPI_NR);
@@ -2063,6 +2065,7 @@ static struct irq_chip hv_msi_irq_chip = {
 static struct msi_domain_ops hv_msi_ops = {
 	.msi_prepare	= hv_msi_prepare,
 	.msi_free	= hv_msi_free,
+	.prepare_desc	= hv_msix_prepare_desc,
 };
 
 /**
@@ -2084,7 +2087,7 @@ static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus)
 	hbus->msi_info.ops = &hv_msi_ops;
 	hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS |
 		MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI |
-		MSI_FLAG_PCI_MSIX);
+		MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN);
 	hbus->msi_info.handler = FLOW_HANDLER;
 	hbus->msi_info.handler_name = FLOW_NAME;
 	hbus->msi_info.data = hbus;
diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index d7ba8795d60f..43129aa6d6c7 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -222,13 +222,14 @@ static void pci_irq_unmask_msix(struct irq_data *data)
 	pci_msix_unmask(irq_data_get_msi_desc(data));
 }
 
-static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
-				  struct msi_desc *desc)
+void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
+			   struct msi_desc *desc)
 {
 	/* Don't fiddle with preallocated MSI descriptors */
 	if (!desc->pci.mask_base)
 		msix_prepare_msi_desc(to_pci_dev(desc->dev), desc);
 }
+EXPORT_SYMBOL_GPL(pci_msix_prepare_desc);
 
 static const struct msi_domain_template pci_msix_template = {
 	.chip = {
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 86e42742fd0f..d5864d5e75c2 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -691,6 +691,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 					     struct irq_domain *parent);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev);
 struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
+void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
+			   struct msi_desc *desc);
 #else /* CONFIG_PCI_MSI */
 static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-16 15:35 [PATCH 0/2] Allow dyn pci vector allocation of MANA Shradha Gupta
  2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta
@ 2025-04-16 15:36 ` Shradha Gupta
  2025-04-16 17:22   ` Yury Norov
                     ` (3 more replies)
  1 sibling, 4 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-16 15:36 UTC (permalink / raw)
  To: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li,
	Thomas Gleixner, Bjorn Helgaas, Rob Herring,
	Manivannan Sadhasivam, Krzysztof Wilczyński,
	Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang,
	K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela,
	Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm
  Cc: Shradha Gupta

Currently, the MANA driver allocates pci vector statically based on
MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
up allocating more vectors than it needs.
This is because, by this time we do not have a HW channel and do not know
how many IRQs should be allocated.
To avoid this, we allocate 1 IRQ vector during the creation of HWC and
after getting the value supported by hardware, dynamically add the
remaining vectors.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 306 ++++++++++++++----
 include/net/mana/gdma.h                       |   5 +-
 2 files changed, 250 insertions(+), 61 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 4ffaf7588885..3e3b5854b736 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -6,6 +6,9 @@
 #include <linux/pci.h>
 #include <linux/utsname.h>
 #include <linux/version.h>
+#include <linux/msi.h>
+#include <linux/irqdomain.h>
+#include <linux/list.h>
 
 #include <net/mana/mana.h>
 
@@ -80,8 +83,15 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
 		return err ? err : -EPROTO;
 	}
 
-	if (gc->num_msix_usable > resp.max_msix)
-		gc->num_msix_usable = resp.max_msix;
+	if (!pci_msix_can_alloc_dyn(pdev)) {
+		if (gc->num_msix_usable > resp.max_msix)
+			gc->num_msix_usable = resp.max_msix;
+	} else {
+		/* If dynamic allocation is enabled we have already allocated
+		 * hwc msi
+		 */
+		gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1);
+	}
 
 	if (gc->num_msix_usable <= 1)
 		return -ENOSPC;
@@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
 	struct gdma_irq_context *gic;
 	struct gdma_context *gc;
 	unsigned int msi_index;
-	unsigned long flags;
+	struct list_head *pos;
+	unsigned long flags, flag_irq;
 	struct device *dev;
-	int err = 0;
+	int err = 0, count;
 
 	gc = gd->gdma_context;
 	dev = gc->dev;
@@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
 	}
 
 	queue->eq.msix_index = msi_index;
-	gic = &gc->irq_contexts[msi_index];
+
+	/* get the msi_index value from the list*/
+	count = 0;
+	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
+	list_for_each(pos, &gc->irq_contexts) {
+		if (count == msi_index) {
+			gic = list_entry(pos, struct gdma_irq_context, gic_list);
+			break;
+		}
+
+		count++;
+	}
+	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
+
+	if (!gic)
+		return -1;
 
 	spin_lock_irqsave(&gic->lock, flags);
 	list_add_rcu(&queue->entry, &gic->eq_list);
@@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
 	struct gdma_irq_context *gic;
 	struct gdma_context *gc;
 	unsigned int msix_index;
-	unsigned long flags;
+	struct list_head *pos;
+	unsigned long flags, flag_irq;
 	struct gdma_queue *eq;
+	int count;
 
 	gc = gd->gdma_context;
 
@@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
 	if (WARN_ON(msix_index >= gc->num_msix_usable))
 		return;
 
-	gic = &gc->irq_contexts[msix_index];
+	/* get the msi_index value from the list*/
+	count = 0;
+	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
+	list_for_each(pos, &gc->irq_contexts) {
+		if (count == msix_index) {
+			gic = list_entry(pos, struct gdma_irq_context, gic_list);
+			break;
+		}
+
+		count++;
+	}
+	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
+
+	if (!gic)
+		return;
+
 	spin_lock_irqsave(&gic->lock, flags);
 	list_for_each_entry_rcu(eq, &gic->eq_list, entry) {
 		if (queue == eq) {
@@ -1288,11 +1331,11 @@ void mana_gd_free_res_map(struct gdma_resource *r)
 	r->size = 0;
 }
 
-static int irq_setup(unsigned int *irqs, unsigned int len, int node)
+static int irq_setup(unsigned int *irqs, unsigned int len, int node, int skip_cpu)
 {
 	const struct cpumask *next, *prev = cpu_none_mask;
 	cpumask_var_t cpus __free(free_cpumask_var);
-	int cpu, weight;
+	int cpu, weight, i = 0;
 
 	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
 		return -ENOMEM;
@@ -1303,9 +1346,21 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
 		while (weight > 0) {
 			cpumask_andnot(cpus, next, prev);
 			for_each_cpu(cpu, cpus) {
+				/* If the call is made for irqs which are dynamically
+				 * added and the num of vcpus is more or equal to
+				 * allocated msix, we need to skip the first
+				 * set of cpus, since they are already affinitized
+				 * to HWC IRQ
+				 */
+				if (skip_cpu && !i) {
+					i = 1;
+					goto next_cpumask;
+				}
 				if (len-- == 0)
 					goto done;
+
 				irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
+next_cpumask:
 				cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
 				--weight;
 			}
@@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
 	return 0;
 }
 
-static int mana_gd_setup_irqs(struct pci_dev *pdev)
+static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 {
 	struct gdma_context *gc = pci_get_drvdata(pdev);
-	unsigned int max_queues_per_port;
 	struct gdma_irq_context *gic;
-	unsigned int max_irqs, cpu;
-	int start_irq_index = 1;
-	int nvec, *irqs, irq;
+	int *irqs, irq, skip_first_cpu = 0;
+	unsigned long flags;
 	int err, i = 0, j;
 
 	cpus_read_lock();
-	max_queues_per_port = num_online_cpus();
-	if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
-		max_queues_per_port = MANA_MAX_NUM_QUEUES;
+	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
+	irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL);
+	if (!irqs) {
+		err = -ENOMEM;
+		goto free_irq_vector;
+	}
 
-	/* Need 1 interrupt for the Hardware communication Channel (HWC) */
-	max_irqs = max_queues_per_port + 1;
+	for (i = 0; i < nvec; i++) {
+		gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL);
+		if (!gic) {
+			err = -ENOMEM;
+			goto free_irq;
+		}
+		gic->handler = mana_gd_process_eq_events;
+		INIT_LIST_HEAD(&gic->eq_list);
+		spin_lock_init(&gic->lock);
 
-	nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX);
-	if (nvec < 0) {
-		cpus_read_unlock();
-		return nvec;
+		snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
+			 i, pci_name(pdev));
+
+		/* one pci vector is already allocated for HWC */
+		irqs[i] = pci_irq_vector(pdev, i + 1);
+		if (irqs[i] < 0) {
+			err = irqs[i];
+			goto free_current_gic;
+		}
+
+		err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic);
+		if (err)
+			goto free_current_gic;
+
+		list_add_tail(&gic->gic_list, &gc->irq_contexts);
+	}
+
+	if (gc->num_msix_usable <= num_online_cpus())
+		skip_first_cpu = 1;
+
+	err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu);
+	if (err)
+		goto free_irq;
+
+	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
+	cpus_read_unlock();
+	kfree(irqs);
+	return 0;
+
+free_current_gic:
+	kfree(gic);
+free_irq:
+	for (j = i - 1; j >= 0; j--) {
+		irq = pci_irq_vector(pdev, j + 1);
+		gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list);
+		irq_update_affinity_hint(irq, NULL);
+		free_irq(irq, gic);
+		list_del(&gic->gic_list);
+		kfree(gic);
 	}
+	kfree(irqs);
+free_irq_vector:
+	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
+	cpus_read_unlock();
+	return err;
+}
+
+static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
+{
+	struct gdma_context *gc = pci_get_drvdata(pdev);
+	struct gdma_irq_context *gic;
+	int start_irq_index = 1;
+	unsigned long flags;
+	unsigned int cpu;
+	int *irqs, irq;
+	int err, i = 0, j;
+
+	cpus_read_lock();
+	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
+
 	if (nvec <= num_online_cpus())
 		start_irq_index = 0;
 
@@ -1349,15 +1467,12 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
 		goto free_irq_vector;
 	}
 
-	gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context),
-				   GFP_KERNEL);
-	if (!gc->irq_contexts) {
-		err = -ENOMEM;
-		goto free_irq_array;
-	}
-
 	for (i = 0; i < nvec; i++) {
-		gic = &gc->irq_contexts[i];
+		gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL);
+		if (!gic) {
+			err = -ENOMEM;
+			goto free_irq;
+		}
 		gic->handler = mana_gd_process_eq_events;
 		INIT_LIST_HEAD(&gic->eq_list);
 		spin_lock_init(&gic->lock);
@@ -1372,22 +1487,14 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
 		irq = pci_irq_vector(pdev, i);
 		if (irq < 0) {
 			err = irq;
-			goto free_irq;
+			goto free_current_gic;
 		}
 
 		if (!i) {
 			err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
 			if (err)
-				goto free_irq;
-
-			/* If number of IRQ is one extra than number of online CPUs,
-			 * then we need to assign IRQ0 (hwc irq) and IRQ1 to
-			 * same CPU.
-			 * Else we will use different CPUs for IRQ0 and IRQ1.
-			 * Also we are using cpumask_local_spread instead of
-			 * cpumask_first for the node, because the node can be
-			 * mem only.
-			 */
+				goto free_current_gic;
+
 			if (start_irq_index) {
 				cpu = cpumask_local_spread(i, gc->numa_node);
 				irq_set_affinity_and_hint(irq, cpumask_of(cpu));
@@ -1399,36 +1506,104 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
 			err = request_irq(irqs[i - start_irq_index], mana_gd_intr, 0,
 					  gic->name, gic);
 			if (err)
-				goto free_irq;
+				goto free_current_gic;
 		}
+
+		list_add_tail(&gic->gic_list, &gc->irq_contexts);
 	}
 
-	err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node);
+	err = irq_setup(irqs, nvec - start_irq_index, gc->numa_node, 0);
 	if (err)
 		goto free_irq;
 
-	gc->max_num_msix = nvec;
-	gc->num_msix_usable = nvec;
+	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
 	cpus_read_unlock();
 	kfree(irqs);
 	return 0;
 
+free_current_gic:
+	kfree(gic);
 free_irq:
 	for (j = i - 1; j >= 0; j--) {
 		irq = pci_irq_vector(pdev, j);
-		gic = &gc->irq_contexts[j];
-
+		gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list);
 		irq_update_affinity_hint(irq, NULL);
 		free_irq(irq, gic);
+		list_del(&gic->gic_list);
+		kfree(gic);
 	}
-
-	kfree(gc->irq_contexts);
-	gc->irq_contexts = NULL;
-free_irq_array:
 	kfree(irqs);
 free_irq_vector:
+	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
 	cpus_read_unlock();
-	pci_free_irq_vectors(pdev);
+	return err;
+}
+
+static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
+{
+	struct gdma_context *gc = pci_get_drvdata(pdev);
+	unsigned int max_irqs, min_irqs;
+	int max_queues_per_port;
+	int nvec, err;
+
+	if (pci_msix_can_alloc_dyn(pdev)) {
+		max_irqs = 1;
+		min_irqs = 1;
+	} else {
+		max_queues_per_port = num_online_cpus();
+		if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
+			max_queues_per_port = MANA_MAX_NUM_QUEUES;
+		/* Need 1 interrupt for the Hardware communication Channel (HWC) */
+		max_irqs = max_queues_per_port + 1;
+		min_irqs = 2;
+	}
+
+	nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX);
+	if (nvec < 0)
+		return nvec;
+
+	err = mana_gd_setup_irqs(pdev, nvec);
+	if (err) {
+		pci_free_irq_vectors(pdev);
+		return err;
+	}
+
+	gc->num_msix_usable = nvec;
+	gc->max_num_msix = nvec;
+
+	return err;
+}
+
+static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev)
+{
+	struct gdma_context *gc = pci_get_drvdata(pdev);
+	int max_irqs, i, err = 0;
+	struct msi_map irq_map;
+
+	if (!pci_msix_can_alloc_dyn(pdev))
+		/* remain irqs are already allocated with HWC IRQ */
+		return 0;
+
+	/* allocate only remaining IRQs*/
+	max_irqs = gc->num_msix_usable - 1;
+
+	for (i = 1; i <= max_irqs; i++) {
+		irq_map = pci_msix_alloc_irq_at(pdev, i, NULL);
+		if (!irq_map.virq) {
+			err = irq_map.index;
+			/* caller will handle cleaning up all allocated
+			 * irqs, after HWC is destroyed
+			 */
+			return err;
+		}
+	}
+
+	err = mana_gd_setup_dyn_irqs(pdev, max_irqs);
+	if (err)
+		return err;
+
+	gc->max_num_msix = gc->max_num_msix + max_irqs;
+
 	return err;
 }
 
@@ -1436,29 +1611,34 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev)
 {
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 	struct gdma_irq_context *gic;
-	int irq, i;
+	struct list_head *pos, *n;
+	unsigned long flags;
+	int irq, i = 0;
 
 	if (gc->max_num_msix < 1)
 		return;
 
-	for (i = 0; i < gc->max_num_msix; i++) {
+	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
+	list_for_each_safe(pos, n, &gc->irq_contexts) {
 		irq = pci_irq_vector(pdev, i);
 		if (irq < 0)
 			continue;
 
-		gic = &gc->irq_contexts[i];
+		gic = list_entry(pos, struct gdma_irq_context, gic_list);
 
 		/* Need to clear the hint before free_irq */
 		irq_update_affinity_hint(irq, NULL);
 		free_irq(irq, gic);
+		list_del(pos);
+		kfree(gic);
+		i++;
 	}
+	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
 
 	pci_free_irq_vectors(pdev);
 
 	gc->max_num_msix = 0;
 	gc->num_msix_usable = 0;
-	kfree(gc->irq_contexts);
-	gc->irq_contexts = NULL;
 }
 
 static int mana_gd_setup(struct pci_dev *pdev)
@@ -1469,9 +1649,9 @@ static int mana_gd_setup(struct pci_dev *pdev)
 	mana_gd_init_registers(pdev);
 	mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base);
 
-	err = mana_gd_setup_irqs(pdev);
+	err = mana_gd_setup_hwc_irqs(pdev);
 	if (err) {
-		dev_err(gc->dev, "Failed to setup IRQs: %d\n", err);
+		dev_err(gc->dev, "Failed to setup IRQs for HWC creation: %d\n", err);
 		return err;
 	}
 
@@ -1487,6 +1667,10 @@ static int mana_gd_setup(struct pci_dev *pdev)
 	if (err)
 		goto destroy_hwc;
 
+	err = mana_gd_setup_remaining_irqs(pdev);
+	if (err)
+		goto destroy_hwc;
+
 	err = mana_gd_detect_devices(pdev);
 	if (err)
 		goto destroy_hwc;
@@ -1563,6 +1747,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	gc->is_pf = mana_is_pf(pdev->device);
 	gc->bar0_va = bar0_va;
 	gc->dev = &pdev->dev;
+	INIT_LIST_HEAD(&gc->irq_contexts);
+	spin_lock_init(&gc->irq_ctxs_lock);
 
 	if (gc->is_pf)
 		gc->mana_pci_debugfs = debugfs_create_dir("0", mana_debugfs_root);
diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
index 228603bf03f2..eae38d7302fe 100644
--- a/include/net/mana/gdma.h
+++ b/include/net/mana/gdma.h
@@ -363,6 +363,7 @@ struct gdma_irq_context {
 	spinlock_t lock;
 	struct list_head eq_list;
 	char name[MANA_IRQ_NAME_SZ];
+	struct list_head gic_list;
 };
 
 struct gdma_context {
@@ -373,7 +374,9 @@ struct gdma_context {
 	unsigned int		max_num_queues;
 	unsigned int		max_num_msix;
 	unsigned int		num_msix_usable;
-	struct gdma_irq_context	*irq_contexts;
+	struct list_head	irq_contexts;
+	/* Protect the irq_contexts list */
+	spinlock_t		irq_ctxs_lock;
 
 	/* L2 MTU */
 	u16 adapter_mtu;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta
@ 2025-04-16 17:22   ` Yury Norov
  2025-04-17  7:32     ` Shradha Gupta
  2025-04-22 12:09     ` Shradha Gupta
  2025-04-16 18:32   ` Bjorn Helgaas
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Yury Norov @ 2025-04-16 17:22 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Leon Romanovsky, Maxim Levitsky,
	Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma,
	Paul Rosswurm, Shradha Gupta

On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote:
> Currently, the MANA driver allocates pci vector statically based on
> MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
> up allocating more vectors than it needs.
> This is because, by this time we do not have a HW channel and do not know
> how many IRQs should be allocated.
> To avoid this, we allocate 1 IRQ vector during the creation of HWC and
> after getting the value supported by hardware, dynamically add the
> remaining vectors.
> 
> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
>  .../net/ethernet/microsoft/mana/gdma_main.c   | 306 ++++++++++++++----
>  include/net/mana/gdma.h                       |   5 +-
>  2 files changed, 250 insertions(+), 61 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 4ffaf7588885..3e3b5854b736 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -6,6 +6,9 @@
>  #include <linux/pci.h>
>  #include <linux/utsname.h>
>  #include <linux/version.h>
> +#include <linux/msi.h>
> +#include <linux/irqdomain.h>
> +#include <linux/list.h>
>  
>  #include <net/mana/mana.h>
>  
> @@ -80,8 +83,15 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
>  		return err ? err : -EPROTO;
>  	}
>  
> -	if (gc->num_msix_usable > resp.max_msix)
> -		gc->num_msix_usable = resp.max_msix;
> +	if (!pci_msix_can_alloc_dyn(pdev)) {
> +		if (gc->num_msix_usable > resp.max_msix)
> +			gc->num_msix_usable = resp.max_msix;
> +	} else {
> +		/* If dynamic allocation is enabled we have already allocated
> +		 * hwc msi
> +		 */
> +		gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1);
> +	}
>  
>  	if (gc->num_msix_usable <= 1)
>  		return -ENOSPC;
> @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
>  	struct gdma_irq_context *gic;
>  	struct gdma_context *gc;
>  	unsigned int msi_index;
> -	unsigned long flags;
> +	struct list_head *pos;
> +	unsigned long flags, flag_irq;
>  	struct device *dev;
> -	int err = 0;
> +	int err = 0, count;
>  
>  	gc = gd->gdma_context;
>  	dev = gc->dev;
> @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
>  	}
>  
>  	queue->eq.msix_index = msi_index;
> -	gic = &gc->irq_contexts[msi_index];
> +
> +	/* get the msi_index value from the list*/
> +	count = 0;
> +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> +	list_for_each(pos, &gc->irq_contexts) {
> +		if (count == msi_index) {
> +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> +			break;
> +		}
> +
> +		count++;
> +	}
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> +
> +	if (!gic)
> +		return -1;
>  
>  	spin_lock_irqsave(&gic->lock, flags);
>  	list_add_rcu(&queue->entry, &gic->eq_list);
> @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
>  	struct gdma_irq_context *gic;
>  	struct gdma_context *gc;
>  	unsigned int msix_index;
> -	unsigned long flags;
> +	struct list_head *pos;
> +	unsigned long flags, flag_irq;
>  	struct gdma_queue *eq;
> +	int count;
>  
>  	gc = gd->gdma_context;
>  
> @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
>  	if (WARN_ON(msix_index >= gc->num_msix_usable))
>  		return;
>  
> -	gic = &gc->irq_contexts[msix_index];
> +	/* get the msi_index value from the list*/
> +	count = 0;
> +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> +	list_for_each(pos, &gc->irq_contexts) {
> +		if (count == msix_index) {
> +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> +			break;
> +		}
> +
> +		count++;
> +	}
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> +
> +	if (!gic)
> +		return;
> +
>  	spin_lock_irqsave(&gic->lock, flags);
>  	list_for_each_entry_rcu(eq, &gic->eq_list, entry) {
>  		if (queue == eq) {
> @@ -1288,11 +1331,11 @@ void mana_gd_free_res_map(struct gdma_resource *r)
>  	r->size = 0;
>  }
>  
> -static int irq_setup(unsigned int *irqs, unsigned int len, int node)
> +static int irq_setup(unsigned int *irqs, unsigned int len, int node, int skip_cpu)
>  {
>  	const struct cpumask *next, *prev = cpu_none_mask;
>  	cpumask_var_t cpus __free(free_cpumask_var);
> -	int cpu, weight;
> +	int cpu, weight, i = 0;
>  
>  	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
>  		return -ENOMEM;
> @@ -1303,9 +1346,21 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
>  		while (weight > 0) {
>  			cpumask_andnot(cpus, next, prev);
>  			for_each_cpu(cpu, cpus) {
> +				/* If the call is made for irqs which are dynamically
> +				 * added and the num of vcpus is more or equal to
> +				 * allocated msix, we need to skip the first
> +				 * set of cpus, since they are already affinitized

Can you replace the 'set of cpus' with a 'sibling group'?

> +				 * to HWC IRQ
> +				 */

This comment should not be here. This is a helper function. User may
want to skip 1st CPU for whatever reason. Please put the comment in
mana_gd_setup_dyn_irqs().

> +				if (skip_cpu && !i) {
> +					i = 1;
> +					goto next_cpumask;
> +				}

The 'skip_cpu' variable should be a boolean, and has more a specific
name. And you don't need the local 'i' to implement your logic:

        			if (unlikely(skip_first_cpu)) {
                                        skip_first_cpu = false;
        				goto next_sibling;
        			}

>  				if (len-- == 0)
>  					goto done;

This check should go before the one you're adding here.

> +
>  				irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
> +next_cpumask:
>  				cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
>  				--weight;
>  			}
> @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
>  	return 0;
>  }
>  
> -static int mana_gd_setup_irqs(struct pci_dev *pdev)
> +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
>  {
>  	struct gdma_context *gc = pci_get_drvdata(pdev);
> -	unsigned int max_queues_per_port;
>  	struct gdma_irq_context *gic;
> -	unsigned int max_irqs, cpu;
> -	int start_irq_index = 1;
> -	int nvec, *irqs, irq;
> +	int *irqs, irq, skip_first_cpu = 0;
> +	unsigned long flags;
>  	int err, i = 0, j;
>  
>  	cpus_read_lock();
> -	max_queues_per_port = num_online_cpus();
> -	if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> -		max_queues_per_port = MANA_MAX_NUM_QUEUES;
> +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> +	irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL);
> +	if (!irqs) {
> +		err = -ENOMEM;
> +		goto free_irq_vector;
> +	}
>  
> -	/* Need 1 interrupt for the Hardware communication Channel (HWC) */
> -	max_irqs = max_queues_per_port + 1;
> +	for (i = 0; i < nvec; i++) {
> +		gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL);
> +		if (!gic) {
> +			err = -ENOMEM;
> +			goto free_irq;
> +		}
> +		gic->handler = mana_gd_process_eq_events;
> +		INIT_LIST_HEAD(&gic->eq_list);
> +		spin_lock_init(&gic->lock);
>  
> -	nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX);
> -	if (nvec < 0) {
> -		cpus_read_unlock();
> -		return nvec;
> +		snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
> +			 i, pci_name(pdev));
> +
> +		/* one pci vector is already allocated for HWC */
> +		irqs[i] = pci_irq_vector(pdev, i + 1);
> +		if (irqs[i] < 0) {
> +			err = irqs[i];
> +			goto free_current_gic;
> +		}
> +
> +		err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic);
> +		if (err)
> +			goto free_current_gic;
> +
> +		list_add_tail(&gic->gic_list, &gc->irq_contexts);
> +	}
> +
> +	if (gc->num_msix_usable <= num_online_cpus())
> +		skip_first_cpu = 1;
> +
> +	err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu);
> +	if (err)
> +		goto free_irq;
> +
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> +	cpus_read_unlock();
> +	kfree(irqs);
> +	return 0;
> +
> +free_current_gic:
> +	kfree(gic);
> +free_irq:
> +	for (j = i - 1; j >= 0; j--) {
> +		irq = pci_irq_vector(pdev, j + 1);
> +		gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list);
> +		irq_update_affinity_hint(irq, NULL);
> +		free_irq(irq, gic);
> +		list_del(&gic->gic_list);
> +		kfree(gic);
>  	}
> +	kfree(irqs);
> +free_irq_vector:
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> +	cpus_read_unlock();
> +	return err;
> +}
> +
> +static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
> +{
> +	struct gdma_context *gc = pci_get_drvdata(pdev);
> +	struct gdma_irq_context *gic;
> +	int start_irq_index = 1;
> +	unsigned long flags;
> +	unsigned int cpu;
> +	int *irqs, irq;
> +	int err, i = 0, j;
> +
> +	cpus_read_lock();
> +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> +
>  	if (nvec <= num_online_cpus())
>  		start_irq_index = 0;
>  
> @@ -1349,15 +1467,12 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
>  		goto free_irq_vector;
>  	}
>  
> -	gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context),
> -				   GFP_KERNEL);
> -	if (!gc->irq_contexts) {
> -		err = -ENOMEM;
> -		goto free_irq_array;
> -	}
> -
>  	for (i = 0; i < nvec; i++) {
> -		gic = &gc->irq_contexts[i];
> +		gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL);
> +		if (!gic) {
> +			err = -ENOMEM;
> +			goto free_irq;
> +		}
>  		gic->handler = mana_gd_process_eq_events;
>  		INIT_LIST_HEAD(&gic->eq_list);
>  		spin_lock_init(&gic->lock);
> @@ -1372,22 +1487,14 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
>  		irq = pci_irq_vector(pdev, i);
>  		if (irq < 0) {
>  			err = irq;
> -			goto free_irq;
> +			goto free_current_gic;
>  		}
>  
>  		if (!i) {
>  			err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
>  			if (err)
> -				goto free_irq;
> -
> -			/* If number of IRQ is one extra than number of online CPUs,
> -			 * then we need to assign IRQ0 (hwc irq) and IRQ1 to
> -			 * same CPU.
> -			 * Else we will use different CPUs for IRQ0 and IRQ1.
> -			 * Also we are using cpumask_local_spread instead of
> -			 * cpumask_first for the node, because the node can be
> -			 * mem only.
> -			 */
> +				goto free_current_gic;
> +
>  			if (start_irq_index) {
>  				cpu = cpumask_local_spread(i, gc->numa_node);
>  				irq_set_affinity_and_hint(irq, cpumask_of(cpu));
> @@ -1399,36 +1506,104 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
>  			err = request_irq(irqs[i - start_irq_index], mana_gd_intr, 0,
>  					  gic->name, gic);
>  			if (err)
> -				goto free_irq;
> +				goto free_current_gic;
>  		}
> +
> +		list_add_tail(&gic->gic_list, &gc->irq_contexts);
>  	}
>  
> -	err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node);
> +	err = irq_setup(irqs, nvec - start_irq_index, gc->numa_node, 0);
>  	if (err)
>  		goto free_irq;
>  
> -	gc->max_num_msix = nvec;
> -	gc->num_msix_usable = nvec;
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
>  	cpus_read_unlock();
>  	kfree(irqs);
>  	return 0;
>  
> +free_current_gic:
> +	kfree(gic);
>  free_irq:
>  	for (j = i - 1; j >= 0; j--) {
>  		irq = pci_irq_vector(pdev, j);
> -		gic = &gc->irq_contexts[j];
> -
> +		gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list);
>  		irq_update_affinity_hint(irq, NULL);
>  		free_irq(irq, gic);
> +		list_del(&gic->gic_list);
> +		kfree(gic);
>  	}
> -
> -	kfree(gc->irq_contexts);
> -	gc->irq_contexts = NULL;
> -free_irq_array:
>  	kfree(irqs);
>  free_irq_vector:
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
>  	cpus_read_unlock();
> -	pci_free_irq_vectors(pdev);
> +	return err;
> +}
> +
> +static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
> +{
> +	struct gdma_context *gc = pci_get_drvdata(pdev);
> +	unsigned int max_irqs, min_irqs;
> +	int max_queues_per_port;
> +	int nvec, err;
> +
> +	if (pci_msix_can_alloc_dyn(pdev)) {
> +		max_irqs = 1;
> +		min_irqs = 1;
> +	} else {
> +		max_queues_per_port = num_online_cpus();
> +		if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> +			max_queues_per_port = MANA_MAX_NUM_QUEUES;
> +		/* Need 1 interrupt for the Hardware communication Channel (HWC) */
> +		max_irqs = max_queues_per_port + 1;
> +		min_irqs = 2;
> +	}
> +
> +	nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX);
> +	if (nvec < 0)
> +		return nvec;
> +
> +	err = mana_gd_setup_irqs(pdev, nvec);
> +	if (err) {
> +		pci_free_irq_vectors(pdev);
> +		return err;
> +	}
> +
> +	gc->num_msix_usable = nvec;
> +	gc->max_num_msix = nvec;
> +
> +	return err;
> +}
> +
> +static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev)
> +{
> +	struct gdma_context *gc = pci_get_drvdata(pdev);
> +	int max_irqs, i, err = 0;
> +	struct msi_map irq_map;
> +
> +	if (!pci_msix_can_alloc_dyn(pdev))
> +		/* remain irqs are already allocated with HWC IRQ */
> +		return 0;
> +
> +	/* allocate only remaining IRQs*/
> +	max_irqs = gc->num_msix_usable - 1;
> +
> +	for (i = 1; i <= max_irqs; i++) {
> +		irq_map = pci_msix_alloc_irq_at(pdev, i, NULL);
> +		if (!irq_map.virq) {
> +			err = irq_map.index;
> +			/* caller will handle cleaning up all allocated
> +			 * irqs, after HWC is destroyed
> +			 */
> +			return err;
> +		}
> +	}
> +
> +	err = mana_gd_setup_dyn_irqs(pdev, max_irqs);
> +	if (err)
> +		return err;
> +
> +	gc->max_num_msix = gc->max_num_msix + max_irqs;
> +
>  	return err;
>  }
>  
> @@ -1436,29 +1611,34 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev)
>  {
>  	struct gdma_context *gc = pci_get_drvdata(pdev);
>  	struct gdma_irq_context *gic;
> -	int irq, i;
> +	struct list_head *pos, *n;
> +	unsigned long flags;
> +	int irq, i = 0;
>  
>  	if (gc->max_num_msix < 1)
>  		return;
>  
> -	for (i = 0; i < gc->max_num_msix; i++) {
> +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> +	list_for_each_safe(pos, n, &gc->irq_contexts) {
>  		irq = pci_irq_vector(pdev, i);
>  		if (irq < 0)
>  			continue;
>  
> -		gic = &gc->irq_contexts[i];
> +		gic = list_entry(pos, struct gdma_irq_context, gic_list);
>  
>  		/* Need to clear the hint before free_irq */
>  		irq_update_affinity_hint(irq, NULL);
>  		free_irq(irq, gic);
> +		list_del(pos);
> +		kfree(gic);
> +		i++;
>  	}
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
>  
>  	pci_free_irq_vectors(pdev);
>  
>  	gc->max_num_msix = 0;
>  	gc->num_msix_usable = 0;
> -	kfree(gc->irq_contexts);
> -	gc->irq_contexts = NULL;
>  }
>  
>  static int mana_gd_setup(struct pci_dev *pdev)
> @@ -1469,9 +1649,9 @@ static int mana_gd_setup(struct pci_dev *pdev)
>  	mana_gd_init_registers(pdev);
>  	mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base);
>  
> -	err = mana_gd_setup_irqs(pdev);
> +	err = mana_gd_setup_hwc_irqs(pdev);
>  	if (err) {
> -		dev_err(gc->dev, "Failed to setup IRQs: %d\n", err);
> +		dev_err(gc->dev, "Failed to setup IRQs for HWC creation: %d\n", err);
>  		return err;
>  	}
>  
> @@ -1487,6 +1667,10 @@ static int mana_gd_setup(struct pci_dev *pdev)
>  	if (err)
>  		goto destroy_hwc;
>  
> +	err = mana_gd_setup_remaining_irqs(pdev);
> +	if (err)
> +		goto destroy_hwc;
> +
>  	err = mana_gd_detect_devices(pdev);
>  	if (err)
>  		goto destroy_hwc;
> @@ -1563,6 +1747,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	gc->is_pf = mana_is_pf(pdev->device);
>  	gc->bar0_va = bar0_va;
>  	gc->dev = &pdev->dev;
> +	INIT_LIST_HEAD(&gc->irq_contexts);
> +	spin_lock_init(&gc->irq_ctxs_lock);
>  
>  	if (gc->is_pf)
>  		gc->mana_pci_debugfs = debugfs_create_dir("0", mana_debugfs_root);
> diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> index 228603bf03f2..eae38d7302fe 100644
> --- a/include/net/mana/gdma.h
> +++ b/include/net/mana/gdma.h
> @@ -363,6 +363,7 @@ struct gdma_irq_context {
>  	spinlock_t lock;
>  	struct list_head eq_list;
>  	char name[MANA_IRQ_NAME_SZ];
> +	struct list_head gic_list;
>  };
>  
>  struct gdma_context {
> @@ -373,7 +374,9 @@ struct gdma_context {
>  	unsigned int		max_num_queues;
>  	unsigned int		max_num_msix;
>  	unsigned int		num_msix_usable;
> -	struct gdma_irq_context	*irq_contexts;
> +	struct list_head	irq_contexts;
> +	/* Protect the irq_contexts list */
> +	spinlock_t		irq_ctxs_lock;
>  
>  	/* L2 MTU */
>  	u16 adapter_mtu;
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation
  2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta
@ 2025-04-16 18:30   ` Bjorn Helgaas
  2025-04-17  7:29     ` Shradha Gupta
  2025-04-17 10:00   ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2025-04-16 18:30 UTC (permalink / raw)
  To: Shradha Gupta, Thomas Gleixner
  Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shivamurthy Shastri, Kevin Tian, Long Li, Bjorn Helgaas,
	Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński,
	Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang,
	K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela,
	Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta

[+to Thomas]

On Wed, Apr 16, 2025 at 08:36:06AM -0700, Shradha Gupta wrote:
> For supporting dynamic MSI vector allocation by pci controllers, enabling
> the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
> to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci
> controllers like pci-hyperv to support dynamic MSI vector allocation.
> Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use
> pci_msix_prepare_desc() in pci_hyperv controller

Follow capitalization convention for subject line.  Probably remove
"pci_hyperv" since it already contains "PCI: hv" and add something
about MSI-X.

s/pci/PCI/

s/MSI vector/MSI-X vector/ since the context says you're concerned
with MSI-X.

This requires an ack from Thomas; moved him to "To:" line.

> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: Long Li <longli@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 7 +++++--
>  drivers/pci/msi/irqdomain.c         | 5 +++--
>  include/linux/msi.h                 | 2 ++
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index ac27bda5ba26..f2fa6bdb6bb8 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -598,7 +598,8 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data)
>  	return cfg->vector;
>  }
>  
> -#define hv_msi_prepare		pci_msi_prepare
> +#define hv_msi_prepare			pci_msi_prepare
> +#define hv_msix_prepare_desc		pci_msix_prepare_desc
>  
>  /**
>   * hv_arch_irq_unmask() - "Unmask" the IRQ by setting its current
> @@ -727,6 +728,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
>  #define FLOW_HANDLER		NULL
>  #define FLOW_NAME		NULL
>  #define hv_msi_prepare		NULL
> +#define hv_msix_prepare_desc	NULL
>  
>  struct hv_pci_chip_data {
>  	DECLARE_BITMAP(spi_map, HV_PCI_MSI_SPI_NR);
> @@ -2063,6 +2065,7 @@ static struct irq_chip hv_msi_irq_chip = {
>  static struct msi_domain_ops hv_msi_ops = {
>  	.msi_prepare	= hv_msi_prepare,
>  	.msi_free	= hv_msi_free,
> +	.prepare_desc	= hv_msix_prepare_desc,
>  };
>  
>  /**
> @@ -2084,7 +2087,7 @@ static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus)
>  	hbus->msi_info.ops = &hv_msi_ops;
>  	hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS |
>  		MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI |
> -		MSI_FLAG_PCI_MSIX);
> +		MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN);
>  	hbus->msi_info.handler = FLOW_HANDLER;
>  	hbus->msi_info.handler_name = FLOW_NAME;
>  	hbus->msi_info.data = hbus;
> diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> index d7ba8795d60f..43129aa6d6c7 100644
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -222,13 +222,14 @@ static void pci_irq_unmask_msix(struct irq_data *data)
>  	pci_msix_unmask(irq_data_get_msi_desc(data));
>  }
>  
> -static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
> -				  struct msi_desc *desc)
> +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
> +			   struct msi_desc *desc)
>  {
>  	/* Don't fiddle with preallocated MSI descriptors */
>  	if (!desc->pci.mask_base)
>  		msix_prepare_msi_desc(to_pci_dev(desc->dev), desc);
>  }
> +EXPORT_SYMBOL_GPL(pci_msix_prepare_desc);
>  
>  static const struct msi_domain_template pci_msix_template = {
>  	.chip = {
> diff --git a/include/linux/msi.h b/include/linux/msi.h
> index 86e42742fd0f..d5864d5e75c2 100644
> --- a/include/linux/msi.h
> +++ b/include/linux/msi.h
> @@ -691,6 +691,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
>  					     struct irq_domain *parent);
>  u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev);
>  struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
> +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
> +			   struct msi_desc *desc);
>  #else /* CONFIG_PCI_MSI */
>  static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
>  {
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta
  2025-04-16 17:22   ` Yury Norov
@ 2025-04-16 18:32   ` Bjorn Helgaas
  2025-04-17  7:33     ` Shradha Gupta
  2025-04-24 16:57   ` Simon Horman
  2025-05-07 16:28   ` kernel test robot
  3 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2025-04-16 18:32 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Leon Romanovsky, Maxim Levitsky,
	Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma,
	Paul Rosswurm, Shradha Gupta

On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote:
> Currently, the MANA driver allocates pci vector statically based on
> MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
> up allocating more vectors than it needs.
> This is because, by this time we do not have a HW channel and do not know
> how many IRQs should be allocated.
> To avoid this, we allocate 1 IRQ vector during the creation of HWC and
> after getting the value supported by hardware, dynamically add the
> remaining vectors.

Add blank lines between paragraphs.

s/pci vector/MSI-X vectors/

Maybe also update subject to mention "MSI-X vectors" instead of "PCI
vector", since "PCI vector" is not really a common term.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation
  2025-04-16 18:30   ` Bjorn Helgaas
@ 2025-04-17  7:29     ` Shradha Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-17  7:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Thomas Gleixner, linux-hyperv, linux-pci, linux-kernel,
	Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron,
	Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczy??ski, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Leon Romanovsky, Maxim Levitsky,
	Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma,
	Paul Rosswurm, Shradha Gupta

On Wed, Apr 16, 2025 at 01:30:42PM -0500, Bjorn Helgaas wrote:
> [+to Thomas]
> 
> On Wed, Apr 16, 2025 at 08:36:06AM -0700, Shradha Gupta wrote:
> > For supporting dynamic MSI vector allocation by pci controllers, enabling
> > the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
> > to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci
> > controllers like pci-hyperv to support dynamic MSI vector allocation.
> > Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use
> > pci_msix_prepare_desc() in pci_hyperv controller
> 
> Follow capitalization convention for subject line.  Probably remove
> "pci_hyperv" since it already contains "PCI: hv" and add something
> about MSI-X.
> 
> s/pci/PCI/
> 
> s/MSI vector/MSI-X vector/ since the context says you're concerned
> with MSI-X.
> 
> This requires an ack from Thomas; moved him to "To:" line.
>
Thanks Bjorn. I will work on addressing these.

Regards,
Shradha. 
> > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> > Reviewed-by: Long Li <longli@microsoft.com>
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 7 +++++--
> >  drivers/pci/msi/irqdomain.c         | 5 +++--
> >  include/linux/msi.h                 | 2 ++
> >  3 files changed, 10 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> > index ac27bda5ba26..f2fa6bdb6bb8 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -598,7 +598,8 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data)
> >  	return cfg->vector;
> >  }
> >  
> > -#define hv_msi_prepare		pci_msi_prepare
> > +#define hv_msi_prepare			pci_msi_prepare
> > +#define hv_msix_prepare_desc		pci_msix_prepare_desc
> >  
> >  /**
> >   * hv_arch_irq_unmask() - "Unmask" the IRQ by setting its current
> > @@ -727,6 +728,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
> >  #define FLOW_HANDLER		NULL
> >  #define FLOW_NAME		NULL
> >  #define hv_msi_prepare		NULL
> > +#define hv_msix_prepare_desc	NULL
> >  
> >  struct hv_pci_chip_data {
> >  	DECLARE_BITMAP(spi_map, HV_PCI_MSI_SPI_NR);
> > @@ -2063,6 +2065,7 @@ static struct irq_chip hv_msi_irq_chip = {
> >  static struct msi_domain_ops hv_msi_ops = {
> >  	.msi_prepare	= hv_msi_prepare,
> >  	.msi_free	= hv_msi_free,
> > +	.prepare_desc	= hv_msix_prepare_desc,
> >  };
> >  
> >  /**
> > @@ -2084,7 +2087,7 @@ static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus)
> >  	hbus->msi_info.ops = &hv_msi_ops;
> >  	hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS |
> >  		MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI |
> > -		MSI_FLAG_PCI_MSIX);
> > +		MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN);
> >  	hbus->msi_info.handler = FLOW_HANDLER;
> >  	hbus->msi_info.handler_name = FLOW_NAME;
> >  	hbus->msi_info.data = hbus;
> > diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
> > index d7ba8795d60f..43129aa6d6c7 100644
> > --- a/drivers/pci/msi/irqdomain.c
> > +++ b/drivers/pci/msi/irqdomain.c
> > @@ -222,13 +222,14 @@ static void pci_irq_unmask_msix(struct irq_data *data)
> >  	pci_msix_unmask(irq_data_get_msi_desc(data));
> >  }
> >  
> > -static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
> > -				  struct msi_desc *desc)
> > +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
> > +			   struct msi_desc *desc)
> >  {
> >  	/* Don't fiddle with preallocated MSI descriptors */
> >  	if (!desc->pci.mask_base)
> >  		msix_prepare_msi_desc(to_pci_dev(desc->dev), desc);
> >  }
> > +EXPORT_SYMBOL_GPL(pci_msix_prepare_desc);
> >  
> >  static const struct msi_domain_template pci_msix_template = {
> >  	.chip = {
> > diff --git a/include/linux/msi.h b/include/linux/msi.h
> > index 86e42742fd0f..d5864d5e75c2 100644
> > --- a/include/linux/msi.h
> > +++ b/include/linux/msi.h
> > @@ -691,6 +691,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
> >  					     struct irq_domain *parent);
> >  u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev);
> >  struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
> > +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg,
> > +			   struct msi_desc *desc);
> >  #else /* CONFIG_PCI_MSI */
> >  static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev)
> >  {
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-16 17:22   ` Yury Norov
@ 2025-04-17  7:32     ` Shradha Gupta
  2025-04-22 12:09     ` Shradha Gupta
  1 sibling, 0 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-17  7:32 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczy??ski, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Leon Romanovsky, Maxim Levitsky,
	Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma,
	Paul Rosswurm, Shradha Gupta

On Wed, Apr 16, 2025 at 01:22:04PM -0400, Yury Norov wrote:
> On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote:
> > Currently, the MANA driver allocates pci vector statically based on
> > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
> > up allocating more vectors than it needs.
> > This is because, by this time we do not have a HW channel and do not know
> > how many IRQs should be allocated.
> > To avoid this, we allocate 1 IRQ vector during the creation of HWC and
> > after getting the value supported by hardware, dynamically add the
> > remaining vectors.
> > 
> > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> > ---
> >  .../net/ethernet/microsoft/mana/gdma_main.c   | 306 ++++++++++++++----
> >  include/net/mana/gdma.h                       |   5 +-
> >  2 files changed, 250 insertions(+), 61 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 4ffaf7588885..3e3b5854b736 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -6,6 +6,9 @@
> >  #include <linux/pci.h>
> >  #include <linux/utsname.h>
> >  #include <linux/version.h>
> > +#include <linux/msi.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/list.h>
> >  
> >  #include <net/mana/mana.h>
> >  
> > @@ -80,8 +83,15 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
> >  		return err ? err : -EPROTO;
> >  	}
> >  
> > -	if (gc->num_msix_usable > resp.max_msix)
> > -		gc->num_msix_usable = resp.max_msix;
> > +	if (!pci_msix_can_alloc_dyn(pdev)) {
> > +		if (gc->num_msix_usable > resp.max_msix)
> > +			gc->num_msix_usable = resp.max_msix;
> > +	} else {
> > +		/* If dynamic allocation is enabled we have already allocated
> > +		 * hwc msi
> > +		 */
> > +		gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1);
> > +	}
> >  
> >  	if (gc->num_msix_usable <= 1)
> >  		return -ENOSPC;
> > @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
> >  	struct gdma_irq_context *gic;
> >  	struct gdma_context *gc;
> >  	unsigned int msi_index;
> > -	unsigned long flags;
> > +	struct list_head *pos;
> > +	unsigned long flags, flag_irq;
> >  	struct device *dev;
> > -	int err = 0;
> > +	int err = 0, count;
> >  
> >  	gc = gd->gdma_context;
> >  	dev = gc->dev;
> > @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
> >  	}
> >  
> >  	queue->eq.msix_index = msi_index;
> > -	gic = &gc->irq_contexts[msi_index];
> > +
> > +	/* get the msi_index value from the list*/
> > +	count = 0;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> > +	list_for_each(pos, &gc->irq_contexts) {
> > +		if (count == msi_index) {
> > +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> > +			break;
> > +		}
> > +
> > +		count++;
> > +	}
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> > +
> > +	if (!gic)
> > +		return -1;
> >  
> >  	spin_lock_irqsave(&gic->lock, flags);
> >  	list_add_rcu(&queue->entry, &gic->eq_list);
> > @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
> >  	struct gdma_irq_context *gic;
> >  	struct gdma_context *gc;
> >  	unsigned int msix_index;
> > -	unsigned long flags;
> > +	struct list_head *pos;
> > +	unsigned long flags, flag_irq;
> >  	struct gdma_queue *eq;
> > +	int count;
> >  
> >  	gc = gd->gdma_context;
> >  
> > @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
> >  	if (WARN_ON(msix_index >= gc->num_msix_usable))
> >  		return;
> >  
> > -	gic = &gc->irq_contexts[msix_index];
> > +	/* get the msi_index value from the list*/
> > +	count = 0;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> > +	list_for_each(pos, &gc->irq_contexts) {
> > +		if (count == msix_index) {
> > +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> > +			break;
> > +		}
> > +
> > +		count++;
> > +	}
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> > +
> > +	if (!gic)
> > +		return;
> > +
> >  	spin_lock_irqsave(&gic->lock, flags);
> >  	list_for_each_entry_rcu(eq, &gic->eq_list, entry) {
> >  		if (queue == eq) {
> > @@ -1288,11 +1331,11 @@ void mana_gd_free_res_map(struct gdma_resource *r)
> >  	r->size = 0;
> >  }
> >  
> > -static int irq_setup(unsigned int *irqs, unsigned int len, int node)
> > +static int irq_setup(unsigned int *irqs, unsigned int len, int node, int skip_cpu)
> >  {
> >  	const struct cpumask *next, *prev = cpu_none_mask;
> >  	cpumask_var_t cpus __free(free_cpumask_var);
> > -	int cpu, weight;
> > +	int cpu, weight, i = 0;
> >  
> >  	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
> >  		return -ENOMEM;
> > @@ -1303,9 +1346,21 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
> >  		while (weight > 0) {
> >  			cpumask_andnot(cpus, next, prev);
> >  			for_each_cpu(cpu, cpus) {
> > +				/* If the call is made for irqs which are dynamically
> > +				 * added and the num of vcpus is more or equal to
> > +				 * allocated msix, we need to skip the first
> > +				 * set of cpus, since they are already affinitized
> 
> Can you replace the 'set of cpus' with a 'sibling group'?
> 
> > +				 * to HWC IRQ
> > +				 */
> 
> This comment should not be here. This is a helper function. User may
> want to skip 1st CPU for whatever reason. Please put the comment in
> mana_gd_setup_dyn_irqs().
> 
> > +				if (skip_cpu && !i) {
> > +					i = 1;
> > +					goto next_cpumask;
> > +				}
> 
> The 'skip_cpu' variable should be a boolean, and has more a specific
> name. And you don't need the local 'i' to implement your logic:
> 
>         			if (unlikely(skip_first_cpu)) {
>                                         skip_first_cpu = false;
>         				goto next_sibling;
>         			}
> 
> >  				if (len-- == 0)
> >  					goto done;
> 
> This check should go before the one you're adding here.

Thank you for all the comments, Yury. I will fix these.

Regards,
Shradha.
> 
> > +
> >  				irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
> > +next_cpumask:
> >  				cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
> >  				--weight;
> >  			}
> > @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
> >  	return 0;
> >  }
> >  
> > -static int mana_gd_setup_irqs(struct pci_dev *pdev)
> > +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
> >  {
> >  	struct gdma_context *gc = pci_get_drvdata(pdev);
> > -	unsigned int max_queues_per_port;
> >  	struct gdma_irq_context *gic;
> > -	unsigned int max_irqs, cpu;
> > -	int start_irq_index = 1;
> > -	int nvec, *irqs, irq;
> > +	int *irqs, irq, skip_first_cpu = 0;
> > +	unsigned long flags;
> >  	int err, i = 0, j;
> >  
> >  	cpus_read_lock();
> > -	max_queues_per_port = num_online_cpus();
> > -	if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> > -		max_queues_per_port = MANA_MAX_NUM_QUEUES;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> > +	irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL);
> > +	if (!irqs) {
> > +		err = -ENOMEM;
> > +		goto free_irq_vector;
> > +	}
> >  
> > -	/* Need 1 interrupt for the Hardware communication Channel (HWC) */
> > -	max_irqs = max_queues_per_port + 1;
> > +	for (i = 0; i < nvec; i++) {
> > +		gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL);
> > +		if (!gic) {
> > +			err = -ENOMEM;
> > +			goto free_irq;
> > +		}
> > +		gic->handler = mana_gd_process_eq_events;
> > +		INIT_LIST_HEAD(&gic->eq_list);
> > +		spin_lock_init(&gic->lock);
> >  
> > -	nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX);
> > -	if (nvec < 0) {
> > -		cpus_read_unlock();
> > -		return nvec;
> > +		snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s",
> > +			 i, pci_name(pdev));
> > +
> > +		/* one pci vector is already allocated for HWC */
> > +		irqs[i] = pci_irq_vector(pdev, i + 1);
> > +		if (irqs[i] < 0) {
> > +			err = irqs[i];
> > +			goto free_current_gic;
> > +		}
> > +
> > +		err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic);
> > +		if (err)
> > +			goto free_current_gic;
> > +
> > +		list_add_tail(&gic->gic_list, &gc->irq_contexts);
> > +	}
> > +
> > +	if (gc->num_msix_usable <= num_online_cpus())
> > +		skip_first_cpu = 1;
> > +
> > +	err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu);
> > +	if (err)
> > +		goto free_irq;
> > +
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> > +	cpus_read_unlock();
> > +	kfree(irqs);
> > +	return 0;
> > +
> > +free_current_gic:
> > +	kfree(gic);
> > +free_irq:
> > +	for (j = i - 1; j >= 0; j--) {
> > +		irq = pci_irq_vector(pdev, j + 1);
> > +		gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list);
> > +		irq_update_affinity_hint(irq, NULL);
> > +		free_irq(irq, gic);
> > +		list_del(&gic->gic_list);
> > +		kfree(gic);
> >  	}
> > +	kfree(irqs);
> > +free_irq_vector:
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> > +	cpus_read_unlock();
> > +	return err;
> > +}
> > +
> > +static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec)
> > +{
> > +	struct gdma_context *gc = pci_get_drvdata(pdev);
> > +	struct gdma_irq_context *gic;
> > +	int start_irq_index = 1;
> > +	unsigned long flags;
> > +	unsigned int cpu;
> > +	int *irqs, irq;
> > +	int err, i = 0, j;
> > +
> > +	cpus_read_lock();
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> > +
> >  	if (nvec <= num_online_cpus())
> >  		start_irq_index = 0;
> >  
> > @@ -1349,15 +1467,12 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
> >  		goto free_irq_vector;
> >  	}
> >  
> > -	gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context),
> > -				   GFP_KERNEL);
> > -	if (!gc->irq_contexts) {
> > -		err = -ENOMEM;
> > -		goto free_irq_array;
> > -	}
> > -
> >  	for (i = 0; i < nvec; i++) {
> > -		gic = &gc->irq_contexts[i];
> > +		gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL);
> > +		if (!gic) {
> > +			err = -ENOMEM;
> > +			goto free_irq;
> > +		}
> >  		gic->handler = mana_gd_process_eq_events;
> >  		INIT_LIST_HEAD(&gic->eq_list);
> >  		spin_lock_init(&gic->lock);
> > @@ -1372,22 +1487,14 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
> >  		irq = pci_irq_vector(pdev, i);
> >  		if (irq < 0) {
> >  			err = irq;
> > -			goto free_irq;
> > +			goto free_current_gic;
> >  		}
> >  
> >  		if (!i) {
> >  			err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
> >  			if (err)
> > -				goto free_irq;
> > -
> > -			/* If number of IRQ is one extra than number of online CPUs,
> > -			 * then we need to assign IRQ0 (hwc irq) and IRQ1 to
> > -			 * same CPU.
> > -			 * Else we will use different CPUs for IRQ0 and IRQ1.
> > -			 * Also we are using cpumask_local_spread instead of
> > -			 * cpumask_first for the node, because the node can be
> > -			 * mem only.
> > -			 */
> > +				goto free_current_gic;
> > +
> >  			if (start_irq_index) {
> >  				cpu = cpumask_local_spread(i, gc->numa_node);
> >  				irq_set_affinity_and_hint(irq, cpumask_of(cpu));
> > @@ -1399,36 +1506,104 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
> >  			err = request_irq(irqs[i - start_irq_index], mana_gd_intr, 0,
> >  					  gic->name, gic);
> >  			if (err)
> > -				goto free_irq;
> > +				goto free_current_gic;
> >  		}
> > +
> > +		list_add_tail(&gic->gic_list, &gc->irq_contexts);
> >  	}
> >  
> > -	err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node);
> > +	err = irq_setup(irqs, nvec - start_irq_index, gc->numa_node, 0);
> >  	if (err)
> >  		goto free_irq;
> >  
> > -	gc->max_num_msix = nvec;
> > -	gc->num_msix_usable = nvec;
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> >  	cpus_read_unlock();
> >  	kfree(irqs);
> >  	return 0;
> >  
> > +free_current_gic:
> > +	kfree(gic);
> >  free_irq:
> >  	for (j = i - 1; j >= 0; j--) {
> >  		irq = pci_irq_vector(pdev, j);
> > -		gic = &gc->irq_contexts[j];
> > -
> > +		gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list);
> >  		irq_update_affinity_hint(irq, NULL);
> >  		free_irq(irq, gic);
> > +		list_del(&gic->gic_list);
> > +		kfree(gic);
> >  	}
> > -
> > -	kfree(gc->irq_contexts);
> > -	gc->irq_contexts = NULL;
> > -free_irq_array:
> >  	kfree(irqs);
> >  free_irq_vector:
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> >  	cpus_read_unlock();
> > -	pci_free_irq_vectors(pdev);
> > +	return err;
> > +}
> > +
> > +static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev)
> > +{
> > +	struct gdma_context *gc = pci_get_drvdata(pdev);
> > +	unsigned int max_irqs, min_irqs;
> > +	int max_queues_per_port;
> > +	int nvec, err;
> > +
> > +	if (pci_msix_can_alloc_dyn(pdev)) {
> > +		max_irqs = 1;
> > +		min_irqs = 1;
> > +	} else {
> > +		max_queues_per_port = num_online_cpus();
> > +		if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> > +			max_queues_per_port = MANA_MAX_NUM_QUEUES;
> > +		/* Need 1 interrupt for the Hardware communication Channel (HWC) */
> > +		max_irqs = max_queues_per_port + 1;
> > +		min_irqs = 2;
> > +	}
> > +
> > +	nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX);
> > +	if (nvec < 0)
> > +		return nvec;
> > +
> > +	err = mana_gd_setup_irqs(pdev, nvec);
> > +	if (err) {
> > +		pci_free_irq_vectors(pdev);
> > +		return err;
> > +	}
> > +
> > +	gc->num_msix_usable = nvec;
> > +	gc->max_num_msix = nvec;
> > +
> > +	return err;
> > +}
> > +
> > +static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev)
> > +{
> > +	struct gdma_context *gc = pci_get_drvdata(pdev);
> > +	int max_irqs, i, err = 0;
> > +	struct msi_map irq_map;
> > +
> > +	if (!pci_msix_can_alloc_dyn(pdev))
> > +		/* remain irqs are already allocated with HWC IRQ */
> > +		return 0;
> > +
> > +	/* allocate only remaining IRQs*/
> > +	max_irqs = gc->num_msix_usable - 1;
> > +
> > +	for (i = 1; i <= max_irqs; i++) {
> > +		irq_map = pci_msix_alloc_irq_at(pdev, i, NULL);
> > +		if (!irq_map.virq) {
> > +			err = irq_map.index;
> > +			/* caller will handle cleaning up all allocated
> > +			 * irqs, after HWC is destroyed
> > +			 */
> > +			return err;
> > +		}
> > +	}
> > +
> > +	err = mana_gd_setup_dyn_irqs(pdev, max_irqs);
> > +	if (err)
> > +		return err;
> > +
> > +	gc->max_num_msix = gc->max_num_msix + max_irqs;
> > +
> >  	return err;
> >  }
> >  
> > @@ -1436,29 +1611,34 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev)
> >  {
> >  	struct gdma_context *gc = pci_get_drvdata(pdev);
> >  	struct gdma_irq_context *gic;
> > -	int irq, i;
> > +	struct list_head *pos, *n;
> > +	unsigned long flags;
> > +	int irq, i = 0;
> >  
> >  	if (gc->max_num_msix < 1)
> >  		return;
> >  
> > -	for (i = 0; i < gc->max_num_msix; i++) {
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> > +	list_for_each_safe(pos, n, &gc->irq_contexts) {
> >  		irq = pci_irq_vector(pdev, i);
> >  		if (irq < 0)
> >  			continue;
> >  
> > -		gic = &gc->irq_contexts[i];
> > +		gic = list_entry(pos, struct gdma_irq_context, gic_list);
> >  
> >  		/* Need to clear the hint before free_irq */
> >  		irq_update_affinity_hint(irq, NULL);
> >  		free_irq(irq, gic);
> > +		list_del(pos);
> > +		kfree(gic);
> > +		i++;
> >  	}
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> >  
> >  	pci_free_irq_vectors(pdev);
> >  
> >  	gc->max_num_msix = 0;
> >  	gc->num_msix_usable = 0;
> > -	kfree(gc->irq_contexts);
> > -	gc->irq_contexts = NULL;
> >  }
> >  
> >  static int mana_gd_setup(struct pci_dev *pdev)
> > @@ -1469,9 +1649,9 @@ static int mana_gd_setup(struct pci_dev *pdev)
> >  	mana_gd_init_registers(pdev);
> >  	mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base);
> >  
> > -	err = mana_gd_setup_irqs(pdev);
> > +	err = mana_gd_setup_hwc_irqs(pdev);
> >  	if (err) {
> > -		dev_err(gc->dev, "Failed to setup IRQs: %d\n", err);
> > +		dev_err(gc->dev, "Failed to setup IRQs for HWC creation: %d\n", err);
> >  		return err;
> >  	}
> >  
> > @@ -1487,6 +1667,10 @@ static int mana_gd_setup(struct pci_dev *pdev)
> >  	if (err)
> >  		goto destroy_hwc;
> >  
> > +	err = mana_gd_setup_remaining_irqs(pdev);
> > +	if (err)
> > +		goto destroy_hwc;
> > +
> >  	err = mana_gd_detect_devices(pdev);
> >  	if (err)
> >  		goto destroy_hwc;
> > @@ -1563,6 +1747,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> >  	gc->is_pf = mana_is_pf(pdev->device);
> >  	gc->bar0_va = bar0_va;
> >  	gc->dev = &pdev->dev;
> > +	INIT_LIST_HEAD(&gc->irq_contexts);
> > +	spin_lock_init(&gc->irq_ctxs_lock);
> >  
> >  	if (gc->is_pf)
> >  		gc->mana_pci_debugfs = debugfs_create_dir("0", mana_debugfs_root);
> > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> > index 228603bf03f2..eae38d7302fe 100644
> > --- a/include/net/mana/gdma.h
> > +++ b/include/net/mana/gdma.h
> > @@ -363,6 +363,7 @@ struct gdma_irq_context {
> >  	spinlock_t lock;
> >  	struct list_head eq_list;
> >  	char name[MANA_IRQ_NAME_SZ];
> > +	struct list_head gic_list;
> >  };
> >  
> >  struct gdma_context {
> > @@ -373,7 +374,9 @@ struct gdma_context {
> >  	unsigned int		max_num_queues;
> >  	unsigned int		max_num_msix;
> >  	unsigned int		num_msix_usable;
> > -	struct gdma_irq_context	*irq_contexts;
> > +	struct list_head	irq_contexts;
> > +	/* Protect the irq_contexts list */
> > +	spinlock_t		irq_ctxs_lock;
> >  
> >  	/* L2 MTU */
> >  	u16 adapter_mtu;
> > -- 
> > 2.34.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-16 18:32   ` Bjorn Helgaas
@ 2025-04-17  7:33     ` Shradha Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-17  7:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczy??ski, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Leon Romanovsky, Maxim Levitsky,
	Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma,
	Paul Rosswurm, Shradha Gupta

On Wed, Apr 16, 2025 at 01:32:49PM -0500, Bjorn Helgaas wrote:
> On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote:
> > Currently, the MANA driver allocates pci vector statically based on
> > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
> > up allocating more vectors than it needs.
> > This is because, by this time we do not have a HW channel and do not know
> > how many IRQs should be allocated.
> > To avoid this, we allocate 1 IRQ vector during the creation of HWC and
> > after getting the value supported by hardware, dynamically add the
> > remaining vectors.
> 
> Add blank lines between paragraphs.
> 
> s/pci vector/MSI-X vectors/
> 
> Maybe also update subject to mention "MSI-X vectors" instead of "PCI
> vector", since "PCI vector" is not really a common term.

Understood, thanks Bjorn.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation
  2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta
  2025-04-16 18:30   ` Bjorn Helgaas
@ 2025-04-17 10:00   ` Thomas Gleixner
  2025-04-21  6:33     ` Shradha Gupta
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2025-04-17 10:00 UTC (permalink / raw)
  To: Shradha Gupta, linux-hyperv, linux-pci, linux-kernel, Nipun Gupta,
	Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Simon Horman, Leon Romanovsky, Maxim Levitsky,
	Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma,
	Paul Rosswurm
  Cc: Shradha Gupta

On Wed, Apr 16 2025 at 08:36, Shradha Gupta wrote:
> For supporting dynamic MSI vector allocation by pci controllers, enabling
> the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
> to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci
> controllers like pci-hyperv to support dynamic MSI vector allocation.
> Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use
> pci_msix_prepare_desc() in pci_hyperv controller

Seems your newline key is broken. Please structure changelogs properly
in paragraphs:

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

>  drivers/pci/controller/pci-hyperv.c | 7 +++++--
>  drivers/pci/msi/irqdomain.c         | 5 +++--
>  include/linux/msi.h                 | 2 ++

This wants to be split into a patch which exports
pci_msix_prepare_desc() and one which enables the functionality in
PCI/HV.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation
  2025-04-17 10:00   ` Thomas Gleixner
@ 2025-04-21  6:33     ` Shradha Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-21  6:33 UTC (permalink / raw)
  To: Thomas Gleixner, Shradha Gupta, linux-hyperv@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Gupta, Nipun, Yury Norov, Jason Gunthorpe, Jonathan Cameron,
	Anna-Maria Behnsen, Shradha Gupta, Shivamurthy Shastri,
	Kevin Tian, Long Li, Bjorn Helgaas, Rob Herring,
	Manivannan Sadhasivam, Krzysztof Wilczyński,
	Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang,
	KY Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela,
	Peter Zijlstra, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, Paul Rosswurm

>On Wed, Apr 16 2025 at 08:36, Shradha Gupta wrote:
>> For supporting dynamic MSI vector allocation by pci controllers, enabling
>> the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc()
>> to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci
>> controllers like pci-hyperv to support dynamic MSI vector allocation.
>>Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use
>> pci_msix_prepare_desc() in pci_hyperv controller
>
>Seems your newline key is broken. Please structure changelogs properly
>
>in paragraphs:
>
>
>https://www.kernel.org/doc/html/latest/process/maintainer->tip.html%23changelog&data=05%7C02%7Cshradhagupta%40microsoft.com%7Cb1abd0d5505243eacd4c08dd7d96afa6%7C72f988bf86f141af91ab2d7cd011db47%7C1%>7C0%7C638804808451114975%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3%3D%7C0%7C%7C%7C&sdata=Q4YgrhG0y7nFuYYePe3QVydzUQJ2owKiqWZwLkZjCR0%3D&reserved=0

>>  drivers/pci/controller/pci-hyperv.c | 7 +++++--
>> drivers/pci/msi/irqdomain.c         | 5 +++--
>> include/linux/msi.h                 | 2 ++
>
>This wants to be split into a patch which exports
>pci_msix_prepare_desc() and one which enables the functionality in
>PCI/HV.
>
>Thanks,
>
>      tglx

Thanks for all the comments, Thomas. I will incorporate all these and send out the next version.
We are facing some issues with mutt, hence I am replying through outlook. Apologies
for the issues with formatting for this reply. :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-16 17:22   ` Yury Norov
  2025-04-17  7:32     ` Shradha Gupta
@ 2025-04-22 12:09     ` Shradha Gupta
  1 sibling, 0 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-22 12:09 UTC (permalink / raw)
  To: Yury Norov, Shradha Gupta
  Cc: linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, Gupta, Nipun, Jason Gunthorpe,
	Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri,
	Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring,
	Manivannan Sadhasivam, Krzysztof Wilczyński,
	Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang,
	KY Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela,
	Peter Zijlstra, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, Paul Rosswurm

> On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote:
> > Currently, the MANA driver allocates pci vector statically based on
> > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases
> > ends up allocating more vectors than it needs.
> > This is because, by this time we do not have a HW channel and do not
> > know how many IRQs should be allocated.
> > To avoid this, we allocate 1 IRQ vector during the creation of HWC and
> > after getting the value supported by hardware, dynamically add the
> > remaining vectors.
> >
> > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> > ---
> >  .../net/ethernet/microsoft/mana/gdma_main.c   | 306 ++++++++++++++----
> >  include/net/mana/gdma.h                       |   5 +-
> >  2 files changed, 250 insertions(+), 61 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > index 4ffaf7588885..3e3b5854b736 100644
> > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> > @@ -6,6 +6,9 @@
> >  #include <linux/pci.h>
> >  #include <linux/utsname.h>
> >  #include <linux/version.h>
> > +#include <linux/msi.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/list.h>
> >
> >  #include <net/mana/mana.h>
> >
> > @@ -80,8 +83,15 @@ static int mana_gd_query_max_resources(struct pci_dev
> *pdev)
> >  		return err ? err : -EPROTO;
> >  	}
> >
> > -	if (gc->num_msix_usable > resp.max_msix)
> > -		gc->num_msix_usable = resp.max_msix;
> > +	if (!pci_msix_can_alloc_dyn(pdev)) {
> > +		if (gc->num_msix_usable > resp.max_msix)
> > +			gc->num_msix_usable = resp.max_msix;
> > +	} else {
> > +		/* If dynamic allocation is enabled we have already allocated
> > +		 * hwc msi
> > +		 */
> > +		gc->num_msix_usable = min(resp.max_msix, num_online_cpus()
> + 1);
> > +	}
> >
> >  	if (gc->num_msix_usable <= 1)
> >  		return -ENOSPC;
> > @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue
> *queue,
> >  	struct gdma_irq_context *gic;
> >  	struct gdma_context *gc;
> >  	unsigned int msi_index;
> > -	unsigned long flags;
> > +	struct list_head *pos;
> > +	unsigned long flags, flag_irq;
> >  	struct device *dev;
> > -	int err = 0;
> > +	int err = 0, count;
> >
> >  	gc = gd->gdma_context;
> >  	dev = gc->dev;
> > @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue
> *queue,
> >  	}
> >
> >  	queue->eq.msix_index = msi_index;
> > -	gic = &gc->irq_contexts[msi_index];
> > +
> > +	/* get the msi_index value from the list*/
> > +	count = 0;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> > +	list_for_each(pos, &gc->irq_contexts) {
> > +		if (count == msi_index) {
> > +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> > +			break;
> > +		}
> > +
> > +		count++;
> > +	}
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> > +
> > +	if (!gic)
> > +		return -1;
> >
> >  	spin_lock_irqsave(&gic->lock, flags);
> >  	list_add_rcu(&queue->entry, &gic->eq_list); @@ -497,8 +523,10 @@
> > static void mana_gd_deregiser_irq(struct gdma_queue *queue)
> >  	struct gdma_irq_context *gic;
> >  	struct gdma_context *gc;
> >  	unsigned int msix_index;
> > -	unsigned long flags;
> > +	struct list_head *pos;
> > +	unsigned long flags, flag_irq;
> >  	struct gdma_queue *eq;
> > +	int count;
> >
> >  	gc = gd->gdma_context;
> >
> > @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct
> gdma_queue *queue)
> >  	if (WARN_ON(msix_index >= gc->num_msix_usable))
> >  		return;
> >
> > -	gic = &gc->irq_contexts[msix_index];
> > +	/* get the msi_index value from the list*/
> > +	count = 0;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> > +	list_for_each(pos, &gc->irq_contexts) {
> > +		if (count == msix_index) {
> > +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> > +			break;
> > +		}
> > +
> > +		count++;
> > +	}
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> > +
> > +	if (!gic)
> > +		return;
> > +
> >  	spin_lock_irqsave(&gic->lock, flags);
> >  	list_for_each_entry_rcu(eq, &gic->eq_list, entry) {
> >  		if (queue == eq) {
> > @@ -1288,11 +1331,11 @@ void mana_gd_free_res_map(struct
> gdma_resource *r)
> >  	r->size = 0;
> >  }
> >
> > -static int irq_setup(unsigned int *irqs, unsigned int len, int node)
> > +static int irq_setup(unsigned int *irqs, unsigned int len, int node,
> > +int skip_cpu)
> >  {
> >  	const struct cpumask *next, *prev = cpu_none_mask;
> >  	cpumask_var_t cpus __free(free_cpumask_var);
> > -	int cpu, weight;
> > +	int cpu, weight, i = 0;
> >
> >  	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
> >  		return -ENOMEM;
> > @@ -1303,9 +1346,21 @@ static int irq_setup(unsigned int *irqs, unsigned int
> len, int node)
> >  		while (weight > 0) {
> >  			cpumask_andnot(cpus, next, prev);
> >  			for_each_cpu(cpu, cpus) {
> > +				/* If the call is made for irqs which are
> dynamically
> > +				 * added and the num of vcpus is more or equal
> to
> > +				 * allocated msix, we need to skip the first
> > +				 * set of cpus, since they are already affinitized
> 
> Can you replace the 'set of cpus' with a 'sibling group'?
> 
> > +				 * to HWC IRQ
> > +				 */
> 
> This comment should not be here. This is a helper function. User may want to skip
> 1st CPU for whatever reason. Please put the comment in
> mana_gd_setup_dyn_irqs().
> 
> > +				if (skip_cpu && !i) {
> > +					i = 1;
> > +					goto next_cpumask;
> > +				}
> 
> The 'skip_cpu' variable should be a boolean, and has more a specific name. And
> you don't need the local 'i' to implement your logic:
> 
>         			if (unlikely(skip_first_cpu)) {
>                                         skip_first_cpu = false;
>         				goto next_sibling;
>         			}
> 
> >  				if (len-- == 0)
> >  					goto done;
> 
> This check should go before the one you're adding here.

Hi Yury,
While preparing for the v2 for this patch, I realized that the movement of this 'if(len-- == 0)' check
above 'if(unlikely(skip_first_cpu))' is not needed as we are deliberately trying to skip 'len--'
in the iteration where skip_first_cpu is true.

Regards,
Shradha 

> 
> > +
> >  				irq_set_affinity_and_hint(*irqs++,
> > topology_sibling_cpumask(cpu));
> > +next_cpumask:
> >  				cpumask_andnot(cpus, cpus,
> topology_sibling_cpumask(cpu));
> >  				--weight;
> >  			}
> > @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int
> len, int node)
> >  	return 0;
> >  }
> >
> > -static int mana_gd_setup_irqs(struct pci_dev *pdev)
> > +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
> >  {
> >  	struct gdma_context *gc = pci_get_drvdata(pdev);
> > -	unsigned int max_queues_per_port;
> >  	struct gdma_irq_context *gic;
> > -	unsigned int max_irqs, cpu;
> > -	int start_irq_index = 1;
> > -	int nvec, *irqs, irq;
> > +	int *irqs, irq, skip_first_cpu = 0;
> > +	unsigned long flags;
> >  	int err, i = 0, j;
> >
> >  	cpus_read_lock();
> > -	max_queues_per_port = num_online_cpus();
> > -	if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> > -		max_queues_per_port = MANA_MAX_NUM_QUEUES;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> > +	irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL);
> > +	if (!irqs) {
> > +		err = -ENOMEM;
> > +		goto free_irq_vector;
> > +	}
> >
> > -	/* Need 1 interrupt for the Hardware communication Channel (HWC) */
> > -	max_irqs = max_queues_per_port + 1;
> > +	for (i = 0; i < nvec; i++) {
> > +		gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL);
> > +		if (!gic) {
> > +			err = -ENOMEM;
> > +			goto free_irq;
> > +		}
> > +		gic->handler = mana_gd_process_eq_events;
> > +		INIT_LIST_HEAD(&gic->eq_list);
> > +		spin_lock_init(&gic->lock);
> >
> > -	nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX);
> > -	if (nvec < 0) {
> > -		cpus_read_unlock();
> > -		return nvec;
> > +		snprintf(gic->name, MANA_IRQ_NAME_SZ,
> "mana_q%d@pci:%s",
> > +			 i, pci_name(pdev));
> > +
> > +		/* one pci vector is already allocated for HWC */
> > +		irqs[i] = pci_irq_vector(pdev, i + 1);
> > +		if (irqs[i] < 0) {
> > +			err = irqs[i];
> > +			goto free_current_gic;
> > +		}
> > +
> > +		err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic);
> > +		if (err)
> > +			goto free_current_gic;
> > +
> > +		list_add_tail(&gic->gic_list, &gc->irq_contexts);
> > +	}
> > +
> > +	if (gc->num_msix_usable <= num_online_cpus())
> > +		skip_first_cpu = 1;
> > +
> > +	err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu);
> > +	if (err)
> > +		goto free_irq;
> > +
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> > +	cpus_read_unlock();
> > +	kfree(irqs);
> > +	return 0;
> > +
> > +free_current_gic:
> > +	kfree(gic);
> > +free_irq:
> > +	for (j = i - 1; j >= 0; j--) {
> > +		irq = pci_irq_vector(pdev, j + 1);
> > +		gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context,
> gic_list);
> > +		irq_update_affinity_hint(irq, NULL);
> > +		free_irq(irq, gic);
> > +		list_del(&gic->gic_list);
> > +		kfree(gic);
> >  	}
> > +	kfree(irqs);
> > +free_irq_vector:
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> > +	cpus_read_unlock();
> > +	return err;
> > +}
> > +
> > +static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec) {
> > +	struct gdma_context *gc = pci_get_drvdata(pdev);
> > +	struct gdma_irq_context *gic;
> > +	int start_irq_index = 1;
> > +	unsigned long flags;
> > +	unsigned int cpu;
> > +	int *irqs, irq;
> > +	int err, i = 0, j;
> > +
> > +	cpus_read_lock();
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> > +
> >  	if (nvec <= num_online_cpus())
> >  		start_irq_index = 0;
> >
> > @@ -1349,15 +1467,12 @@ static int mana_gd_setup_irqs(struct pci_dev
> *pdev)
> >  		goto free_irq_vector;
> >  	}
> >
> > -	gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context),
> > -				   GFP_KERNEL);
> > -	if (!gc->irq_contexts) {
> > -		err = -ENOMEM;
> > -		goto free_irq_array;
> > -	}
> > -
> >  	for (i = 0; i < nvec; i++) {
> > -		gic = &gc->irq_contexts[i];
> > +		gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL);
> > +		if (!gic) {
> > +			err = -ENOMEM;
> > +			goto free_irq;
> > +		}
> >  		gic->handler = mana_gd_process_eq_events;
> >  		INIT_LIST_HEAD(&gic->eq_list);
> >  		spin_lock_init(&gic->lock);
> > @@ -1372,22 +1487,14 @@ static int mana_gd_setup_irqs(struct pci_dev
> *pdev)
> >  		irq = pci_irq_vector(pdev, i);
> >  		if (irq < 0) {
> >  			err = irq;
> > -			goto free_irq;
> > +			goto free_current_gic;
> >  		}
> >
> >  		if (!i) {
> >  			err = request_irq(irq, mana_gd_intr, 0, gic->name, gic);
> >  			if (err)
> > -				goto free_irq;
> > -
> > -			/* If number of IRQ is one extra than number of online
> CPUs,
> > -			 * then we need to assign IRQ0 (hwc irq) and IRQ1 to
> > -			 * same CPU.
> > -			 * Else we will use different CPUs for IRQ0 and IRQ1.
> > -			 * Also we are using cpumask_local_spread instead of
> > -			 * cpumask_first for the node, because the node can be
> > -			 * mem only.
> > -			 */
> > +				goto free_current_gic;
> > +
> >  			if (start_irq_index) {
> >  				cpu = cpumask_local_spread(i, gc-
> >numa_node);
> >  				irq_set_affinity_and_hint(irq, cpumask_of(cpu));
> @@ -1399,36
> > +1506,104 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
> >  			err = request_irq(irqs[i - start_irq_index], mana_gd_intr,
> 0,
> >  					  gic->name, gic);
> >  			if (err)
> > -				goto free_irq;
> > +				goto free_current_gic;
> >  		}
> > +
> > +		list_add_tail(&gic->gic_list, &gc->irq_contexts);
> >  	}
> >
> > -	err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node);
> > +	err = irq_setup(irqs, nvec - start_irq_index, gc->numa_node, 0);
> >  	if (err)
> >  		goto free_irq;
> >
> > -	gc->max_num_msix = nvec;
> > -	gc->num_msix_usable = nvec;
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> >  	cpus_read_unlock();
> >  	kfree(irqs);
> >  	return 0;
> >
> > +free_current_gic:
> > +	kfree(gic);
> >  free_irq:
> >  	for (j = i - 1; j >= 0; j--) {
> >  		irq = pci_irq_vector(pdev, j);
> > -		gic = &gc->irq_contexts[j];
> > -
> > +		gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context,
> > +gic_list);
> >  		irq_update_affinity_hint(irq, NULL);
> >  		free_irq(irq, gic);
> > +		list_del(&gic->gic_list);
> > +		kfree(gic);
> >  	}
> > -
> > -	kfree(gc->irq_contexts);
> > -	gc->irq_contexts = NULL;
> > -free_irq_array:
> >  	kfree(irqs);
> >  free_irq_vector:
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> >  	cpus_read_unlock();
> > -	pci_free_irq_vectors(pdev);
> > +	return err;
> > +}
> > +
> > +static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev) {
> > +	struct gdma_context *gc = pci_get_drvdata(pdev);
> > +	unsigned int max_irqs, min_irqs;
> > +	int max_queues_per_port;
> > +	int nvec, err;
> > +
> > +	if (pci_msix_can_alloc_dyn(pdev)) {
> > +		max_irqs = 1;
> > +		min_irqs = 1;
> > +	} else {
> > +		max_queues_per_port = num_online_cpus();
> > +		if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> > +			max_queues_per_port = MANA_MAX_NUM_QUEUES;
> > +		/* Need 1 interrupt for the Hardware communication Channel
> (HWC) */
> > +		max_irqs = max_queues_per_port + 1;
> > +		min_irqs = 2;
> > +	}
> > +
> > +	nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX);
> > +	if (nvec < 0)
> > +		return nvec;
> > +
> > +	err = mana_gd_setup_irqs(pdev, nvec);
> > +	if (err) {
> > +		pci_free_irq_vectors(pdev);
> > +		return err;
> > +	}
> > +
> > +	gc->num_msix_usable = nvec;
> > +	gc->max_num_msix = nvec;
> > +
> > +	return err;
> > +}
> > +
> > +static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev) {
> > +	struct gdma_context *gc = pci_get_drvdata(pdev);
> > +	int max_irqs, i, err = 0;
> > +	struct msi_map irq_map;
> > +
> > +	if (!pci_msix_can_alloc_dyn(pdev))
> > +		/* remain irqs are already allocated with HWC IRQ */
> > +		return 0;
> > +
> > +	/* allocate only remaining IRQs*/
> > +	max_irqs = gc->num_msix_usable - 1;
> > +
> > +	for (i = 1; i <= max_irqs; i++) {
> > +		irq_map = pci_msix_alloc_irq_at(pdev, i, NULL);
> > +		if (!irq_map.virq) {
> > +			err = irq_map.index;
> > +			/* caller will handle cleaning up all allocated
> > +			 * irqs, after HWC is destroyed
> > +			 */
> > +			return err;
> > +		}
> > +	}
> > +
> > +	err = mana_gd_setup_dyn_irqs(pdev, max_irqs);
> > +	if (err)
> > +		return err;
> > +
> > +	gc->max_num_msix = gc->max_num_msix + max_irqs;
> > +
> >  	return err;
> >  }
> >
> > @@ -1436,29 +1611,34 @@ static void mana_gd_remove_irqs(struct pci_dev
> > *pdev)  {
> >  	struct gdma_context *gc = pci_get_drvdata(pdev);
> >  	struct gdma_irq_context *gic;
> > -	int irq, i;
> > +	struct list_head *pos, *n;
> > +	unsigned long flags;
> > +	int irq, i = 0;
> >
> >  	if (gc->max_num_msix < 1)
> >  		return;
> >
> > -	for (i = 0; i < gc->max_num_msix; i++) {
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> > +	list_for_each_safe(pos, n, &gc->irq_contexts) {
> >  		irq = pci_irq_vector(pdev, i);
> >  		if (irq < 0)
> >  			continue;
> >
> > -		gic = &gc->irq_contexts[i];
> > +		gic = list_entry(pos, struct gdma_irq_context, gic_list);
> >
> >  		/* Need to clear the hint before free_irq */
> >  		irq_update_affinity_hint(irq, NULL);
> >  		free_irq(irq, gic);
> > +		list_del(pos);
> > +		kfree(gic);
> > +		i++;
> >  	}
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags);
> >
> >  	pci_free_irq_vectors(pdev);
> >
> >  	gc->max_num_msix = 0;
> >  	gc->num_msix_usable = 0;
> > -	kfree(gc->irq_contexts);
> > -	gc->irq_contexts = NULL;
> >  }
> >
> >  static int mana_gd_setup(struct pci_dev *pdev) @@ -1469,9 +1649,9 @@
> > static int mana_gd_setup(struct pci_dev *pdev)
> >  	mana_gd_init_registers(pdev);
> >  	mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base);
> >
> > -	err = mana_gd_setup_irqs(pdev);
> > +	err = mana_gd_setup_hwc_irqs(pdev);
> >  	if (err) {
> > -		dev_err(gc->dev, "Failed to setup IRQs: %d\n", err);
> > +		dev_err(gc->dev, "Failed to setup IRQs for HWC creation: %d\n",
> > +err);
> >  		return err;
> >  	}
> >
> > @@ -1487,6 +1667,10 @@ static int mana_gd_setup(struct pci_dev *pdev)
> >  	if (err)
> >  		goto destroy_hwc;
> >
> > +	err = mana_gd_setup_remaining_irqs(pdev);
> > +	if (err)
> > +		goto destroy_hwc;
> > +
> >  	err = mana_gd_detect_devices(pdev);
> >  	if (err)
> >  		goto destroy_hwc;
> > @@ -1563,6 +1747,8 @@ static int mana_gd_probe(struct pci_dev *pdev,
> const struct pci_device_id *ent)
> >  	gc->is_pf = mana_is_pf(pdev->device);
> >  	gc->bar0_va = bar0_va;
> >  	gc->dev = &pdev->dev;
> > +	INIT_LIST_HEAD(&gc->irq_contexts);
> > +	spin_lock_init(&gc->irq_ctxs_lock);
> >
> >  	if (gc->is_pf)
> >  		gc->mana_pci_debugfs = debugfs_create_dir("0",
> mana_debugfs_root);
> > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index
> > 228603bf03f2..eae38d7302fe 100644
> > --- a/include/net/mana/gdma.h
> > +++ b/include/net/mana/gdma.h
> > @@ -363,6 +363,7 @@ struct gdma_irq_context {
> >  	spinlock_t lock;
> >  	struct list_head eq_list;
> >  	char name[MANA_IRQ_NAME_SZ];
> > +	struct list_head gic_list;
> >  };
> >
> >  struct gdma_context {
> > @@ -373,7 +374,9 @@ struct gdma_context {
> >  	unsigned int		max_num_queues;
> >  	unsigned int		max_num_msix;
> >  	unsigned int		num_msix_usable;
> > -	struct gdma_irq_context	*irq_contexts;
> > +	struct list_head	irq_contexts;
> > +	/* Protect the irq_contexts list */
> > +	spinlock_t		irq_ctxs_lock;
> >
> >  	/* L2 MTU */
> >  	u16 adapter_mtu;
> > --
> > 2.34.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta
  2025-04-16 17:22   ` Yury Norov
  2025-04-16 18:32   ` Bjorn Helgaas
@ 2025-04-24 16:57   ` Simon Horman
  2025-04-25  7:29     ` Shradha Gupta
  2025-05-07 16:28   ` kernel test robot
  3 siblings, 1 reply; 15+ messages in thread
From: Simon Horman @ 2025-04-24 16:57 UTC (permalink / raw)
  To: Shradha Gupta
  Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela,
	Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta

On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote:
> Currently, the MANA driver allocates pci vector statically based on
> MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
> up allocating more vectors than it needs.
> This is because, by this time we do not have a HW channel and do not know
> how many IRQs should be allocated.
> To avoid this, we allocate 1 IRQ vector during the creation of HWC and
> after getting the value supported by hardware, dynamically add the
> remaining vectors.
> 
> Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>

Hi Shradha,

Some minor nits from my side.

...

> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c

...

> @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
>  	struct gdma_irq_context *gic;
>  	struct gdma_context *gc;
>  	unsigned int msi_index;
> -	unsigned long flags;
> +	struct list_head *pos;
> +	unsigned long flags, flag_irq;
>  	struct device *dev;
> -	int err = 0;
> +	int err = 0, count;

As this is Networking code, please preserve the arrangement of local
variables in reverse xmas tree order - longest line to shortest.

Edward Cree's tool can be useful in this area:
https://github.com/ecree-solarflare/xmastree

>  
>  	gc = gd->gdma_context;
>  	dev = gc->dev;
> @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
>  	}
>  
>  	queue->eq.msix_index = msi_index;
> -	gic = &gc->irq_contexts[msi_index];
> +
> +	/* get the msi_index value from the list*/
> +	count = 0;
> +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> +	list_for_each(pos, &gc->irq_contexts) {
> +		if (count == msi_index) {
> +			gic = list_entry(pos, struct gdma_irq_context, gic_list);

Please consider line wrapping to 80 columns or less, as is still preferred
in Networking code.

Likewise elsewhere in this patch.

checkpatch.pl --max-line-length=80
can be helpful here.

> +			break;
> +		}
> +
> +		count++;
> +	}
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> +
> +	if (!gic)
> +		return -1;
>  
>  	spin_lock_irqsave(&gic->lock, flags);
>  	list_add_rcu(&queue->entry, &gic->eq_list);
> @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
>  	struct gdma_irq_context *gic;
>  	struct gdma_context *gc;
>  	unsigned int msix_index;
> -	unsigned long flags;
> +	struct list_head *pos;
> +	unsigned long flags, flag_irq;
>  	struct gdma_queue *eq;
> +	int count;

Reverse xmas tree here too.

>  
>  	gc = gd->gdma_context;
>  
> @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
>  	if (WARN_ON(msix_index >= gc->num_msix_usable))
>  		return;
>  
> -	gic = &gc->irq_contexts[msix_index];
> +	/* get the msi_index value from the list*/
> +	count = 0;
> +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> +	list_for_each(pos, &gc->irq_contexts) {
> +		if (count == msix_index) {
> +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> +			break;
> +		}
> +
> +		count++;
> +	}
> +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> +

Does gic need to be initialised to NULL before the list_for_each loop
to ensure that it is always initialised here?

Flagged by Clang 20.1.2 KCFLAGS=-Wsometimes-uninitialized builds, and Smatch

> +	if (!gic)
> +		return;
> +
>  	spin_lock_irqsave(&gic->lock, flags);
>  	list_for_each_entry_rcu(eq, &gic->eq_list, entry) {
>  		if (queue == eq) {

...

> @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
>  	return 0;
>  }
>  
> -static int mana_gd_setup_irqs(struct pci_dev *pdev)
> +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
>  {
>  	struct gdma_context *gc = pci_get_drvdata(pdev);
> -	unsigned int max_queues_per_port;
>  	struct gdma_irq_context *gic;
> -	unsigned int max_irqs, cpu;
> -	int start_irq_index = 1;
> -	int nvec, *irqs, irq;
> +	int *irqs, irq, skip_first_cpu = 0;
> +	unsigned long flags;
>  	int err, i = 0, j;

Reverse xmas tree here too.

>  
>  	cpus_read_lock();
> -	max_queues_per_port = num_online_cpus();
> -	if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> -		max_queues_per_port = MANA_MAX_NUM_QUEUES;
> +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> +	irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL);
> +	if (!irqs) {
> +		err = -ENOMEM;
> +		goto free_irq_vector;
> +	}

...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-24 16:57   ` Simon Horman
@ 2025-04-25  7:29     ` Shradha Gupta
  0 siblings, 0 replies; 15+ messages in thread
From: Shradha Gupta @ 2025-04-25  7:29 UTC (permalink / raw)
  To: Simon Horman
  Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov,
	Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczy??ski, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela,
	Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta

On Thu, Apr 24, 2025 at 05:57:43PM +0100, Simon Horman wrote:
> On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote:
> > Currently, the MANA driver allocates pci vector statically based on
> > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
> > up allocating more vectors than it needs.
> > This is because, by this time we do not have a HW channel and do not know
> > how many IRQs should be allocated.
> > To avoid this, we allocate 1 IRQ vector during the creation of HWC and
> > after getting the value supported by hardware, dynamically add the
> > remaining vectors.
> > 
> > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> 
> Hi Shradha,
> 
> Some minor nits from my side.
> 
> ...
> 
> > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> 
> ...
> 
> > @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
> >  	struct gdma_irq_context *gic;
> >  	struct gdma_context *gc;
> >  	unsigned int msi_index;
> > -	unsigned long flags;
> > +	struct list_head *pos;
> > +	unsigned long flags, flag_irq;
> >  	struct device *dev;
> > -	int err = 0;
> > +	int err = 0, count;
> 
> As this is Networking code, please preserve the arrangement of local
> variables in reverse xmas tree order - longest line to shortest.
> 
> Edward Cree's tool can be useful in this area:
> https://github.com/ecree-solarflare/xmastree
> 
> >  
> >  	gc = gd->gdma_context;
> >  	dev = gc->dev;
> > @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue,
> >  	}
> >  
> >  	queue->eq.msix_index = msi_index;
> > -	gic = &gc->irq_contexts[msi_index];
> > +
> > +	/* get the msi_index value from the list*/
> > +	count = 0;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> > +	list_for_each(pos, &gc->irq_contexts) {
> > +		if (count == msi_index) {
> > +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> 
> Please consider line wrapping to 80 columns or less, as is still preferred
> in Networking code.
> 
> Likewise elsewhere in this patch.
> 
> checkpatch.pl --max-line-length=80
> can be helpful here.
> 
> > +			break;
> > +		}
> > +
> > +		count++;
> > +	}
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> > +
> > +	if (!gic)
> > +		return -1;
> >  
> >  	spin_lock_irqsave(&gic->lock, flags);
> >  	list_add_rcu(&queue->entry, &gic->eq_list);
> > @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
> >  	struct gdma_irq_context *gic;
> >  	struct gdma_context *gc;
> >  	unsigned int msix_index;
> > -	unsigned long flags;
> > +	struct list_head *pos;
> > +	unsigned long flags, flag_irq;
> >  	struct gdma_queue *eq;
> > +	int count;
> 
> Reverse xmas tree here too.
> 
> >  
> >  	gc = gd->gdma_context;
> >  
> > @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue)
> >  	if (WARN_ON(msix_index >= gc->num_msix_usable))
> >  		return;
> >  
> > -	gic = &gc->irq_contexts[msix_index];
> > +	/* get the msi_index value from the list*/
> > +	count = 0;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
> > +	list_for_each(pos, &gc->irq_contexts) {
> > +		if (count == msix_index) {
> > +			gic = list_entry(pos, struct gdma_irq_context, gic_list);
> > +			break;
> > +		}
> > +
> > +		count++;
> > +	}
> > +	spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
> > +
> 
> Does gic need to be initialised to NULL before the list_for_each loop
> to ensure that it is always initialised here?
> 
> Flagged by Clang 20.1.2 KCFLAGS=-Wsometimes-uninitialized builds, and Smatch
> 
> > +	if (!gic)
> > +		return;
> > +
> >  	spin_lock_irqsave(&gic->lock, flags);
> >  	list_for_each_entry_rcu(eq, &gic->eq_list, entry) {
> >  		if (queue == eq) {
> 
> ...
> 
> > @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
> >  	return 0;
> >  }
> >  
> > -static int mana_gd_setup_irqs(struct pci_dev *pdev)
> > +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
> >  {
> >  	struct gdma_context *gc = pci_get_drvdata(pdev);
> > -	unsigned int max_queues_per_port;
> >  	struct gdma_irq_context *gic;
> > -	unsigned int max_irqs, cpu;
> > -	int start_irq_index = 1;
> > -	int nvec, *irqs, irq;
> > +	int *irqs, irq, skip_first_cpu = 0;
> > +	unsigned long flags;
> >  	int err, i = 0, j;
> 
> Reverse xmas tree here too.
> 
> >  
> >  	cpus_read_lock();
> > -	max_queues_per_port = num_online_cpus();
> > -	if (max_queues_per_port > MANA_MAX_NUM_QUEUES)
> > -		max_queues_per_port = MANA_MAX_NUM_QUEUES;
> > +	spin_lock_irqsave(&gc->irq_ctxs_lock, flags);
> > +	irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL);
> > +	if (!irqs) {
> > +		err = -ENOMEM;
> > +		goto free_irq_vector;
> > +	}
> 
> ...

Hi Simon,

Thank you so much for all the comments, I will have them incorporated
in the next version. :)

Regards,
Shradha.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
  2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta
                     ` (2 preceding siblings ...)
  2025-04-24 16:57   ` Simon Horman
@ 2025-05-07 16:28   ` kernel test robot
  3 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2025-05-07 16:28 UTC (permalink / raw)
  To: Shradha Gupta, linux-hyperv, linux-pci, linux-kernel, Nipun Gupta,
	Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen,
	Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner,
	Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam,
	Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu,
	Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov,
	Simon Horman
  Cc: llvm, oe-kbuild-all, netdev

Hi Shradha,

kernel test robot noticed the following build warnings:

[auto build test WARNING on pci/next]
[also build test WARNING on pci/for-linus linus/master v6.15-rc5 next-20250507]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shradha-Gupta/PCI-hv-enable-pci_hyperv-to-allow-dynamic-vector-allocation/20250416-233828
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/1744817781-3243-1-git-send-email-shradhagupta%40linux.microsoft.com
patch subject: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20250508/202505080049.7AvfzOGc-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250508/202505080049.7AvfzOGc-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505080049.7AvfzOGc-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/net/ethernet/microsoft/mana/gdma_main.c:500:2: warning: variable 'gic' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized]
     500 |         list_for_each(pos, &gc->irq_contexts) {
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/list.h:687:27: note: expanded from macro 'list_for_each'
     687 |         for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/microsoft/mana/gdma_main.c:510:7: note: uninitialized use occurs here
     510 |         if (!gic)
         |              ^~~
   drivers/net/ethernet/microsoft/mana/gdma_main.c:500:2: note: remove the condition if it is always true
     500 |         list_for_each(pos, &gc->irq_contexts) {
         |         ^
   include/linux/list.h:687:27: note: expanded from macro 'list_for_each'
     687 |         for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next)
         |                                  ^
   drivers/net/ethernet/microsoft/mana/gdma_main.c:475:30: note: initialize the variable 'gic' to silence this warning
     475 |         struct gdma_irq_context *gic;
         |                                     ^
         |                                      = NULL
   drivers/net/ethernet/microsoft/mana/gdma_main.c:541:2: warning: variable 'gic' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized]
     541 |         list_for_each(pos, &gc->irq_contexts) {
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/list.h:687:27: note: expanded from macro 'list_for_each'
     687 |         for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next)
         |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/microsoft/mana/gdma_main.c:551:7: note: uninitialized use occurs here
     551 |         if (!gic)
         |              ^~~
   drivers/net/ethernet/microsoft/mana/gdma_main.c:541:2: note: remove the condition if it is always true
     541 |         list_for_each(pos, &gc->irq_contexts) {
         |         ^
   include/linux/list.h:687:27: note: expanded from macro 'list_for_each'
     687 |         for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next)
         |                                  ^
   drivers/net/ethernet/microsoft/mana/gdma_main.c:523:30: note: initialize the variable 'gic' to silence this warning
     523 |         struct gdma_irq_context *gic;
         |                                     ^
         |                                      = NULL
   2 warnings generated.


vim +500 drivers/net/ethernet/microsoft/mana/gdma_main.c

   470	
   471	static int mana_gd_register_irq(struct gdma_queue *queue,
   472					const struct gdma_queue_spec *spec)
   473	{
   474		struct gdma_dev *gd = queue->gdma_dev;
   475		struct gdma_irq_context *gic;
   476		struct gdma_context *gc;
   477		unsigned int msi_index;
   478		struct list_head *pos;
   479		unsigned long flags, flag_irq;
   480		struct device *dev;
   481		int err = 0, count;
   482	
   483		gc = gd->gdma_context;
   484		dev = gc->dev;
   485		msi_index = spec->eq.msix_index;
   486	
   487		if (msi_index >= gc->num_msix_usable) {
   488			err = -ENOSPC;
   489			dev_err(dev, "Register IRQ err:%d, msi:%u nMSI:%u",
   490				err, msi_index, gc->num_msix_usable);
   491	
   492			return err;
   493		}
   494	
   495		queue->eq.msix_index = msi_index;
   496	
   497		/* get the msi_index value from the list*/
   498		count = 0;
   499		spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq);
 > 500		list_for_each(pos, &gc->irq_contexts) {
   501			if (count == msi_index) {
   502				gic = list_entry(pos, struct gdma_irq_context, gic_list);
   503				break;
   504			}
   505	
   506			count++;
   507		}
   508		spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq);
   509	
   510		if (!gic)
   511			return -1;
   512	
   513		spin_lock_irqsave(&gic->lock, flags);
   514		list_add_rcu(&queue->entry, &gic->eq_list);
   515		spin_unlock_irqrestore(&gic->lock, flags);
   516	
   517		return 0;
   518	}
   519	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-05-07 16:29 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16 15:35 [PATCH 0/2] Allow dyn pci vector allocation of MANA Shradha Gupta
2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta
2025-04-16 18:30   ` Bjorn Helgaas
2025-04-17  7:29     ` Shradha Gupta
2025-04-17 10:00   ` Thomas Gleixner
2025-04-21  6:33     ` Shradha Gupta
2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta
2025-04-16 17:22   ` Yury Norov
2025-04-17  7:32     ` Shradha Gupta
2025-04-22 12:09     ` Shradha Gupta
2025-04-16 18:32   ` Bjorn Helgaas
2025-04-17  7:33     ` Shradha Gupta
2025-04-24 16:57   ` Simon Horman
2025-04-25  7:29     ` Shradha Gupta
2025-05-07 16:28   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).