* [PATCH 0/2] Allow dyn pci vector allocation of MANA @ 2025-04-16 15:35 Shradha Gupta 2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta 2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta 0 siblings, 2 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-16 15:35 UTC (permalink / raw) To: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm Cc: Shradha Gupta In this patchset we want to enable the MANA driver to be able to allocate MSI vectors in PCI dynamically The first patch targets the changes required for enabling the support of dyn vector allocation in pci-hyperv PCI controller. It also consists of changes in the PCI tree to allow clean support for dynamic allocation of IRQ vectors for PCI controllers. The second patch has the changes in MANA driver to be able to allocate pci vectors dynamically if it is supported by the infra. If the support does not exist it defaults to older behavior. For this submission, I wasn't sure if we should specify net-next or pci tree. Please let me know what the recommendation is. Shradha Gupta (2): PCI: hv: enable pci_hyperv to allow dynamic vector allocation net: mana: Allow MANA driver to allocate PCI vector dynamically .../net/ethernet/microsoft/mana/gdma_main.c | 306 ++++++++++++++---- drivers/pci/controller/pci-hyperv.c | 7 +- drivers/pci/msi/irqdomain.c | 5 +- include/linux/msi.h | 2 + include/net/mana/gdma.h | 5 +- 5 files changed, 260 insertions(+), 65 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation 2025-04-16 15:35 [PATCH 0/2] Allow dyn pci vector allocation of MANA Shradha Gupta @ 2025-04-16 15:36 ` Shradha Gupta 2025-04-16 18:30 ` Bjorn Helgaas 2025-04-17 10:00 ` Thomas Gleixner 2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta 1 sibling, 2 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-16 15:36 UTC (permalink / raw) To: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm Cc: Shradha Gupta For supporting dynamic MSI vector allocation by pci controllers, enabling the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc() to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci controllers like pci-hyperv to support dynamic MSI vector allocation. Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use pci_msix_prepare_desc() in pci_hyperv controller Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> --- drivers/pci/controller/pci-hyperv.c | 7 +++++-- drivers/pci/msi/irqdomain.c | 5 +++-- include/linux/msi.h | 2 ++ 3 files changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index ac27bda5ba26..f2fa6bdb6bb8 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -598,7 +598,8 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data) return cfg->vector; } -#define hv_msi_prepare pci_msi_prepare +#define hv_msi_prepare pci_msi_prepare +#define hv_msix_prepare_desc pci_msix_prepare_desc /** * hv_arch_irq_unmask() - "Unmask" the IRQ by setting its current @@ -727,6 +728,7 @@ static void hv_arch_irq_unmask(struct irq_data *data) #define FLOW_HANDLER NULL #define FLOW_NAME NULL #define hv_msi_prepare NULL +#define hv_msix_prepare_desc NULL struct hv_pci_chip_data { DECLARE_BITMAP(spi_map, HV_PCI_MSI_SPI_NR); @@ -2063,6 +2065,7 @@ static struct irq_chip hv_msi_irq_chip = { static struct msi_domain_ops hv_msi_ops = { .msi_prepare = hv_msi_prepare, .msi_free = hv_msi_free, + .prepare_desc = hv_msix_prepare_desc, }; /** @@ -2084,7 +2087,7 @@ static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus) hbus->msi_info.ops = &hv_msi_ops; hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI | - MSI_FLAG_PCI_MSIX); + MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN); hbus->msi_info.handler = FLOW_HANDLER; hbus->msi_info.handler_name = FLOW_NAME; hbus->msi_info.data = hbus; diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c index d7ba8795d60f..43129aa6d6c7 100644 --- a/drivers/pci/msi/irqdomain.c +++ b/drivers/pci/msi/irqdomain.c @@ -222,13 +222,14 @@ static void pci_irq_unmask_msix(struct irq_data *data) pci_msix_unmask(irq_data_get_msi_desc(data)); } -static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, - struct msi_desc *desc) +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, + struct msi_desc *desc) { /* Don't fiddle with preallocated MSI descriptors */ if (!desc->pci.mask_base) msix_prepare_msi_desc(to_pci_dev(desc->dev), desc); } +EXPORT_SYMBOL_GPL(pci_msix_prepare_desc); static const struct msi_domain_template pci_msix_template = { .chip = { diff --git a/include/linux/msi.h b/include/linux/msi.h index 86e42742fd0f..d5864d5e75c2 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -691,6 +691,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode, struct irq_domain *parent); u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev); struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev); +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, + struct msi_desc *desc); #else /* CONFIG_PCI_MSI */ static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev) { -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation 2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta @ 2025-04-16 18:30 ` Bjorn Helgaas 2025-04-17 7:29 ` Shradha Gupta 2025-04-17 10:00 ` Thomas Gleixner 1 sibling, 1 reply; 15+ messages in thread From: Bjorn Helgaas @ 2025-04-16 18:30 UTC (permalink / raw) To: Shradha Gupta, Thomas Gleixner Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta [+to Thomas] On Wed, Apr 16, 2025 at 08:36:06AM -0700, Shradha Gupta wrote: > For supporting dynamic MSI vector allocation by pci controllers, enabling > the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc() > to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci > controllers like pci-hyperv to support dynamic MSI vector allocation. > Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use > pci_msix_prepare_desc() in pci_hyperv controller Follow capitalization convention for subject line. Probably remove "pci_hyperv" since it already contains "PCI: hv" and add something about MSI-X. s/pci/PCI/ s/MSI vector/MSI-X vector/ since the context says you're concerned with MSI-X. This requires an ack from Thomas; moved him to "To:" line. > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> > Reviewed-by: Long Li <longli@microsoft.com> > --- > drivers/pci/controller/pci-hyperv.c | 7 +++++-- > drivers/pci/msi/irqdomain.c | 5 +++-- > include/linux/msi.h | 2 ++ > 3 files changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index ac27bda5ba26..f2fa6bdb6bb8 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -598,7 +598,8 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data) > return cfg->vector; > } > > -#define hv_msi_prepare pci_msi_prepare > +#define hv_msi_prepare pci_msi_prepare > +#define hv_msix_prepare_desc pci_msix_prepare_desc > > /** > * hv_arch_irq_unmask() - "Unmask" the IRQ by setting its current > @@ -727,6 +728,7 @@ static void hv_arch_irq_unmask(struct irq_data *data) > #define FLOW_HANDLER NULL > #define FLOW_NAME NULL > #define hv_msi_prepare NULL > +#define hv_msix_prepare_desc NULL > > struct hv_pci_chip_data { > DECLARE_BITMAP(spi_map, HV_PCI_MSI_SPI_NR); > @@ -2063,6 +2065,7 @@ static struct irq_chip hv_msi_irq_chip = { > static struct msi_domain_ops hv_msi_ops = { > .msi_prepare = hv_msi_prepare, > .msi_free = hv_msi_free, > + .prepare_desc = hv_msix_prepare_desc, > }; > > /** > @@ -2084,7 +2087,7 @@ static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus) > hbus->msi_info.ops = &hv_msi_ops; > hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS | > MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI | > - MSI_FLAG_PCI_MSIX); > + MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN); > hbus->msi_info.handler = FLOW_HANDLER; > hbus->msi_info.handler_name = FLOW_NAME; > hbus->msi_info.data = hbus; > diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c > index d7ba8795d60f..43129aa6d6c7 100644 > --- a/drivers/pci/msi/irqdomain.c > +++ b/drivers/pci/msi/irqdomain.c > @@ -222,13 +222,14 @@ static void pci_irq_unmask_msix(struct irq_data *data) > pci_msix_unmask(irq_data_get_msi_desc(data)); > } > > -static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, > - struct msi_desc *desc) > +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, > + struct msi_desc *desc) > { > /* Don't fiddle with preallocated MSI descriptors */ > if (!desc->pci.mask_base) > msix_prepare_msi_desc(to_pci_dev(desc->dev), desc); > } > +EXPORT_SYMBOL_GPL(pci_msix_prepare_desc); > > static const struct msi_domain_template pci_msix_template = { > .chip = { > diff --git a/include/linux/msi.h b/include/linux/msi.h > index 86e42742fd0f..d5864d5e75c2 100644 > --- a/include/linux/msi.h > +++ b/include/linux/msi.h > @@ -691,6 +691,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode, > struct irq_domain *parent); > u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev); > struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev); > +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, > + struct msi_desc *desc); > #else /* CONFIG_PCI_MSI */ > static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev) > { > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation 2025-04-16 18:30 ` Bjorn Helgaas @ 2025-04-17 7:29 ` Shradha Gupta 0 siblings, 0 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-17 7:29 UTC (permalink / raw) To: Bjorn Helgaas Cc: Thomas Gleixner, linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczy??ski, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta On Wed, Apr 16, 2025 at 01:30:42PM -0500, Bjorn Helgaas wrote: > [+to Thomas] > > On Wed, Apr 16, 2025 at 08:36:06AM -0700, Shradha Gupta wrote: > > For supporting dynamic MSI vector allocation by pci controllers, enabling > > the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc() > > to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci > > controllers like pci-hyperv to support dynamic MSI vector allocation. > > Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use > > pci_msix_prepare_desc() in pci_hyperv controller > > Follow capitalization convention for subject line. Probably remove > "pci_hyperv" since it already contains "PCI: hv" and add something > about MSI-X. > > s/pci/PCI/ > > s/MSI vector/MSI-X vector/ since the context says you're concerned > with MSI-X. > > This requires an ack from Thomas; moved him to "To:" line. > Thanks Bjorn. I will work on addressing these. Regards, Shradha. > > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> > > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> > > Reviewed-by: Long Li <longli@microsoft.com> > > --- > > drivers/pci/controller/pci-hyperv.c | 7 +++++-- > > drivers/pci/msi/irqdomain.c | 5 +++-- > > include/linux/msi.h | 2 ++ > > 3 files changed, 10 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > > index ac27bda5ba26..f2fa6bdb6bb8 100644 > > --- a/drivers/pci/controller/pci-hyperv.c > > +++ b/drivers/pci/controller/pci-hyperv.c > > @@ -598,7 +598,8 @@ static unsigned int hv_msi_get_int_vector(struct irq_data *data) > > return cfg->vector; > > } > > > > -#define hv_msi_prepare pci_msi_prepare > > +#define hv_msi_prepare pci_msi_prepare > > +#define hv_msix_prepare_desc pci_msix_prepare_desc > > > > /** > > * hv_arch_irq_unmask() - "Unmask" the IRQ by setting its current > > @@ -727,6 +728,7 @@ static void hv_arch_irq_unmask(struct irq_data *data) > > #define FLOW_HANDLER NULL > > #define FLOW_NAME NULL > > #define hv_msi_prepare NULL > > +#define hv_msix_prepare_desc NULL > > > > struct hv_pci_chip_data { > > DECLARE_BITMAP(spi_map, HV_PCI_MSI_SPI_NR); > > @@ -2063,6 +2065,7 @@ static struct irq_chip hv_msi_irq_chip = { > > static struct msi_domain_ops hv_msi_ops = { > > .msi_prepare = hv_msi_prepare, > > .msi_free = hv_msi_free, > > + .prepare_desc = hv_msix_prepare_desc, > > }; > > > > /** > > @@ -2084,7 +2087,7 @@ static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus) > > hbus->msi_info.ops = &hv_msi_ops; > > hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS | > > MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI | > > - MSI_FLAG_PCI_MSIX); > > + MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN); > > hbus->msi_info.handler = FLOW_HANDLER; > > hbus->msi_info.handler_name = FLOW_NAME; > > hbus->msi_info.data = hbus; > > diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c > > index d7ba8795d60f..43129aa6d6c7 100644 > > --- a/drivers/pci/msi/irqdomain.c > > +++ b/drivers/pci/msi/irqdomain.c > > @@ -222,13 +222,14 @@ static void pci_irq_unmask_msix(struct irq_data *data) > > pci_msix_unmask(irq_data_get_msi_desc(data)); > > } > > > > -static void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, > > - struct msi_desc *desc) > > +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, > > + struct msi_desc *desc) > > { > > /* Don't fiddle with preallocated MSI descriptors */ > > if (!desc->pci.mask_base) > > msix_prepare_msi_desc(to_pci_dev(desc->dev), desc); > > } > > +EXPORT_SYMBOL_GPL(pci_msix_prepare_desc); > > > > static const struct msi_domain_template pci_msix_template = { > > .chip = { > > diff --git a/include/linux/msi.h b/include/linux/msi.h > > index 86e42742fd0f..d5864d5e75c2 100644 > > --- a/include/linux/msi.h > > +++ b/include/linux/msi.h > > @@ -691,6 +691,8 @@ struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode, > > struct irq_domain *parent); > > u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev); > > struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev); > > +void pci_msix_prepare_desc(struct irq_domain *domain, msi_alloc_info_t *arg, > > + struct msi_desc *desc); > > #else /* CONFIG_PCI_MSI */ > > static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev) > > { > > -- > > 2.34.1 > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation 2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta 2025-04-16 18:30 ` Bjorn Helgaas @ 2025-04-17 10:00 ` Thomas Gleixner 2025-04-21 6:33 ` Shradha Gupta 1 sibling, 1 reply; 15+ messages in thread From: Thomas Gleixner @ 2025-04-17 10:00 UTC (permalink / raw) To: Shradha Gupta, linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm Cc: Shradha Gupta On Wed, Apr 16 2025 at 08:36, Shradha Gupta wrote: > For supporting dynamic MSI vector allocation by pci controllers, enabling > the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc() > to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci > controllers like pci-hyperv to support dynamic MSI vector allocation. > Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use > pci_msix_prepare_desc() in pci_hyperv controller Seems your newline key is broken. Please structure changelogs properly in paragraphs: https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog > drivers/pci/controller/pci-hyperv.c | 7 +++++-- > drivers/pci/msi/irqdomain.c | 5 +++-- > include/linux/msi.h | 2 ++ This wants to be split into a patch which exports pci_msix_prepare_desc() and one which enables the functionality in PCI/HV. Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation 2025-04-17 10:00 ` Thomas Gleixner @ 2025-04-21 6:33 ` Shradha Gupta 0 siblings, 0 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-21 6:33 UTC (permalink / raw) To: Thomas Gleixner, Shradha Gupta, linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Gupta, Nipun, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, KY Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, Paul Rosswurm >On Wed, Apr 16 2025 at 08:36, Shradha Gupta wrote: >> For supporting dynamic MSI vector allocation by pci controllers, enabling >> the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN is not enough, msix_prepare_msi_desc() >> to prepare the desc is needed. Export pci_msix_prepare_desc() to allow pci >> controllers like pci-hyperv to support dynamic MSI vector allocation. >>Also, add the flag MSI_FLAG_PCI_MSIX_ALLOC_DYN and use >> pci_msix_prepare_desc() in pci_hyperv controller > >Seems your newline key is broken. Please structure changelogs properly > >in paragraphs: > > >https://www.kernel.org/doc/html/latest/process/maintainer->tip.html%23changelog&data=05%7C02%7Cshradhagupta%40microsoft.com%7Cb1abd0d5505243eacd4c08dd7d96afa6%7C72f988bf86f141af91ab2d7cd011db47%7C1%>7C0%7C638804808451114975%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3%3D%7C0%7C%7C%7C&sdata=Q4YgrhG0y7nFuYYePe3QVydzUQJ2owKiqWZwLkZjCR0%3D&reserved=0 >> drivers/pci/controller/pci-hyperv.c | 7 +++++-- >> drivers/pci/msi/irqdomain.c | 5 +++-- >> include/linux/msi.h | 2 ++ > >This wants to be split into a patch which exports >pci_msix_prepare_desc() and one which enables the functionality in >PCI/HV. > >Thanks, > > tglx Thanks for all the comments, Thomas. I will incorporate all these and send out the next version. We are facing some issues with mutt, hence I am replying through outlook. Apologies for the issues with formatting for this reply. :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-16 15:35 [PATCH 0/2] Allow dyn pci vector allocation of MANA Shradha Gupta 2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta @ 2025-04-16 15:36 ` Shradha Gupta 2025-04-16 17:22 ` Yury Norov ` (3 more replies) 1 sibling, 4 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-16 15:36 UTC (permalink / raw) To: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shradha Gupta, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm Cc: Shradha Gupta Currently, the MANA driver allocates pci vector statically based on MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends up allocating more vectors than it needs. This is because, by this time we do not have a HW channel and do not know how many IRQs should be allocated. To avoid this, we allocate 1 IRQ vector during the creation of HWC and after getting the value supported by hardware, dynamically add the remaining vectors. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> --- .../net/ethernet/microsoft/mana/gdma_main.c | 306 ++++++++++++++---- include/net/mana/gdma.h | 5 +- 2 files changed, 250 insertions(+), 61 deletions(-) diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c index 4ffaf7588885..3e3b5854b736 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -6,6 +6,9 @@ #include <linux/pci.h> #include <linux/utsname.h> #include <linux/version.h> +#include <linux/msi.h> +#include <linux/irqdomain.h> +#include <linux/list.h> #include <net/mana/mana.h> @@ -80,8 +83,15 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev) return err ? err : -EPROTO; } - if (gc->num_msix_usable > resp.max_msix) - gc->num_msix_usable = resp.max_msix; + if (!pci_msix_can_alloc_dyn(pdev)) { + if (gc->num_msix_usable > resp.max_msix) + gc->num_msix_usable = resp.max_msix; + } else { + /* If dynamic allocation is enabled we have already allocated + * hwc msi + */ + gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1); + } if (gc->num_msix_usable <= 1) return -ENOSPC; @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue, struct gdma_irq_context *gic; struct gdma_context *gc; unsigned int msi_index; - unsigned long flags; + struct list_head *pos; + unsigned long flags, flag_irq; struct device *dev; - int err = 0; + int err = 0, count; gc = gd->gdma_context; dev = gc->dev; @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue, } queue->eq.msix_index = msi_index; - gic = &gc->irq_contexts[msi_index]; + + /* get the msi_index value from the list*/ + count = 0; + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); + list_for_each(pos, &gc->irq_contexts) { + if (count == msi_index) { + gic = list_entry(pos, struct gdma_irq_context, gic_list); + break; + } + + count++; + } + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); + + if (!gic) + return -1; spin_lock_irqsave(&gic->lock, flags); list_add_rcu(&queue->entry, &gic->eq_list); @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) struct gdma_irq_context *gic; struct gdma_context *gc; unsigned int msix_index; - unsigned long flags; + struct list_head *pos; + unsigned long flags, flag_irq; struct gdma_queue *eq; + int count; gc = gd->gdma_context; @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) if (WARN_ON(msix_index >= gc->num_msix_usable)) return; - gic = &gc->irq_contexts[msix_index]; + /* get the msi_index value from the list*/ + count = 0; + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); + list_for_each(pos, &gc->irq_contexts) { + if (count == msix_index) { + gic = list_entry(pos, struct gdma_irq_context, gic_list); + break; + } + + count++; + } + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); + + if (!gic) + return; + spin_lock_irqsave(&gic->lock, flags); list_for_each_entry_rcu(eq, &gic->eq_list, entry) { if (queue == eq) { @@ -1288,11 +1331,11 @@ void mana_gd_free_res_map(struct gdma_resource *r) r->size = 0; } -static int irq_setup(unsigned int *irqs, unsigned int len, int node) +static int irq_setup(unsigned int *irqs, unsigned int len, int node, int skip_cpu) { const struct cpumask *next, *prev = cpu_none_mask; cpumask_var_t cpus __free(free_cpumask_var); - int cpu, weight; + int cpu, weight, i = 0; if (!alloc_cpumask_var(&cpus, GFP_KERNEL)) return -ENOMEM; @@ -1303,9 +1346,21 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node) while (weight > 0) { cpumask_andnot(cpus, next, prev); for_each_cpu(cpu, cpus) { + /* If the call is made for irqs which are dynamically + * added and the num of vcpus is more or equal to + * allocated msix, we need to skip the first + * set of cpus, since they are already affinitized + * to HWC IRQ + */ + if (skip_cpu && !i) { + i = 1; + goto next_cpumask; + } if (len-- == 0) goto done; + irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu)); +next_cpumask: cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu)); --weight; } @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node) return 0; } -static int mana_gd_setup_irqs(struct pci_dev *pdev) +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) { struct gdma_context *gc = pci_get_drvdata(pdev); - unsigned int max_queues_per_port; struct gdma_irq_context *gic; - unsigned int max_irqs, cpu; - int start_irq_index = 1; - int nvec, *irqs, irq; + int *irqs, irq, skip_first_cpu = 0; + unsigned long flags; int err, i = 0, j; cpus_read_lock(); - max_queues_per_port = num_online_cpus(); - if (max_queues_per_port > MANA_MAX_NUM_QUEUES) - max_queues_per_port = MANA_MAX_NUM_QUEUES; + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); + irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL); + if (!irqs) { + err = -ENOMEM; + goto free_irq_vector; + } - /* Need 1 interrupt for the Hardware communication Channel (HWC) */ - max_irqs = max_queues_per_port + 1; + for (i = 0; i < nvec; i++) { + gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL); + if (!gic) { + err = -ENOMEM; + goto free_irq; + } + gic->handler = mana_gd_process_eq_events; + INIT_LIST_HEAD(&gic->eq_list); + spin_lock_init(&gic->lock); - nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX); - if (nvec < 0) { - cpus_read_unlock(); - return nvec; + snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s", + i, pci_name(pdev)); + + /* one pci vector is already allocated for HWC */ + irqs[i] = pci_irq_vector(pdev, i + 1); + if (irqs[i] < 0) { + err = irqs[i]; + goto free_current_gic; + } + + err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic); + if (err) + goto free_current_gic; + + list_add_tail(&gic->gic_list, &gc->irq_contexts); + } + + if (gc->num_msix_usable <= num_online_cpus()) + skip_first_cpu = 1; + + err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu); + if (err) + goto free_irq; + + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); + cpus_read_unlock(); + kfree(irqs); + return 0; + +free_current_gic: + kfree(gic); +free_irq: + for (j = i - 1; j >= 0; j--) { + irq = pci_irq_vector(pdev, j + 1); + gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list); + irq_update_affinity_hint(irq, NULL); + free_irq(irq, gic); + list_del(&gic->gic_list); + kfree(gic); } + kfree(irqs); +free_irq_vector: + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); + cpus_read_unlock(); + return err; +} + +static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec) +{ + struct gdma_context *gc = pci_get_drvdata(pdev); + struct gdma_irq_context *gic; + int start_irq_index = 1; + unsigned long flags; + unsigned int cpu; + int *irqs, irq; + int err, i = 0, j; + + cpus_read_lock(); + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); + if (nvec <= num_online_cpus()) start_irq_index = 0; @@ -1349,15 +1467,12 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) goto free_irq_vector; } - gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context), - GFP_KERNEL); - if (!gc->irq_contexts) { - err = -ENOMEM; - goto free_irq_array; - } - for (i = 0; i < nvec; i++) { - gic = &gc->irq_contexts[i]; + gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL); + if (!gic) { + err = -ENOMEM; + goto free_irq; + } gic->handler = mana_gd_process_eq_events; INIT_LIST_HEAD(&gic->eq_list); spin_lock_init(&gic->lock); @@ -1372,22 +1487,14 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) irq = pci_irq_vector(pdev, i); if (irq < 0) { err = irq; - goto free_irq; + goto free_current_gic; } if (!i) { err = request_irq(irq, mana_gd_intr, 0, gic->name, gic); if (err) - goto free_irq; - - /* If number of IRQ is one extra than number of online CPUs, - * then we need to assign IRQ0 (hwc irq) and IRQ1 to - * same CPU. - * Else we will use different CPUs for IRQ0 and IRQ1. - * Also we are using cpumask_local_spread instead of - * cpumask_first for the node, because the node can be - * mem only. - */ + goto free_current_gic; + if (start_irq_index) { cpu = cpumask_local_spread(i, gc->numa_node); irq_set_affinity_and_hint(irq, cpumask_of(cpu)); @@ -1399,36 +1506,104 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) err = request_irq(irqs[i - start_irq_index], mana_gd_intr, 0, gic->name, gic); if (err) - goto free_irq; + goto free_current_gic; } + + list_add_tail(&gic->gic_list, &gc->irq_contexts); } - err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node); + err = irq_setup(irqs, nvec - start_irq_index, gc->numa_node, 0); if (err) goto free_irq; - gc->max_num_msix = nvec; - gc->num_msix_usable = nvec; + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); cpus_read_unlock(); kfree(irqs); return 0; +free_current_gic: + kfree(gic); free_irq: for (j = i - 1; j >= 0; j--) { irq = pci_irq_vector(pdev, j); - gic = &gc->irq_contexts[j]; - + gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list); irq_update_affinity_hint(irq, NULL); free_irq(irq, gic); + list_del(&gic->gic_list); + kfree(gic); } - - kfree(gc->irq_contexts); - gc->irq_contexts = NULL; -free_irq_array: kfree(irqs); free_irq_vector: + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); cpus_read_unlock(); - pci_free_irq_vectors(pdev); + return err; +} + +static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev) +{ + struct gdma_context *gc = pci_get_drvdata(pdev); + unsigned int max_irqs, min_irqs; + int max_queues_per_port; + int nvec, err; + + if (pci_msix_can_alloc_dyn(pdev)) { + max_irqs = 1; + min_irqs = 1; + } else { + max_queues_per_port = num_online_cpus(); + if (max_queues_per_port > MANA_MAX_NUM_QUEUES) + max_queues_per_port = MANA_MAX_NUM_QUEUES; + /* Need 1 interrupt for the Hardware communication Channel (HWC) */ + max_irqs = max_queues_per_port + 1; + min_irqs = 2; + } + + nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX); + if (nvec < 0) + return nvec; + + err = mana_gd_setup_irqs(pdev, nvec); + if (err) { + pci_free_irq_vectors(pdev); + return err; + } + + gc->num_msix_usable = nvec; + gc->max_num_msix = nvec; + + return err; +} + +static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev) +{ + struct gdma_context *gc = pci_get_drvdata(pdev); + int max_irqs, i, err = 0; + struct msi_map irq_map; + + if (!pci_msix_can_alloc_dyn(pdev)) + /* remain irqs are already allocated with HWC IRQ */ + return 0; + + /* allocate only remaining IRQs*/ + max_irqs = gc->num_msix_usable - 1; + + for (i = 1; i <= max_irqs; i++) { + irq_map = pci_msix_alloc_irq_at(pdev, i, NULL); + if (!irq_map.virq) { + err = irq_map.index; + /* caller will handle cleaning up all allocated + * irqs, after HWC is destroyed + */ + return err; + } + } + + err = mana_gd_setup_dyn_irqs(pdev, max_irqs); + if (err) + return err; + + gc->max_num_msix = gc->max_num_msix + max_irqs; + return err; } @@ -1436,29 +1611,34 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev) { struct gdma_context *gc = pci_get_drvdata(pdev); struct gdma_irq_context *gic; - int irq, i; + struct list_head *pos, *n; + unsigned long flags; + int irq, i = 0; if (gc->max_num_msix < 1) return; - for (i = 0; i < gc->max_num_msix; i++) { + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); + list_for_each_safe(pos, n, &gc->irq_contexts) { irq = pci_irq_vector(pdev, i); if (irq < 0) continue; - gic = &gc->irq_contexts[i]; + gic = list_entry(pos, struct gdma_irq_context, gic_list); /* Need to clear the hint before free_irq */ irq_update_affinity_hint(irq, NULL); free_irq(irq, gic); + list_del(pos); + kfree(gic); + i++; } + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); pci_free_irq_vectors(pdev); gc->max_num_msix = 0; gc->num_msix_usable = 0; - kfree(gc->irq_contexts); - gc->irq_contexts = NULL; } static int mana_gd_setup(struct pci_dev *pdev) @@ -1469,9 +1649,9 @@ static int mana_gd_setup(struct pci_dev *pdev) mana_gd_init_registers(pdev); mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base); - err = mana_gd_setup_irqs(pdev); + err = mana_gd_setup_hwc_irqs(pdev); if (err) { - dev_err(gc->dev, "Failed to setup IRQs: %d\n", err); + dev_err(gc->dev, "Failed to setup IRQs for HWC creation: %d\n", err); return err; } @@ -1487,6 +1667,10 @@ static int mana_gd_setup(struct pci_dev *pdev) if (err) goto destroy_hwc; + err = mana_gd_setup_remaining_irqs(pdev); + if (err) + goto destroy_hwc; + err = mana_gd_detect_devices(pdev); if (err) goto destroy_hwc; @@ -1563,6 +1747,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent) gc->is_pf = mana_is_pf(pdev->device); gc->bar0_va = bar0_va; gc->dev = &pdev->dev; + INIT_LIST_HEAD(&gc->irq_contexts); + spin_lock_init(&gc->irq_ctxs_lock); if (gc->is_pf) gc->mana_pci_debugfs = debugfs_create_dir("0", mana_debugfs_root); diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index 228603bf03f2..eae38d7302fe 100644 --- a/include/net/mana/gdma.h +++ b/include/net/mana/gdma.h @@ -363,6 +363,7 @@ struct gdma_irq_context { spinlock_t lock; struct list_head eq_list; char name[MANA_IRQ_NAME_SZ]; + struct list_head gic_list; }; struct gdma_context { @@ -373,7 +374,9 @@ struct gdma_context { unsigned int max_num_queues; unsigned int max_num_msix; unsigned int num_msix_usable; - struct gdma_irq_context *irq_contexts; + struct list_head irq_contexts; + /* Protect the irq_contexts list */ + spinlock_t irq_ctxs_lock; /* L2 MTU */ u16 adapter_mtu; -- 2.34.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta @ 2025-04-16 17:22 ` Yury Norov 2025-04-17 7:32 ` Shradha Gupta 2025-04-22 12:09 ` Shradha Gupta 2025-04-16 18:32 ` Bjorn Helgaas ` (2 subsequent siblings) 3 siblings, 2 replies; 15+ messages in thread From: Yury Norov @ 2025-04-16 17:22 UTC (permalink / raw) To: Shradha Gupta Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote: > Currently, the MANA driver allocates pci vector statically based on > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends > up allocating more vectors than it needs. > This is because, by this time we do not have a HW channel and do not know > how many IRQs should be allocated. > To avoid this, we allocate 1 IRQ vector during the creation of HWC and > after getting the value supported by hardware, dynamically add the > remaining vectors. > > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> > --- > .../net/ethernet/microsoft/mana/gdma_main.c | 306 ++++++++++++++---- > include/net/mana/gdma.h | 5 +- > 2 files changed, 250 insertions(+), 61 deletions(-) > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c > index 4ffaf7588885..3e3b5854b736 100644 > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c > @@ -6,6 +6,9 @@ > #include <linux/pci.h> > #include <linux/utsname.h> > #include <linux/version.h> > +#include <linux/msi.h> > +#include <linux/irqdomain.h> > +#include <linux/list.h> > > #include <net/mana/mana.h> > > @@ -80,8 +83,15 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev) > return err ? err : -EPROTO; > } > > - if (gc->num_msix_usable > resp.max_msix) > - gc->num_msix_usable = resp.max_msix; > + if (!pci_msix_can_alloc_dyn(pdev)) { > + if (gc->num_msix_usable > resp.max_msix) > + gc->num_msix_usable = resp.max_msix; > + } else { > + /* If dynamic allocation is enabled we have already allocated > + * hwc msi > + */ > + gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1); > + } > > if (gc->num_msix_usable <= 1) > return -ENOSPC; > @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue, > struct gdma_irq_context *gic; > struct gdma_context *gc; > unsigned int msi_index; > - unsigned long flags; > + struct list_head *pos; > + unsigned long flags, flag_irq; > struct device *dev; > - int err = 0; > + int err = 0, count; > > gc = gd->gdma_context; > dev = gc->dev; > @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue, > } > > queue->eq.msix_index = msi_index; > - gic = &gc->irq_contexts[msi_index]; > + > + /* get the msi_index value from the list*/ > + count = 0; > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > + list_for_each(pos, &gc->irq_contexts) { > + if (count == msi_index) { > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > + break; > + } > + > + count++; > + } > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > + > + if (!gic) > + return -1; > > spin_lock_irqsave(&gic->lock, flags); > list_add_rcu(&queue->entry, &gic->eq_list); > @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) > struct gdma_irq_context *gic; > struct gdma_context *gc; > unsigned int msix_index; > - unsigned long flags; > + struct list_head *pos; > + unsigned long flags, flag_irq; > struct gdma_queue *eq; > + int count; > > gc = gd->gdma_context; > > @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) > if (WARN_ON(msix_index >= gc->num_msix_usable)) > return; > > - gic = &gc->irq_contexts[msix_index]; > + /* get the msi_index value from the list*/ > + count = 0; > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > + list_for_each(pos, &gc->irq_contexts) { > + if (count == msix_index) { > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > + break; > + } > + > + count++; > + } > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > + > + if (!gic) > + return; > + > spin_lock_irqsave(&gic->lock, flags); > list_for_each_entry_rcu(eq, &gic->eq_list, entry) { > if (queue == eq) { > @@ -1288,11 +1331,11 @@ void mana_gd_free_res_map(struct gdma_resource *r) > r->size = 0; > } > > -static int irq_setup(unsigned int *irqs, unsigned int len, int node) > +static int irq_setup(unsigned int *irqs, unsigned int len, int node, int skip_cpu) > { > const struct cpumask *next, *prev = cpu_none_mask; > cpumask_var_t cpus __free(free_cpumask_var); > - int cpu, weight; > + int cpu, weight, i = 0; > > if (!alloc_cpumask_var(&cpus, GFP_KERNEL)) > return -ENOMEM; > @@ -1303,9 +1346,21 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node) > while (weight > 0) { > cpumask_andnot(cpus, next, prev); > for_each_cpu(cpu, cpus) { > + /* If the call is made for irqs which are dynamically > + * added and the num of vcpus is more or equal to > + * allocated msix, we need to skip the first > + * set of cpus, since they are already affinitized Can you replace the 'set of cpus' with a 'sibling group'? > + * to HWC IRQ > + */ This comment should not be here. This is a helper function. User may want to skip 1st CPU for whatever reason. Please put the comment in mana_gd_setup_dyn_irqs(). > + if (skip_cpu && !i) { > + i = 1; > + goto next_cpumask; > + } The 'skip_cpu' variable should be a boolean, and has more a specific name. And you don't need the local 'i' to implement your logic: if (unlikely(skip_first_cpu)) { skip_first_cpu = false; goto next_sibling; } > if (len-- == 0) > goto done; This check should go before the one you're adding here. > + > irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu)); > +next_cpumask: > cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu)); > --weight; > } > @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node) > return 0; > } > > -static int mana_gd_setup_irqs(struct pci_dev *pdev) > +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) > { > struct gdma_context *gc = pci_get_drvdata(pdev); > - unsigned int max_queues_per_port; > struct gdma_irq_context *gic; > - unsigned int max_irqs, cpu; > - int start_irq_index = 1; > - int nvec, *irqs, irq; > + int *irqs, irq, skip_first_cpu = 0; > + unsigned long flags; > int err, i = 0, j; > > cpus_read_lock(); > - max_queues_per_port = num_online_cpus(); > - if (max_queues_per_port > MANA_MAX_NUM_QUEUES) > - max_queues_per_port = MANA_MAX_NUM_QUEUES; > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > + irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL); > + if (!irqs) { > + err = -ENOMEM; > + goto free_irq_vector; > + } > > - /* Need 1 interrupt for the Hardware communication Channel (HWC) */ > - max_irqs = max_queues_per_port + 1; > + for (i = 0; i < nvec; i++) { > + gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL); > + if (!gic) { > + err = -ENOMEM; > + goto free_irq; > + } > + gic->handler = mana_gd_process_eq_events; > + INIT_LIST_HEAD(&gic->eq_list); > + spin_lock_init(&gic->lock); > > - nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX); > - if (nvec < 0) { > - cpus_read_unlock(); > - return nvec; > + snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s", > + i, pci_name(pdev)); > + > + /* one pci vector is already allocated for HWC */ > + irqs[i] = pci_irq_vector(pdev, i + 1); > + if (irqs[i] < 0) { > + err = irqs[i]; > + goto free_current_gic; > + } > + > + err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic); > + if (err) > + goto free_current_gic; > + > + list_add_tail(&gic->gic_list, &gc->irq_contexts); > + } > + > + if (gc->num_msix_usable <= num_online_cpus()) > + skip_first_cpu = 1; > + > + err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu); > + if (err) > + goto free_irq; > + > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > + cpus_read_unlock(); > + kfree(irqs); > + return 0; > + > +free_current_gic: > + kfree(gic); > +free_irq: > + for (j = i - 1; j >= 0; j--) { > + irq = pci_irq_vector(pdev, j + 1); > + gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list); > + irq_update_affinity_hint(irq, NULL); > + free_irq(irq, gic); > + list_del(&gic->gic_list); > + kfree(gic); > } > + kfree(irqs); > +free_irq_vector: > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > + cpus_read_unlock(); > + return err; > +} > + > +static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec) > +{ > + struct gdma_context *gc = pci_get_drvdata(pdev); > + struct gdma_irq_context *gic; > + int start_irq_index = 1; > + unsigned long flags; > + unsigned int cpu; > + int *irqs, irq; > + int err, i = 0, j; > + > + cpus_read_lock(); > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > + > if (nvec <= num_online_cpus()) > start_irq_index = 0; > > @@ -1349,15 +1467,12 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) > goto free_irq_vector; > } > > - gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context), > - GFP_KERNEL); > - if (!gc->irq_contexts) { > - err = -ENOMEM; > - goto free_irq_array; > - } > - > for (i = 0; i < nvec; i++) { > - gic = &gc->irq_contexts[i]; > + gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL); > + if (!gic) { > + err = -ENOMEM; > + goto free_irq; > + } > gic->handler = mana_gd_process_eq_events; > INIT_LIST_HEAD(&gic->eq_list); > spin_lock_init(&gic->lock); > @@ -1372,22 +1487,14 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) > irq = pci_irq_vector(pdev, i); > if (irq < 0) { > err = irq; > - goto free_irq; > + goto free_current_gic; > } > > if (!i) { > err = request_irq(irq, mana_gd_intr, 0, gic->name, gic); > if (err) > - goto free_irq; > - > - /* If number of IRQ is one extra than number of online CPUs, > - * then we need to assign IRQ0 (hwc irq) and IRQ1 to > - * same CPU. > - * Else we will use different CPUs for IRQ0 and IRQ1. > - * Also we are using cpumask_local_spread instead of > - * cpumask_first for the node, because the node can be > - * mem only. > - */ > + goto free_current_gic; > + > if (start_irq_index) { > cpu = cpumask_local_spread(i, gc->numa_node); > irq_set_affinity_and_hint(irq, cpumask_of(cpu)); > @@ -1399,36 +1506,104 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) > err = request_irq(irqs[i - start_irq_index], mana_gd_intr, 0, > gic->name, gic); > if (err) > - goto free_irq; > + goto free_current_gic; > } > + > + list_add_tail(&gic->gic_list, &gc->irq_contexts); > } > > - err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node); > + err = irq_setup(irqs, nvec - start_irq_index, gc->numa_node, 0); > if (err) > goto free_irq; > > - gc->max_num_msix = nvec; > - gc->num_msix_usable = nvec; > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > cpus_read_unlock(); > kfree(irqs); > return 0; > > +free_current_gic: > + kfree(gic); > free_irq: > for (j = i - 1; j >= 0; j--) { > irq = pci_irq_vector(pdev, j); > - gic = &gc->irq_contexts[j]; > - > + gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list); > irq_update_affinity_hint(irq, NULL); > free_irq(irq, gic); > + list_del(&gic->gic_list); > + kfree(gic); > } > - > - kfree(gc->irq_contexts); > - gc->irq_contexts = NULL; > -free_irq_array: > kfree(irqs); > free_irq_vector: > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > cpus_read_unlock(); > - pci_free_irq_vectors(pdev); > + return err; > +} > + > +static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev) > +{ > + struct gdma_context *gc = pci_get_drvdata(pdev); > + unsigned int max_irqs, min_irqs; > + int max_queues_per_port; > + int nvec, err; > + > + if (pci_msix_can_alloc_dyn(pdev)) { > + max_irqs = 1; > + min_irqs = 1; > + } else { > + max_queues_per_port = num_online_cpus(); > + if (max_queues_per_port > MANA_MAX_NUM_QUEUES) > + max_queues_per_port = MANA_MAX_NUM_QUEUES; > + /* Need 1 interrupt for the Hardware communication Channel (HWC) */ > + max_irqs = max_queues_per_port + 1; > + min_irqs = 2; > + } > + > + nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX); > + if (nvec < 0) > + return nvec; > + > + err = mana_gd_setup_irqs(pdev, nvec); > + if (err) { > + pci_free_irq_vectors(pdev); > + return err; > + } > + > + gc->num_msix_usable = nvec; > + gc->max_num_msix = nvec; > + > + return err; > +} > + > +static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev) > +{ > + struct gdma_context *gc = pci_get_drvdata(pdev); > + int max_irqs, i, err = 0; > + struct msi_map irq_map; > + > + if (!pci_msix_can_alloc_dyn(pdev)) > + /* remain irqs are already allocated with HWC IRQ */ > + return 0; > + > + /* allocate only remaining IRQs*/ > + max_irqs = gc->num_msix_usable - 1; > + > + for (i = 1; i <= max_irqs; i++) { > + irq_map = pci_msix_alloc_irq_at(pdev, i, NULL); > + if (!irq_map.virq) { > + err = irq_map.index; > + /* caller will handle cleaning up all allocated > + * irqs, after HWC is destroyed > + */ > + return err; > + } > + } > + > + err = mana_gd_setup_dyn_irqs(pdev, max_irqs); > + if (err) > + return err; > + > + gc->max_num_msix = gc->max_num_msix + max_irqs; > + > return err; > } > > @@ -1436,29 +1611,34 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev) > { > struct gdma_context *gc = pci_get_drvdata(pdev); > struct gdma_irq_context *gic; > - int irq, i; > + struct list_head *pos, *n; > + unsigned long flags; > + int irq, i = 0; > > if (gc->max_num_msix < 1) > return; > > - for (i = 0; i < gc->max_num_msix; i++) { > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > + list_for_each_safe(pos, n, &gc->irq_contexts) { > irq = pci_irq_vector(pdev, i); > if (irq < 0) > continue; > > - gic = &gc->irq_contexts[i]; > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > /* Need to clear the hint before free_irq */ > irq_update_affinity_hint(irq, NULL); > free_irq(irq, gic); > + list_del(pos); > + kfree(gic); > + i++; > } > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > pci_free_irq_vectors(pdev); > > gc->max_num_msix = 0; > gc->num_msix_usable = 0; > - kfree(gc->irq_contexts); > - gc->irq_contexts = NULL; > } > > static int mana_gd_setup(struct pci_dev *pdev) > @@ -1469,9 +1649,9 @@ static int mana_gd_setup(struct pci_dev *pdev) > mana_gd_init_registers(pdev); > mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base); > > - err = mana_gd_setup_irqs(pdev); > + err = mana_gd_setup_hwc_irqs(pdev); > if (err) { > - dev_err(gc->dev, "Failed to setup IRQs: %d\n", err); > + dev_err(gc->dev, "Failed to setup IRQs for HWC creation: %d\n", err); > return err; > } > > @@ -1487,6 +1667,10 @@ static int mana_gd_setup(struct pci_dev *pdev) > if (err) > goto destroy_hwc; > > + err = mana_gd_setup_remaining_irqs(pdev); > + if (err) > + goto destroy_hwc; > + > err = mana_gd_detect_devices(pdev); > if (err) > goto destroy_hwc; > @@ -1563,6 +1747,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > gc->is_pf = mana_is_pf(pdev->device); > gc->bar0_va = bar0_va; > gc->dev = &pdev->dev; > + INIT_LIST_HEAD(&gc->irq_contexts); > + spin_lock_init(&gc->irq_ctxs_lock); > > if (gc->is_pf) > gc->mana_pci_debugfs = debugfs_create_dir("0", mana_debugfs_root); > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h > index 228603bf03f2..eae38d7302fe 100644 > --- a/include/net/mana/gdma.h > +++ b/include/net/mana/gdma.h > @@ -363,6 +363,7 @@ struct gdma_irq_context { > spinlock_t lock; > struct list_head eq_list; > char name[MANA_IRQ_NAME_SZ]; > + struct list_head gic_list; > }; > > struct gdma_context { > @@ -373,7 +374,9 @@ struct gdma_context { > unsigned int max_num_queues; > unsigned int max_num_msix; > unsigned int num_msix_usable; > - struct gdma_irq_context *irq_contexts; > + struct list_head irq_contexts; > + /* Protect the irq_contexts list */ > + spinlock_t irq_ctxs_lock; > > /* L2 MTU */ > u16 adapter_mtu; > -- > 2.34.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-16 17:22 ` Yury Norov @ 2025-04-17 7:32 ` Shradha Gupta 2025-04-22 12:09 ` Shradha Gupta 1 sibling, 0 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-17 7:32 UTC (permalink / raw) To: Yury Norov Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczy??ski, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta On Wed, Apr 16, 2025 at 01:22:04PM -0400, Yury Norov wrote: > On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote: > > Currently, the MANA driver allocates pci vector statically based on > > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends > > up allocating more vectors than it needs. > > This is because, by this time we do not have a HW channel and do not know > > how many IRQs should be allocated. > > To avoid this, we allocate 1 IRQ vector during the creation of HWC and > > after getting the value supported by hardware, dynamically add the > > remaining vectors. > > > > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> > > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> > > --- > > .../net/ethernet/microsoft/mana/gdma_main.c | 306 ++++++++++++++---- > > include/net/mana/gdma.h | 5 +- > > 2 files changed, 250 insertions(+), 61 deletions(-) > > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > index 4ffaf7588885..3e3b5854b736 100644 > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > @@ -6,6 +6,9 @@ > > #include <linux/pci.h> > > #include <linux/utsname.h> > > #include <linux/version.h> > > +#include <linux/msi.h> > > +#include <linux/irqdomain.h> > > +#include <linux/list.h> > > > > #include <net/mana/mana.h> > > > > @@ -80,8 +83,15 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev) > > return err ? err : -EPROTO; > > } > > > > - if (gc->num_msix_usable > resp.max_msix) > > - gc->num_msix_usable = resp.max_msix; > > + if (!pci_msix_can_alloc_dyn(pdev)) { > > + if (gc->num_msix_usable > resp.max_msix) > > + gc->num_msix_usable = resp.max_msix; > > + } else { > > + /* If dynamic allocation is enabled we have already allocated > > + * hwc msi > > + */ > > + gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1); > > + } > > > > if (gc->num_msix_usable <= 1) > > return -ENOSPC; > > @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue, > > struct gdma_irq_context *gic; > > struct gdma_context *gc; > > unsigned int msi_index; > > - unsigned long flags; > > + struct list_head *pos; > > + unsigned long flags, flag_irq; > > struct device *dev; > > - int err = 0; > > + int err = 0, count; > > > > gc = gd->gdma_context; > > dev = gc->dev; > > @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue, > > } > > > > queue->eq.msix_index = msi_index; > > - gic = &gc->irq_contexts[msi_index]; > > + > > + /* get the msi_index value from the list*/ > > + count = 0; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > > + list_for_each(pos, &gc->irq_contexts) { > > + if (count == msi_index) { > > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > + break; > > + } > > + > > + count++; > > + } > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > > + > > + if (!gic) > > + return -1; > > > > spin_lock_irqsave(&gic->lock, flags); > > list_add_rcu(&queue->entry, &gic->eq_list); > > @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) > > struct gdma_irq_context *gic; > > struct gdma_context *gc; > > unsigned int msix_index; > > - unsigned long flags; > > + struct list_head *pos; > > + unsigned long flags, flag_irq; > > struct gdma_queue *eq; > > + int count; > > > > gc = gd->gdma_context; > > > > @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) > > if (WARN_ON(msix_index >= gc->num_msix_usable)) > > return; > > > > - gic = &gc->irq_contexts[msix_index]; > > + /* get the msi_index value from the list*/ > > + count = 0; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > > + list_for_each(pos, &gc->irq_contexts) { > > + if (count == msix_index) { > > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > + break; > > + } > > + > > + count++; > > + } > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > > + > > + if (!gic) > > + return; > > + > > spin_lock_irqsave(&gic->lock, flags); > > list_for_each_entry_rcu(eq, &gic->eq_list, entry) { > > if (queue == eq) { > > @@ -1288,11 +1331,11 @@ void mana_gd_free_res_map(struct gdma_resource *r) > > r->size = 0; > > } > > > > -static int irq_setup(unsigned int *irqs, unsigned int len, int node) > > +static int irq_setup(unsigned int *irqs, unsigned int len, int node, int skip_cpu) > > { > > const struct cpumask *next, *prev = cpu_none_mask; > > cpumask_var_t cpus __free(free_cpumask_var); > > - int cpu, weight; > > + int cpu, weight, i = 0; > > > > if (!alloc_cpumask_var(&cpus, GFP_KERNEL)) > > return -ENOMEM; > > @@ -1303,9 +1346,21 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node) > > while (weight > 0) { > > cpumask_andnot(cpus, next, prev); > > for_each_cpu(cpu, cpus) { > > + /* If the call is made for irqs which are dynamically > > + * added and the num of vcpus is more or equal to > > + * allocated msix, we need to skip the first > > + * set of cpus, since they are already affinitized > > Can you replace the 'set of cpus' with a 'sibling group'? > > > + * to HWC IRQ > > + */ > > This comment should not be here. This is a helper function. User may > want to skip 1st CPU for whatever reason. Please put the comment in > mana_gd_setup_dyn_irqs(). > > > + if (skip_cpu && !i) { > > + i = 1; > > + goto next_cpumask; > > + } > > The 'skip_cpu' variable should be a boolean, and has more a specific > name. And you don't need the local 'i' to implement your logic: > > if (unlikely(skip_first_cpu)) { > skip_first_cpu = false; > goto next_sibling; > } > > > if (len-- == 0) > > goto done; > > This check should go before the one you're adding here. Thank you for all the comments, Yury. I will fix these. Regards, Shradha. > > > + > > irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu)); > > +next_cpumask: > > cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu)); > > --weight; > > } > > @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node) > > return 0; > > } > > > > -static int mana_gd_setup_irqs(struct pci_dev *pdev) > > +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) > > { > > struct gdma_context *gc = pci_get_drvdata(pdev); > > - unsigned int max_queues_per_port; > > struct gdma_irq_context *gic; > > - unsigned int max_irqs, cpu; > > - int start_irq_index = 1; > > - int nvec, *irqs, irq; > > + int *irqs, irq, skip_first_cpu = 0; > > + unsigned long flags; > > int err, i = 0, j; > > > > cpus_read_lock(); > > - max_queues_per_port = num_online_cpus(); > > - if (max_queues_per_port > MANA_MAX_NUM_QUEUES) > > - max_queues_per_port = MANA_MAX_NUM_QUEUES; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > > + irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL); > > + if (!irqs) { > > + err = -ENOMEM; > > + goto free_irq_vector; > > + } > > > > - /* Need 1 interrupt for the Hardware communication Channel (HWC) */ > > - max_irqs = max_queues_per_port + 1; > > + for (i = 0; i < nvec; i++) { > > + gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL); > > + if (!gic) { > > + err = -ENOMEM; > > + goto free_irq; > > + } > > + gic->handler = mana_gd_process_eq_events; > > + INIT_LIST_HEAD(&gic->eq_list); > > + spin_lock_init(&gic->lock); > > > > - nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX); > > - if (nvec < 0) { > > - cpus_read_unlock(); > > - return nvec; > > + snprintf(gic->name, MANA_IRQ_NAME_SZ, "mana_q%d@pci:%s", > > + i, pci_name(pdev)); > > + > > + /* one pci vector is already allocated for HWC */ > > + irqs[i] = pci_irq_vector(pdev, i + 1); > > + if (irqs[i] < 0) { > > + err = irqs[i]; > > + goto free_current_gic; > > + } > > + > > + err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic); > > + if (err) > > + goto free_current_gic; > > + > > + list_add_tail(&gic->gic_list, &gc->irq_contexts); > > + } > > + > > + if (gc->num_msix_usable <= num_online_cpus()) > > + skip_first_cpu = 1; > > + > > + err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu); > > + if (err) > > + goto free_irq; > > + > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > + cpus_read_unlock(); > > + kfree(irqs); > > + return 0; > > + > > +free_current_gic: > > + kfree(gic); > > +free_irq: > > + for (j = i - 1; j >= 0; j--) { > > + irq = pci_irq_vector(pdev, j + 1); > > + gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list); > > + irq_update_affinity_hint(irq, NULL); > > + free_irq(irq, gic); > > + list_del(&gic->gic_list); > > + kfree(gic); > > } > > + kfree(irqs); > > +free_irq_vector: > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > + cpus_read_unlock(); > > + return err; > > +} > > + > > +static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec) > > +{ > > + struct gdma_context *gc = pci_get_drvdata(pdev); > > + struct gdma_irq_context *gic; > > + int start_irq_index = 1; > > + unsigned long flags; > > + unsigned int cpu; > > + int *irqs, irq; > > + int err, i = 0, j; > > + > > + cpus_read_lock(); > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > > + > > if (nvec <= num_online_cpus()) > > start_irq_index = 0; > > > > @@ -1349,15 +1467,12 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) > > goto free_irq_vector; > > } > > > > - gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context), > > - GFP_KERNEL); > > - if (!gc->irq_contexts) { > > - err = -ENOMEM; > > - goto free_irq_array; > > - } > > - > > for (i = 0; i < nvec; i++) { > > - gic = &gc->irq_contexts[i]; > > + gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL); > > + if (!gic) { > > + err = -ENOMEM; > > + goto free_irq; > > + } > > gic->handler = mana_gd_process_eq_events; > > INIT_LIST_HEAD(&gic->eq_list); > > spin_lock_init(&gic->lock); > > @@ -1372,22 +1487,14 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) > > irq = pci_irq_vector(pdev, i); > > if (irq < 0) { > > err = irq; > > - goto free_irq; > > + goto free_current_gic; > > } > > > > if (!i) { > > err = request_irq(irq, mana_gd_intr, 0, gic->name, gic); > > if (err) > > - goto free_irq; > > - > > - /* If number of IRQ is one extra than number of online CPUs, > > - * then we need to assign IRQ0 (hwc irq) and IRQ1 to > > - * same CPU. > > - * Else we will use different CPUs for IRQ0 and IRQ1. > > - * Also we are using cpumask_local_spread instead of > > - * cpumask_first for the node, because the node can be > > - * mem only. > > - */ > > + goto free_current_gic; > > + > > if (start_irq_index) { > > cpu = cpumask_local_spread(i, gc->numa_node); > > irq_set_affinity_and_hint(irq, cpumask_of(cpu)); > > @@ -1399,36 +1506,104 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) > > err = request_irq(irqs[i - start_irq_index], mana_gd_intr, 0, > > gic->name, gic); > > if (err) > > - goto free_irq; > > + goto free_current_gic; > > } > > + > > + list_add_tail(&gic->gic_list, &gc->irq_contexts); > > } > > > > - err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node); > > + err = irq_setup(irqs, nvec - start_irq_index, gc->numa_node, 0); > > if (err) > > goto free_irq; > > > > - gc->max_num_msix = nvec; > > - gc->num_msix_usable = nvec; > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > cpus_read_unlock(); > > kfree(irqs); > > return 0; > > > > +free_current_gic: > > + kfree(gic); > > free_irq: > > for (j = i - 1; j >= 0; j--) { > > irq = pci_irq_vector(pdev, j); > > - gic = &gc->irq_contexts[j]; > > - > > + gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, gic_list); > > irq_update_affinity_hint(irq, NULL); > > free_irq(irq, gic); > > + list_del(&gic->gic_list); > > + kfree(gic); > > } > > - > > - kfree(gc->irq_contexts); > > - gc->irq_contexts = NULL; > > -free_irq_array: > > kfree(irqs); > > free_irq_vector: > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > cpus_read_unlock(); > > - pci_free_irq_vectors(pdev); > > + return err; > > +} > > + > > +static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev) > > +{ > > + struct gdma_context *gc = pci_get_drvdata(pdev); > > + unsigned int max_irqs, min_irqs; > > + int max_queues_per_port; > > + int nvec, err; > > + > > + if (pci_msix_can_alloc_dyn(pdev)) { > > + max_irqs = 1; > > + min_irqs = 1; > > + } else { > > + max_queues_per_port = num_online_cpus(); > > + if (max_queues_per_port > MANA_MAX_NUM_QUEUES) > > + max_queues_per_port = MANA_MAX_NUM_QUEUES; > > + /* Need 1 interrupt for the Hardware communication Channel (HWC) */ > > + max_irqs = max_queues_per_port + 1; > > + min_irqs = 2; > > + } > > + > > + nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX); > > + if (nvec < 0) > > + return nvec; > > + > > + err = mana_gd_setup_irqs(pdev, nvec); > > + if (err) { > > + pci_free_irq_vectors(pdev); > > + return err; > > + } > > + > > + gc->num_msix_usable = nvec; > > + gc->max_num_msix = nvec; > > + > > + return err; > > +} > > + > > +static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev) > > +{ > > + struct gdma_context *gc = pci_get_drvdata(pdev); > > + int max_irqs, i, err = 0; > > + struct msi_map irq_map; > > + > > + if (!pci_msix_can_alloc_dyn(pdev)) > > + /* remain irqs are already allocated with HWC IRQ */ > > + return 0; > > + > > + /* allocate only remaining IRQs*/ > > + max_irqs = gc->num_msix_usable - 1; > > + > > + for (i = 1; i <= max_irqs; i++) { > > + irq_map = pci_msix_alloc_irq_at(pdev, i, NULL); > > + if (!irq_map.virq) { > > + err = irq_map.index; > > + /* caller will handle cleaning up all allocated > > + * irqs, after HWC is destroyed > > + */ > > + return err; > > + } > > + } > > + > > + err = mana_gd_setup_dyn_irqs(pdev, max_irqs); > > + if (err) > > + return err; > > + > > + gc->max_num_msix = gc->max_num_msix + max_irqs; > > + > > return err; > > } > > > > @@ -1436,29 +1611,34 @@ static void mana_gd_remove_irqs(struct pci_dev *pdev) > > { > > struct gdma_context *gc = pci_get_drvdata(pdev); > > struct gdma_irq_context *gic; > > - int irq, i; > > + struct list_head *pos, *n; > > + unsigned long flags; > > + int irq, i = 0; > > > > if (gc->max_num_msix < 1) > > return; > > > > - for (i = 0; i < gc->max_num_msix; i++) { > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > > + list_for_each_safe(pos, n, &gc->irq_contexts) { > > irq = pci_irq_vector(pdev, i); > > if (irq < 0) > > continue; > > > > - gic = &gc->irq_contexts[i]; > > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > > > /* Need to clear the hint before free_irq */ > > irq_update_affinity_hint(irq, NULL); > > free_irq(irq, gic); > > + list_del(pos); > > + kfree(gic); > > + i++; > > } > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > > > pci_free_irq_vectors(pdev); > > > > gc->max_num_msix = 0; > > gc->num_msix_usable = 0; > > - kfree(gc->irq_contexts); > > - gc->irq_contexts = NULL; > > } > > > > static int mana_gd_setup(struct pci_dev *pdev) > > @@ -1469,9 +1649,9 @@ static int mana_gd_setup(struct pci_dev *pdev) > > mana_gd_init_registers(pdev); > > mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base); > > > > - err = mana_gd_setup_irqs(pdev); > > + err = mana_gd_setup_hwc_irqs(pdev); > > if (err) { > > - dev_err(gc->dev, "Failed to setup IRQs: %d\n", err); > > + dev_err(gc->dev, "Failed to setup IRQs for HWC creation: %d\n", err); > > return err; > > } > > > > @@ -1487,6 +1667,10 @@ static int mana_gd_setup(struct pci_dev *pdev) > > if (err) > > goto destroy_hwc; > > > > + err = mana_gd_setup_remaining_irqs(pdev); > > + if (err) > > + goto destroy_hwc; > > + > > err = mana_gd_detect_devices(pdev); > > if (err) > > goto destroy_hwc; > > @@ -1563,6 +1747,8 @@ static int mana_gd_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > > gc->is_pf = mana_is_pf(pdev->device); > > gc->bar0_va = bar0_va; > > gc->dev = &pdev->dev; > > + INIT_LIST_HEAD(&gc->irq_contexts); > > + spin_lock_init(&gc->irq_ctxs_lock); > > > > if (gc->is_pf) > > gc->mana_pci_debugfs = debugfs_create_dir("0", mana_debugfs_root); > > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h > > index 228603bf03f2..eae38d7302fe 100644 > > --- a/include/net/mana/gdma.h > > +++ b/include/net/mana/gdma.h > > @@ -363,6 +363,7 @@ struct gdma_irq_context { > > spinlock_t lock; > > struct list_head eq_list; > > char name[MANA_IRQ_NAME_SZ]; > > + struct list_head gic_list; > > }; > > > > struct gdma_context { > > @@ -373,7 +374,9 @@ struct gdma_context { > > unsigned int max_num_queues; > > unsigned int max_num_msix; > > unsigned int num_msix_usable; > > - struct gdma_irq_context *irq_contexts; > > + struct list_head irq_contexts; > > + /* Protect the irq_contexts list */ > > + spinlock_t irq_ctxs_lock; > > > > /* L2 MTU */ > > u16 adapter_mtu; > > -- > > 2.34.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-16 17:22 ` Yury Norov 2025-04-17 7:32 ` Shradha Gupta @ 2025-04-22 12:09 ` Shradha Gupta 1 sibling, 0 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-22 12:09 UTC (permalink / raw) To: Yury Norov, Shradha Gupta Cc: linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Gupta, Nipun, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, KY Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, Paul Rosswurm > On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote: > > Currently, the MANA driver allocates pci vector statically based on > > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases > > ends up allocating more vectors than it needs. > > This is because, by this time we do not have a HW channel and do not > > know how many IRQs should be allocated. > > To avoid this, we allocate 1 IRQ vector during the creation of HWC and > > after getting the value supported by hardware, dynamically add the > > remaining vectors. > > > > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> > > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> > > --- > > .../net/ethernet/microsoft/mana/gdma_main.c | 306 ++++++++++++++---- > > include/net/mana/gdma.h | 5 +- > > 2 files changed, 250 insertions(+), 61 deletions(-) > > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > index 4ffaf7588885..3e3b5854b736 100644 > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > @@ -6,6 +6,9 @@ > > #include <linux/pci.h> > > #include <linux/utsname.h> > > #include <linux/version.h> > > +#include <linux/msi.h> > > +#include <linux/irqdomain.h> > > +#include <linux/list.h> > > > > #include <net/mana/mana.h> > > > > @@ -80,8 +83,15 @@ static int mana_gd_query_max_resources(struct pci_dev > *pdev) > > return err ? err : -EPROTO; > > } > > > > - if (gc->num_msix_usable > resp.max_msix) > > - gc->num_msix_usable = resp.max_msix; > > + if (!pci_msix_can_alloc_dyn(pdev)) { > > + if (gc->num_msix_usable > resp.max_msix) > > + gc->num_msix_usable = resp.max_msix; > > + } else { > > + /* If dynamic allocation is enabled we have already allocated > > + * hwc msi > > + */ > > + gc->num_msix_usable = min(resp.max_msix, num_online_cpus() > + 1); > > + } > > > > if (gc->num_msix_usable <= 1) > > return -ENOSPC; > > @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue > *queue, > > struct gdma_irq_context *gic; > > struct gdma_context *gc; > > unsigned int msi_index; > > - unsigned long flags; > > + struct list_head *pos; > > + unsigned long flags, flag_irq; > > struct device *dev; > > - int err = 0; > > + int err = 0, count; > > > > gc = gd->gdma_context; > > dev = gc->dev; > > @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue > *queue, > > } > > > > queue->eq.msix_index = msi_index; > > - gic = &gc->irq_contexts[msi_index]; > > + > > + /* get the msi_index value from the list*/ > > + count = 0; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > > + list_for_each(pos, &gc->irq_contexts) { > > + if (count == msi_index) { > > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > + break; > > + } > > + > > + count++; > > + } > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > > + > > + if (!gic) > > + return -1; > > > > spin_lock_irqsave(&gic->lock, flags); > > list_add_rcu(&queue->entry, &gic->eq_list); @@ -497,8 +523,10 @@ > > static void mana_gd_deregiser_irq(struct gdma_queue *queue) > > struct gdma_irq_context *gic; > > struct gdma_context *gc; > > unsigned int msix_index; > > - unsigned long flags; > > + struct list_head *pos; > > + unsigned long flags, flag_irq; > > struct gdma_queue *eq; > > + int count; > > > > gc = gd->gdma_context; > > > > @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct > gdma_queue *queue) > > if (WARN_ON(msix_index >= gc->num_msix_usable)) > > return; > > > > - gic = &gc->irq_contexts[msix_index]; > > + /* get the msi_index value from the list*/ > > + count = 0; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > > + list_for_each(pos, &gc->irq_contexts) { > > + if (count == msix_index) { > > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > + break; > > + } > > + > > + count++; > > + } > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > > + > > + if (!gic) > > + return; > > + > > spin_lock_irqsave(&gic->lock, flags); > > list_for_each_entry_rcu(eq, &gic->eq_list, entry) { > > if (queue == eq) { > > @@ -1288,11 +1331,11 @@ void mana_gd_free_res_map(struct > gdma_resource *r) > > r->size = 0; > > } > > > > -static int irq_setup(unsigned int *irqs, unsigned int len, int node) > > +static int irq_setup(unsigned int *irqs, unsigned int len, int node, > > +int skip_cpu) > > { > > const struct cpumask *next, *prev = cpu_none_mask; > > cpumask_var_t cpus __free(free_cpumask_var); > > - int cpu, weight; > > + int cpu, weight, i = 0; > > > > if (!alloc_cpumask_var(&cpus, GFP_KERNEL)) > > return -ENOMEM; > > @@ -1303,9 +1346,21 @@ static int irq_setup(unsigned int *irqs, unsigned int > len, int node) > > while (weight > 0) { > > cpumask_andnot(cpus, next, prev); > > for_each_cpu(cpu, cpus) { > > + /* If the call is made for irqs which are > dynamically > > + * added and the num of vcpus is more or equal > to > > + * allocated msix, we need to skip the first > > + * set of cpus, since they are already affinitized > > Can you replace the 'set of cpus' with a 'sibling group'? > > > + * to HWC IRQ > > + */ > > This comment should not be here. This is a helper function. User may want to skip > 1st CPU for whatever reason. Please put the comment in > mana_gd_setup_dyn_irqs(). > > > + if (skip_cpu && !i) { > > + i = 1; > > + goto next_cpumask; > > + } > > The 'skip_cpu' variable should be a boolean, and has more a specific name. And > you don't need the local 'i' to implement your logic: > > if (unlikely(skip_first_cpu)) { > skip_first_cpu = false; > goto next_sibling; > } > > > if (len-- == 0) > > goto done; > > This check should go before the one you're adding here. Hi Yury, While preparing for the v2 for this patch, I realized that the movement of this 'if(len-- == 0)' check above 'if(unlikely(skip_first_cpu))' is not needed as we are deliberately trying to skip 'len--' in the iteration where skip_first_cpu is true. Regards, Shradha > > > + > > irq_set_affinity_and_hint(*irqs++, > > topology_sibling_cpumask(cpu)); > > +next_cpumask: > > cpumask_andnot(cpus, cpus, > topology_sibling_cpumask(cpu)); > > --weight; > > } > > @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int > len, int node) > > return 0; > > } > > > > -static int mana_gd_setup_irqs(struct pci_dev *pdev) > > +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) > > { > > struct gdma_context *gc = pci_get_drvdata(pdev); > > - unsigned int max_queues_per_port; > > struct gdma_irq_context *gic; > > - unsigned int max_irqs, cpu; > > - int start_irq_index = 1; > > - int nvec, *irqs, irq; > > + int *irqs, irq, skip_first_cpu = 0; > > + unsigned long flags; > > int err, i = 0, j; > > > > cpus_read_lock(); > > - max_queues_per_port = num_online_cpus(); > > - if (max_queues_per_port > MANA_MAX_NUM_QUEUES) > > - max_queues_per_port = MANA_MAX_NUM_QUEUES; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > > + irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL); > > + if (!irqs) { > > + err = -ENOMEM; > > + goto free_irq_vector; > > + } > > > > - /* Need 1 interrupt for the Hardware communication Channel (HWC) */ > > - max_irqs = max_queues_per_port + 1; > > + for (i = 0; i < nvec; i++) { > > + gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL); > > + if (!gic) { > > + err = -ENOMEM; > > + goto free_irq; > > + } > > + gic->handler = mana_gd_process_eq_events; > > + INIT_LIST_HEAD(&gic->eq_list); > > + spin_lock_init(&gic->lock); > > > > - nvec = pci_alloc_irq_vectors(pdev, 2, max_irqs, PCI_IRQ_MSIX); > > - if (nvec < 0) { > > - cpus_read_unlock(); > > - return nvec; > > + snprintf(gic->name, MANA_IRQ_NAME_SZ, > "mana_q%d@pci:%s", > > + i, pci_name(pdev)); > > + > > + /* one pci vector is already allocated for HWC */ > > + irqs[i] = pci_irq_vector(pdev, i + 1); > > + if (irqs[i] < 0) { > > + err = irqs[i]; > > + goto free_current_gic; > > + } > > + > > + err = request_irq(irqs[i], mana_gd_intr, 0, gic->name, gic); > > + if (err) > > + goto free_current_gic; > > + > > + list_add_tail(&gic->gic_list, &gc->irq_contexts); > > + } > > + > > + if (gc->num_msix_usable <= num_online_cpus()) > > + skip_first_cpu = 1; > > + > > + err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu); > > + if (err) > > + goto free_irq; > > + > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > + cpus_read_unlock(); > > + kfree(irqs); > > + return 0; > > + > > +free_current_gic: > > + kfree(gic); > > +free_irq: > > + for (j = i - 1; j >= 0; j--) { > > + irq = pci_irq_vector(pdev, j + 1); > > + gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, > gic_list); > > + irq_update_affinity_hint(irq, NULL); > > + free_irq(irq, gic); > > + list_del(&gic->gic_list); > > + kfree(gic); > > } > > + kfree(irqs); > > +free_irq_vector: > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > + cpus_read_unlock(); > > + return err; > > +} > > + > > +static int mana_gd_setup_irqs(struct pci_dev *pdev, int nvec) { > > + struct gdma_context *gc = pci_get_drvdata(pdev); > > + struct gdma_irq_context *gic; > > + int start_irq_index = 1; > > + unsigned long flags; > > + unsigned int cpu; > > + int *irqs, irq; > > + int err, i = 0, j; > > + > > + cpus_read_lock(); > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > > + > > if (nvec <= num_online_cpus()) > > start_irq_index = 0; > > > > @@ -1349,15 +1467,12 @@ static int mana_gd_setup_irqs(struct pci_dev > *pdev) > > goto free_irq_vector; > > } > > > > - gc->irq_contexts = kcalloc(nvec, sizeof(struct gdma_irq_context), > > - GFP_KERNEL); > > - if (!gc->irq_contexts) { > > - err = -ENOMEM; > > - goto free_irq_array; > > - } > > - > > for (i = 0; i < nvec; i++) { > > - gic = &gc->irq_contexts[i]; > > + gic = kcalloc(1, sizeof(struct gdma_irq_context), GFP_KERNEL); > > + if (!gic) { > > + err = -ENOMEM; > > + goto free_irq; > > + } > > gic->handler = mana_gd_process_eq_events; > > INIT_LIST_HEAD(&gic->eq_list); > > spin_lock_init(&gic->lock); > > @@ -1372,22 +1487,14 @@ static int mana_gd_setup_irqs(struct pci_dev > *pdev) > > irq = pci_irq_vector(pdev, i); > > if (irq < 0) { > > err = irq; > > - goto free_irq; > > + goto free_current_gic; > > } > > > > if (!i) { > > err = request_irq(irq, mana_gd_intr, 0, gic->name, gic); > > if (err) > > - goto free_irq; > > - > > - /* If number of IRQ is one extra than number of online > CPUs, > > - * then we need to assign IRQ0 (hwc irq) and IRQ1 to > > - * same CPU. > > - * Else we will use different CPUs for IRQ0 and IRQ1. > > - * Also we are using cpumask_local_spread instead of > > - * cpumask_first for the node, because the node can be > > - * mem only. > > - */ > > + goto free_current_gic; > > + > > if (start_irq_index) { > > cpu = cpumask_local_spread(i, gc- > >numa_node); > > irq_set_affinity_and_hint(irq, cpumask_of(cpu)); > @@ -1399,36 > > +1506,104 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev) > > err = request_irq(irqs[i - start_irq_index], mana_gd_intr, > 0, > > gic->name, gic); > > if (err) > > - goto free_irq; > > + goto free_current_gic; > > } > > + > > + list_add_tail(&gic->gic_list, &gc->irq_contexts); > > } > > > > - err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node); > > + err = irq_setup(irqs, nvec - start_irq_index, gc->numa_node, 0); > > if (err) > > goto free_irq; > > > > - gc->max_num_msix = nvec; > > - gc->num_msix_usable = nvec; > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > cpus_read_unlock(); > > kfree(irqs); > > return 0; > > > > +free_current_gic: > > + kfree(gic); > > free_irq: > > for (j = i - 1; j >= 0; j--) { > > irq = pci_irq_vector(pdev, j); > > - gic = &gc->irq_contexts[j]; > > - > > + gic = list_last_entry(&gc->irq_contexts, struct gdma_irq_context, > > +gic_list); > > irq_update_affinity_hint(irq, NULL); > > free_irq(irq, gic); > > + list_del(&gic->gic_list); > > + kfree(gic); > > } > > - > > - kfree(gc->irq_contexts); > > - gc->irq_contexts = NULL; > > -free_irq_array: > > kfree(irqs); > > free_irq_vector: > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > cpus_read_unlock(); > > - pci_free_irq_vectors(pdev); > > + return err; > > +} > > + > > +static int mana_gd_setup_hwc_irqs(struct pci_dev *pdev) { > > + struct gdma_context *gc = pci_get_drvdata(pdev); > > + unsigned int max_irqs, min_irqs; > > + int max_queues_per_port; > > + int nvec, err; > > + > > + if (pci_msix_can_alloc_dyn(pdev)) { > > + max_irqs = 1; > > + min_irqs = 1; > > + } else { > > + max_queues_per_port = num_online_cpus(); > > + if (max_queues_per_port > MANA_MAX_NUM_QUEUES) > > + max_queues_per_port = MANA_MAX_NUM_QUEUES; > > + /* Need 1 interrupt for the Hardware communication Channel > (HWC) */ > > + max_irqs = max_queues_per_port + 1; > > + min_irqs = 2; > > + } > > + > > + nvec = pci_alloc_irq_vectors(pdev, min_irqs, max_irqs, PCI_IRQ_MSIX); > > + if (nvec < 0) > > + return nvec; > > + > > + err = mana_gd_setup_irqs(pdev, nvec); > > + if (err) { > > + pci_free_irq_vectors(pdev); > > + return err; > > + } > > + > > + gc->num_msix_usable = nvec; > > + gc->max_num_msix = nvec; > > + > > + return err; > > +} > > + > > +static int mana_gd_setup_remaining_irqs(struct pci_dev *pdev) { > > + struct gdma_context *gc = pci_get_drvdata(pdev); > > + int max_irqs, i, err = 0; > > + struct msi_map irq_map; > > + > > + if (!pci_msix_can_alloc_dyn(pdev)) > > + /* remain irqs are already allocated with HWC IRQ */ > > + return 0; > > + > > + /* allocate only remaining IRQs*/ > > + max_irqs = gc->num_msix_usable - 1; > > + > > + for (i = 1; i <= max_irqs; i++) { > > + irq_map = pci_msix_alloc_irq_at(pdev, i, NULL); > > + if (!irq_map.virq) { > > + err = irq_map.index; > > + /* caller will handle cleaning up all allocated > > + * irqs, after HWC is destroyed > > + */ > > + return err; > > + } > > + } > > + > > + err = mana_gd_setup_dyn_irqs(pdev, max_irqs); > > + if (err) > > + return err; > > + > > + gc->max_num_msix = gc->max_num_msix + max_irqs; > > + > > return err; > > } > > > > @@ -1436,29 +1611,34 @@ static void mana_gd_remove_irqs(struct pci_dev > > *pdev) { > > struct gdma_context *gc = pci_get_drvdata(pdev); > > struct gdma_irq_context *gic; > > - int irq, i; > > + struct list_head *pos, *n; > > + unsigned long flags; > > + int irq, i = 0; > > > > if (gc->max_num_msix < 1) > > return; > > > > - for (i = 0; i < gc->max_num_msix; i++) { > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > > + list_for_each_safe(pos, n, &gc->irq_contexts) { > > irq = pci_irq_vector(pdev, i); > > if (irq < 0) > > continue; > > > > - gic = &gc->irq_contexts[i]; > > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > > > /* Need to clear the hint before free_irq */ > > irq_update_affinity_hint(irq, NULL); > > free_irq(irq, gic); > > + list_del(pos); > > + kfree(gic); > > + i++; > > } > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flags); > > > > pci_free_irq_vectors(pdev); > > > > gc->max_num_msix = 0; > > gc->num_msix_usable = 0; > > - kfree(gc->irq_contexts); > > - gc->irq_contexts = NULL; > > } > > > > static int mana_gd_setup(struct pci_dev *pdev) @@ -1469,9 +1649,9 @@ > > static int mana_gd_setup(struct pci_dev *pdev) > > mana_gd_init_registers(pdev); > > mana_smc_init(&gc->shm_channel, gc->dev, gc->shm_base); > > > > - err = mana_gd_setup_irqs(pdev); > > + err = mana_gd_setup_hwc_irqs(pdev); > > if (err) { > > - dev_err(gc->dev, "Failed to setup IRQs: %d\n", err); > > + dev_err(gc->dev, "Failed to setup IRQs for HWC creation: %d\n", > > +err); > > return err; > > } > > > > @@ -1487,6 +1667,10 @@ static int mana_gd_setup(struct pci_dev *pdev) > > if (err) > > goto destroy_hwc; > > > > + err = mana_gd_setup_remaining_irqs(pdev); > > + if (err) > > + goto destroy_hwc; > > + > > err = mana_gd_detect_devices(pdev); > > if (err) > > goto destroy_hwc; > > @@ -1563,6 +1747,8 @@ static int mana_gd_probe(struct pci_dev *pdev, > const struct pci_device_id *ent) > > gc->is_pf = mana_is_pf(pdev->device); > > gc->bar0_va = bar0_va; > > gc->dev = &pdev->dev; > > + INIT_LIST_HEAD(&gc->irq_contexts); > > + spin_lock_init(&gc->irq_ctxs_lock); > > > > if (gc->is_pf) > > gc->mana_pci_debugfs = debugfs_create_dir("0", > mana_debugfs_root); > > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h index > > 228603bf03f2..eae38d7302fe 100644 > > --- a/include/net/mana/gdma.h > > +++ b/include/net/mana/gdma.h > > @@ -363,6 +363,7 @@ struct gdma_irq_context { > > spinlock_t lock; > > struct list_head eq_list; > > char name[MANA_IRQ_NAME_SZ]; > > + struct list_head gic_list; > > }; > > > > struct gdma_context { > > @@ -373,7 +374,9 @@ struct gdma_context { > > unsigned int max_num_queues; > > unsigned int max_num_msix; > > unsigned int num_msix_usable; > > - struct gdma_irq_context *irq_contexts; > > + struct list_head irq_contexts; > > + /* Protect the irq_contexts list */ > > + spinlock_t irq_ctxs_lock; > > > > /* L2 MTU */ > > u16 adapter_mtu; > > -- > > 2.34.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta 2025-04-16 17:22 ` Yury Norov @ 2025-04-16 18:32 ` Bjorn Helgaas 2025-04-17 7:33 ` Shradha Gupta 2025-04-24 16:57 ` Simon Horman 2025-05-07 16:28 ` kernel test robot 3 siblings, 1 reply; 15+ messages in thread From: Bjorn Helgaas @ 2025-04-16 18:32 UTC (permalink / raw) To: Shradha Gupta Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote: > Currently, the MANA driver allocates pci vector statically based on > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends > up allocating more vectors than it needs. > This is because, by this time we do not have a HW channel and do not know > how many IRQs should be allocated. > To avoid this, we allocate 1 IRQ vector during the creation of HWC and > after getting the value supported by hardware, dynamically add the > remaining vectors. Add blank lines between paragraphs. s/pci vector/MSI-X vectors/ Maybe also update subject to mention "MSI-X vectors" instead of "PCI vector", since "PCI vector" is not really a common term. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-16 18:32 ` Bjorn Helgaas @ 2025-04-17 7:33 ` Shradha Gupta 0 siblings, 0 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-17 7:33 UTC (permalink / raw) To: Bjorn Helgaas Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczy??ski, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta On Wed, Apr 16, 2025 at 01:32:49PM -0500, Bjorn Helgaas wrote: > On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote: > > Currently, the MANA driver allocates pci vector statically based on > > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends > > up allocating more vectors than it needs. > > This is because, by this time we do not have a HW channel and do not know > > how many IRQs should be allocated. > > To avoid this, we allocate 1 IRQ vector during the creation of HWC and > > after getting the value supported by hardware, dynamically add the > > remaining vectors. > > Add blank lines between paragraphs. > > s/pci vector/MSI-X vectors/ > > Maybe also update subject to mention "MSI-X vectors" instead of "PCI > vector", since "PCI vector" is not really a common term. Understood, thanks Bjorn. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta 2025-04-16 17:22 ` Yury Norov 2025-04-16 18:32 ` Bjorn Helgaas @ 2025-04-24 16:57 ` Simon Horman 2025-04-25 7:29 ` Shradha Gupta 2025-05-07 16:28 ` kernel test robot 3 siblings, 1 reply; 15+ messages in thread From: Simon Horman @ 2025-04-24 16:57 UTC (permalink / raw) To: Shradha Gupta Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote: > Currently, the MANA driver allocates pci vector statically based on > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends > up allocating more vectors than it needs. > This is because, by this time we do not have a HW channel and do not know > how many IRQs should be allocated. > To avoid this, we allocate 1 IRQ vector during the creation of HWC and > after getting the value supported by hardware, dynamically add the > remaining vectors. > > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Hi Shradha, Some minor nits from my side. ... > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c ... > @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue, > struct gdma_irq_context *gic; > struct gdma_context *gc; > unsigned int msi_index; > - unsigned long flags; > + struct list_head *pos; > + unsigned long flags, flag_irq; > struct device *dev; > - int err = 0; > + int err = 0, count; As this is Networking code, please preserve the arrangement of local variables in reverse xmas tree order - longest line to shortest. Edward Cree's tool can be useful in this area: https://github.com/ecree-solarflare/xmastree > > gc = gd->gdma_context; > dev = gc->dev; > @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue, > } > > queue->eq.msix_index = msi_index; > - gic = &gc->irq_contexts[msi_index]; > + > + /* get the msi_index value from the list*/ > + count = 0; > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > + list_for_each(pos, &gc->irq_contexts) { > + if (count == msi_index) { > + gic = list_entry(pos, struct gdma_irq_context, gic_list); Please consider line wrapping to 80 columns or less, as is still preferred in Networking code. Likewise elsewhere in this patch. checkpatch.pl --max-line-length=80 can be helpful here. > + break; > + } > + > + count++; > + } > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > + > + if (!gic) > + return -1; > > spin_lock_irqsave(&gic->lock, flags); > list_add_rcu(&queue->entry, &gic->eq_list); > @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) > struct gdma_irq_context *gic; > struct gdma_context *gc; > unsigned int msix_index; > - unsigned long flags; > + struct list_head *pos; > + unsigned long flags, flag_irq; > struct gdma_queue *eq; > + int count; Reverse xmas tree here too. > > gc = gd->gdma_context; > > @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) > if (WARN_ON(msix_index >= gc->num_msix_usable)) > return; > > - gic = &gc->irq_contexts[msix_index]; > + /* get the msi_index value from the list*/ > + count = 0; > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > + list_for_each(pos, &gc->irq_contexts) { > + if (count == msix_index) { > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > + break; > + } > + > + count++; > + } > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > + Does gic need to be initialised to NULL before the list_for_each loop to ensure that it is always initialised here? Flagged by Clang 20.1.2 KCFLAGS=-Wsometimes-uninitialized builds, and Smatch > + if (!gic) > + return; > + > spin_lock_irqsave(&gic->lock, flags); > list_for_each_entry_rcu(eq, &gic->eq_list, entry) { > if (queue == eq) { ... > @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node) > return 0; > } > > -static int mana_gd_setup_irqs(struct pci_dev *pdev) > +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) > { > struct gdma_context *gc = pci_get_drvdata(pdev); > - unsigned int max_queues_per_port; > struct gdma_irq_context *gic; > - unsigned int max_irqs, cpu; > - int start_irq_index = 1; > - int nvec, *irqs, irq; > + int *irqs, irq, skip_first_cpu = 0; > + unsigned long flags; > int err, i = 0, j; Reverse xmas tree here too. > > cpus_read_lock(); > - max_queues_per_port = num_online_cpus(); > - if (max_queues_per_port > MANA_MAX_NUM_QUEUES) > - max_queues_per_port = MANA_MAX_NUM_QUEUES; > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > + irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL); > + if (!irqs) { > + err = -ENOMEM; > + goto free_irq_vector; > + } ... ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-24 16:57 ` Simon Horman @ 2025-04-25 7:29 ` Shradha Gupta 0 siblings, 0 replies; 15+ messages in thread From: Shradha Gupta @ 2025-04-25 7:29 UTC (permalink / raw) To: Simon Horman Cc: linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczy??ski, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Leon Romanovsky, Maxim Levitsky, Erni Sri Satya Vennela, Peter Zijlstra, netdev, linux-rdma, Paul Rosswurm, Shradha Gupta On Thu, Apr 24, 2025 at 05:57:43PM +0100, Simon Horman wrote: > On Wed, Apr 16, 2025 at 08:36:21AM -0700, Shradha Gupta wrote: > > Currently, the MANA driver allocates pci vector statically based on > > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends > > up allocating more vectors than it needs. > > This is because, by this time we do not have a HW channel and do not know > > how many IRQs should be allocated. > > To avoid this, we allocate 1 IRQ vector during the creation of HWC and > > after getting the value supported by hardware, dynamically add the > > remaining vectors. > > > > Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> > > Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> > > Hi Shradha, > > Some minor nits from my side. > > ... > > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > ... > > > @@ -465,9 +475,10 @@ static int mana_gd_register_irq(struct gdma_queue *queue, > > struct gdma_irq_context *gic; > > struct gdma_context *gc; > > unsigned int msi_index; > > - unsigned long flags; > > + struct list_head *pos; > > + unsigned long flags, flag_irq; > > struct device *dev; > > - int err = 0; > > + int err = 0, count; > > As this is Networking code, please preserve the arrangement of local > variables in reverse xmas tree order - longest line to shortest. > > Edward Cree's tool can be useful in this area: > https://github.com/ecree-solarflare/xmastree > > > > > gc = gd->gdma_context; > > dev = gc->dev; > > @@ -482,7 +493,22 @@ static int mana_gd_register_irq(struct gdma_queue *queue, > > } > > > > queue->eq.msix_index = msi_index; > > - gic = &gc->irq_contexts[msi_index]; > > + > > + /* get the msi_index value from the list*/ > > + count = 0; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > > + list_for_each(pos, &gc->irq_contexts) { > > + if (count == msi_index) { > > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > Please consider line wrapping to 80 columns or less, as is still preferred > in Networking code. > > Likewise elsewhere in this patch. > > checkpatch.pl --max-line-length=80 > can be helpful here. > > > + break; > > + } > > + > > + count++; > > + } > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > > + > > + if (!gic) > > + return -1; > > > > spin_lock_irqsave(&gic->lock, flags); > > list_add_rcu(&queue->entry, &gic->eq_list); > > @@ -497,8 +523,10 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) > > struct gdma_irq_context *gic; > > struct gdma_context *gc; > > unsigned int msix_index; > > - unsigned long flags; > > + struct list_head *pos; > > + unsigned long flags, flag_irq; > > struct gdma_queue *eq; > > + int count; > > Reverse xmas tree here too. > > > > > gc = gd->gdma_context; > > > > @@ -507,7 +535,22 @@ static void mana_gd_deregiser_irq(struct gdma_queue *queue) > > if (WARN_ON(msix_index >= gc->num_msix_usable)) > > return; > > > > - gic = &gc->irq_contexts[msix_index]; > > + /* get the msi_index value from the list*/ > > + count = 0; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > > + list_for_each(pos, &gc->irq_contexts) { > > + if (count == msix_index) { > > + gic = list_entry(pos, struct gdma_irq_context, gic_list); > > + break; > > + } > > + > > + count++; > > + } > > + spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); > > + > > Does gic need to be initialised to NULL before the list_for_each loop > to ensure that it is always initialised here? > > Flagged by Clang 20.1.2 KCFLAGS=-Wsometimes-uninitialized builds, and Smatch > > > + if (!gic) > > + return; > > + > > spin_lock_irqsave(&gic->lock, flags); > > list_for_each_entry_rcu(eq, &gic->eq_list, entry) { > > if (queue == eq) { > > ... > > > @@ -1317,29 +1372,92 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node) > > return 0; > > } > > > > -static int mana_gd_setup_irqs(struct pci_dev *pdev) > > +static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec) > > { > > struct gdma_context *gc = pci_get_drvdata(pdev); > > - unsigned int max_queues_per_port; > > struct gdma_irq_context *gic; > > - unsigned int max_irqs, cpu; > > - int start_irq_index = 1; > > - int nvec, *irqs, irq; > > + int *irqs, irq, skip_first_cpu = 0; > > + unsigned long flags; > > int err, i = 0, j; > > Reverse xmas tree here too. > > > > > cpus_read_lock(); > > - max_queues_per_port = num_online_cpus(); > > - if (max_queues_per_port > MANA_MAX_NUM_QUEUES) > > - max_queues_per_port = MANA_MAX_NUM_QUEUES; > > + spin_lock_irqsave(&gc->irq_ctxs_lock, flags); > > + irqs = kmalloc_array(nvec, sizeof(int), GFP_KERNEL); > > + if (!irqs) { > > + err = -ENOMEM; > > + goto free_irq_vector; > > + } > > ... Hi Simon, Thank you so much for all the comments, I will have them incorporated in the next version. :) Regards, Shradha. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically 2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta ` (2 preceding siblings ...) 2025-04-24 16:57 ` Simon Horman @ 2025-05-07 16:28 ` kernel test robot 3 siblings, 0 replies; 15+ messages in thread From: kernel test robot @ 2025-05-07 16:28 UTC (permalink / raw) To: Shradha Gupta, linux-hyperv, linux-pci, linux-kernel, Nipun Gupta, Yury Norov, Jason Gunthorpe, Jonathan Cameron, Anna-Maria Behnsen, Shivamurthy Shastri, Kevin Tian, Long Li, Thomas Gleixner, Bjorn Helgaas, Rob Herring, Manivannan Sadhasivam, Krzysztof Wilczyński, Lorenzo Pieralisi, Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman Cc: llvm, oe-kbuild-all, netdev Hi Shradha, kernel test robot noticed the following build warnings: [auto build test WARNING on pci/next] [also build test WARNING on pci/for-linus linus/master v6.15-rc5 next-20250507] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Shradha-Gupta/PCI-hv-enable-pci_hyperv-to-allow-dynamic-vector-allocation/20250416-233828 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/1744817781-3243-1-git-send-email-shradhagupta%40linux.microsoft.com patch subject: [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20250508/202505080049.7AvfzOGc-lkp@intel.com/config) compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250508/202505080049.7AvfzOGc-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202505080049.7AvfzOGc-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/net/ethernet/microsoft/mana/gdma_main.c:500:2: warning: variable 'gic' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized] 500 | list_for_each(pos, &gc->irq_contexts) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/list.h:687:27: note: expanded from macro 'list_for_each' 687 | for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/microsoft/mana/gdma_main.c:510:7: note: uninitialized use occurs here 510 | if (!gic) | ^~~ drivers/net/ethernet/microsoft/mana/gdma_main.c:500:2: note: remove the condition if it is always true 500 | list_for_each(pos, &gc->irq_contexts) { | ^ include/linux/list.h:687:27: note: expanded from macro 'list_for_each' 687 | for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next) | ^ drivers/net/ethernet/microsoft/mana/gdma_main.c:475:30: note: initialize the variable 'gic' to silence this warning 475 | struct gdma_irq_context *gic; | ^ | = NULL drivers/net/ethernet/microsoft/mana/gdma_main.c:541:2: warning: variable 'gic' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized] 541 | list_for_each(pos, &gc->irq_contexts) { | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/list.h:687:27: note: expanded from macro 'list_for_each' 687 | for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next) | ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/microsoft/mana/gdma_main.c:551:7: note: uninitialized use occurs here 551 | if (!gic) | ^~~ drivers/net/ethernet/microsoft/mana/gdma_main.c:541:2: note: remove the condition if it is always true 541 | list_for_each(pos, &gc->irq_contexts) { | ^ include/linux/list.h:687:27: note: expanded from macro 'list_for_each' 687 | for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next) | ^ drivers/net/ethernet/microsoft/mana/gdma_main.c:523:30: note: initialize the variable 'gic' to silence this warning 523 | struct gdma_irq_context *gic; | ^ | = NULL 2 warnings generated. vim +500 drivers/net/ethernet/microsoft/mana/gdma_main.c 470 471 static int mana_gd_register_irq(struct gdma_queue *queue, 472 const struct gdma_queue_spec *spec) 473 { 474 struct gdma_dev *gd = queue->gdma_dev; 475 struct gdma_irq_context *gic; 476 struct gdma_context *gc; 477 unsigned int msi_index; 478 struct list_head *pos; 479 unsigned long flags, flag_irq; 480 struct device *dev; 481 int err = 0, count; 482 483 gc = gd->gdma_context; 484 dev = gc->dev; 485 msi_index = spec->eq.msix_index; 486 487 if (msi_index >= gc->num_msix_usable) { 488 err = -ENOSPC; 489 dev_err(dev, "Register IRQ err:%d, msi:%u nMSI:%u", 490 err, msi_index, gc->num_msix_usable); 491 492 return err; 493 } 494 495 queue->eq.msix_index = msi_index; 496 497 /* get the msi_index value from the list*/ 498 count = 0; 499 spin_lock_irqsave(&gc->irq_ctxs_lock, flag_irq); > 500 list_for_each(pos, &gc->irq_contexts) { 501 if (count == msi_index) { 502 gic = list_entry(pos, struct gdma_irq_context, gic_list); 503 break; 504 } 505 506 count++; 507 } 508 spin_unlock_irqrestore(&gc->irq_ctxs_lock, flag_irq); 509 510 if (!gic) 511 return -1; 512 513 spin_lock_irqsave(&gic->lock, flags); 514 list_add_rcu(&queue->entry, &gic->eq_list); 515 spin_unlock_irqrestore(&gic->lock, flags); 516 517 return 0; 518 } 519 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-05-07 16:29 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-16 15:35 [PATCH 0/2] Allow dyn pci vector allocation of MANA Shradha Gupta 2025-04-16 15:36 ` [PATCH 1/2] PCI: hv: enable pci_hyperv to allow dynamic vector allocation Shradha Gupta 2025-04-16 18:30 ` Bjorn Helgaas 2025-04-17 7:29 ` Shradha Gupta 2025-04-17 10:00 ` Thomas Gleixner 2025-04-21 6:33 ` Shradha Gupta 2025-04-16 15:36 ` [PATCH 2/2] net: mana: Allow MANA driver to allocate PCI vector dynamically Shradha Gupta 2025-04-16 17:22 ` Yury Norov 2025-04-17 7:32 ` Shradha Gupta 2025-04-22 12:09 ` Shradha Gupta 2025-04-16 18:32 ` Bjorn Helgaas 2025-04-17 7:33 ` Shradha Gupta 2025-04-24 16:57 ` Simon Horman 2025-04-25 7:29 ` Shradha Gupta 2025-05-07 16:28 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).