netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion
@ 2025-07-07  8:20 Nam Cao
  2025-07-07  8:20 ` [PATCH for-netdev v2 1/2] irqdomain: Export irq_domain_free_irqs_top() Nam Cao
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Nam Cao @ 2025-07-07  8:20 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Michael Kelley, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
	linux-kernel
  Cc: Manivannan Sadhasivam, Nam Cao

Hi,

This series originally belongs to a bigger series sent to PCI tree:
https://lore.kernel.org/linux-pci/024f0122314198fe0a42fef01af53e8953a687ec.1750858083.git.namcao@linutronix.de/

However, during review, we noticed that the patch conflicts with another
patch in netdev tree:
https://lore.kernel.org/netdev/1749651015-9668-1-git-send-email-shradhagupta@linux.microsoft.com/

As this series has no dependency with the rest of the series, we think it
is best to split out this one and send it to netdev, to avoid conflict
resolution headache later on.

Can netdev maintainers please pick it up?

Best regards,
Nam

Nam Cao (2):
  irqdomain: Export irq_domain_free_irqs_top()
  PCI: hv: Switch to msi_create_parent_irq_domain()

 drivers/pci/Kconfig                 |   1 +
 drivers/pci/controller/pci-hyperv.c | 111 +++++++++++++++++++++-------
 kernel/irq/irqdomain.c              |   1 +
 3 files changed, 85 insertions(+), 28 deletions(-)

-- 
2.39.5


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH for-netdev v2 1/2] irqdomain: Export irq_domain_free_irqs_top()
  2025-07-07  8:20 [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion Nam Cao
@ 2025-07-07  8:20 ` Nam Cao
  2025-07-07  8:20 ` [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain() Nam Cao
  2025-07-11 12:00 ` [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion patchwork-bot+netdevbpf
  2 siblings, 0 replies; 8+ messages in thread
From: Nam Cao @ 2025-07-07  8:20 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Michael Kelley, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
	linux-kernel
  Cc: Manivannan Sadhasivam, Nam Cao

Export irq_domain_free_irqs_top(), making it usable for drivers compiled as
modules.

Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 kernel/irq/irqdomain.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index c8b6de09047b..46919e6c9c45 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -1561,6 +1561,7 @@ void irq_domain_free_irqs_top(struct irq_domain *domain, unsigned int virq,
 	}
 	irq_domain_free_irqs_common(domain, virq, nr_irqs);
 }
+EXPORT_SYMBOL_GPL(irq_domain_free_irqs_top);
 
 static void irq_domain_free_irqs_hierarchy(struct irq_domain *domain,
 					   unsigned int irq_base,
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain()
  2025-07-07  8:20 [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion Nam Cao
  2025-07-07  8:20 ` [PATCH for-netdev v2 1/2] irqdomain: Export irq_domain_free_irqs_top() Nam Cao
@ 2025-07-07  8:20 ` Nam Cao
  2025-07-07 18:49   ` Michael Kelley
  2025-07-11 12:00 ` [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion patchwork-bot+netdevbpf
  2 siblings, 1 reply; 8+ messages in thread
From: Nam Cao @ 2025-07-07  8:20 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Michael Kelley, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev,
	linux-kernel
  Cc: Manivannan Sadhasivam, Nam Cao, Bjorn Helgaas

Move away from the legacy MSI domain setup, switch to use
msi_create_parent_irq_domain().

While doing the conversion, I noticed that hv_compose_msi_msg() is doing
more than it is supposed to (composing message). This function also
allocates and populates struct tran_int_desc, which should be done in
hv_pcie_domain_alloc() instead. It works, but it is not the correct design.
However, I have no hardware to test such change, therefore I leave a TODO
note.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
v2:
  - rebase onto netdev/next
  - clarify the TODO note
  - fixup arm64
  - Add HV_MSI_CHIP_FLAGS macro and reduce the amount of #ifdef
---
 drivers/pci/Kconfig                 |   1 +
 drivers/pci/controller/pci-hyperv.c | 111 +++++++++++++++++++++-------
 2 files changed, 84 insertions(+), 28 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 9c0e4aaf4e8c..9a249c65aedc 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -223,6 +223,7 @@ config PCI_HYPERV
 	tristate "Hyper-V PCI Frontend"
 	depends on ((X86 && X86_64) || ARM64) && HYPERV && PCI_MSI && SYSFS
 	select PCI_HYPERV_INTERFACE
+	select IRQ_MSI_LIB
 	help
 	  The PCI device frontend driver allows the kernel to import arbitrary
 	  PCI devices from a PCI backend to support PCI driver domains.
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 86ca041bf74a..ebe39218479a 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -44,6 +44,7 @@
 #include <linux/delay.h>
 #include <linux/semaphore.h>
 #include <linux/irq.h>
+#include <linux/irqchip/irq-msi-lib.h>
 #include <linux/msi.h>
 #include <linux/hyperv.h>
 #include <linux/refcount.h>
@@ -508,7 +509,6 @@ struct hv_pcibus_device {
 	struct list_head children;
 	struct list_head dr_list;
 
-	struct msi_domain_info msi_info;
 	struct irq_domain *irq_domain;
 
 	struct workqueue_struct *wq;
@@ -576,9 +576,8 @@ struct hv_pci_compl {
 static void hv_pci_onchannelcallback(void *context);
 
 #ifdef CONFIG_X86
-#define DELIVERY_MODE	APIC_DELIVERY_MODE_FIXED
-#define FLOW_HANDLER	handle_edge_irq
-#define FLOW_NAME	"edge"
+#define DELIVERY_MODE		APIC_DELIVERY_MODE_FIXED
+#define HV_MSI_CHIP_FLAGS	MSI_CHIP_FLAG_SET_ACK
 
 static int hv_pci_irqchip_init(void)
 {
@@ -723,8 +722,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
 #define HV_PCI_MSI_SPI_START	64
 #define HV_PCI_MSI_SPI_NR	(1020 - HV_PCI_MSI_SPI_START)
 #define DELIVERY_MODE		0
-#define FLOW_HANDLER		NULL
-#define FLOW_NAME		NULL
+#define HV_MSI_CHIP_FLAGS	MSI_CHIP_FLAG_SET_EOI
 #define hv_msi_prepare		NULL
 
 struct hv_pci_chip_data {
@@ -1687,7 +1685,7 @@ static void hv_msi_free(struct irq_domain *domain, struct msi_domain_info *info,
 	struct msi_desc *msi = irq_data_get_msi_desc(irq_data);
 
 	pdev = msi_desc_to_pci_dev(msi);
-	hbus = info->data;
+	hbus = domain->host_data;
 	int_desc = irq_data_get_irq_chip_data(irq_data);
 	if (!int_desc)
 		return;
@@ -1705,7 +1703,6 @@ static void hv_msi_free(struct irq_domain *domain, struct msi_domain_info *info,
 
 static void hv_irq_mask(struct irq_data *data)
 {
-	pci_msi_mask_irq(data);
 	if (data->parent_data->chip->irq_mask)
 		irq_chip_mask_parent(data);
 }
@@ -1716,7 +1713,6 @@ static void hv_irq_unmask(struct irq_data *data)
 
 	if (data->parent_data->chip->irq_unmask)
 		irq_chip_unmask_parent(data);
-	pci_msi_unmask_irq(data);
 }
 
 struct compose_comp_ctxt {
@@ -2101,25 +2097,87 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
 	msg->data = 0;
 }
 
+static bool hv_pcie_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
+				      struct irq_domain *real_parent, struct msi_domain_info *info)
+{
+	struct irq_chip *chip = info->chip;
+
+	if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
+		return false;
+
+	info->ops->msi_prepare = hv_msi_prepare;
+
+	chip->irq_set_affinity = irq_chip_set_affinity_parent;
+
+	if (IS_ENABLED(CONFIG_X86))
+		chip->flags |= IRQCHIP_MOVE_DEFERRED;
+
+	return true;
+}
+
+#define HV_PCIE_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS		| \
+				    MSI_FLAG_USE_DEF_CHIP_OPS		| \
+				    MSI_FLAG_PCI_MSI_MASK_PARENT)
+#define HV_PCIE_MSI_FLAGS_SUPPORTED (MSI_FLAG_MULTI_PCI_MSI		| \
+				     MSI_FLAG_PCI_MSIX			| \
+				     MSI_FLAG_PCI_MSIX_ALLOC_DYN	| \
+				     MSI_GENERIC_FLAGS_MASK)
+
+static const struct msi_parent_ops hv_pcie_msi_parent_ops = {
+	.required_flags		= HV_PCIE_MSI_FLAGS_REQUIRED,
+	.supported_flags	= HV_PCIE_MSI_FLAGS_SUPPORTED,
+	.bus_select_token	= DOMAIN_BUS_PCI_MSI,
+	.chip_flags		= HV_MSI_CHIP_FLAGS,
+	.prefix			= "HV-",
+	.init_dev_msi_info	= hv_pcie_init_dev_msi_info,
+};
+
 /* HW Interrupt Chip Descriptor */
 static struct irq_chip hv_msi_irq_chip = {
 	.name			= "Hyper-V PCIe MSI",
 	.irq_compose_msi_msg	= hv_compose_msi_msg,
 	.irq_set_affinity	= irq_chip_set_affinity_parent,
-#ifdef CONFIG_X86
 	.irq_ack		= irq_chip_ack_parent,
-	.flags			= IRQCHIP_MOVE_DEFERRED,
-#elif defined(CONFIG_ARM64)
 	.irq_eoi		= irq_chip_eoi_parent,
-#endif
 	.irq_mask		= hv_irq_mask,
 	.irq_unmask		= hv_irq_unmask,
 };
 
-static struct msi_domain_ops hv_msi_ops = {
-	.msi_prepare	= hv_msi_prepare,
-	.msi_free	= hv_msi_free,
-	.prepare_desc	= pci_msix_prepare_desc,
+static int hv_pcie_domain_alloc(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs,
+			       void *arg)
+{
+	/*
+	 * TODO: Allocating and populating struct tran_int_desc in hv_compose_msi_msg()
+	 * should be moved here.
+	 */
+	int ret;
+
+	ret = irq_domain_alloc_irqs_parent(d, virq, nr_irqs, arg);
+	if (ret < 0)
+		return ret;
+
+	for (int i = 0; i < nr_irqs; i++) {
+		irq_domain_set_hwirq_and_chip(d, virq + i, 0, &hv_msi_irq_chip, NULL);
+		if (IS_ENABLED(CONFIG_X86))
+			__irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
+	}
+
+	return 0;
+}
+
+static void hv_pcie_domain_free(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs)
+{
+	struct msi_domain_info *info = d->host_data;
+
+	for (int i = 0; i < nr_irqs; i++)
+		hv_msi_free(d, info, virq + i);
+
+	irq_domain_free_irqs_top(d, virq, nr_irqs);
+}
+
+static const struct irq_domain_ops hv_pcie_domain_ops = {
+	.alloc	= hv_pcie_domain_alloc,
+	.free	= hv_pcie_domain_free,
 };
 
 /**
@@ -2137,17 +2195,14 @@ static struct msi_domain_ops hv_msi_ops = {
  */
 static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus)
 {
-	hbus->msi_info.chip = &hv_msi_irq_chip;
-	hbus->msi_info.ops = &hv_msi_ops;
-	hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS |
-		MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI |
-		MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN);
-	hbus->msi_info.handler = FLOW_HANDLER;
-	hbus->msi_info.handler_name = FLOW_NAME;
-	hbus->msi_info.data = hbus;
-	hbus->irq_domain = pci_msi_create_irq_domain(hbus->fwnode,
-						     &hbus->msi_info,
-						     hv_pci_get_root_domain());
+	struct irq_domain_info info = {
+		.fwnode		= hbus->fwnode,
+		.ops		= &hv_pcie_domain_ops,
+		.host_data	= hbus,
+		.parent		= hv_pci_get_root_domain(),
+	};
+
+	hbus->irq_domain = msi_create_parent_irq_domain(&info, &hv_pcie_msi_parent_ops);
 	if (!hbus->irq_domain) {
 		dev_err(&hbus->hdev->device,
 			"Failed to build an MSI IRQ domain\n");
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* RE: [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain()
  2025-07-07  8:20 ` [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain() Nam Cao
@ 2025-07-07 18:49   ` Michael Kelley
  2025-07-10 13:11     ` Paolo Abeni
  2025-07-15  5:14     ` Wei Liu
  0 siblings, 2 replies; 8+ messages in thread
From: Michael Kelley @ 2025-07-07 18:49 UTC (permalink / raw)
  To: Nam Cao, Marc Zyngier, Thomas Gleixner, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	open list:Hyper-V/Azure CORE AND DRIVERS
  Cc: Manivannan Sadhasivam, Bjorn Helgaas

From: Nam Cao <namcao@linutronix.de> Sent: Monday, July 7, 2025 1:20 AM
> 
> Move away from the legacy MSI domain setup, switch to use
> msi_create_parent_irq_domain().
> 
> While doing the conversion, I noticed that hv_compose_msi_msg() is doing
> more than it is supposed to (composing message). This function also
> allocates and populates struct tran_int_desc, which should be done in
> hv_pcie_domain_alloc() instead. It works, but it is not the correct design.
> However, I have no hardware to test such change, therefore I leave a TODO
> note.
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Nam Cao <namcao@linutronix.de>

[Adding linux-hyperv@vger.kernel.org so that the Linux on Hyper-V folks
have visibility.]

This all looks good to me now. Thanks for the additional explanation of
the TODO. I understand what you are suggesting. Moving the interaction
with the Hyper-V host into hv_pcie_domain_alloc() has additional appeal
because it should eliminate the need for the ugly polling for a VMBus
response. However, I'm unlikely to be the person implementing the
TODO. hv_compose_msi_msg() is a real beast of a function, and I lack
access to hardware to fully test the move, particularly a device that
does multi MSI. I don't think such a device is available in a VM in the
Azure public cloud.

I've tested this patch in an Azure VM that has a MANA NIC. The MANA
driver has updates in linux-next to use MSIX dynamic allocation, and
that dynamic allocation appears to work correctly with this patch. My
testing included unbind and rebind the driver several times so that
the full round-trip is tested.

Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>

> ---
> v2:
>   - rebase onto netdev/next
>   - clarify the TODO note
>   - fixup arm64
>   - Add HV_MSI_CHIP_FLAGS macro and reduce the amount of #ifdef
> ---
>  drivers/pci/Kconfig                 |   1 +
>  drivers/pci/controller/pci-hyperv.c | 111 +++++++++++++++++++++-------
>  2 files changed, 84 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 9c0e4aaf4e8c..9a249c65aedc 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -223,6 +223,7 @@ config PCI_HYPERV
>  	tristate "Hyper-V PCI Frontend"
>  	depends on ((X86 && X86_64) || ARM64) && HYPERV && PCI_MSI && SYSFS
>  	select PCI_HYPERV_INTERFACE
> +	select IRQ_MSI_LIB
>  	help
>  	  The PCI device frontend driver allows the kernel to import arbitrary
>  	  PCI devices from a PCI backend to support PCI driver domains.
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 86ca041bf74a..ebe39218479a 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -44,6 +44,7 @@
>  #include <linux/delay.h>
>  #include <linux/semaphore.h>
>  #include <linux/irq.h>
> +#include <linux/irqchip/irq-msi-lib.h>
>  #include <linux/msi.h>
>  #include <linux/hyperv.h>
>  #include <linux/refcount.h>
> @@ -508,7 +509,6 @@ struct hv_pcibus_device {
>  	struct list_head children;
>  	struct list_head dr_list;
> 
> -	struct msi_domain_info msi_info;
>  	struct irq_domain *irq_domain;
> 
>  	struct workqueue_struct *wq;
> @@ -576,9 +576,8 @@ struct hv_pci_compl {
>  static void hv_pci_onchannelcallback(void *context);
> 
>  #ifdef CONFIG_X86
> -#define DELIVERY_MODE	APIC_DELIVERY_MODE_FIXED
> -#define FLOW_HANDLER	handle_edge_irq
> -#define FLOW_NAME	"edge"
> +#define DELIVERY_MODE		APIC_DELIVERY_MODE_FIXED
> +#define HV_MSI_CHIP_FLAGS	MSI_CHIP_FLAG_SET_ACK
> 
>  static int hv_pci_irqchip_init(void)
>  {
> @@ -723,8 +722,7 @@ static void hv_arch_irq_unmask(struct irq_data *data)
>  #define HV_PCI_MSI_SPI_START	64
>  #define HV_PCI_MSI_SPI_NR	(1020 - HV_PCI_MSI_SPI_START)
>  #define DELIVERY_MODE		0
> -#define FLOW_HANDLER		NULL
> -#define FLOW_NAME		NULL
> +#define HV_MSI_CHIP_FLAGS	MSI_CHIP_FLAG_SET_EOI
>  #define hv_msi_prepare		NULL
> 
>  struct hv_pci_chip_data {
> @@ -1687,7 +1685,7 @@ static void hv_msi_free(struct irq_domain *domain, struct
> msi_domain_info *info,
>  	struct msi_desc *msi = irq_data_get_msi_desc(irq_data);
> 
>  	pdev = msi_desc_to_pci_dev(msi);
> -	hbus = info->data;
> +	hbus = domain->host_data;
>  	int_desc = irq_data_get_irq_chip_data(irq_data);
>  	if (!int_desc)
>  		return;
> @@ -1705,7 +1703,6 @@ static void hv_msi_free(struct irq_domain *domain, struct
> msi_domain_info *info,
> 
>  static void hv_irq_mask(struct irq_data *data)
>  {
> -	pci_msi_mask_irq(data);
>  	if (data->parent_data->chip->irq_mask)
>  		irq_chip_mask_parent(data);
>  }
> @@ -1716,7 +1713,6 @@ static void hv_irq_unmask(struct irq_data *data)
> 
>  	if (data->parent_data->chip->irq_unmask)
>  		irq_chip_unmask_parent(data);
> -	pci_msi_unmask_irq(data);
>  }
> 
>  struct compose_comp_ctxt {
> @@ -2101,25 +2097,87 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
>  	msg->data = 0;
>  }
> 
> +static bool hv_pcie_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
> +				      struct irq_domain *real_parent, struct msi_domain_info *info)
> +{
> +	struct irq_chip *chip = info->chip;
> +
> +	if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
> +		return false;
> +
> +	info->ops->msi_prepare = hv_msi_prepare;
> +
> +	chip->irq_set_affinity = irq_chip_set_affinity_parent;
> +
> +	if (IS_ENABLED(CONFIG_X86))
> +		chip->flags |= IRQCHIP_MOVE_DEFERRED;
> +
> +	return true;
> +}
> +
> +#define HV_PCIE_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS	| \
> +				    MSI_FLAG_USE_DEF_CHIP_OPS		| \
> +				    MSI_FLAG_PCI_MSI_MASK_PARENT)
> +#define HV_PCIE_MSI_FLAGS_SUPPORTED (MSI_FLAG_MULTI_PCI_MSI	| \
> +				     MSI_FLAG_PCI_MSIX			| \
> +				     MSI_FLAG_PCI_MSIX_ALLOC_DYN	| \
> +				     MSI_GENERIC_FLAGS_MASK)
> +
> +static const struct msi_parent_ops hv_pcie_msi_parent_ops = {
> +	.required_flags		= HV_PCIE_MSI_FLAGS_REQUIRED,
> +	.supported_flags	= HV_PCIE_MSI_FLAGS_SUPPORTED,
> +	.bus_select_token	= DOMAIN_BUS_PCI_MSI,
> +	.chip_flags		= HV_MSI_CHIP_FLAGS,
> +	.prefix			= "HV-",
> +	.init_dev_msi_info	= hv_pcie_init_dev_msi_info,
> +};
> +
>  /* HW Interrupt Chip Descriptor */
>  static struct irq_chip hv_msi_irq_chip = {
>  	.name			= "Hyper-V PCIe MSI",
>  	.irq_compose_msi_msg	= hv_compose_msi_msg,
>  	.irq_set_affinity	= irq_chip_set_affinity_parent,
> -#ifdef CONFIG_X86
>  	.irq_ack		= irq_chip_ack_parent,
> -	.flags			= IRQCHIP_MOVE_DEFERRED,
> -#elif defined(CONFIG_ARM64)
>  	.irq_eoi		= irq_chip_eoi_parent,
> -#endif
>  	.irq_mask		= hv_irq_mask,
>  	.irq_unmask		= hv_irq_unmask,
>  };
> 
> -static struct msi_domain_ops hv_msi_ops = {
> -	.msi_prepare	= hv_msi_prepare,
> -	.msi_free	= hv_msi_free,
> -	.prepare_desc	= pci_msix_prepare_desc,
> +static int hv_pcie_domain_alloc(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs,
> +			       void *arg)
> +{
> +	/*
> +	 * TODO: Allocating and populating struct tran_int_desc in hv_compose_msi_msg()
> +	 * should be moved here.
> +	 */
> +	int ret;
> +
> +	ret = irq_domain_alloc_irqs_parent(d, virq, nr_irqs, arg);
> +	if (ret < 0)
> +		return ret;
> +
> +	for (int i = 0; i < nr_irqs; i++) {
> +		irq_domain_set_hwirq_and_chip(d, virq + i, 0, &hv_msi_irq_chip, NULL);
> +		if (IS_ENABLED(CONFIG_X86))
> +			__irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
> +	}
> +
> +	return 0;
> +}
> +
> +static void hv_pcie_domain_free(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs)
> +{
> +	struct msi_domain_info *info = d->host_data;
> +
> +	for (int i = 0; i < nr_irqs; i++)
> +		hv_msi_free(d, info, virq + i);
> +
> +	irq_domain_free_irqs_top(d, virq, nr_irqs);
> +}
> +
> +static const struct irq_domain_ops hv_pcie_domain_ops = {
> +	.alloc	= hv_pcie_domain_alloc,
> +	.free	= hv_pcie_domain_free,
>  };
> 
>  /**
> @@ -2137,17 +2195,14 @@ static struct msi_domain_ops hv_msi_ops = {
>   */
>  static int hv_pcie_init_irq_domain(struct hv_pcibus_device *hbus)
>  {
> -	hbus->msi_info.chip = &hv_msi_irq_chip;
> -	hbus->msi_info.ops = &hv_msi_ops;
> -	hbus->msi_info.flags = (MSI_FLAG_USE_DEF_DOM_OPS |
> -		MSI_FLAG_USE_DEF_CHIP_OPS | MSI_FLAG_MULTI_PCI_MSI |
> -		MSI_FLAG_PCI_MSIX | MSI_FLAG_PCI_MSIX_ALLOC_DYN);
> -	hbus->msi_info.handler = FLOW_HANDLER;
> -	hbus->msi_info.handler_name = FLOW_NAME;
> -	hbus->msi_info.data = hbus;
> -	hbus->irq_domain = pci_msi_create_irq_domain(hbus->fwnode,
> -						     &hbus->msi_info,
> -						     hv_pci_get_root_domain());
> +	struct irq_domain_info info = {
> +		.fwnode		= hbus->fwnode,
> +		.ops		= &hv_pcie_domain_ops,
> +		.host_data	= hbus,
> +		.parent		= hv_pci_get_root_domain(),
> +	};
> +
> +	hbus->irq_domain = msi_create_parent_irq_domain(&info, &hv_pcie_msi_parent_ops);
>  	if (!hbus->irq_domain) {
>  		dev_err(&hbus->hdev->device,
>  			"Failed to build an MSI IRQ domain\n");
> --
> 2.39.5


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain()
  2025-07-07 18:49   ` Michael Kelley
@ 2025-07-10 13:11     ` Paolo Abeni
  2025-07-10 13:22       ` Nam Cao
  2025-07-15  5:14     ` Wei Liu
  1 sibling, 1 reply; 8+ messages in thread
From: Paolo Abeni @ 2025-07-10 13:11 UTC (permalink / raw)
  To: open list:Hyper-V/Azure CORE AND DRIVERS, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui
  Cc: Manivannan Sadhasivam, Bjorn Helgaas, Michael Kelley, Nam Cao,
	Simon Horman, Marc Zyngier, David S . Miller, Thomas Gleixner,
	Eric Dumazet, Jakub Kicinski, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

On 7/7/25 8:49 PM, Michael Kelley wrote:
> From: Nam Cao <namcao@linutronix.de> Sent: Monday, July 7, 2025 1:20 AM
>>
>> Move away from the legacy MSI domain setup, switch to use
>> msi_create_parent_irq_domain().
>>
>> While doing the conversion, I noticed that hv_compose_msi_msg() is doing
>> more than it is supposed to (composing message). This function also
>> allocates and populates struct tran_int_desc, which should be done in
>> hv_pcie_domain_alloc() instead. It works, but it is not the correct design.
>> However, I have no hardware to test such change, therefore I leave a TODO
>> note.
>>
>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Nam Cao <namcao@linutronix.de>
> 
> [Adding linux-hyperv@vger.kernel.org so that the Linux on Hyper-V folks
> have visibility.]

I think this series could go via netdev, to simplify the later merge,
but it would be better to have explicit ack from Hyper-V people.

Adding more Microsoft folks.

/P


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain()
  2025-07-10 13:11     ` Paolo Abeni
@ 2025-07-10 13:22       ` Nam Cao
  0 siblings, 0 replies; 8+ messages in thread
From: Nam Cao @ 2025-07-10 13:22 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: open list:Hyper-V/Azure CORE AND DRIVERS, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Manivannan Sadhasivam,
	Bjorn Helgaas, Michael Kelley, Simon Horman, Marc Zyngier,
	David S . Miller, Thomas Gleixner, Eric Dumazet, Jakub Kicinski,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Jul 10, 2025 at 03:11:36PM +0200, Paolo Abeni wrote:
> I think this series could go via netdev, to simplify the later merge,
> but it would be better to have explicit ack from Hyper-V people.
> 
> Adding more Microsoft folks.

Right. Sorry Microsoft folks, I fumbled my scripts and didn't Cc you in the
patch.

Nam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion
  2025-07-07  8:20 [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion Nam Cao
  2025-07-07  8:20 ` [PATCH for-netdev v2 1/2] irqdomain: Export irq_domain_free_irqs_top() Nam Cao
  2025-07-07  8:20 ` [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain() Nam Cao
@ 2025-07-11 12:00 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-07-11 12:00 UTC (permalink / raw)
  To: Nam Cao
  Cc: maz, tglx, mhklinux, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, mani

Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon,  7 Jul 2025 10:20:14 +0200 you wrote:
> Hi,
> 
> This series originally belongs to a bigger series sent to PCI tree:
> https://lore.kernel.org/linux-pci/024f0122314198fe0a42fef01af53e8953a687ec.1750858083.git.namcao@linutronix.de/
> 
> However, during review, we noticed that the patch conflicts with another
> patch in netdev tree:
> https://lore.kernel.org/netdev/1749651015-9668-1-git-send-email-shradhagupta@linux.microsoft.com/
> 
> [...]

Here is the summary with links:
  - [for-netdev,v2,1/2] irqdomain: Export irq_domain_free_irqs_top()
    https://git.kernel.org/netdev/net-next/c/a6b0465bd283
  - [for-netdev,v2,2/2] PCI: hv: Switch to msi_create_parent_irq_domain()
    https://git.kernel.org/netdev/net-next/c/5f83d6337c9c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain()
  2025-07-07 18:49   ` Michael Kelley
  2025-07-10 13:11     ` Paolo Abeni
@ 2025-07-15  5:14     ` Wei Liu
  1 sibling, 0 replies; 8+ messages in thread
From: Wei Liu @ 2025-07-15  5:14 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Nam Cao, Marc Zyngier, Thomas Gleixner, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	open list:Hyper-V/Azure CORE AND DRIVERS, Manivannan Sadhasivam,
	Bjorn Helgaas, Wei Liu

On Mon, Jul 07, 2025 at 06:49:02PM +0000, Michael Kelley wrote:
> From: Nam Cao <namcao@linutronix.de> Sent: Monday, July 7, 2025 1:20 AM
> > 
> > Move away from the legacy MSI domain setup, switch to use
> > msi_create_parent_irq_domain().
> > 
> > While doing the conversion, I noticed that hv_compose_msi_msg() is doing
> > more than it is supposed to (composing message). This function also
> > allocates and populates struct tran_int_desc, which should be done in
> > hv_pcie_domain_alloc() instead. It works, but it is not the correct design.
> > However, I have no hardware to test such change, therefore I leave a TODO
> > note.
> > 
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Nam Cao <namcao@linutronix.de>
> 
> [Adding linux-hyperv@vger.kernel.org so that the Linux on Hyper-V folks
> have visibility.]
> 
> This all looks good to me now. Thanks for the additional explanation of
> the TODO. I understand what you are suggesting. Moving the interaction
> with the Hyper-V host into hv_pcie_domain_alloc() has additional appeal
> because it should eliminate the need for the ugly polling for a VMBus
> response. However, I'm unlikely to be the person implementing the
> TODO. hv_compose_msi_msg() is a real beast of a function, and I lack
> access to hardware to fully test the move, particularly a device that
> does multi MSI. I don't think such a device is available in a VM in the
> Azure public cloud.
> 
> I've tested this patch in an Azure VM that has a MANA NIC. The MANA
> driver has updates in linux-next to use MSIX dynamic allocation, and
> that dynamic allocation appears to work correctly with this patch. My
> testing included unbind and rebind the driver several times so that
> the full round-trip is tested.
> 
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>
> Tested-by: Michael Kelley <mhklinux@outlook.com>

Acked-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-07-15  5:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07  8:20 [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion Nam Cao
2025-07-07  8:20 ` [PATCH for-netdev v2 1/2] irqdomain: Export irq_domain_free_irqs_top() Nam Cao
2025-07-07  8:20 ` [PATCH for-netdev v2 2/2] PCI: hv: Switch to msi_create_parent_irq_domain() Nam Cao
2025-07-07 18:49   ` Michael Kelley
2025-07-10 13:11     ` Paolo Abeni
2025-07-10 13:22       ` Nam Cao
2025-07-15  5:14     ` Wei Liu
2025-07-11 12:00 ` [PATCH for-netdev v2 0/2] PCI: hv: MSI parent domain conversion patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).