From: Nam Cao <namcao@linutronix.de>
To: Michael Kelley <mhklinux@outlook.com>
Cc: "Marc Zyngier" <maz@kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Manivannan Sadhasivam" <mani@kernel.org>,
"Rob Herring" <robh@kernel.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Karthikeyan Mitran" <m.karthikeyan@mobiveil.co.in>,
"Hou Zhiqiang" <Zhiqiang.Hou@nxp.com>,
"Thomas Petazzoni" <thomas.petazzoni@bootlin.com>,
"Pali Rohár" <pali@kernel.org>,
"K . Y . Srinivasan" <kys@microsoft.com>,
"Haiyang Zhang" <haiyangz@microsoft.com>,
"Wei Liu" <wei.liu@kernel.org>,
"Dexuan Cui" <decui@microsoft.com>,
"Joyce Ooi" <joyce.ooi@intel.com>,
"Jim Quinlan" <jim2101024@gmail.com>,
"Nicolas Saenz Julienne" <nsaenz@kernel.org>,
"Florian Fainelli" <florian.fainelli@broadcom.com>,
"Broadcom internal kernel review list"
<bcm-kernel-feedback-list@broadcom.com>,
"Ray Jui" <rjui@broadcom.com>,
"Scott Branden" <sbranden@broadcom.com>,
"Ryder Lee" <ryder.lee@mediatek.com>,
"Jianjun Wang" <jianjun.wang@mediatek.com>,
"Marek Vasut" <marek.vasut+renesas@gmail.com>,
"Yoshihiro Shimoda" <yoshihiro.shimoda.uh@renesas.com>,
"Michal Simek" <michal.simek@amd.com>,
"Daire McNamara" <daire.mcnamara@microchip.com>,
"Nirmal Patel" <nirmal.patel@linux.intel.com>,
"Jonathan Derrick" <jonathan.derrick@linux.dev>,
"Matthias Brugger" <matthias.bgg@gmail.com>,
"AngeloGioacchino Del Regno"
<angelogioacchino.delregno@collabora.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
"linux-rpi-kernel@lists.infradead.org"
<linux-rpi-kernel@lists.infradead.org>,
"linux-mediatek@lists.infradead.org"
<linux-mediatek@lists.infradead.org>,
"linux-renesas-soc@vger.kernel.org"
<linux-renesas-soc@vger.kernel.org>
Subject: Re: [PATCH 14/16] PCI: hv: Switch to msi_create_parent_irq_domain()
Date: Sat, 5 Jul 2025 11:46:55 +0200 [thread overview]
Message-ID: <20250705094655.sEu3KWbJ@linutronix.de> (raw)
In-Reply-To: <SN6PR02MB41571145B5ECA505CDA6BD90D44DA@SN6PR02MB4157.namprd02.prod.outlook.com>
On Sat, Jul 05, 2025 at 03:51:48AM +0000, Michael Kelley wrote:
> From: Nam Cao <namcao@linutronix.de> Sent: Thursday, June 26, 2025 7:48 AM
> >
> > Move away from the legacy MSI domain setup, switch to use
> > msi_create_parent_irq_domain().
>
> With the additional tweak to this patch that you supplied separately,
> everything in my testing on both x86 and arm64 seems to work OK. So
> that's all good.
>
> On arm64, I did notice the following IRQ domain information from
> /sys/kernel/debug/irq/domains:
>
> # cat HV-PCI-MSIX-1e03\:00\:00.0-12
> name: HV-PCI-MSIX-1e03:00:00.0-12
> size: 0
> mapped: 7
> flags: 0x00000213
> IRQ_DOMAIN_FLAG_HIERARCHY
> IRQ_DOMAIN_NAME_ALLOCATED
> IRQ_DOMAIN_FLAG_MSI
> IRQ_DOMAIN_FLAG_MSI_DEVICE
> parent: 5D202AA8-1E03-4F0F-A786-390A0D2749E9-3
> name: 5D202AA8-1E03-4F0F-A786-390A0D2749E9-3
> size: 0
> mapped: 7
> flags: 0x00000103
> IRQ_DOMAIN_FLAG_HIERARCHY
> IRQ_DOMAIN_NAME_ALLOCATED
> IRQ_DOMAIN_FLAG_MSI_PARENT
> parent: hv_vpci_arm64
> name: hv_vpci_arm64
> size: 956
> mapped: 31
> flags: 0x00000003
> IRQ_DOMAIN_FLAG_HIERARCHY
> IRQ_DOMAIN_NAME_ALLOCATED
> parent: irqchip@0x00000000ffff0000-1
> name: irqchip@0x00000000ffff0000-1
> size: 0
> mapped: 47
> flags: 0x00000003
> IRQ_DOMAIN_FLAG_HIERARCHY
> IRQ_DOMAIN_NAME_ALLOCATED
>
> The 5D202AA8-1E03-4F0F-A786-390A0D2749E9-3 domain has
> IRQ_DOMAIN_FLAG_MSI_PARENT set. But the hv_vpci_arm64
> and irqchip@... domains do not. Is that a problem? On x86,
> the output is this, with IRQ_DOMAIN_FLAG_MSI_PARENT set
> in the next level up VECTOR domain:
That looks normal. IRQ_DOMAIN_FLAG_MSI_PARENT is set for domains which
provide MSI parent domain capability, which happens to be the case for x86
vector.
> # cat HV-PCI-MSIX-6b71\:00\:02.0-12
> name: HV-PCI-MSIX-6b71:00:02.0-12
> size: 0
> mapped: 17
> flags: 0x00000213
> IRQ_DOMAIN_FLAG_HIERARCHY
> IRQ_DOMAIN_NAME_ALLOCATED
> IRQ_DOMAIN_FLAG_MSI
> IRQ_DOMAIN_FLAG_MSI_DEVICE
> parent: 8564CB14-6B71-477C-B189-F175118E6FF0-3
> name: 8564CB14-6B71-477C-B189-F175118E6FF0-3
> size: 0
> mapped: 17
> flags: 0x00000103
> IRQ_DOMAIN_FLAG_HIERARCHY
> IRQ_DOMAIN_NAME_ALLOCATED
> IRQ_DOMAIN_FLAG_MSI_PARENT
> parent: VECTOR
> name: VECTOR
> size: 0
> mapped: 67
> flags: 0x00000103
> IRQ_DOMAIN_FLAG_HIERARCHY
> IRQ_DOMAIN_NAME_ALLOCATED
> IRQ_DOMAIN_FLAG_MSI_PARENT
>
> Finally, I've noted a couple of code review comments below. These
> comments may reflect my lack of fully understanding the MSI
> IRQ handling, in which case, please set me straight. Thanks,
>
> Michael
>
> >
> > Signed-off-by: Nam Cao <namcao@linutronix.de>
> > ---
> > Cc: K. Y. Srinivasan <kys@microsoft.com>
> > Cc: Haiyang Zhang <haiyangz@microsoft.com>
> > Cc: Wei Liu <wei.liu@kernel.org>
> > Cc: Dexuan Cui <decui@microsoft.com>
> > Cc: linux-hyperv@vger.kernel.org
> > ---
> > drivers/pci/Kconfig | 1 +
> > drivers/pci/controller/pci-hyperv.c | 98 +++++++++++++++++++++++------
> > 2 files changed, 80 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> > index 9c0e4aaf4e8cb..9a249c65aedcd 100644
> > --- a/drivers/pci/Kconfig
> > +++ b/drivers/pci/Kconfig
> > @@ -223,6 +223,7 @@ config PCI_HYPERV
> > tristate "Hyper-V PCI Frontend"
> > depends on ((X86 && X86_64) || ARM64) && HYPERV && PCI_MSI && SYSFS
> > select PCI_HYPERV_INTERFACE
> > + select IRQ_MSI_LIB
> > help
> > The PCI device frontend driver allows the kernel to import arbitrary
> > PCI devices from a PCI backend to support PCI driver domains.
> > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> > index ef5d655a0052c..3a24fadddb83b 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -44,6 +44,7 @@
> > #include <linux/delay.h>
> > #include <linux/semaphore.h>
> > #include <linux/irq.h>
> > +#include <linux/irqchip/irq-msi-lib.h>
> > #include <linux/msi.h>
> > #include <linux/hyperv.h>
> > #include <linux/refcount.h>
> > @@ -508,7 +509,6 @@ struct hv_pcibus_device {
> > struct list_head children;
> > struct list_head dr_list;
> >
> > - struct msi_domain_info msi_info;
> > struct irq_domain *irq_domain;
> >
> > struct workqueue_struct *wq;
> > @@ -1687,7 +1687,7 @@ static void hv_msi_free(struct irq_domain *domain, struct msi_domain_info *info,
> > struct msi_desc *msi = irq_data_get_msi_desc(irq_data);
> >
> > pdev = msi_desc_to_pci_dev(msi);
> > - hbus = info->data;
> > + hbus = domain->host_data;
> > int_desc = irq_data_get_irq_chip_data(irq_data);
> > if (!int_desc)
> > return;
> > @@ -1705,7 +1705,6 @@ static void hv_msi_free(struct irq_domain *domain, struct msi_domain_info *info,
> >
> > static void hv_irq_mask(struct irq_data *data)
> > {
> > - pci_msi_mask_irq(data);
> > if (data->parent_data->chip->irq_mask)
> > irq_chip_mask_parent(data);
> > }
> > @@ -1716,7 +1715,6 @@ static void hv_irq_unmask(struct irq_data *data)
> >
> > if (data->parent_data->chip->irq_unmask)
> > irq_chip_unmask_parent(data);
> > - pci_msi_unmask_irq(data);
> > }
> >
> > struct compose_comp_ctxt {
> > @@ -2101,6 +2099,44 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
> > msg->data = 0;
> > }
> >
> > +static bool hv_pcie_init_dev_msi_info(struct device *dev, struct irq_domain *domain,
> > + struct irq_domain *real_parent, struct msi_domain_info *info)
> > +{
> > + struct irq_chip *chip = info->chip;
> > +
> > + if (!msi_lib_init_dev_msi_info(dev, domain, real_parent, info))
> > + return false;
> > +
> > + info->ops->msi_prepare = hv_msi_prepare;
> > +
> > + chip->irq_set_affinity = irq_chip_set_affinity_parent;
> > +
> > + if (IS_ENABLED(CONFIG_X86))
> > + chip->flags |= IRQCHIP_MOVE_DEFERRED;
> > +
> > + return true;
> > +}
> > +
> > +#define HV_PCIE_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \
> > + MSI_FLAG_USE_DEF_CHIP_OPS | \
> > + MSI_FLAG_PCI_MSI_MASK_PARENT)
> > +#define HV_PCIE_MSI_FLAGS_SUPPORTED (MSI_FLAG_MULTI_PCI_MSI | \
> > + MSI_FLAG_PCI_MSIX | \
> > + MSI_GENERIC_FLAGS_MASK)
> > +
> > +static const struct msi_parent_ops hv_pcie_msi_parent_ops = {
> > + .required_flags = HV_PCIE_MSI_FLAGS_REQUIRED,
> > + .supported_flags = HV_PCIE_MSI_FLAGS_SUPPORTED,
> > + .bus_select_token = DOMAIN_BUS_PCI_MSI,
> > +#ifdef CONFIG_X86
> > + .chip_flags = MSI_CHIP_FLAG_SET_ACK,
> > +#elif defined(CONFIG_ARM64)
> > + .chip_flags = MSI_CHIP_FLAG_SET_EOI,
> > +#endif
> > + .prefix = "HV-",
> > + .init_dev_msi_info = hv_pcie_init_dev_msi_info,
> > +};
> > +
> > /* HW Interrupt Chip Descriptor */
> > static struct irq_chip hv_msi_irq_chip = {
> > .name = "Hyper-V PCIe MSI",
> > @@ -2108,7 +2144,6 @@ static struct irq_chip hv_msi_irq_chip = {
> > .irq_set_affinity = irq_chip_set_affinity_parent,
> > #ifdef CONFIG_X86
> > .irq_ack = irq_chip_ack_parent,
> > - .flags = IRQCHIP_MOVE_DEFERRED,
> > #elif defined(CONFIG_ARM64)
> > .irq_eoi = irq_chip_eoi_parent,
> > #endif
>
> Would it work to drop the #ifdef's and always set both .irq_ack and
> .irq_eoi on x86 and on ARM64? Is which one gets called controlled by the
> child HV-PCI-MSIX- ... domain, based on the .chip_flags?
>
> I'm trying to reduce the #ifdef clutter. I
> tested without the #ifdefs on both x86 and arm64, and
> everything works, but I know that doesn't prove that it's
> OK.
Nothing is wrong with that, as far as I can tell.
> If the #ifdefs can go away, then I'd like to see a tweak to the way
> .chip_flags is set. Rather than do an #ifdef inline for struct
> msi_parent_ops hv_pcie_msi_parent_ops, add a #define
> HV_MSI_CHIP_FLAGS in the existing #ifdef X86 and #ifdef ARM64
> sections respectively near the top of this source file, and then
> use HV_MSI_CHIP_FLAGS in struct msi_parent_ops
> hv_pcie_msi_parent_ops. As much as is reasonable, I'd like to
> not clutter the code with #ifdef X86 #elseif ARM64, but instead
> group all the differences under the existing #ifdefs near the top.
> There are some places where this isn't practical, but this seems
> like a place that is practical.
Yes, that would be better. I will do it in v2.
> > @@ -2116,9 +2151,37 @@ static struct irq_chip hv_msi_irq_chip = {
> > .irq_unmask = hv_irq_unmask,
> > };
> >
> > -static struct msi_domain_ops hv_msi_ops = {
> > - .msi_prepare = hv_msi_prepare,
> > - .msi_free = hv_msi_free,
> > +static int hv_pcie_domain_alloc(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs,
> > + void *arg)
> > +{
> > + /* TODO: move the content of hv_compose_msi_msg() in here */
>
> Could you elaborate on this TODO? Is the idea to loop through all the IRQs and
> generate the MSI message for each one? What is the advantage to doing it here?
> I noticed in Patch 3 of the series, the Aardvark controller has
> advk_msi_irq_compose_msi_msg(), but you had not moved it into the domain
> allocation path.
Sorry for being unclear. hv_compose_msi_msg() should not be moved here
entirely. Let me elaborate this in v2.
What I meant is that, hv_compose_msi_msg() is doing more than what this
callback is supposed to do (composing message). It works, but it is not
correct. Interrupt allocation is the responsibility of
irq_domain_ops::alloc(). Allocating and populating int_desc should be in
hv_pcie_domain_alloc() instead.
irq_domain_ops's .alloc() and .free() should be asymmetric.
>
> Also, is there some point in the time in the future where the "TODO" is likely to
> become a "MUST DO"?
There's nothing planned that would make this non-functional, as far as I
know.
Thanks so much for examining the patch,
Nam
next prev parent reply other threads:[~2025-07-05 9:47 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-26 14:47 [PATCH 00/16] PCI: MSI parent domain conversion Nam Cao
2025-06-26 14:47 ` [PATCH 01/16] PCI: dwc: Switch to msi_create_parent_irq_domain() Nam Cao
2025-07-03 13:19 ` Thomas Gleixner
2025-06-26 14:47 ` [PATCH 02/16] PCI: mobiveil: " Nam Cao
2025-07-03 13:20 ` Thomas Gleixner
2025-06-26 14:47 ` [PATCH 03/16] PCI: aardvark: " Nam Cao
2025-07-03 13:21 ` Thomas Gleixner
2025-06-26 14:47 ` [PATCH 04/16] PCI: altera-msi: " Nam Cao
2025-07-03 13:22 ` Thomas Gleixner
2025-06-26 14:47 ` [PATCH 05/16] PCI: brcmstb: " Nam Cao
2025-06-30 19:18 ` Florian Fainelli
2025-07-03 13:23 ` Thomas Gleixner
2025-06-26 14:47 ` [PATCH 06/16] PCI: iproc: " Nam Cao
2025-06-30 19:17 ` Florian Fainelli
2025-07-03 13:23 ` Thomas Gleixner
2025-06-26 14:47 ` [PATCH 07/16] PCI: mediatek-gen3: " Nam Cao
2025-07-03 13:24 ` Thomas Gleixner
2025-06-26 14:47 ` [PATCH 08/16] PCI: mediatek: " Nam Cao
2025-07-03 13:25 ` Thomas Gleixner
2025-06-26 14:47 ` [PATCH 09/16] PCI: rcar-host: " Nam Cao
2025-07-03 13:26 ` Thomas Gleixner
2025-06-26 14:48 ` [PATCH 10/16] PCI: xilinx-xdma: " Nam Cao
2025-07-03 13:27 ` Thomas Gleixner
2025-06-26 14:48 ` [PATCH 11/16] PCI: xilinx-nwl: " Nam Cao
2025-07-03 13:28 ` Thomas Gleixner
2025-06-26 14:48 ` [PATCH 12/16] PCI: xilinx: " Nam Cao
2025-07-03 13:29 ` Thomas Gleixner
2025-06-26 14:48 ` [PATCH 13/16] PCI: plda: " Nam Cao
2025-07-03 13:30 ` Thomas Gleixner
2025-06-26 14:48 ` [PATCH 14/16] PCI: hv: " Nam Cao
2025-07-03 13:33 ` Thomas Gleixner
2025-07-03 17:41 ` Michael Kelley
2025-07-03 19:59 ` Thomas Gleixner
2025-07-03 20:15 ` Michael Kelley
2025-07-03 21:00 ` Nam Cao
2025-07-03 21:52 ` Thomas Gleixner
2025-07-03 21:21 ` Thomas Gleixner
2025-07-04 2:27 ` Michael Kelley
2025-07-04 4:32 ` Nam Cao
2025-07-04 4:58 ` Michael Kelley
2025-07-05 2:52 ` kernel test robot
2025-07-05 3:51 ` Michael Kelley
2025-07-05 9:46 ` Nam Cao [this message]
2025-07-05 10:02 ` Nam Cao
2025-07-07 19:04 ` Michael Kelley
2025-06-26 14:48 ` [PATCH 15/16] PCI: vmd: Convert to lock guards Nam Cao
2025-07-03 13:34 ` Thomas Gleixner
2025-06-26 14:48 ` [PATCH 16/16] PCI: vmd: Switch to msi_create_parent_irq_domain() Nam Cao
2025-07-03 13:37 ` Thomas Gleixner
2025-07-16 18:10 ` Nirmal Patel
2025-07-16 19:41 ` Bjorn Helgaas
2025-07-16 19:52 ` Antonio Quartulli
2025-07-16 20:12 ` Nam Cao
2025-07-16 20:31 ` Bjorn Helgaas
2025-07-03 17:28 ` [PATCH 00/16] PCI: MSI parent domain conversion Bjorn Helgaas
2025-07-04 4:48 ` Nam Cao
2025-07-07 6:20 ` Manivannan Sadhasivam
2025-07-07 7:43 ` Manivannan Sadhasivam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250705094655.sEu3KWbJ@linutronix.de \
--to=namcao@linutronix.de \
--cc=Zhiqiang.Hou@nxp.com \
--cc=angelogioacchino.delregno@collabora.com \
--cc=bcm-kernel-feedback-list@broadcom.com \
--cc=bhelgaas@google.com \
--cc=daire.mcnamara@microchip.com \
--cc=decui@microsoft.com \
--cc=florian.fainelli@broadcom.com \
--cc=haiyangz@microsoft.com \
--cc=jianjun.wang@mediatek.com \
--cc=jim2101024@gmail.com \
--cc=jonathan.derrick@linux.dev \
--cc=joyce.ooi@intel.com \
--cc=kwilczynski@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-renesas-soc@vger.kernel.org \
--cc=linux-rpi-kernel@lists.infradead.org \
--cc=lpieralisi@kernel.org \
--cc=m.karthikeyan@mobiveil.co.in \
--cc=mani@kernel.org \
--cc=marek.vasut+renesas@gmail.com \
--cc=matthias.bgg@gmail.com \
--cc=maz@kernel.org \
--cc=mhklinux@outlook.com \
--cc=michal.simek@amd.com \
--cc=nirmal.patel@linux.intel.com \
--cc=nsaenz@kernel.org \
--cc=pali@kernel.org \
--cc=rjui@broadcom.com \
--cc=robh@kernel.org \
--cc=ryder.lee@mediatek.com \
--cc=sbranden@broadcom.com \
--cc=tglx@linutronix.de \
--cc=thomas.petazzoni@bootlin.com \
--cc=wei.liu@kernel.org \
--cc=yoshihiro.shimoda.uh@renesas.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).