[PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
@ 2024-11-09  5:48 Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

On ARM GIC systems and others, the target address of the MSI is translated
by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
IOMMU is disabled, the MSI address is programmed to the physical location
of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
page is behind the IOMMU, so the MSI address is programmed to an allocated
IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
When a 2-stage translation is enabled, IOVA will be still used to program
the MSI address, though the mappings will be in two stages:
  IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000
(IPA stands for Intermediate Physical Address).

If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
IOVA is dynamically allocated from the top of the IOVA space. If attached
to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.

So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
of the IOMMU translation (1-stage translation), since the IOVA for the ITS
page is fixed and known by kernel. However, with virtual machine enabling
a nested IOMMU translation (2-stage), a guest kernel directly controls the
stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
kernel can't know that guest-level IOVA to program the MSI address.

To solve this problem the VMM should capture the MSI IOVA allocated by the
guest kernel and relay it to the GIC driver in the host kernel, to program
the correct MSI IOVA. And this requires a new ioctl via VFIO.

Extend the VFIO path to allow an MSI target IOVA to be forwarded into the
kernel and pushed down to the GIC driver.

Add VFIO ioctl VFIO_IRQ_SET_ACTION_PREPARE with VFIO_IRQ_SET_DATA_MSI_IOVA
to carry the data.

The downstream calltrace is quite long from the VFIO to the ITS driver. So
in order to carry the MSI IOVA from the top to its_irq_domain_alloc(), add
patches in a leaf-to-root order:

  vfio_pci_core_ioctl:
    vfio_pci_set_irqs_ioctl:
      vfio_pci_set_msi_prepare:                           // PATCH-7
        pci_alloc_irq_vectors_iovas:                      // PATCH-6
          __pci_alloc_irq_vectors:                        // PATCH-5
            __pci_enable_msi/msix_range:                  // PATCH-4
              msi/msix_capability_init:                   // PATCH-3
                msi/msix_setup_msi_descs:
                  msi_insert_msi_desc();                  // PATCH-1
                pci_msi_setup_msi_irqs:
                  msi_domain_alloc_irqs_all_locked:
                    __msi_domain_alloc_locked:
                      __msi_domain_alloc_irqs:
                        __irq_domain_alloc_irqs:
                          irq_domain_alloc_irqs_locked:
                            irq_domain_alloc_irqs_hierarchy:
                              msi_domain_alloc:
                                irq_domain_alloc_irqs_parent:
                                  its_irq_domain_alloc(); // PATCH-2

Note that this series solves half the problem, since it only allows kernel
to set the physical PCI MSI/MSI-X on the device with the correct head IOVA
of a 2-stage translation, where the guest kernel does the stage-1 mapping
that MSI IOVA (0xEEEE0000) to its own vITS page (0x80900000) while missing
the stage-2 mapping from that IPA to the physical ITS page:
  0xEEEE0000 ===> 0x80900000 =x=> 0x20200000
A followup series should fill that gap, doing the stage-2 mapping from the
vITS page 0x80900000 to the physical ITS page (0x20200000), likely via new
IOMMUFD ioctl. Once VMM sets up this stage-2 mapping, VM will act the same
as bare metal relying on a running kernel to handle the stage-1 mapping:
  0xEEEE0000 ===> 0x80900000 ===> 0x20200000

This series (prototype) is on Github:
https://github.com/nicolinc/iommufd/commits/vfio_msi_giova-rfcv1/
It's tested by hacking the host kernel to hard-code a stage-2 mapping.

Thanks!
Nicolin

Nicolin Chen (7):
  genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell
    address
  irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset
  PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
  PCI/MSI: Allow __pci_enable_msi_range to pass in iova
  PCI/MSI: Extract a common __pci_alloc_irq_vectors function
  PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE

 drivers/pci/msi/msi.h             |   3 +-
 include/linux/msi.h               |  11 +++
 include/linux/pci.h               |  18 ++++
 include/linux/vfio_pci_core.h     |   1 +
 include/uapi/linux/vfio.h         |   8 +-
 drivers/irqchip/irq-gic-v3-its.c  |  21 ++++-
 drivers/pci/msi/api.c             | 136 ++++++++++++++++++++----------
 drivers/pci/msi/msi.c             |  20 +++--
 drivers/vfio/pci/vfio_pci_intrs.c |  41 ++++++++-
 drivers/vfio/vfio_main.c          |   3 +
 kernel/irq/msi.c                  |   6 ++
 11 files changed, 212 insertions(+), 56 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Currently, the IOVA and its mapping (to physical doorbell address) is done
by the dma-iommu via the iommu_cookie in the struct msi_desc, which is the
structure used by three parties: genirq, irqchip and dma-iommu.

However, when dealing with a nested translation on ARM64, the MSI doorbell
address is behind the SMMU (IOMMU on ARM64), thus HW needs to be programed
with the stage-1 IOVA, i.e. guest-level IOVA, rather than asking dma-iommu
to allocate one.

To support a guest-programmable pathway, first we need to make sure struct
msi_desc will allow a preset IOVA v.s. using iommu_cookie. Add an msi_iova
to the structure and init its value to PHYS_ADDR_MAX. And provide a helper
for drivers to get the msi_iova out of an msi_desc object.

A following patch will change msi_setup_msi_descs and msix_setup_msi_descs
to call msi_domain_insert_msi_desc and finish the actual value forwarding.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/msi.h | 11 +++++++++++
 kernel/irq/msi.c    |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index b10093c4d00e..873094743065 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -185,6 +185,7 @@ struct msi_desc {
 	struct irq_affinity_desc	*affinity;
 #ifdef CONFIG_IRQ_MSI_IOMMU
 	const void			*iommu_cookie;
+	dma_addr_t			msi_iova;
 #endif
 #ifdef CONFIG_SYSFS
 	struct device_attribute		*sysfs_attrs;
@@ -296,6 +297,11 @@ static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc,
 {
 	desc->iommu_cookie = iommu_cookie;
 }
+
+static inline dma_addr_t msi_desc_get_iova(struct msi_desc *desc)
+{
+	return desc->msi_iova;
+}
 #else
 static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc)
 {
@@ -306,6 +312,11 @@ static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc,
 					     const void *iommu_cookie)
 {
 }
+
+static inline dma_addr_t msi_desc_get_iova(struct msi_desc *desc)
+{
+	return PHYS_ADDR_MAX;
+}
 #endif
 
 int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 3a24d6b5f559..f3159ec0f036 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -81,6 +81,9 @@ static struct msi_desc *msi_alloc_desc(struct device *dev, int nvec,
 
 	desc->dev = dev;
 	desc->nvec_used = nvec;
+#ifdef CONFIG_IRQ_MSI_IOMMU
+	desc->msi_iova = PHYS_ADDR_MAX;
+#endif
 	if (affinity) {
 		desc->affinity = kmemdup_array(affinity, nvec, sizeof(*desc->affinity), GFP_KERNEL);
 		if (!desc->affinity) {
@@ -158,6 +161,9 @@ int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid,
 
 	/* Copy type specific data to the new descriptor. */
 	desc->pci = init_desc->pci;
+#ifdef CONFIG_IRQ_MSI_IOMMU
+	desc->msi_iova = init_desc->msi_iova;
+#endif
 
 	return msi_insert_desc(dev, desc, domid, init_desc->msi_index);
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Now struct msi_desc can carry a preset IOVA for MSI doorbell address. This
is typically preset by user space when engaging a 2-stage translation. So,
use the preset IOVA instead of kernel-level IOVA allocations in dma-iommu.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index ab597e74ba08..bc1768576546 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1723,6 +1723,8 @@ static u64 its_irq_get_msi_base(struct its_device *its_dev)
 static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg)
 {
 	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
+	struct msi_desc *desc = irq_data_get_msi_desc(d);
+	dma_addr_t iova = msi_desc_get_iova(desc);
 	struct its_node *its;
 	u64 addr;
 
@@ -1733,7 +1735,13 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg)
 	msg->address_hi		= upper_32_bits(addr);
 	msg->data		= its_get_event_id(d);
 
-	iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg);
+	/* Bypass iommu_dma_compose_msi_msg if msi_iova is preset */
+	if (iova == PHYS_ADDR_MAX) {
+		iommu_dma_compose_msi_msg(desc, msg);
+	} else {
+		msg->address_lo = lower_32_bits(iova);
+		msg->address_hi = upper_32_bits(iova);
+	}
 }
 
 static int its_irq_set_irqchip_state(struct irq_data *d,
@@ -3570,6 +3578,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 {
 	msi_alloc_info_t *info = args;
 	struct its_device *its_dev = info->scratchpad[0].ptr;
+	dma_addr_t iova = msi_desc_get_iova(info->desc);
 	struct its_node *its = its_dev->its;
 	struct irq_data *irqd;
 	irq_hw_number_t hwirq;
@@ -3580,9 +3589,13 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (err)
 		return err;
 
-	err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
-	if (err)
-		return err;
+	/* Bypass iommu_dma_prepare_msi if msi_iova is preset */
+	if (iova == PHYS_ADDR_MAX) {
+		err = iommu_dma_prepare_msi(info->desc,
+					    its->get_msi_base(its_dev));
+		if (err)
+			return err;
+	}
 
 	for (i = 0; i < nr_irqs; i++) {
 		err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

msi_setup_msi_descs and msix_setup_msi_descs are the two callers of genirq
helper msi_domain_insert_msi_desc that is now ready for a preset msi_iova.

So, do the same in these two callers. Note MSIx supports multiple entries,
so use struct msix_entry to pass msi_iova in.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/pci.h   |  1 +
 drivers/pci/msi/msi.c | 18 ++++++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 573b4c4c2be6..68ebb9d42f7f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1652,6 +1652,7 @@ int pci_set_vga_state(struct pci_dev *pdev, bool decode,
 struct msix_entry {
 	u32	vector;	/* Kernel uses to write allocated vector */
 	u16	entry;	/* Driver uses to specify entry, OS writes */
+	u64	iova;	/* Kernel uses to override doorbell address */
 };
 
 #ifdef CONFIG_PCI_MSI
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 3a45879d85db..95caa81d3421 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -282,7 +282,7 @@ static void pci_msi_set_enable(struct pci_dev *dev, int enable)
 	pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
 }
 
-static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
+static int msi_setup_msi_desc(struct pci_dev *dev, int nvec, dma_addr_t iova,
 			      struct irq_affinity_desc *masks)
 {
 	struct msi_desc desc;
@@ -312,6 +312,10 @@ static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
 	else
 		desc.pci.mask_pos = dev->msi_cap + PCI_MSI_MASK_32;
 
+#ifdef CONFIG_IRQ_MSI_IOMMU
+	desc.msi_iova = iova;
+#endif
+
 	/* Save the initial mask status */
 	if (desc.pci.msi_attrib.can_mask)
 		pci_read_config_dword(dev, desc.pci.mask_pos, &desc.pci.msi_mask);
@@ -349,7 +353,7 @@ static int msi_verify_entries(struct pci_dev *dev)
  * which could have been allocated.
  */
 static int msi_capability_init(struct pci_dev *dev, int nvec,
-			       struct irq_affinity *affd)
+			       struct irq_affinity *affd, dma_addr_t iova)
 {
 	struct irq_affinity_desc *masks = NULL;
 	struct msi_desc *entry, desc;
@@ -370,7 +374,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
 		masks = irq_create_affinity_masks(nvec, affd);
 
 	msi_lock_descs(&dev->dev);
-	ret = msi_setup_msi_desc(dev, nvec, masks);
+	ret = msi_setup_msi_desc(dev, nvec, iova, masks);
 	if (ret)
 		goto fail;
 
@@ -456,7 +460,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 				return -ENOSPC;
 		}
 
-		rc = msi_capability_init(dev, nvec, affd);
+		rc = msi_capability_init(dev, nvec, affd, PHYS_ADDR_MAX);
 		if (rc == 0)
 			return nvec;
 
@@ -625,6 +629,12 @@ static int msix_setup_msi_descs(struct pci_dev *dev, struct msix_entry *entries,
 	memset(&desc, 0, sizeof(desc));
 
 	for (i = 0, curmsk = masks; i < nvec; i++, curmsk++) {
+#ifdef CONFIG_IRQ_MSI_IOMMU
+		if (entries && entries[i].iova)
+			desc.msi_iova = (dma_addr_t)entries[i].iova;
+		else
+			desc.msi_iova = PHYS_ADDR_MAX;
+#endif
 		desc.msi_index = entries ? entries[i].entry : i;
 		desc.affinity = masks ? curmsk : NULL;
 		desc.pci.msi_attrib.is_virtual = desc.msi_index >= vec_count;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (2 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-11  9:30   ` Andy Shevchenko
  2024-11-09  5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

The previous patch passes in the msi_iova to msi_capability_init, so this
allows its caller to do the same.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/pci/msi/msi.h | 3 ++-
 drivers/pci/msi/api.c | 6 ++++--
 drivers/pci/msi/msi.c | 4 ++--
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
index ee53cf079f4e..8009d69bf9a5 100644
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -93,7 +93,8 @@ extern int pci_msi_enable;
 void pci_msi_shutdown(struct pci_dev *dev);
 void pci_msix_shutdown(struct pci_dev *dev);
 void pci_free_msi_irqs(struct pci_dev *dev);
-int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, struct irq_affinity *affd);
+int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
+			   struct irq_affinity *affd, dma_addr_t iova);
 int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
 			    int maxvec,  struct irq_affinity *affd, int flags);
 void __pci_restore_msi_state(struct pci_dev *dev);
diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index b956ce591f96..99ade7f69cd4 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -29,7 +29,8 @@
  */
 int pci_enable_msi(struct pci_dev *dev)
 {
-	int rc = __pci_enable_msi_range(dev, 1, 1, NULL);
+	int rc = __pci_enable_msi_range(dev, 1, 1, NULL,
+					PHYS_ADDR_MAX);
 	if (rc < 0)
 		return rc;
 	return 0;
@@ -274,7 +275,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	}
 
 	if (flags & PCI_IRQ_MSI) {
-		nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
+		nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
+					       affd, PHYS_ADDR_MAX);
 		if (nvecs > 0)
 			return nvecs;
 	}
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 95caa81d3421..25da0435c674 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -417,7 +417,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
 }
 
 int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
-			   struct irq_affinity *affd)
+			   struct irq_affinity *affd, dma_addr_t iova)
 {
 	int nvec;
 	int rc;
@@ -460,7 +460,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 				return -ENOSPC;
 		}
 
-		rc = msi_capability_init(dev, nvec, affd, PHYS_ADDR_MAX);
+		rc = msi_capability_init(dev, nvec, affd, iova);
 		if (rc == 0)
 			return nvec;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (3 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-11  9:33   ` Andy Shevchenko
  2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Extract a common function from the existing callers, to prepare for a new
helper that provides an array of msi_iovas. Also, extract the msi_iova(s)
from the array and pass in properly down to __pci_enable_msi/msix_range().

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/pci/msi/api.c | 113 ++++++++++++++++++++++++++----------------
 1 file changed, 70 insertions(+), 43 deletions(-)

diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index 99ade7f69cd4..dff3d7350b38 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -204,6 +204,72 @@ void pci_disable_msix(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pci_disable_msix);
 
+static int __pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
+				   unsigned int max_vecs, unsigned int flags,
+				   struct irq_affinity *affd,
+				   dma_addr_t *msi_iovas)
+{
+	struct irq_affinity msi_default_affd = {0};
+	int nvecs = -ENOSPC;
+
+	if (flags & PCI_IRQ_AFFINITY) {
+		if (!affd)
+			affd = &msi_default_affd;
+	} else {
+		if (WARN_ON(affd))
+			affd = NULL;
+	}
+
+	if (flags & PCI_IRQ_MSIX) {
+		struct msix_entry *entries = NULL;
+
+		if (msi_iovas) {
+			int count = max_vecs - min_vecs + 1;
+			int i;
+
+			entries = kcalloc(max_vecs - min_vecs + 1,
+					  sizeof(*entries), GFP_KERNEL);
+			if (!entries)
+				return -ENOMEM;
+			for (i = 0; i < count; i++) {
+				entries[i].entry = i;
+				entries[i].iova = msi_iovas[i];
+			}
+		}
+
+		nvecs = __pci_enable_msix_range(dev, entries, min_vecs,
+						max_vecs, affd, flags);
+		kfree(entries);
+		if (nvecs > 0)
+			return nvecs;
+	}
+
+	if (flags & PCI_IRQ_MSI) {
+		nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd,
+					       msi_iovas ? *msi_iovas :
+							   PHYS_ADDR_MAX);
+		if (nvecs > 0)
+			return nvecs;
+	}
+
+	/* use INTx IRQ if allowed */
+	if (flags & PCI_IRQ_INTX) {
+		if (min_vecs == 1 && dev->irq) {
+			/*
+			 * Invoke the affinity spreading logic to ensure that
+			 * the device driver can adjust queue configuration
+			 * for the single interrupt case.
+			 */
+			if (affd)
+				irq_create_affinity_masks(1, affd);
+			pci_intx(dev, 1);
+			return 1;
+		}
+	}
+
+	return nvecs;
+}
+
 /**
  * pci_alloc_irq_vectors() - Allocate multiple device interrupt vectors
  * @dev:      the PCI device to operate on
@@ -235,8 +301,8 @@ EXPORT_SYMBOL(pci_disable_msix);
 int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 			  unsigned int max_vecs, unsigned int flags)
 {
-	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
-					      flags, NULL);
+	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
+				       flags, NULL, NULL);
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors);
 
@@ -256,47 +322,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   unsigned int max_vecs, unsigned int flags,
 				   struct irq_affinity *affd)
 {
-	struct irq_affinity msi_default_affd = {0};
-	int nvecs = -ENOSPC;
-
-	if (flags & PCI_IRQ_AFFINITY) {
-		if (!affd)
-			affd = &msi_default_affd;
-	} else {
-		if (WARN_ON(affd))
-			affd = NULL;
-	}
-
-	if (flags & PCI_IRQ_MSIX) {
-		nvecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
-						affd, flags);
-		if (nvecs > 0)
-			return nvecs;
-	}
-
-	if (flags & PCI_IRQ_MSI) {
-		nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
-					       affd, PHYS_ADDR_MAX);
-		if (nvecs > 0)
-			return nvecs;
-	}
-
-	/* use INTx IRQ if allowed */
-	if (flags & PCI_IRQ_INTX) {
-		if (min_vecs == 1 && dev->irq) {
-			/*
-			 * Invoke the affinity spreading logic to ensure that
-			 * the device driver can adjust queue configuration
-			 * for the single interrupt case.
-			 */
-			if (affd)
-				irq_create_affinity_masks(1, affd);
-			pci_intx(dev, 1);
-			return 1;
-		}
-	}
-
-	return nvecs;
+	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
+				       flags, affd, NULL);
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (4 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-11  9:34   ` Andy Shevchenko
  2024-11-09  5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen
  2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
  7 siblings, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas,
which is a list of preset IOVAs for MSI doorbell addresses.

Add a helper that would pass in a list. A following patch will call this
to forward msi_iovas from user space.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/pci.h   | 17 +++++++++++++++++
 drivers/pci/msi/api.c | 21 +++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 68ebb9d42f7f..6423bee3b207 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1678,6 +1678,9 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   unsigned int max_vecs, unsigned int flags,
 				   struct irq_affinity *affd);
+int pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
+				unsigned int max_vecs, unsigned int flags,
+				dma_addr_t *msi_iovas);
 
 bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
 struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
@@ -1714,6 +1717,13 @@ pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	return -ENOSPC;
 }
 static inline int
+pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
+			    unsigned int max_vecs, unsigned int flags,
+			    dma_addr_t *msi_iovas)
+
+	return -ENOSPC; /* No support if !CONFIG_PCI_MSI */
+}
+static inline int
 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		      unsigned int max_vecs, unsigned int flags)
 {
@@ -2068,6 +2078,13 @@ pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	return -ENOSPC;
 }
 static inline int
+pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
+			    unsigned int max_vecs, unsigned int flags,
+			    dma_addr_t *msi_iovas)
+{
+	return -ENOSPC;
+}
+static inline int
 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		      unsigned int max_vecs, unsigned int flags)
 {
diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index dff3d7350b38..4e90ef8f571c 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -327,6 +327,27 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
 
+/**
+ * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt
+ *                                 vectors with preset msi_iovas
+ * @dev:       the PCI device to operate on
+ * @min_vecs:  minimum required number of vectors (must be >= 1)
+ * @max_vecs:  maximum desired number of vectors
+ * @flags:     allocation flags, as in pci_alloc_irq_vectors()
+ * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs]
+ *
+ * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter.
+ * Check that function docs, and &struct irq_affinity, for more details.
+ */
+int pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
+				unsigned int max_vecs, unsigned int flags,
+				dma_addr_t *msi_iovas)
+{
+	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
+				       flags, NULL, msi_iovas);
+}
+EXPORT_SYMBOL(pci_alloc_irq_vectors_iovas);
+
 /**
  * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
  * @dev: the PCI device to operate on
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (5 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
  7 siblings, 0 replies; 22+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Add a new VFIO_IRQ_SET_ACTION_PREPARE to set VFIO_IRQ_SET_DATA_MSI_IOVA,
giving user space an interface to forward to kernel the stage-1 IOVA (of
a 2-stage translation: IOVA->IPA->PA) for an MSI doorbell address, since
the ITS hardware needs to be programmed with the top level IOVA address,
in order to work with the IOMMU on ARM64.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/vfio_pci_core.h     |  1 +
 include/uapi/linux/vfio.h         |  8 ++++--
 drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++++++++++++++++++++++++-
 drivers/vfio/vfio_main.c          |  3 +++
 4 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index fbb472dd99b3..08027b8331f0 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -63,6 +63,7 @@ struct vfio_pci_core_device {
 	int			irq_type;
 	int			num_regions;
 	struct vfio_pci_region	*region;
+	dma_addr_t		*msi_iovas;
 	u8			msi_qmax;
 	u8			msix_bar;
 	u16			msix_size;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 2b68e6cdf190..d6be351abcde 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -590,6 +590,8 @@ struct vfio_irq_set {
 #define VFIO_IRQ_SET_ACTION_MASK	(1 << 3) /* Mask interrupt */
 #define VFIO_IRQ_SET_ACTION_UNMASK	(1 << 4) /* Unmask interrupt */
 #define VFIO_IRQ_SET_ACTION_TRIGGER	(1 << 5) /* Trigger interrupt */
+#define VFIO_IRQ_SET_DATA_MSI_IOVA	(1 << 6) /* Data is MSI IOVA (u64) */
+#define VFIO_IRQ_SET_ACTION_PREPARE	(1 << 7) /* Prepare interrupt */
 	__u32	index;
 	__u32	start;
 	__u32	count;
@@ -599,10 +601,12 @@ struct vfio_irq_set {
 
 #define VFIO_IRQ_SET_DATA_TYPE_MASK	(VFIO_IRQ_SET_DATA_NONE | \
 					 VFIO_IRQ_SET_DATA_BOOL | \
-					 VFIO_IRQ_SET_DATA_EVENTFD)
+					 VFIO_IRQ_SET_DATA_EVENTFD | \
+					 VFIO_IRQ_SET_DATA_MSI_IOVA)
 #define VFIO_IRQ_SET_ACTION_TYPE_MASK	(VFIO_IRQ_SET_ACTION_MASK | \
 					 VFIO_IRQ_SET_ACTION_UNMASK | \
-					 VFIO_IRQ_SET_ACTION_TRIGGER)
+					 VFIO_IRQ_SET_ACTION_TRIGGER | \
+					 VFIO_IRQ_SET_ACTION_PREPARE)
 /**
  * VFIO_DEVICE_RESET - _IO(VFIO_TYPE, VFIO_BASE + 11)
  *
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 8382c5834335..18bcdc5b1ef5 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -383,7 +383,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 
 	/* return the number of supported vectors if we can't get all: */
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
-	ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
+	ret = pci_alloc_irq_vectors_iovas(pdev, 1, nvec, flag, vdev->msi_iovas);
 	if (ret < nvec) {
 		if (ret > 0)
 			pci_free_irq_vectors(pdev);
@@ -685,6 +685,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 
 	if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
 		vfio_msi_disable(vdev, msix);
+		/* FIXME we need a better cleanup routine */
+		kfree(vdev->msi_iovas);
+		vdev->msi_iovas = NULL;
 		return 0;
 	}
 
@@ -728,6 +731,39 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 	return 0;
 }
 
+static int vfio_pci_set_msi_prepare(struct vfio_pci_core_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	uint64_t *iovas = data;
+	unsigned int i;
+
+	if (!(irq_is(vdev, index) || is_irq_none(vdev)))
+		return -EINVAL;
+
+	if (flags & VFIO_IRQ_SET_DATA_NONE) {
+		if (!count)
+			return -EINVAL;
+		/* FIXME support partial unset */
+		kfree(vdev->msi_iovas);
+		vdev->msi_iovas = NULL;
+		return 0;
+	}
+
+	if (!(flags & VFIO_IRQ_SET_DATA_MSI_IOVA))
+		return -EOPNOTSUPP;
+	if (!IS_ENABLED(CONFIG_IRQ_MSI_IOMMU))
+		return -EOPNOTSUPP;
+	if (!vdev->msi_iovas)
+		vdev->msi_iovas = kcalloc(count, sizeof(dma_addr_t), GFP_KERNEL);
+	if (!vdev->msi_iovas)
+		return -ENOMEM;
+	for (i = 0; i < count; i++)
+		vdev->msi_iovas[i] = iovas[i];
+
+	return 0;
+}
+
 static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx,
 					   unsigned int count, uint32_t flags,
 					   void *data)
@@ -837,6 +873,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
 			func = vfio_pci_set_msi_trigger;
 			break;
+		case VFIO_IRQ_SET_ACTION_PREPARE:
+			func = vfio_pci_set_msi_prepare;
+			break;
 		}
 		break;
 	case VFIO_PCI_ERR_IRQ_INDEX:
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a5a62d9d963f..61211c082a64 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1554,6 +1554,9 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
 	case VFIO_IRQ_SET_DATA_EVENTFD:
 		size = sizeof(int32_t);
 		break;
+	case VFIO_IRQ_SET_DATA_MSI_IOVA:
+		size = sizeof(uint64_t);
+		break;
 	default:
 		return -EINVAL;
 	}
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova
  2024-11-09  5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
@ 2024-11-11  9:30   ` Andy Shevchenko
  0 siblings, 0 replies; 22+ messages in thread
From: Andy Shevchenko @ 2024-11-11  9:30 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian,
	smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Fri, Nov 08, 2024 at 09:48:49PM -0800, Nicolin Chen wrote:
> The previous patch passes in the msi_iova to msi_capability_init, so this

msi_capability_init()

> allows its caller to do the same.

-- 
With Best Regards,
Andy Shevchenko




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function
  2024-11-09  5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
@ 2024-11-11  9:33   ` Andy Shevchenko
  0 siblings, 0 replies; 22+ messages in thread
From: Andy Shevchenko @ 2024-11-11  9:33 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian,
	smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Fri, Nov 08, 2024 at 09:48:50PM -0800, Nicolin Chen wrote:
> Extract a common function from the existing callers, to prepare for a new
> helper that provides an array of msi_iovas. Also, extract the msi_iova(s)
> from the array and pass in properly down to __pci_enable_msi/msix_range().

...

> +	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
> +				       flags, NULL, NULL);

Even if you so strict about 80 character limit, this whole line is only 83
characters, which is okay to have.

...

> +	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
> +				       flags, affd, NULL);

In the similar way, and other lines as well.

-- 
With Best Regards,
Andy Shevchenko




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
@ 2024-11-11  9:34   ` Andy Shevchenko
  2024-11-12 22:14     ` Nicolin Chen
  0 siblings, 1 reply; 22+ messages in thread
From: Andy Shevchenko @ 2024-11-11  9:34 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian,
	smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Fri, Nov 08, 2024 at 09:48:51PM -0800, Nicolin Chen wrote:
> Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas,
> which is a list of preset IOVAs for MSI doorbell addresses.
> 
> Add a helper that would pass in a list. A following patch will call this
> to forward msi_iovas from user space.

...

> +/**
> + * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt
> + *                                 vectors with preset msi_iovas
> + * @dev:       the PCI device to operate on
> + * @min_vecs:  minimum required number of vectors (must be >= 1)
> + * @max_vecs:  maximum desired number of vectors
> + * @flags:     allocation flags, as in pci_alloc_irq_vectors()
> + * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs]
> + *
> + * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter.
> + * Check that function docs, and &struct irq_affinity, for more details.
> + */

Always validate your kernel-doc descriptions

	scripts/kernel-doc -Wall -none -v ...

will give you a warning here.

-- 
With Best Regards,
Andy Shevchenko




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (6 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen
@ 2024-11-11 13:09 ` Robin Murphy
  2024-11-11 14:14   ` Marc Zyngier
  2024-11-12 21:54   ` Nicolin Chen
  7 siblings, 2 replies; 22+ messages in thread
From: Robin Murphy @ 2024-11-11 13:09 UTC (permalink / raw)
  To: Nicolin Chen, maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian,
	smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile,
	yebin10, brauner, apatel, shivamurthy.shastri, anna-maria,
	nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel,
	linux-pci, kvm

On 2024-11-09 5:48 am, Nicolin Chen wrote:
> On ARM GIC systems and others, the target address of the MSI is translated
> by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
> IOMMU is disabled, the MSI address is programmed to the physical location
> of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
> page is behind the IOMMU, so the MSI address is programmed to an allocated
> IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
> the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
> When a 2-stage translation is enabled, IOVA will be still used to program
> the MSI address, though the mappings will be in two stages:
>    IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000
> (IPA stands for Intermediate Physical Address).
> 
> If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
> IOVA is dynamically allocated from the top of the IOVA space. If attached
> to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
> fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
> which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
> 
> So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
> of the IOMMU translation (1-stage translation), since the IOVA for the ITS
> page is fixed and known by kernel. However, with virtual machine enabling
> a nested IOMMU translation (2-stage), a guest kernel directly controls the
> stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
> IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
> kernel can't know that guest-level IOVA to program the MSI address.
> 
> To solve this problem the VMM should capture the MSI IOVA allocated by the
> guest kernel and relay it to the GIC driver in the host kernel, to program
> the correct MSI IOVA. And this requires a new ioctl via VFIO.

Once VFIO has that information from userspace, though, do we really need 
the whole complicated dance to push it right down into the irqchip layer 
just so it can be passed back up again? AFAICS 
vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly 
rewrites MSI-X vectors, so it seems like it should be pretty 
straightforward to override the message address in general at that 
level, without the lower layers having to be aware at all, no?

Thanks,
Robin.

> Extend the VFIO path to allow an MSI target IOVA to be forwarded into the
> kernel and pushed down to the GIC driver.
> 
> Add VFIO ioctl VFIO_IRQ_SET_ACTION_PREPARE with VFIO_IRQ_SET_DATA_MSI_IOVA
> to carry the data.
> 
> The downstream calltrace is quite long from the VFIO to the ITS driver. So
> in order to carry the MSI IOVA from the top to its_irq_domain_alloc(), add
> patches in a leaf-to-root order:
> 
>    vfio_pci_core_ioctl:
>      vfio_pci_set_irqs_ioctl:
>        vfio_pci_set_msi_prepare:                           // PATCH-7
>          pci_alloc_irq_vectors_iovas:                      // PATCH-6
>            __pci_alloc_irq_vectors:                        // PATCH-5
>              __pci_enable_msi/msix_range:                  // PATCH-4
>                msi/msix_capability_init:                   // PATCH-3
>                  msi/msix_setup_msi_descs:
>                    msi_insert_msi_desc();                  // PATCH-1
>                  pci_msi_setup_msi_irqs:
>                    msi_domain_alloc_irqs_all_locked:
>                      __msi_domain_alloc_locked:
>                        __msi_domain_alloc_irqs:
>                          __irq_domain_alloc_irqs:
>                            irq_domain_alloc_irqs_locked:
>                              irq_domain_alloc_irqs_hierarchy:
>                                msi_domain_alloc:
>                                  irq_domain_alloc_irqs_parent:
>                                    its_irq_domain_alloc(); // PATCH-2
> 
> Note that this series solves half the problem, since it only allows kernel
> to set the physical PCI MSI/MSI-X on the device with the correct head IOVA
> of a 2-stage translation, where the guest kernel does the stage-1 mapping
> that MSI IOVA (0xEEEE0000) to its own vITS page (0x80900000) while missing
> the stage-2 mapping from that IPA to the physical ITS page:
>    0xEEEE0000 ===> 0x80900000 =x=> 0x20200000
> A followup series should fill that gap, doing the stage-2 mapping from the
> vITS page 0x80900000 to the physical ITS page (0x20200000), likely via new
> IOMMUFD ioctl. Once VMM sets up this stage-2 mapping, VM will act the same
> as bare metal relying on a running kernel to handle the stage-1 mapping:
>    0xEEEE0000 ===> 0x80900000 ===> 0x20200000
> 
> This series (prototype) is on Github:
> https://github.com/nicolinc/iommufd/commits/vfio_msi_giova-rfcv1/
> It's tested by hacking the host kernel to hard-code a stage-2 mapping.
> 
> Thanks!
> Nicolin
> 
> Nicolin Chen (7):
>    genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell
>      address
>    irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset
>    PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
>    PCI/MSI: Allow __pci_enable_msi_range to pass in iova
>    PCI/MSI: Extract a common __pci_alloc_irq_vectors function
>    PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
>    vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE
> 
>   drivers/pci/msi/msi.h             |   3 +-
>   include/linux/msi.h               |  11 +++
>   include/linux/pci.h               |  18 ++++
>   include/linux/vfio_pci_core.h     |   1 +
>   include/uapi/linux/vfio.h         |   8 +-
>   drivers/irqchip/irq-gic-v3-its.c  |  21 ++++-
>   drivers/pci/msi/api.c             | 136 ++++++++++++++++++++----------
>   drivers/pci/msi/msi.c             |  20 +++--
>   drivers/vfio/pci/vfio_pci_intrs.c |  41 ++++++++-
>   drivers/vfio/vfio_main.c          |   3 +
>   kernel/irq/msi.c                  |   6 ++
>   11 files changed, 212 insertions(+), 56 deletions(-)
> 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
@ 2024-11-11 14:14   ` Marc Zyngier
  2024-11-12 22:13     ` Nicolin Chen
  2024-11-12 21:54   ` Nicolin Chen
  1 sibling, 1 reply; 22+ messages in thread
From: Marc Zyngier @ 2024-11-11 14:14 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolin Chen, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Mon, 11 Nov 2024 13:09:20 +0000,
Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > On ARM GIC systems and others, the target address of the MSI is translated
> > by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
> > IOMMU is disabled, the MSI address is programmed to the physical location
> > of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
> > page is behind the IOMMU, so the MSI address is programmed to an allocated
> > IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
> > the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
> > When a 2-stage translation is enabled, IOVA will be still used to program
> > the MSI address, though the mappings will be in two stages:
> >    IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000
> > (IPA stands for Intermediate Physical Address).
> > 
> > If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
> > IOVA is dynamically allocated from the top of the IOVA space. If attached
> > to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
> > fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
> > which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
> > 
> > So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
> > of the IOMMU translation (1-stage translation), since the IOVA for the ITS
> > page is fixed and known by kernel. However, with virtual machine enabling
> > a nested IOMMU translation (2-stage), a guest kernel directly controls the
> > stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
> > IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
> > kernel can't know that guest-level IOVA to program the MSI address.
> > 
> > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > guest kernel and relay it to the GIC driver in the host kernel, to program
> > the correct MSI IOVA. And this requires a new ioctl via VFIO.
> 
> Once VFIO has that information from userspace, though, do we really
> need the whole complicated dance to push it right down into the
> irqchip layer just so it can be passed back up again? AFAICS
> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already
> explicitly rewrites MSI-X vectors, so it seems like it should be
> pretty straightforward to override the message address in general at
> that level, without the lower layers having to be aware at all, no?

+1.

I would like to avoid polluting each and every interrupt controller
with usage-specific knowledge (they usually are brain-damaged enough).
We already have an indirection into the IOMMU subsystem and it
shouldn't be a big deal to intercept the message for all
implementations at this level.

I also wonder how to handle the case of braindead^Wwonderful platforms
where ITS transactions are not translated by the SMMU. Somehow, VFIO
should be made aware of this situation.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
  2024-11-11 14:14   ` Marc Zyngier
@ 2024-11-12 21:54   ` Nicolin Chen
  2024-11-13  1:34     ` Jason Gunthorpe
  1 sibling, 1 reply; 22+ messages in thread
From: Nicolin Chen @ 2024-11-12 21:54 UTC (permalink / raw)
  To: Robin Murphy
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
> On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > guest kernel and relay it to the GIC driver in the host kernel, to program
> > the correct MSI IOVA. And this requires a new ioctl via VFIO.
> 
> Once VFIO has that information from userspace, though, do we really need
> the whole complicated dance to push it right down into the irqchip layer
> just so it can be passed back up again? AFAICS
> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
> rewrites MSI-X vectors, so it seems like it should be pretty
> straightforward to override the message address in general at that
> level, without the lower layers having to be aware at all, no?

Didn't see that clearly!! It works with a simple following override:
--------------------------------------------------------------------
@@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
                struct msi_msg msg;

                get_cached_msi_msg(irq, &msg);
+               if (vdev->msi_iovas) {
+                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
+                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
+               }
                pci_write_msi_msg(irq, &msg);
        }
 
--------------------------------------------------------------------

With that, I think we only need one VFIO change for this part :)

Thanks!
Nicolin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-11 14:14   ` Marc Zyngier
@ 2024-11-12 22:13     ` Nicolin Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Nicolin Chen @ 2024-11-12 22:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Robin Murphy, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Mon, Nov 11, 2024 at 02:14:15PM +0000, Marc Zyngier wrote:
> On Mon, 11 Nov 2024 13:09:20 +0000,
> Robin Murphy <robin.murphy@arm.com> wrote:
> > On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > > guest kernel and relay it to the GIC driver in the host kernel, to program
> > > the correct MSI IOVA. And this requires a new ioctl via VFIO.
> >
> > Once VFIO has that information from userspace, though, do we really
> > need the whole complicated dance to push it right down into the
> > irqchip layer just so it can be passed back up again? AFAICS
> > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already
> > explicitly rewrites MSI-X vectors, so it seems like it should be
> > pretty straightforward to override the message address in general at
> > that level, without the lower layers having to be aware at all, no?
> 
> +1.
> 
> I would like to avoid polluting each and every interrupt controller
> with usage-specific knowledge (they usually are brain-damaged enough).
> We already have an indirection into the IOMMU subsystem and it
> shouldn't be a big deal to intercept the message for all
> implementations at this level.
> 
> I also wonder how to handle the case of braindead^Wwonderful platforms
> where ITS transactions are not translated by the SMMU. Somehow, VFIO
> should be made aware of this situation.

Perhaps we should do iommu_get_domain_for_dev(&vdev->pdev->dev) and
check the returned domain->type:
 * if (domain->type & __IOMMU_DOMAIN_PAGING): 1-stage translation
 * if (domain->type == IOMMU_DOMAIN_NESTED): 2-stage translation

And for this particular topic/series, we should do something like:
	if (vdev->msi_iovas && domain->type == IOMMU_DOMAIN_NESTED) {
		msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
		msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
	}
?

Thanks
Nicolin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  2024-11-11  9:34   ` Andy Shevchenko
@ 2024-11-12 22:14     ` Nicolin Chen
  0 siblings, 0 replies; 22+ messages in thread
From: Nicolin Chen @ 2024-11-12 22:14 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian,
	smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Mon, Nov 11, 2024 at 11:34:47AM +0200, Andy Shevchenko wrote:
> On Fri, Nov 08, 2024 at 09:48:51PM -0800, Nicolin Chen wrote:
> > Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas,
> > which is a list of preset IOVAs for MSI doorbell addresses.
> >
> > Add a helper that would pass in a list. A following patch will call this
> > to forward msi_iovas from user space.
> 
> ...
> 
> > +/**
> > + * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt
> > + *                                 vectors with preset msi_iovas
> > + * @dev:       the PCI device to operate on
> > + * @min_vecs:  minimum required number of vectors (must be >= 1)
> > + * @max_vecs:  maximum desired number of vectors
> > + * @flags:     allocation flags, as in pci_alloc_irq_vectors()
> > + * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs]
> > + *
> > + * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter.
> > + * Check that function docs, and &struct irq_affinity, for more details.
> > + */
> 
> Always validate your kernel-doc descriptions
> 
>         scripts/kernel-doc -Wall -none -v ...
> 
> will give you a warning here.

Will do in next round.

Thanks!
Nicolin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-12 21:54   ` Nicolin Chen
@ 2024-11-13  1:34     ` Jason Gunthorpe
  2024-11-13 21:11       ` Alex Williamson
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2024-11-13  1:34 UTC (permalink / raw)
  To: Nicolin Chen, tglx, alex.williamson
  Cc: Robin Murphy, maz, bhelgaas, leonro, shameerali.kolothum.thodi,
	dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
> > On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > > guest kernel and relay it to the GIC driver in the host kernel, to program
> > > the correct MSI IOVA. And this requires a new ioctl via VFIO.
> > 
> > Once VFIO has that information from userspace, though, do we really need
> > the whole complicated dance to push it right down into the irqchip layer
> > just so it can be passed back up again? AFAICS
> > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
> > rewrites MSI-X vectors, so it seems like it should be pretty
> > straightforward to override the message address in general at that
> > level, without the lower layers having to be aware at all, no?
> 
> Didn't see that clearly!! It works with a simple following override:
> --------------------------------------------------------------------
> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>                 struct msi_msg msg;
> 
>                 get_cached_msi_msg(irq, &msg);
> +               if (vdev->msi_iovas) {
> +                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
> +                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
> +               }
>                 pci_write_msi_msg(irq, &msg);
>         }
>  
> --------------------------------------------------------------------
> 
> With that, I think we only need one VFIO change for this part :)

Wow, is that really OK from a layering perspective? The comment is
pretty clear on the intention that this is to resync the irq layer
view of the device with the physical HW.

Editing the msi_msg while doing that resync smells bad.

Also, this is only doing MSI-X, we should include normal MSI as
well. (it probably should have a resync too?)

I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
context)

I think there are many options here we just need to get a clearer
understanding what best fits the architecture of the interrupt
subsystem.

Jason


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-13  1:34     ` Jason Gunthorpe
@ 2024-11-13 21:11       ` Alex Williamson
  2024-11-14 15:35         ` Robin Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2024-11-13 21:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, tglx, Robin Murphy, maz, bhelgaas, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Tue, 12 Nov 2024 21:34:30 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
> > On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:  
> > > On 2024-11-09 5:48 am, Nicolin Chen wrote:  
> > > > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > > > guest kernel and relay it to the GIC driver in the host kernel, to program
> > > > the correct MSI IOVA. And this requires a new ioctl via VFIO.  
> > > 
> > > Once VFIO has that information from userspace, though, do we really need
> > > the whole complicated dance to push it right down into the irqchip layer
> > > just so it can be passed back up again? AFAICS
> > > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
> > > rewrites MSI-X vectors, so it seems like it should be pretty
> > > straightforward to override the message address in general at that
> > > level, without the lower layers having to be aware at all, no?  
> > 
> > Didn't see that clearly!! It works with a simple following override:
> > --------------------------------------------------------------------
> > @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >                 struct msi_msg msg;
> > 
> >                 get_cached_msi_msg(irq, &msg);
> > +               if (vdev->msi_iovas) {
> > +                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
> > +                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
> > +               }
> >                 pci_write_msi_msg(irq, &msg);
> >         }
> >  
> > --------------------------------------------------------------------
> > 
> > With that, I think we only need one VFIO change for this part :)  
> 
> Wow, is that really OK from a layering perspective? The comment is
> pretty clear on the intention that this is to resync the irq layer
> view of the device with the physical HW.
> 
> Editing the msi_msg while doing that resync smells bad.
> 
> Also, this is only doing MSI-X, we should include normal MSI as
> well. (it probably should have a resync too?)

This was added for a specific IBM HBA that clears the vector table
during a built-in self test, so it's possible the MSI table being in
config space never had the same issue, or we just haven't encountered
it.  I don't expect anything else actually requires this.

> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
> context)

It seems suspect to me too.  In a sense it is still just synchronizing
the MSI address, but to a different address space.

Is it possible to do this with the existing write_msi_msg callback on
the msi descriptor?  For instance we could simply translate the msg
address and call pci_write_msi_msg() (while avoiding an infinite
recursion).  Or maybe there should be an xlate_msi_msg callback we can
register.  Or I suppose there might be a way to insert an irqchip that
does the translation on write.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-13 21:11       ` Alex Williamson
@ 2024-11-14 15:35         ` Robin Murphy
  2024-11-20 13:17           ` Eric Auger
  0 siblings, 1 reply; 22+ messages in thread
From: Robin Murphy @ 2024-11-14 15:35 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: Nicolin Chen, tglx, maz, bhelgaas, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On 13/11/2024 9:11 pm, Alex Williamson wrote:
> On Tue, 12 Nov 2024 21:34:30 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
>> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
>>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
>>>> On 2024-11-09 5:48 am, Nicolin Chen wrote:
>>>>> To solve this problem the VMM should capture the MSI IOVA allocated by the
>>>>> guest kernel and relay it to the GIC driver in the host kernel, to program
>>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO.
>>>>
>>>> Once VFIO has that information from userspace, though, do we really need
>>>> the whole complicated dance to push it right down into the irqchip layer
>>>> just so it can be passed back up again? AFAICS
>>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
>>>> rewrites MSI-X vectors, so it seems like it should be pretty
>>>> straightforward to override the message address in general at that
>>>> level, without the lower layers having to be aware at all, no?
>>>
>>> Didn't see that clearly!! It works with a simple following override:
>>> --------------------------------------------------------------------
>>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>                  struct msi_msg msg;
>>>
>>>                  get_cached_msi_msg(irq, &msg);
>>> +               if (vdev->msi_iovas) {
>>> +                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
>>> +                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
>>> +               }
>>>                  pci_write_msi_msg(irq, &msg);
>>>          }
>>>   
>>> --------------------------------------------------------------------
>>>
>>> With that, I think we only need one VFIO change for this part :)
>>
>> Wow, is that really OK from a layering perspective? The comment is
>> pretty clear on the intention that this is to resync the irq layer
>> view of the device with the physical HW.
>>
>> Editing the msi_msg while doing that resync smells bad.
>>
>> Also, this is only doing MSI-X, we should include normal MSI as
>> well. (it probably should have a resync too?)
> 
> This was added for a specific IBM HBA that clears the vector table
> during a built-in self test, so it's possible the MSI table being in
> config space never had the same issue, or we just haven't encountered
> it.  I don't expect anything else actually requires this.

Yeah, I wasn't really suggesting to literally hook into this exact case; 
it was more just a general observation that if VFIO already has one 
justification for tinkering with pci_write_msi_msg() directly without 
going through the msi_domain layer, then adding another (wherever it 
fits best) can't be *entirely* unreasonable.

At the end of the day, the semantic here is that VFIO does know more 
than the IRQ layer, and does need to program the endpoint differently 
from what the irqchip assumes, so I don't see much benefit in dressing 
that up more than functionally necessary.

>> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
>> context)
> 
> It seems suspect to me too.  In a sense it is still just synchronizing
> the MSI address, but to a different address space.
> 
> Is it possible to do this with the existing write_msi_msg callback on
> the msi descriptor?  For instance we could simply translate the msg
> address and call pci_write_msi_msg() (while avoiding an infinite
> recursion).  Or maybe there should be an xlate_msi_msg callback we can
> register.  Or I suppose there might be a way to insert an irqchip that
> does the translation on write.  Thanks,

I'm far from keen on the idea, but if there really is an appetite for 
more indirection, then I guess the least-worst option would be yet 
another type of iommu_dma_cookie to work via the existing 
iommu_dma_compose_msi_msg() flow, with some interface for VFIO to update 
per-device addresses directly. But then it's still going to need some 
kind of "layering violation" for VFIO to poke the IRQ layer into 
re-composing and re-writing a message whenever userspace feels like 
changing an address, because we're fundamentally stepping outside the 
established lifecycle of a kernel-managed IRQ around which said layering 
was designed...

Thanks,
Robin.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-14 15:35         ` Robin Murphy
@ 2024-11-20 13:17           ` Eric Auger
  2024-11-20 14:03             ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Auger @ 2024-11-20 13:17 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson, Jason Gunthorpe
  Cc: Nicolin Chen, tglx, maz, bhelgaas, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm



On 11/14/24 16:35, Robin Murphy wrote:
> On 13/11/2024 9:11 pm, Alex Williamson wrote:
>> On Tue, 12 Nov 2024 21:34:30 -0400
>> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>>> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
>>>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
>>>>> On 2024-11-09 5:48 am, Nicolin Chen wrote:
>>>>>> To solve this problem the VMM should capture the MSI IOVA
>>>>>> allocated by the
>>>>>> guest kernel and relay it to the GIC driver in the host kernel,
>>>>>> to program
>>>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO.
>>>>>
>>>>> Once VFIO has that information from userspace, though, do we
>>>>> really need
>>>>> the whole complicated dance to push it right down into the irqchip
>>>>> layer
>>>>> just so it can be passed back up again? AFAICS
>>>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already
>>>>> explicitly
>>>>> rewrites MSI-X vectors, so it seems like it should be pretty
>>>>> straightforward to override the message address in general at that
>>>>> level, without the lower layers having to be aware at all, no?
>>>>
>>>> Didn't see that clearly!! It works with a simple following override:
>>>> --------------------------------------------------------------------
>>>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct
>>>> vfio_pci_core_device *vdev,
>>>>                  struct msi_msg msg;
>>>>
>>>>                  get_cached_msi_msg(irq, &msg);
>>>> +               if (vdev->msi_iovas) {
>>>> +                       msg.address_lo =
>>>> lower_32_bits(vdev->msi_iovas[vector]);
>>>> +                       msg.address_hi =
>>>> upper_32_bits(vdev->msi_iovas[vector]);
>>>> +               }
>>>>                  pci_write_msi_msg(irq, &msg);
>>>>          }
>>>>   --------------------------------------------------------------------
>>>>
>>>> With that, I think we only need one VFIO change for this part :)
>>>
>>> Wow, is that really OK from a layering perspective? The comment is
>>> pretty clear on the intention that this is to resync the irq layer
>>> view of the device with the physical HW.
>>>
>>> Editing the msi_msg while doing that resync smells bad.
>>>
>>> Also, this is only doing MSI-X, we should include normal MSI as
>>> well. (it probably should have a resync too?)
>>
>> This was added for a specific IBM HBA that clears the vector table
>> during a built-in self test, so it's possible the MSI table being in
>> config space never had the same issue, or we just haven't encountered
>> it.  I don't expect anything else actually requires this.
>
> Yeah, I wasn't really suggesting to literally hook into this exact
> case; it was more just a general observation that if VFIO already has
> one justification for tinkering with pci_write_msi_msg() directly
> without going through the msi_domain layer, then adding another
> (wherever it fits best) can't be *entirely* unreasonable.
>
> At the end of the day, the semantic here is that VFIO does know more
> than the IRQ layer, and does need to program the endpoint differently
> from what the irqchip assumes, so I don't see much benefit in dressing
> that up more than functionally necessary.
>
>>> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
>>> context)
>>
>> It seems suspect to me too.  In a sense it is still just synchronizing
>> the MSI address, but to a different address space.
>>
>> Is it possible to do this with the existing write_msi_msg callback on
>> the msi descriptor?  For instance we could simply translate the msg
>> address and call pci_write_msi_msg() (while avoiding an infinite
>> recursion).  Or maybe there should be an xlate_msi_msg callback we can
>> register.  Or I suppose there might be a way to insert an irqchip that
>> does the translation on write.  Thanks,
>
> I'm far from keen on the idea, but if there really is an appetite for
> more indirection, then I guess the least-worst option would be yet
> another type of iommu_dma_cookie to work via the existing
> iommu_dma_compose_msi_msg() flow, with some interface for VFIO to
> update per-device addresses direcitly. But then it's still going to
> need some kind of "layering violation" for VFIO to poke the IRQ layer
> into re-composing and re-writing a message whenever userspace feels
> like changing an address, because we're fundamentally stepping outside
> the established lifecycle of a kernel-managed IRQ around which said
> layering was designed...

for the record, the first integration was based on such distinct
iommu_dma_cookie

[PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) <https://lore.kernel.org/all/20210411111228.14386-1-eric.auger@redhat.com/#r>, patches 8 - 11

Thanks

Eric



>
> Thanks,
> Robin.
>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-20 13:17           ` Eric Auger
@ 2024-11-20 14:03             ` Jason Gunthorpe
  2024-11-28 11:15               ` Thomas Gleixner
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2024-11-20 14:03 UTC (permalink / raw)
  To: Eric Auger
  Cc: Robin Murphy, Alex Williamson, Nicolin Chen, tglx, maz, bhelgaas,
	leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Wed, Nov 20, 2024 at 02:17:46PM +0100, Eric Auger wrote:
> > Yeah, I wasn't really suggesting to literally hook into this exact
> > case; it was more just a general observation that if VFIO already has
> > one justification for tinkering with pci_write_msi_msg() directly
> > without going through the msi_domain layer, then adding another
> > (wherever it fits best) can't be *entirely* unreasonable.

I'm not sure that we can assume VFIO is the only thing touching the
interrupt programming.

I think there is a KVM path, and also the /proc/ path that will change
the MSI affinity on the fly for a VFIO created IRQ. If the platform
requires a MSI update to do this (ie encoding affinity in the
add/data, not using IRQ remapping HW) then we still need to ensure the
correct MSI address is hooked in.

> >> Is it possible to do this with the existing write_msi_msg callback on
> >> the msi descriptor?  For instance we could simply translate the msg
> >> address and call pci_write_msi_msg() (while avoiding an infinite
> >> recursion).  Or maybe there should be an xlate_msi_msg callback we can
> >> register.  Or I suppose there might be a way to insert an irqchip that
> >> does the translation on write.  Thanks,
> >
> > I'm far from keen on the idea, but if there really is an appetite for
> > more indirection, then I guess the least-worst option would be yet
> > another type of iommu_dma_cookie to work via the existing
> > iommu_dma_compose_msi_msg() flow, 

For this direction I think I would turn iommu_dma_compose_msi_msg()
into a function pointer stored in the iommu_domain and have
vfio/iommufd provide its own implementation. The thing that is in
control of the domain's translation should be providing the msi_msg.

> > update per-device addresses direcitly. But then it's still going to
> > need some kind of "layering violation" for VFIO to poke the IRQ layer
> > into re-composing and re-writing a message whenever userspace feels
> > like changing an address

I think we'd need to get into the affinity update path and force a MSI
write as well, even if the platform isn't changing the MSI for
affinity. Processing a vMSI entry update would be two steps where we
update the MSI addr in VFIO and then set the affinity.

> for the record, the first integration was based on such distinct
> iommu_dma_cookie
> 
> [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) <https://lore.kernel.org/all/20210411111228.14386-1-eric.auger@redhat.com/#r>, patches 8 - 11

There are some significant differences from that series with this idea:

 - We want to maintain a per-MSI-index/per-device lookup table. It
   is not just a simple cookie, the msi_desc->dev &&
   msi_desc->msi_index have to be matched against what userspace
   provides in the per-vMSI IOCTL

 - There would be no implicit progamming of the stage 2, this will be
   done directly in userspace by creating an IOAS area for the ITS page

 - It shouldn't have any sort of dynamic allocation behavior. It is an
   error for the kernel to ask for an msi_desc that userspace hasn't
   provided a mapping for
 
 - It should work well with nested and non-nested domains

Jason


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-20 14:03             ` Jason Gunthorpe
@ 2024-11-28 11:15               ` Thomas Gleixner
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Gleixner @ 2024-11-28 11:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Eric Auger
  Cc: Robin Murphy, Alex Williamson, Nicolin Chen, maz, bhelgaas,
	leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Wed, Nov 20 2024 at 10:03, Jason Gunthorpe wrote:
> On Wed, Nov 20, 2024 at 02:17:46PM +0100, Eric Auger wrote:
>> > Yeah, I wasn't really suggesting to literally hook into this exact
>> > case; it was more just a general observation that if VFIO already has
>> > one justification for tinkering with pci_write_msi_msg() directly
>> > without going through the msi_domain layer, then adding another
>> > (wherever it fits best) can't be *entirely* unreasonable.
>
> I'm not sure that we can assume VFIO is the only thing touching the
> interrupt programming.

Correct.

> I think there is a KVM path, and also the /proc/ path that will change
> the MSI affinity on the fly for a VFIO created IRQ. If the platform
> requires a MSI update to do this (ie encoding affinity in the
> add/data, not using IRQ remapping HW) then we still need to ensure the
> correct MSI address is hooked in.

Yes.

>> >> Is it possible to do this with the existing write_msi_msg callback on
>> >> the msi descriptor?  For instance we could simply translate the msg
>> >> address and call pci_write_msi_msg() (while avoiding an infinite
>> >> recursion).  Or maybe there should be an xlate_msi_msg callback we can
>> >> register.  Or I suppose there might be a way to insert an irqchip that
>> >> does the translation on write.  Thanks,
>> >
>> > I'm far from keen on the idea, but if there really is an appetite for
>> > more indirection, then I guess the least-worst option would be yet
>> > another type of iommu_dma_cookie to work via the existing
>> > iommu_dma_compose_msi_msg() flow, 
>
> For this direction I think I would turn iommu_dma_compose_msi_msg()
> into a function pointer stored in the iommu_domain and have
> vfio/iommufd provide its own implementation. The thing that is in
> control of the domain's translation should be providing the msi_msg.

Yes. The resulting cached message should be writeable as is.

>> > update per-device addresses direcitly. But then it's still going to
>> > need some kind of "layering violation" for VFIO to poke the IRQ layer
>> > into re-composing and re-writing a message whenever userspace feels
>> > like changing an address
>
> I think we'd need to get into the affinity update path and force a MSI
> write as well, even if the platform isn't changing the MSI for
> affinity. Processing a vMSI entry update would be two steps where we
> update the MSI addr in VFIO and then set the affinity.

The affinity callback of the domain/chip can return IRQ_SET_MASK_OK_DONE
which prevents recomposing and writing the message.

So you want a explicit update/write of the message similar to what
msi_domain_activate() does.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-11-28 11:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
2024-11-11  9:30   ` Andy Shevchenko
2024-11-09  5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
2024-11-11  9:33   ` Andy Shevchenko
2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
2024-11-11  9:34   ` Andy Shevchenko
2024-11-12 22:14     ` Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen
2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
2024-11-11 14:14   ` Marc Zyngier
2024-11-12 22:13     ` Nicolin Chen
2024-11-12 21:54   ` Nicolin Chen
2024-11-13  1:34     ` Jason Gunthorpe
2024-11-13 21:11       ` Alex Williamson
2024-11-14 15:35         ` Robin Murphy
2024-11-20 13:17           ` Eric Auger
2024-11-20 14:03             ` Jason Gunthorpe
2024-11-28 11:15               ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).