[PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
@ 2024-11-09  5:48 Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

On ARM GIC systems and others, the target address of the MSI is translated
by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
IOMMU is disabled, the MSI address is programmed to the physical location
of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
page is behind the IOMMU, so the MSI address is programmed to an allocated
IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
When a 2-stage translation is enabled, IOVA will be still used to program
the MSI address, though the mappings will be in two stages:
  IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000
(IPA stands for Intermediate Physical Address).

If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
IOVA is dynamically allocated from the top of the IOVA space. If attached
to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.

So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
of the IOMMU translation (1-stage translation), since the IOVA for the ITS
page is fixed and known by kernel. However, with virtual machine enabling
a nested IOMMU translation (2-stage), a guest kernel directly controls the
stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
kernel can't know that guest-level IOVA to program the MSI address.

To solve this problem the VMM should capture the MSI IOVA allocated by the
guest kernel and relay it to the GIC driver in the host kernel, to program
the correct MSI IOVA. And this requires a new ioctl via VFIO.

Extend the VFIO path to allow an MSI target IOVA to be forwarded into the
kernel and pushed down to the GIC driver.

Add VFIO ioctl VFIO_IRQ_SET_ACTION_PREPARE with VFIO_IRQ_SET_DATA_MSI_IOVA
to carry the data.

The downstream calltrace is quite long from the VFIO to the ITS driver. So
in order to carry the MSI IOVA from the top to its_irq_domain_alloc(), add
patches in a leaf-to-root order:

  vfio_pci_core_ioctl:
    vfio_pci_set_irqs_ioctl:
      vfio_pci_set_msi_prepare:                           // PATCH-7
        pci_alloc_irq_vectors_iovas:                      // PATCH-6
          __pci_alloc_irq_vectors:                        // PATCH-5
            __pci_enable_msi/msix_range:                  // PATCH-4
              msi/msix_capability_init:                   // PATCH-3
                msi/msix_setup_msi_descs:
                  msi_insert_msi_desc();                  // PATCH-1
                pci_msi_setup_msi_irqs:
                  msi_domain_alloc_irqs_all_locked:
                    __msi_domain_alloc_locked:
                      __msi_domain_alloc_irqs:
                        __irq_domain_alloc_irqs:
                          irq_domain_alloc_irqs_locked:
                            irq_domain_alloc_irqs_hierarchy:
                              msi_domain_alloc:
                                irq_domain_alloc_irqs_parent:
                                  its_irq_domain_alloc(); // PATCH-2

Note that this series solves half the problem, since it only allows kernel
to set the physical PCI MSI/MSI-X on the device with the correct head IOVA
of a 2-stage translation, where the guest kernel does the stage-1 mapping
that MSI IOVA (0xEEEE0000) to its own vITS page (0x80900000) while missing
the stage-2 mapping from that IPA to the physical ITS page:
  0xEEEE0000 ===> 0x80900000 =x=> 0x20200000
A followup series should fill that gap, doing the stage-2 mapping from the
vITS page 0x80900000 to the physical ITS page (0x20200000), likely via new
IOMMUFD ioctl. Once VMM sets up this stage-2 mapping, VM will act the same
as bare metal relying on a running kernel to handle the stage-1 mapping:
  0xEEEE0000 ===> 0x80900000 ===> 0x20200000

This series (prototype) is on Github:
https://github.com/nicolinc/iommufd/commits/vfio_msi_giova-rfcv1/
It's tested by hacking the host kernel to hard-code a stage-2 mapping.

Thanks!
Nicolin

Nicolin Chen (7):
  genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell
    address
  irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset
  PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
  PCI/MSI: Allow __pci_enable_msi_range to pass in iova
  PCI/MSI: Extract a common __pci_alloc_irq_vectors function
  PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE

 drivers/pci/msi/msi.h             |   3 +-
 include/linux/msi.h               |  11 +++
 include/linux/pci.h               |  18 ++++
 include/linux/vfio_pci_core.h     |   1 +
 include/uapi/linux/vfio.h         |   8 +-
 drivers/irqchip/irq-gic-v3-its.c  |  21 ++++-
 drivers/pci/msi/api.c             | 136 ++++++++++++++++++++----------
 drivers/pci/msi/msi.c             |  20 +++--
 drivers/vfio/pci/vfio_pci_intrs.c |  41 ++++++++-
 drivers/vfio/vfio_main.c          |   3 +
 kernel/irq/msi.c                  |   6 ++
 11 files changed, 212 insertions(+), 56 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Currently, the IOVA and its mapping (to physical doorbell address) is done
by the dma-iommu via the iommu_cookie in the struct msi_desc, which is the
structure used by three parties: genirq, irqchip and dma-iommu.

However, when dealing with a nested translation on ARM64, the MSI doorbell
address is behind the SMMU (IOMMU on ARM64), thus HW needs to be programed
with the stage-1 IOVA, i.e. guest-level IOVA, rather than asking dma-iommu
to allocate one.

To support a guest-programmable pathway, first we need to make sure struct
msi_desc will allow a preset IOVA v.s. using iommu_cookie. Add an msi_iova
to the structure and init its value to PHYS_ADDR_MAX. And provide a helper
for drivers to get the msi_iova out of an msi_desc object.

A following patch will change msi_setup_msi_descs and msix_setup_msi_descs
to call msi_domain_insert_msi_desc and finish the actual value forwarding.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/msi.h | 11 +++++++++++
 kernel/irq/msi.c    |  6 ++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/msi.h b/include/linux/msi.h
index b10093c4d00e..873094743065 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -185,6 +185,7 @@ struct msi_desc {
 	struct irq_affinity_desc	*affinity;
 #ifdef CONFIG_IRQ_MSI_IOMMU
 	const void			*iommu_cookie;
+	dma_addr_t			msi_iova;
 #endif
 #ifdef CONFIG_SYSFS
 	struct device_attribute		*sysfs_attrs;
@@ -296,6 +297,11 @@ static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc,
 {
 	desc->iommu_cookie = iommu_cookie;
 }
+
+static inline dma_addr_t msi_desc_get_iova(struct msi_desc *desc)
+{
+	return desc->msi_iova;
+}
 #else
 static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc)
 {
@@ -306,6 +312,11 @@ static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc,
 					     const void *iommu_cookie)
 {
 }
+
+static inline dma_addr_t msi_desc_get_iova(struct msi_desc *desc)
+{
+	return PHYS_ADDR_MAX;
+}
 #endif
 
 int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid,
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 3a24d6b5f559..f3159ec0f036 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -81,6 +81,9 @@ static struct msi_desc *msi_alloc_desc(struct device *dev, int nvec,
 
 	desc->dev = dev;
 	desc->nvec_used = nvec;
+#ifdef CONFIG_IRQ_MSI_IOMMU
+	desc->msi_iova = PHYS_ADDR_MAX;
+#endif
 	if (affinity) {
 		desc->affinity = kmemdup_array(affinity, nvec, sizeof(*desc->affinity), GFP_KERNEL);
 		if (!desc->affinity) {
@@ -158,6 +161,9 @@ int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid,
 
 	/* Copy type specific data to the new descriptor. */
 	desc->pci = init_desc->pci;
+#ifdef CONFIG_IRQ_MSI_IOMMU
+	desc->msi_iova = init_desc->msi_iova;
+#endif
 
 	return msi_insert_desc(dev, desc, domid, init_desc->msi_index);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Now struct msi_desc can carry a preset IOVA for MSI doorbell address. This
is typically preset by user space when engaging a 2-stage translation. So,
use the preset IOVA instead of kernel-level IOVA allocations in dma-iommu.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index ab597e74ba08..bc1768576546 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1723,6 +1723,8 @@ static u64 its_irq_get_msi_base(struct its_device *its_dev)
 static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg)
 {
 	struct its_device *its_dev = irq_data_get_irq_chip_data(d);
+	struct msi_desc *desc = irq_data_get_msi_desc(d);
+	dma_addr_t iova = msi_desc_get_iova(desc);
 	struct its_node *its;
 	u64 addr;
 
@@ -1733,7 +1735,13 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg)
 	msg->address_hi		= upper_32_bits(addr);
 	msg->data		= its_get_event_id(d);
 
-	iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg);
+	/* Bypass iommu_dma_compose_msi_msg if msi_iova is preset */
+	if (iova == PHYS_ADDR_MAX) {
+		iommu_dma_compose_msi_msg(desc, msg);
+	} else {
+		msg->address_lo = lower_32_bits(iova);
+		msg->address_hi = upper_32_bits(iova);
+	}
 }
 
 static int its_irq_set_irqchip_state(struct irq_data *d,
@@ -3570,6 +3578,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 {
 	msi_alloc_info_t *info = args;
 	struct its_device *its_dev = info->scratchpad[0].ptr;
+	dma_addr_t iova = msi_desc_get_iova(info->desc);
 	struct its_node *its = its_dev->its;
 	struct irq_data *irqd;
 	irq_hw_number_t hwirq;
@@ -3580,9 +3589,13 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 	if (err)
 		return err;
 
-	err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev));
-	if (err)
-		return err;
+	/* Bypass iommu_dma_prepare_msi if msi_iova is preset */
+	if (iova == PHYS_ADDR_MAX) {
+		err = iommu_dma_prepare_msi(info->desc,
+					    its->get_msi_base(its_dev));
+		if (err)
+			return err;
+	}
 
 	for (i = 0; i < nr_irqs; i++) {
 		err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
  2024-11-09  5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-09  9:43   ` kernel test robot
  2024-11-09  5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

msi_setup_msi_descs and msix_setup_msi_descs are the two callers of genirq
helper msi_domain_insert_msi_desc that is now ready for a preset msi_iova.

So, do the same in these two callers. Note MSIx supports multiple entries,
so use struct msix_entry to pass msi_iova in.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/pci.h   |  1 +
 drivers/pci/msi/msi.c | 18 ++++++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 573b4c4c2be6..68ebb9d42f7f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1652,6 +1652,7 @@ int pci_set_vga_state(struct pci_dev *pdev, bool decode,
 struct msix_entry {
 	u32	vector;	/* Kernel uses to write allocated vector */
 	u16	entry;	/* Driver uses to specify entry, OS writes */
+	u64	iova;	/* Kernel uses to override doorbell address */
 };
 
 #ifdef CONFIG_PCI_MSI
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 3a45879d85db..95caa81d3421 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -282,7 +282,7 @@ static void pci_msi_set_enable(struct pci_dev *dev, int enable)
 	pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
 }
 
-static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
+static int msi_setup_msi_desc(struct pci_dev *dev, int nvec, dma_addr_t iova,
 			      struct irq_affinity_desc *masks)
 {
 	struct msi_desc desc;
@@ -312,6 +312,10 @@ static int msi_setup_msi_desc(struct pci_dev *dev, int nvec,
 	else
 		desc.pci.mask_pos = dev->msi_cap + PCI_MSI_MASK_32;
 
+#ifdef CONFIG_IRQ_MSI_IOMMU
+	desc.msi_iova = iova;
+#endif
+
 	/* Save the initial mask status */
 	if (desc.pci.msi_attrib.can_mask)
 		pci_read_config_dword(dev, desc.pci.mask_pos, &desc.pci.msi_mask);
@@ -349,7 +353,7 @@ static int msi_verify_entries(struct pci_dev *dev)
  * which could have been allocated.
  */
 static int msi_capability_init(struct pci_dev *dev, int nvec,
-			       struct irq_affinity *affd)
+			       struct irq_affinity *affd, dma_addr_t iova)
 {
 	struct irq_affinity_desc *masks = NULL;
 	struct msi_desc *entry, desc;
@@ -370,7 +374,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
 		masks = irq_create_affinity_masks(nvec, affd);
 
 	msi_lock_descs(&dev->dev);
-	ret = msi_setup_msi_desc(dev, nvec, masks);
+	ret = msi_setup_msi_desc(dev, nvec, iova, masks);
 	if (ret)
 		goto fail;
 
@@ -456,7 +460,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 				return -ENOSPC;
 		}
 
-		rc = msi_capability_init(dev, nvec, affd);
+		rc = msi_capability_init(dev, nvec, affd, PHYS_ADDR_MAX);
 		if (rc == 0)
 			return nvec;
 
@@ -625,6 +629,12 @@ static int msix_setup_msi_descs(struct pci_dev *dev, struct msix_entry *entries,
 	memset(&desc, 0, sizeof(desc));
 
 	for (i = 0, curmsk = masks; i < nvec; i++, curmsk++) {
+#ifdef CONFIG_IRQ_MSI_IOMMU
+		if (entries && entries[i].iova)
+			desc.msi_iova = (dma_addr_t)entries[i].iova;
+		else
+			desc.msi_iova = PHYS_ADDR_MAX;
+#endif
 		desc.msi_index = entries ? entries[i].entry : i;
 		desc.affinity = masks ? curmsk : NULL;
 		desc.pci.msi_attrib.is_virtual = desc.msi_index >= vec_count;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
  2024-11-09  5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen
@ 2024-11-09  9:43   ` kernel test robot
  0 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2024-11-09  9:43 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: oe-kbuild-all

Hi Nicolin,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on pci/next]
[also build test WARNING on pci/for-linus awilliam-vfio/next tip/irq/core linus/master awilliam-vfio/for-linus v6.12-rc6 next-20241108]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Nicolin-Chen/genirq-msi-Allow-preset-IOVA-in-struct-msi_desc-for-MSI-doorbell-address/20241109-135311
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/32fe78c5ec4a08bf0414d80ce9ed9121270b9cb0.1731130093.git.nicolinc%40nvidia.com
patch subject: [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241109/202411091750.nGnflLUO-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241109/202411091750.nGnflLUO-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411091750.nGnflLUO-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/pci/msi/msi.c:357: warning: Function parameter or struct member 'iova' not described in 'msi_capability_init'


vim +357 drivers/pci/msi/msi.c

f144d1496b47e74 drivers/pci/msi.c     Benjamin Herrenschmidt 2014-10-03  342  
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  343  /**
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  344   * msi_capability_init - configure device's MSI capability structure
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  345   * @dev: pointer to the pci_dev data structure of MSI device function
1c8d7b0a562da06 drivers/pci/msi.c     Matthew Wilcox         2009-03-17  346   * @nvec: number of interrupts to allocate
f6b6aefee70aa52 drivers/pci/msi.c     Bjorn Helgaas          2019-05-30  347   * @affd: description of automatic IRQ affinity assignments (may be %NULL)
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  348   *
1c8d7b0a562da06 drivers/pci/msi.c     Matthew Wilcox         2009-03-17  349   * Setup the MSI capability structure of the device with the requested
1c8d7b0a562da06 drivers/pci/msi.c     Matthew Wilcox         2009-03-17  350   * number of interrupts.  A return value of zero indicates the successful
f6b6aefee70aa52 drivers/pci/msi.c     Bjorn Helgaas          2019-05-30  351   * setup of an entry with the new MSI IRQ.  A negative return value indicates
1c8d7b0a562da06 drivers/pci/msi.c     Matthew Wilcox         2009-03-17  352   * an error, and a positive return value indicates the number of interrupts
1c8d7b0a562da06 drivers/pci/msi.c     Matthew Wilcox         2009-03-17  353   * which could have been allocated.
1c8d7b0a562da06 drivers/pci/msi.c     Matthew Wilcox         2009-03-17  354   */
61e1c5905290efe drivers/pci/msi.c     Christoph Hellwig      2016-11-08  355  static int msi_capability_init(struct pci_dev *dev, int nvec,
77c6ca404bd701e drivers/pci/msi/msi.c Nicolin Chen           2024-11-08  356  			       struct irq_affinity *affd, dma_addr_t iova)
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16 @357  {
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  358  	struct irq_affinity_desc *masks = NULL;
9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh          2024-06-24  359  	struct msi_desc *entry, desc;
f465136d7287538 drivers/pci/msi.c     Gavin Shan             2013-04-04  360  	int ret;
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  361  
d2a463b297415ca drivers/pci/msi/msi.c Thomas Gleixner        2022-11-11  362  	/* Reject multi-MSI early on irq domain enabled architectures */
d2a463b297415ca drivers/pci/msi/msi.c Thomas Gleixner        2022-11-11  363  	if (nvec > 1 && !pci_msi_domain_supports(dev, MSI_FLAG_MULTI_PCI_MSI, ALLOW_LEGACY))
d2a463b297415ca drivers/pci/msi/msi.c Thomas Gleixner        2022-11-11  364  		return 1;
d2a463b297415ca drivers/pci/msi/msi.c Thomas Gleixner        2022-11-11  365  
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  366  	/*
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  367  	 * Disable MSI during setup in the hardware, but mark it enabled
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  368  	 * so that setup code can evaluate it.
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  369  	 */
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  370  	pci_msi_set_enable(dev, 0);
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  371  	dev->msi_enabled = 1;
110828c9cdce6e8 drivers/pci/msi.c     Matthew Wilcox         2009-06-16  372  
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  373  	if (affd)
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  374  		masks = irq_create_affinity_masks(nvec, affd);
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  375  
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  376  	msi_lock_descs(&dev->dev);
77c6ca404bd701e drivers/pci/msi/msi.c Nicolin Chen           2024-11-08  377  	ret = msi_setup_msi_desc(dev, nvec, iova, masks);
71020a3c0dff4a0 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  378  	if (ret)
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  379  		goto fail;
1ce03373a7f4b5f drivers/pci/msi.c     Eric W. Biederman      2006-10-04  380  
f6b6aefee70aa52 drivers/pci/msi.c     Bjorn Helgaas          2019-05-30  381  	/* All MSIs are unmasked by default; mask them all */
ae24e28fef14687 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  382  	entry = msi_first_desc(&dev->dev, MSI_DESC_ALL);
446a98b19fd6da9 drivers/pci/msi.c     Thomas Gleixner        2021-07-29  383  	pci_msi_mask(entry, msi_multi_mask(entry));
9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh          2024-06-24  384  	/*
9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh          2024-06-24  385  	 * Copy the MSI descriptor for the error path because
9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh          2024-06-24  386  	 * pci_msi_setup_msi_irqs() will free it for the hierarchical
9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh          2024-06-24  387  	 * interrupt domain case.
9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh          2024-06-24  388  	 */
9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh          2024-06-24  389  	memcpy(&desc, entry, sizeof(desc));
f2440d9acbe866b drivers/pci/msi.c     Matthew Wilcox         2009-03-17  390  
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  391  	/* Configure MSI capability structure */
8e047adae969701 drivers/pci/msi.c     Jiang Liu              2014-11-15  392  	ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
8eb5ce3f78a5e5d drivers/pci/msi.c     Thomas Gleixner        2021-07-29  393  	if (ret)
8eb5ce3f78a5e5d drivers/pci/msi.c     Thomas Gleixner        2021-07-29  394  		goto err;
f7feaca77d6ad6b drivers/pci/msi.c     Eric W. Biederman      2007-01-28  395  
f144d1496b47e74 drivers/pci/msi.c     Benjamin Herrenschmidt 2014-10-03  396  	ret = msi_verify_entries(dev);
8eb5ce3f78a5e5d drivers/pci/msi.c     Thomas Gleixner        2021-07-29  397  	if (ret)
8eb5ce3f78a5e5d drivers/pci/msi.c     Thomas Gleixner        2021-07-29  398  		goto err;
f144d1496b47e74 drivers/pci/msi.c     Benjamin Herrenschmidt 2014-10-03  399  
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  400  	/* Set MSI enabled bits	*/
ba698ad4b7e466c drivers/pci/msi.c     David Miller           2007-10-25  401  	pci_intx_for_msi(dev, 0);
61b64abd399fa4b drivers/pci/msi.c     Michael S. Tsirkin     2015-05-07  402  	pci_msi_set_enable(dev, 1);
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  403  
5f2269916b0e509 drivers/pci/msi.c     Jiang Liu              2015-07-30  404  	pcibios_free_irq(dev);
7fe3730de729b75 drivers/pci/msi.c     Michael Ellerman       2007-04-18  405  	dev->irq = entry->irq;
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  406  	goto unlock;
8eb5ce3f78a5e5d drivers/pci/msi.c     Thomas Gleixner        2021-07-29  407  
8eb5ce3f78a5e5d drivers/pci/msi.c     Thomas Gleixner        2021-07-29  408  err:
9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh          2024-06-24  409  	pci_msi_unmask(&desc, msi_multi_mask(&desc));
b12d0bec385b7a5 drivers/pci/msi/msi.c Ahmed S. Darwish       2022-11-11  410  	pci_free_msi_irqs(dev);
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  411  fail:
c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-10  412  	dev->msi_enabled = 0;
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  413  unlock:
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  414  	msi_unlock_descs(&dev->dev);
5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner        2021-12-06  415  	kfree(masks);
8eb5ce3f78a5e5d drivers/pci/msi.c     Thomas Gleixner        2021-07-29  416  	return ret;
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  417  }
^1da177e4c3f415 drivers/pci/msi.c     Linus Torvalds         2005-04-16  418  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (2 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-11  9:30   ` Andy Shevchenko
  2024-11-09  5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

The previous patch passes in the msi_iova to msi_capability_init, so this
allows its caller to do the same.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/pci/msi/msi.h | 3 ++-
 drivers/pci/msi/api.c | 6 ++++--
 drivers/pci/msi/msi.c | 4 ++--
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h
index ee53cf079f4e..8009d69bf9a5 100644
--- a/drivers/pci/msi/msi.h
+++ b/drivers/pci/msi/msi.h
@@ -93,7 +93,8 @@ extern int pci_msi_enable;
 void pci_msi_shutdown(struct pci_dev *dev);
 void pci_msix_shutdown(struct pci_dev *dev);
 void pci_free_msi_irqs(struct pci_dev *dev);
-int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, struct irq_affinity *affd);
+int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
+			   struct irq_affinity *affd, dma_addr_t iova);
 int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec,
 			    int maxvec,  struct irq_affinity *affd, int flags);
 void __pci_restore_msi_state(struct pci_dev *dev);
diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index b956ce591f96..99ade7f69cd4 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -29,7 +29,8 @@
  */
 int pci_enable_msi(struct pci_dev *dev)
 {
-	int rc = __pci_enable_msi_range(dev, 1, 1, NULL);
+	int rc = __pci_enable_msi_range(dev, 1, 1, NULL,
+					PHYS_ADDR_MAX);
 	if (rc < 0)
 		return rc;
 	return 0;
@@ -274,7 +275,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	}
 
 	if (flags & PCI_IRQ_MSI) {
-		nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd);
+		nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
+					       affd, PHYS_ADDR_MAX);
 		if (nvecs > 0)
 			return nvecs;
 	}
diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c
index 95caa81d3421..25da0435c674 100644
--- a/drivers/pci/msi/msi.c
+++ b/drivers/pci/msi/msi.c
@@ -417,7 +417,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
 }
 
 int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
-			   struct irq_affinity *affd)
+			   struct irq_affinity *affd, dma_addr_t iova)
 {
 	int nvec;
 	int rc;
@@ -460,7 +460,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 				return -ENOSPC;
 		}
 
-		rc = msi_capability_init(dev, nvec, affd, PHYS_ADDR_MAX);
+		rc = msi_capability_init(dev, nvec, affd, iova);
 		if (rc == 0)
 			return nvec;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova
  2024-11-09  5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
@ 2024-11-11  9:30   ` Andy Shevchenko
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Shevchenko @ 2024-11-11  9:30 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian,
	smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Fri, Nov 08, 2024 at 09:48:49PM -0800, Nicolin Chen wrote:
> The previous patch passes in the msi_iova to msi_capability_init, so this

msi_capability_init()

> allows its caller to do the same.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (3 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-11  9:33   ` Andy Shevchenko
  2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Extract a common function from the existing callers, to prepare for a new
helper that provides an array of msi_iovas. Also, extract the msi_iova(s)
from the array and pass in properly down to __pci_enable_msi/msix_range().

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/pci/msi/api.c | 113 ++++++++++++++++++++++++++----------------
 1 file changed, 70 insertions(+), 43 deletions(-)

diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index 99ade7f69cd4..dff3d7350b38 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -204,6 +204,72 @@ void pci_disable_msix(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pci_disable_msix);
 
+static int __pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
+				   unsigned int max_vecs, unsigned int flags,
+				   struct irq_affinity *affd,
+				   dma_addr_t *msi_iovas)
+{
+	struct irq_affinity msi_default_affd = {0};
+	int nvecs = -ENOSPC;
+
+	if (flags & PCI_IRQ_AFFINITY) {
+		if (!affd)
+			affd = &msi_default_affd;
+	} else {
+		if (WARN_ON(affd))
+			affd = NULL;
+	}
+
+	if (flags & PCI_IRQ_MSIX) {
+		struct msix_entry *entries = NULL;
+
+		if (msi_iovas) {
+			int count = max_vecs - min_vecs + 1;
+			int i;
+
+			entries = kcalloc(max_vecs - min_vecs + 1,
+					  sizeof(*entries), GFP_KERNEL);
+			if (!entries)
+				return -ENOMEM;
+			for (i = 0; i < count; i++) {
+				entries[i].entry = i;
+				entries[i].iova = msi_iovas[i];
+			}
+		}
+
+		nvecs = __pci_enable_msix_range(dev, entries, min_vecs,
+						max_vecs, affd, flags);
+		kfree(entries);
+		if (nvecs > 0)
+			return nvecs;
+	}
+
+	if (flags & PCI_IRQ_MSI) {
+		nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd,
+					       msi_iovas ? *msi_iovas :
+							   PHYS_ADDR_MAX);
+		if (nvecs > 0)
+			return nvecs;
+	}
+
+	/* use INTx IRQ if allowed */
+	if (flags & PCI_IRQ_INTX) {
+		if (min_vecs == 1 && dev->irq) {
+			/*
+			 * Invoke the affinity spreading logic to ensure that
+			 * the device driver can adjust queue configuration
+			 * for the single interrupt case.
+			 */
+			if (affd)
+				irq_create_affinity_masks(1, affd);
+			pci_intx(dev, 1);
+			return 1;
+		}
+	}
+
+	return nvecs;
+}
+
 /**
  * pci_alloc_irq_vectors() - Allocate multiple device interrupt vectors
  * @dev:      the PCI device to operate on
@@ -235,8 +301,8 @@ EXPORT_SYMBOL(pci_disable_msix);
 int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 			  unsigned int max_vecs, unsigned int flags)
 {
-	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
-					      flags, NULL);
+	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
+				       flags, NULL, NULL);
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors);
 
@@ -256,47 +322,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   unsigned int max_vecs, unsigned int flags,
 				   struct irq_affinity *affd)
 {
-	struct irq_affinity msi_default_affd = {0};
-	int nvecs = -ENOSPC;
-
-	if (flags & PCI_IRQ_AFFINITY) {
-		if (!affd)
-			affd = &msi_default_affd;
-	} else {
-		if (WARN_ON(affd))
-			affd = NULL;
-	}
-
-	if (flags & PCI_IRQ_MSIX) {
-		nvecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs,
-						affd, flags);
-		if (nvecs > 0)
-			return nvecs;
-	}
-
-	if (flags & PCI_IRQ_MSI) {
-		nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs,
-					       affd, PHYS_ADDR_MAX);
-		if (nvecs > 0)
-			return nvecs;
-	}
-
-	/* use INTx IRQ if allowed */
-	if (flags & PCI_IRQ_INTX) {
-		if (min_vecs == 1 && dev->irq) {
-			/*
-			 * Invoke the affinity spreading logic to ensure that
-			 * the device driver can adjust queue configuration
-			 * for the single interrupt case.
-			 */
-			if (affd)
-				irq_create_affinity_masks(1, affd);
-			pci_intx(dev, 1);
-			return 1;
-		}
-	}
-
-	return nvecs;
+	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
+				       flags, affd, NULL);
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function
  2024-11-09  5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
@ 2024-11-11  9:33   ` Andy Shevchenko
  0 siblings, 0 replies; 25+ messages in thread
From: Andy Shevchenko @ 2024-11-11  9:33 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian,
	smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Fri, Nov 08, 2024 at 09:48:50PM -0800, Nicolin Chen wrote:
> Extract a common function from the existing callers, to prepare for a new
> helper that provides an array of msi_iovas. Also, extract the msi_iova(s)
> from the array and pass in properly down to __pci_enable_msi/msix_range().

...

> +	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
> +				       flags, NULL, NULL);

Even if you so strict about 80 character limit, this whole line is only 83
characters, which is okay to have.

...

> +	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
> +				       flags, affd, NULL);

In the similar way, and other lines as well.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (4 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-09 21:52   ` kernel test robot
                     ` (2 more replies)
  2024-11-09  5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen
  2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
  7 siblings, 3 replies; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas,
which is a list of preset IOVAs for MSI doorbell addresses.

Add a helper that would pass in a list. A following patch will call this
to forward msi_iovas from user space.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/pci.h   | 17 +++++++++++++++++
 drivers/pci/msi/api.c | 21 +++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 68ebb9d42f7f..6423bee3b207 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1678,6 +1678,9 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 				   unsigned int max_vecs, unsigned int flags,
 				   struct irq_affinity *affd);
+int pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
+				unsigned int max_vecs, unsigned int flags,
+				dma_addr_t *msi_iovas);
 
 bool pci_msix_can_alloc_dyn(struct pci_dev *dev);
 struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
@@ -1714,6 +1717,13 @@ pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	return -ENOSPC;
 }
 static inline int
+pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
+			    unsigned int max_vecs, unsigned int flags,
+			    dma_addr_t *msi_iovas)
+
+	return -ENOSPC; /* No support if !CONFIG_PCI_MSI */
+}
+static inline int
 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		      unsigned int max_vecs, unsigned int flags)
 {
@@ -2068,6 +2078,13 @@ pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 	return -ENOSPC;
 }
 static inline int
+pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
+			    unsigned int max_vecs, unsigned int flags,
+			    dma_addr_t *msi_iovas)
+{
+	return -ENOSPC;
+}
+static inline int
 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		      unsigned int max_vecs, unsigned int flags)
 {
diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c
index dff3d7350b38..4e90ef8f571c 100644
--- a/drivers/pci/msi/api.c
+++ b/drivers/pci/msi/api.c
@@ -327,6 +327,27 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
 }
 EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity);
 
+/**
+ * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt
+ *                                 vectors with preset msi_iovas
+ * @dev:       the PCI device to operate on
+ * @min_vecs:  minimum required number of vectors (must be >= 1)
+ * @max_vecs:  maximum desired number of vectors
+ * @flags:     allocation flags, as in pci_alloc_irq_vectors()
+ * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs]
+ *
+ * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter.
+ * Check that function docs, and &struct irq_affinity, for more details.
+ */
+int pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
+				unsigned int max_vecs, unsigned int flags,
+				dma_addr_t *msi_iovas)
+{
+	return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs,
+				       flags, NULL, msi_iovas);
+}
+EXPORT_SYMBOL(pci_alloc_irq_vectors_iovas);
+
 /**
  * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector
  * @dev: the PCI device to operate on
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
@ 2024-11-09 21:52   ` kernel test robot
  2024-11-10  9:51   ` kernel test robot
  2024-11-11  9:34   ` Andy Shevchenko
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2024-11-09 21:52 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: oe-kbuild-all

Hi Nicolin,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus awilliam-vfio/next tip/irq/core linus/master awilliam-vfio/for-linus v6.12-rc6 next-20241108]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Nicolin-Chen/genirq-msi-Allow-preset-IOVA-in-struct-msi_desc-for-MSI-doorbell-address/20241109-135311
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/e9399426b08b16efbdf7224c0122f5bf80f6d0ea.1731130093.git.nicolinc%40nvidia.com
patch subject: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
config: i386-randconfig-141-20241110 (https://download.01.org/0day-ci/archive/20241110/202411100427.zzpAjWnY-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241110/202411100427.zzpAjWnY-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411100427.zzpAjWnY-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/net/ethernet/intel/e1000/e1000_main.c:4:
   In file included from drivers/net/ethernet/intel/e1000/e1000.h:13:
   In file included from include/linux/mm.h:2213:
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from drivers/net/ethernet/intel/e1000/e1000_main.c:4:
   In file included from drivers/net/ethernet/intel/e1000/e1000.h:16:
>> include/linux/pci.h:1719:2: error: expected function body after function declarator
    1719 |         return -ENOSPC; /* No support if !CONFIG_PCI_MSI */
         |         ^
>> include/linux/pci.h:1720:1: error: extraneous closing brace ('}')
    1720 | }
         | ^
   drivers/net/ethernet/intel/e1000/e1000_main.c:999:45: warning: shift count >= width of type [-Wshift-count-overflow]
     999 |             !dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))) {
         |                                                    ^~~~~~~~~~~~~~~~
   include/linux/dma-mapping.h:77:54: note: expanded from macro 'DMA_BIT_MASK'
      77 | #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
         |                                                      ^ ~~~
   2 warnings and 2 errors generated.
--
   In file included from drivers/net/ethernet/intel/e1000/e1000_hw.c:9:
   In file included from drivers/net/ethernet/intel/e1000/e1000.h:13:
   In file included from include/linux/mm.h:2213:
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from drivers/net/ethernet/intel/e1000/e1000_hw.c:9:
   In file included from drivers/net/ethernet/intel/e1000/e1000.h:16:
>> include/linux/pci.h:1719:2: error: expected function body after function declarator
    1719 |         return -ENOSPC; /* No support if !CONFIG_PCI_MSI */
         |         ^
>> include/linux/pci.h:1720:1: error: extraneous closing brace ('}')
    1720 | }
         | ^
   1 warning and 2 errors generated.


vim +1719 include/linux/pci.h

  1704	
  1705	static inline int
  1706	pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
  1707				       unsigned int max_vecs, unsigned int flags,
  1708				       struct irq_affinity *aff_desc)
  1709	{
  1710		if ((flags & PCI_IRQ_INTX) && min_vecs == 1 && dev->irq)
  1711			return 1;
  1712		return -ENOSPC;
  1713	}
  1714	static inline int
  1715	pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
  1716				    unsigned int max_vecs, unsigned int flags,
  1717				    dma_addr_t *msi_iovas)
  1718	
> 1719		return -ENOSPC; /* No support if !CONFIG_PCI_MSI */
> 1720	}
  1721	static inline int
  1722	pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
  1723			      unsigned int max_vecs, unsigned int flags)
  1724	{
  1725		return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
  1726						      flags, NULL);
  1727	}
  1728	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
  2024-11-09 21:52   ` kernel test robot
@ 2024-11-10  9:51   ` kernel test robot
  2024-11-11  9:34   ` Andy Shevchenko
  2 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2024-11-10  9:51 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: oe-kbuild-all

Hi Nicolin,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus awilliam-vfio/next tip/irq/core linus/master awilliam-vfio/for-linus v6.12-rc6 next-20241108]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Nicolin-Chen/genirq-msi-Allow-preset-IOVA-in-struct-msi_desc-for-MSI-doorbell-address/20241109-135311
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/e9399426b08b16efbdf7224c0122f5bf80f6d0ea.1731130093.git.nicolinc%40nvidia.com
patch subject: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
config: alpha-allnoconfig (https://download.01.org/0day-ci/archive/20241110/202411101710.Co5EYmfa-lkp@intel.com/config)
compiler: alpha-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241110/202411101710.Co5EYmfa-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411101710.Co5EYmfa-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from block/blk-mq-pci.c:8:
   include/linux/pci.h: In function 'pci_alloc_irq_vectors_iovas':
>> include/linux/pci.h:1719:9: error: expected declaration specifiers before 'return'
    1719 |         return -ENOSPC; /* No support if !CONFIG_PCI_MSI */
         |         ^~~~~~
>> include/linux/pci.h:1720:1: error: expected declaration specifiers before '}' token
    1720 | }
         | ^
>> include/linux/pci.h:1724:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1724 | {
         | ^
   include/linux/pci.h:1730:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1730 | { return false; }
         | ^
   include/linux/pci.h:1733:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1733 | {
         | ^
   include/linux/pci.h:1740:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1740 | {
         | ^
   include/linux/pci.h:1744:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1744 | {
         | ^
   include/linux/pci.h:1748:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1748 | {
         | ^
   include/linux/pci.h:1755:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1755 | {
         | ^
   include/linux/pci.h:1782:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1782 | {
         | ^
   include/linux/pci.h:1805:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1805 | {
         | ^
   include/linux/pci.h:1851:44: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1851 | static inline bool pci_aer_available(void) { return false; }
         |                                            ^
   include/linux/pci.h:1862:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1862 | { return -EINVAL; }
         | ^
   include/linux/pci.h:1863:57: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1863 | static inline void pci_disable_ptm(struct pci_dev *dev) { }
         |                                                         ^
   include/linux/pci.h:1865:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1865 | { return false; }
         | ^
   In file included from include/linux/cpumask.h:10,
                    from include/linux/smp.h:13,
                    from include/linux/lockdep.h:14,
                    from include/linux/mutex.h:17,
                    from include/linux/kernfs.h:11,
                    from include/linux/sysfs.h:16,
                    from include/linux/kobject.h:20,
                    from block/blk-mq-pci.c:5:
>> include/linux/cleanup.h:244:15: error: storage class specified for parameter 'class_pci_dev_t'
     244 | typedef _type class_##_name##_t;                                        \
         |               ^~~~~~
   include/linux/cleanup.h:291:9: note: in expansion of macro 'DEFINE_CLASS'
     291 |         DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
         |         ^~~~~~~~~~~~
   include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD'
    1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
         | ^~~~~~~~~~~~
>> include/linux/cleanup.h:246:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     246 | { _type _T = *p; _exit; }                                               \
         | ^
   include/linux/cleanup.h:291:9: note: in expansion of macro 'DEFINE_CLASS'
     291 |         DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
         |         ^~~~~~~~~~~~
   include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD'
    1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
         | ^~~~~~~~~~~~
   include/linux/cleanup.h:248:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     248 | { _type t = _init; return t; }
         | ^
   include/linux/cleanup.h:291:9: note: in expansion of macro 'DEFINE_CLASS'
     291 |         DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
         |         ^~~~~~~~~~~~
   include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD'
    1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
         | ^~~~~~~~~~~~
>> include/linux/cleanup.h:291:85: error: expected declaration specifiers before ';' token
     291 |         DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \
         |                                                                                     ^
   include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD'
    1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
         | ^~~~~~~~~~~~
>> include/linux/cleanup.h:292:55: error: expected declaration specifiers or '...' before 'class_pci_dev_t'
     292 |         static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \
         |                                                       ^~~~~~
   include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD'
    1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
         | ^~~~~~~~~~~~
>> include/linux/pci.h:1883:12: error: storage class specified for parameter 'pci_domains_supported'
    1883 | extern int pci_domains_supported;
         |            ^~~~~~~~~~~~~~~~~~~~~
>> include/linux/pci.h:1911:15: error: storage class specified for parameter 'arch_set_vga_state_t'
    1911 | typedef int (*arch_set_vga_state_t)(struct pci_dev *pdev, bool decode,
         |               ^~~~~~~~~~~~~~~~~~~~
>> include/linux/pci.h:1913:33: error: expected declaration specifiers or '...' before 'arch_set_vga_state_t'
    1913 | void pci_register_set_vga_state(arch_set_vga_state_t func);
         |                                 ^~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:1917:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1917 | {
         | ^
   include/linux/pci.h:1924:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1924 | {
         | ^
   include/linux/pci.h:1931:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1931 | {
         | ^
   include/linux/pci.h:1938:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    1938 | {
         | ^
   In file included from arch/alpha/include/asm/pci.h:8,
                    from include/linux/pci.h:2102:
>> include/linux/dma-mapping.h:86:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
      86 | {
         | ^
   include/linux/dma-mapping.h:90:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
      90 | {
         | ^
   include/linux/dma-mapping.h:96:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
      96 | {
         | ^
   include/linux/dma-mapping.h:296:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     296 | {
         | ^
   include/linux/dma-mapping.h:303:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     303 | {
         | ^
   include/linux/dma-mapping.h:310:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     310 | {
         | ^
   include/linux/dma-mapping.h:317:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     317 | {
         | ^
   include/linux/dma-mapping.h:324:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     324 | {
         | ^
   include/linux/dma-mapping.h:330:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     330 | {
         | ^
   include/linux/dma-mapping.h:369:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     369 | {
         | ^
   include/linux/dma-mapping.h:376:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     376 | {
         | ^
   include/linux/dma-mapping.h:382:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     382 | {
         | ^
   include/linux/dma-mapping.h:394:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     394 | {
         | ^
   include/linux/dma-mapping.h:401:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     401 | {
         | ^
   include/linux/dma-mapping.h:408:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     408 | {
         | ^
   include/linux/dma-mapping.h:425:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     425 | {
         | ^
   include/linux/dma-mapping.h:443:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     443 | {
         | ^
   include/linux/dma-mapping.h:460:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     460 | {
         | ^
   include/linux/dma-mapping.h:477:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     477 | {
         | ^
   include/linux/dma-mapping.h:484:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     484 | {
         | ^
   include/linux/dma-mapping.h:490:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     490 | {
         | ^
   include/linux/dma-mapping.h:503:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     503 | {
         | ^
   include/linux/dma-mapping.h:515:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     515 | {
         | ^
   include/linux/dma-mapping.h:521:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     521 | {
         | ^
   include/linux/dma-mapping.h:528:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     528 | {
         | ^
   include/linux/dma-mapping.h:535:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     535 | {
         | ^
   include/linux/dma-mapping.h:554:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     554 | {
         | ^
   include/linux/dma-mapping.h:561:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     561 | {
         | ^
   include/linux/dma-mapping.h:568:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     568 | {
         | ^
   include/linux/dma-mapping.h:576:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     576 | {
         | ^
   include/linux/dma-mapping.h:584:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     584 | {
         | ^
   include/linux/dma-mapping.h:594:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     594 | {
         | ^
   include/linux/dma-mapping.h:601:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     601 | {
         | ^
   include/linux/dma-mapping.h:612:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     612 | {
         | ^
   include/linux/dma-mapping.h:621:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
     621 | {
         | ^
>> arch/alpha/include/asm/pci.h:16:1: warning: empty declaration
      16 | struct pci_iommu_arena;
         | ^~~~~~
   arch/alpha/include/asm/pci.h:17:1: warning: empty declaration
      17 | struct page;
         | ^~~~~~
   arch/alpha/include/asm/pci.h:21:1: warning: empty declaration
      21 | struct pci_controller {
         | ^~~~~~
>> arch/alpha/include/asm/pci.h:62:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
      62 | {
         | ^
>> arch/alpha/include/asm/pci.h:78:24: error: storage class specified for parameter 'isa_bridge'
      78 | extern struct pci_dev *isa_bridge;
         |                        ^~~~~~~~~~
>> arch/alpha/include/asm/pci.h:80:12: error: storage class specified for parameter 'pci_legacy_read'
      80 | extern int pci_legacy_read(struct pci_bus *bus, loff_t port, u32 *val,
         |            ^~~~~~~~~~~~~~~
>> arch/alpha/include/asm/pci.h:82:12: error: storage class specified for parameter 'pci_legacy_write'
      82 | extern int pci_legacy_write(struct pci_bus *bus, loff_t port, u32 val,
         |            ^~~~~~~~~~~~~~~~
>> arch/alpha/include/asm/pci.h:84:12: error: storage class specified for parameter 'pci_mmap_legacy_page_range'
      84 | extern int pci_mmap_legacy_page_range(struct pci_bus *bus,
         |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>> arch/alpha/include/asm/pci.h:87:13: error: storage class specified for parameter 'pci_adjust_legacy_attr'
      87 | extern void pci_adjust_legacy_attr(struct pci_bus *bus,
         |             ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:2160:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2160 | {
         | ^
   include/linux/pci.h:2165:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2165 | {
         | ^
   include/linux/pci.h:2170:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2170 | {
         | ^
>> include/linux/pci.h:2185:1: warning: empty declaration
    2185 | struct pci_fixup {
         | ^~~~~~
   include/linux/pci.h:2197:1: warning: empty declaration
    2197 | enum pci_fixup_pass {
         | ^~~~
>> include/linux/pci.h:2332:12: error: storage class specified for parameter 'pci_pci_problems'
    2332 | extern int pci_pci_problems;
         |            ^~~~~~~~~~~~~~~~
   include/linux/pci.h:2341:22: error: storage class specified for parameter 'pci_cardbus_io_size'
    2341 | extern unsigned long pci_cardbus_io_size;
         |                      ^~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:2342:22: error: storage class specified for parameter 'pci_cardbus_mem_size'
    2342 | extern unsigned long pci_cardbus_mem_size;
         |                      ^~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:2343:11: error: storage class specified for parameter 'pci_dfl_cache_line_size'
    2343 | extern u8 pci_dfl_cache_line_size;
         |           ^~~~~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:2344:11: error: storage class specified for parameter 'pci_cache_line_size'
    2344 | extern u8 pci_cache_line_size;
         |           ^~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:2363:12: error: storage class specified for parameter 'pci_create_resource_files'
    2363 | extern int pci_create_resource_files(struct pci_dev *dev);
         |            ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:2364:13: error: storage class specified for parameter 'pci_remove_resource_files'
    2364 | extern void pci_remove_resource_files(struct pci_dev *dev);
         |             ^~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/pci.h:2371:47: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2371 | static inline void pci_mmcfg_early_init(void) { }
         |                                               ^
   include/linux/pci.h:2372:46: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2372 | static inline void pci_mmcfg_late_init(void) { }
         |                                              ^
   include/linux/pci.h:2405:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2405 | {
         | ^
   include/linux/pci.h:2409:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2409 | {
         | ^
   include/linux/pci.h:2414:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2414 | {
         | ^
   include/linux/pci.h:2420:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2420 | {
         | ^
   include/linux/pci.h:2425:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2425 | { return -ENODEV; }
         | ^
   include/linux/pci.h:2429:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2429 | {
         | ^
   include/linux/pci.h:2433:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2433 | {
         | ^
   include/linux/pci.h:2437:50: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2437 |                                          int id) { }
         |                                                  ^
   include/linux/pci.h:2438:59: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2438 | static inline void pci_disable_sriov(struct pci_dev *dev) { }
         |                                                           ^
   include/linux/pci.h:2439:51: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2439 | static inline int pci_num_vf(struct pci_dev *dev) { return 0; }
         |                                                   ^
   include/linux/pci.h:2441:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2441 | { return 0; }
         | ^
   include/linux/pci.h:2443:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2443 | { return 0; }
         | ^
   include/linux/pci.h:2445:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2445 | { return 0; }
         | ^
   include/linux/pci.h:2448:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2448 | { return 0; }
         | ^
   include/linux/pci.h:2449:78: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2449 | static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { }
         |                                                                              ^
   include/linux/pci.h:2469:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2469 | {
         | ^
   include/linux/pci.h:2480:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2480 | {
         | ^
   include/linux/pci.h:2489:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2489 | {
         | ^
   include/linux/pci.h:2498:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2498 | {
         | ^
   include/linux/pci.h:2510:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2510 | {
         | ^
   include/linux/pci.h:2522:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2522 | {
         | ^
   include/linux/pci.h:2608:52: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2608 | pci_host_bridge_of_msi_domain(struct pci_bus *bus) { return NULL; }
         |                                                    ^
   include/linux/pci.h:2609:64: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2609 | static inline bool pci_host_of_has_msi_map(struct device *dev) { return false; }
         |                                                                ^
   include/linux/pci.h:2614:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2614 | {
         | ^
   include/linux/pci.h:2619:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
    2619 | {
..


vim +/return +1719 include/linux/pci.h

  1704	
  1705	static inline int
  1706	pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs,
  1707				       unsigned int max_vecs, unsigned int flags,
  1708				       struct irq_affinity *aff_desc)
  1709	{
  1710		if ((flags & PCI_IRQ_INTX) && min_vecs == 1 && dev->irq)
  1711			return 1;
  1712		return -ENOSPC;
  1713	}
  1714	static inline int
> 1715	pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs,
  1716				    unsigned int max_vecs, unsigned int flags,
  1717				    dma_addr_t *msi_iovas)
  1718	
> 1719		return -ENOSPC; /* No support if !CONFIG_PCI_MSI */
> 1720	}
  1721	static inline int
  1722	pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
  1723			      unsigned int max_vecs, unsigned int flags)
> 1724	{
  1725		return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs,
  1726						      flags, NULL);
  1727	}
  1728	
  1729	static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev)
  1730	{ return false; }
  1731	static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index,
  1732							   const struct irq_affinity_desc *affdesc)
  1733	{
  1734		struct msi_map map = { .index = -ENOSYS, };
  1735	
  1736		return map;
  1737	}
  1738	
  1739	static inline void pci_msix_free_irq(struct pci_dev *pdev, struct msi_map map)
  1740	{
  1741	}
  1742	
  1743	static inline void pci_free_irq_vectors(struct pci_dev *dev)
  1744	{
  1745	}
  1746	
  1747	static inline int pci_irq_vector(struct pci_dev *dev, unsigned int nr)
  1748	{
  1749		if (WARN_ON_ONCE(nr > 0))
  1750			return -EINVAL;
  1751		return dev->irq;
  1752	}
  1753	static inline const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev,
  1754			int vec)
  1755	{
  1756		return cpu_possible_mask;
  1757	}
  1758	#endif
  1759	
  1760	/**
  1761	 * pci_irqd_intx_xlate() - Translate PCI INTx value to an IRQ domain hwirq
  1762	 * @d: the INTx IRQ domain
  1763	 * @node: the DT node for the device whose interrupt we're translating
  1764	 * @intspec: the interrupt specifier data from the DT
  1765	 * @intsize: the number of entries in @intspec
  1766	 * @out_hwirq: pointer at which to write the hwirq number
  1767	 * @out_type: pointer at which to write the interrupt type
  1768	 *
  1769	 * Translate a PCI INTx interrupt number from device tree in the range 1-4, as
  1770	 * stored in the standard PCI_INTERRUPT_PIN register, to a value in the range
  1771	 * 0-3 suitable for use in a 4 entry IRQ domain. That is, subtract one from the
  1772	 * INTx value to obtain the hwirq number.
  1773	 *
  1774	 * Returns 0 on success, or -EINVAL if the interrupt specifier is out of range.
  1775	 */
  1776	static inline int pci_irqd_intx_xlate(struct irq_domain *d,
  1777					      struct device_node *node,
  1778					      const u32 *intspec,
  1779					      unsigned int intsize,
  1780					      unsigned long *out_hwirq,
  1781					      unsigned int *out_type)
  1782	{
  1783		const u32 intx = intspec[0];
  1784	
  1785		if (intx < PCI_INTERRUPT_INTA || intx > PCI_INTERRUPT_INTD)
  1786			return -EINVAL;
  1787	
  1788		*out_hwirq = intx - PCI_INTERRUPT_INTA;
  1789		return 0;
  1790	}
  1791	
  1792	#ifdef CONFIG_PCIEPORTBUS
  1793	extern bool pcie_ports_disabled;
  1794	extern bool pcie_ports_native;
  1795	
  1796	int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req,
  1797				  bool use_lt);
  1798	#else
  1799	#define pcie_ports_disabled	true
  1800	#define pcie_ports_native	false
  1801	
  1802	static inline int pcie_set_target_speed(struct pci_dev *port,
  1803						enum pci_bus_speed speed_req,
  1804						bool use_lt)
  1805	{
  1806		return -EOPNOTSUPP;
  1807	}
  1808	#endif
  1809	
  1810	#define PCIE_LINK_STATE_L0S		(BIT(0) | BIT(1)) /* Upstr/dwnstr L0s */
  1811	#define PCIE_LINK_STATE_L1		BIT(2)	/* L1 state */
  1812	#define PCIE_LINK_STATE_L1_1		BIT(3)	/* ASPM L1.1 state */
  1813	#define PCIE_LINK_STATE_L1_2		BIT(4)	/* ASPM L1.2 state */
  1814	#define PCIE_LINK_STATE_L1_1_PCIPM	BIT(5)	/* PCI-PM L1.1 state */
  1815	#define PCIE_LINK_STATE_L1_2_PCIPM	BIT(6)	/* PCI-PM L1.2 state */
  1816	#define PCIE_LINK_STATE_ASPM_ALL	(PCIE_LINK_STATE_L0S		|\
  1817						 PCIE_LINK_STATE_L1		|\
  1818						 PCIE_LINK_STATE_L1_1		|\
  1819						 PCIE_LINK_STATE_L1_2		|\
  1820						 PCIE_LINK_STATE_L1_1_PCIPM	|\
  1821						 PCIE_LINK_STATE_L1_2_PCIPM)
  1822	#define PCIE_LINK_STATE_CLKPM		BIT(7)
  1823	#define PCIE_LINK_STATE_ALL		(PCIE_LINK_STATE_ASPM_ALL	|\
  1824						 PCIE_LINK_STATE_CLKPM)
  1825	
  1826	#ifdef CONFIG_PCIEASPM
  1827	int pci_disable_link_state(struct pci_dev *pdev, int state);
  1828	int pci_disable_link_state_locked(struct pci_dev *pdev, int state);
  1829	int pci_enable_link_state(struct pci_dev *pdev, int state);
  1830	int pci_enable_link_state_locked(struct pci_dev *pdev, int state);
  1831	void pcie_no_aspm(void);
  1832	bool pcie_aspm_support_enabled(void);
  1833	bool pcie_aspm_enabled(struct pci_dev *pdev);
  1834	#else
  1835	static inline int pci_disable_link_state(struct pci_dev *pdev, int state)
  1836	{ return 0; }
  1837	static inline int pci_disable_link_state_locked(struct pci_dev *pdev, int state)
  1838	{ return 0; }
  1839	static inline int pci_enable_link_state(struct pci_dev *pdev, int state)
  1840	{ return 0; }
  1841	static inline int pci_enable_link_state_locked(struct pci_dev *pdev, int state)
  1842	{ return 0; }
  1843	static inline void pcie_no_aspm(void) { }
  1844	static inline bool pcie_aspm_support_enabled(void) { return false; }
  1845	static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; }
  1846	#endif
  1847	
  1848	#ifdef CONFIG_PCIEAER
  1849	bool pci_aer_available(void);
  1850	#else
  1851	static inline bool pci_aer_available(void) { return false; }
  1852	#endif
  1853	
  1854	bool pci_ats_disabled(void);
  1855	
  1856	#ifdef CONFIG_PCIE_PTM
  1857	int pci_enable_ptm(struct pci_dev *dev, u8 *granularity);
  1858	void pci_disable_ptm(struct pci_dev *dev);
  1859	bool pcie_ptm_enabled(struct pci_dev *dev);
  1860	#else
  1861	static inline int pci_enable_ptm(struct pci_dev *dev, u8 *granularity)
  1862	{ return -EINVAL; }
  1863	static inline void pci_disable_ptm(struct pci_dev *dev) { }
  1864	static inline bool pcie_ptm_enabled(struct pci_dev *dev)
  1865	{ return false; }
  1866	#endif
  1867	
  1868	void pci_cfg_access_lock(struct pci_dev *dev);
  1869	bool pci_cfg_access_trylock(struct pci_dev *dev);
  1870	void pci_cfg_access_unlock(struct pci_dev *dev);
  1871	
  1872	void pci_dev_lock(struct pci_dev *dev);
  1873	int pci_dev_trylock(struct pci_dev *dev);
  1874	void pci_dev_unlock(struct pci_dev *dev);
> 1875	DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
  1876	
  1877	/*
  1878	 * PCI domain support.  Sometimes called PCI segment (eg by ACPI),
  1879	 * a PCI domain is defined to be a set of PCI buses which share
  1880	 * configuration space.
  1881	 */
  1882	#ifdef CONFIG_PCI_DOMAINS
> 1883	extern int pci_domains_supported;
  1884	#else
  1885	enum { pci_domains_supported = 0 };
  1886	static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
  1887	static inline int pci_proc_domain(struct pci_bus *bus) { return 0; }
  1888	#endif /* CONFIG_PCI_DOMAINS */
  1889	
  1890	/*
  1891	 * Generic implementation for PCI domain support. If your
  1892	 * architecture does not need custom management of PCI
  1893	 * domains then this implementation will be used
  1894	 */
  1895	#ifdef CONFIG_PCI_DOMAINS_GENERIC
  1896	static inline int pci_domain_nr(struct pci_bus *bus)
  1897	{
  1898		return bus->domain_nr;
  1899	}
  1900	#ifdef CONFIG_ACPI
  1901	int acpi_pci_bus_find_domain_nr(struct pci_bus *bus);
  1902	#else
  1903	static inline int acpi_pci_bus_find_domain_nr(struct pci_bus *bus)
  1904	{ return 0; }
  1905	#endif
  1906	int pci_bus_find_domain_nr(struct pci_bus *bus, struct device *parent);
  1907	void pci_bus_release_domain_nr(struct device *parent, int domain_nr);
  1908	#endif
  1909	
  1910	/* Some architectures require additional setup to direct VGA traffic */
> 1911	typedef int (*arch_set_vga_state_t)(struct pci_dev *pdev, bool decode,
  1912					    unsigned int command_bits, u32 flags);
> 1913	void pci_register_set_vga_state(arch_set_vga_state_t func);
  1914	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
  2024-11-09 21:52   ` kernel test robot
  2024-11-10  9:51   ` kernel test robot
@ 2024-11-11  9:34   ` Andy Shevchenko
  2024-11-12 22:14     ` Nicolin Chen
  2 siblings, 1 reply; 25+ messages in thread
From: Andy Shevchenko @ 2024-11-11  9:34 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian,
	smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Fri, Nov 08, 2024 at 09:48:51PM -0800, Nicolin Chen wrote:
> Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas,
> which is a list of preset IOVAs for MSI doorbell addresses.
> 
> Add a helper that would pass in a list. A following patch will call this
> to forward msi_iovas from user space.

...

> +/**
> + * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt
> + *                                 vectors with preset msi_iovas
> + * @dev:       the PCI device to operate on
> + * @min_vecs:  minimum required number of vectors (must be >= 1)
> + * @max_vecs:  maximum desired number of vectors
> + * @flags:     allocation flags, as in pci_alloc_irq_vectors()
> + * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs]
> + *
> + * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter.
> + * Check that function docs, and &struct irq_affinity, for more details.
> + */

Always validate your kernel-doc descriptions

	scripts/kernel-doc -Wall -none -v ...

will give you a warning here.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
  2024-11-11  9:34   ` Andy Shevchenko
@ 2024-11-12 22:14     ` Nicolin Chen
  0 siblings, 0 replies; 25+ messages in thread
From: Nicolin Chen @ 2024-11-12 22:14 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian,
	smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Mon, Nov 11, 2024 at 11:34:47AM +0200, Andy Shevchenko wrote:
> On Fri, Nov 08, 2024 at 09:48:51PM -0800, Nicolin Chen wrote:
> > Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas,
> > which is a list of preset IOVAs for MSI doorbell addresses.
> >
> > Add a helper that would pass in a list. A following patch will call this
> > to forward msi_iovas from user space.
> 
> ...
> 
> > +/**
> > + * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt
> > + *                                 vectors with preset msi_iovas
> > + * @dev:       the PCI device to operate on
> > + * @min_vecs:  minimum required number of vectors (must be >= 1)
> > + * @max_vecs:  maximum desired number of vectors
> > + * @flags:     allocation flags, as in pci_alloc_irq_vectors()
> > + * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs]
> > + *
> > + * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter.
> > + * Check that function docs, and &struct irq_affinity, for more details.
> > + */
> 
> Always validate your kernel-doc descriptions
> 
>         scripts/kernel-doc -Wall -none -v ...
> 
> will give you a warning here.

Will do in next round.

Thanks!
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (5 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
@ 2024-11-09  5:48 ` Nicolin Chen
  2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
  7 siblings, 0 replies; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09  5:48 UTC (permalink / raw)
  To: maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
	kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

Add a new VFIO_IRQ_SET_ACTION_PREPARE to set VFIO_IRQ_SET_DATA_MSI_IOVA,
giving user space an interface to forward to kernel the stage-1 IOVA (of
a 2-stage translation: IOVA->IPA->PA) for an MSI doorbell address, since
the ITS hardware needs to be programmed with the top level IOVA address,
in order to work with the IOMMU on ARM64.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/vfio_pci_core.h     |  1 +
 include/uapi/linux/vfio.h         |  8 ++++--
 drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++++++++++++++++++++++++-
 drivers/vfio/vfio_main.c          |  3 +++
 4 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index fbb472dd99b3..08027b8331f0 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -63,6 +63,7 @@ struct vfio_pci_core_device {
 	int			irq_type;
 	int			num_regions;
 	struct vfio_pci_region	*region;
+	dma_addr_t		*msi_iovas;
 	u8			msi_qmax;
 	u8			msix_bar;
 	u16			msix_size;
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 2b68e6cdf190..d6be351abcde 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -590,6 +590,8 @@ struct vfio_irq_set {
 #define VFIO_IRQ_SET_ACTION_MASK	(1 << 3) /* Mask interrupt */
 #define VFIO_IRQ_SET_ACTION_UNMASK	(1 << 4) /* Unmask interrupt */
 #define VFIO_IRQ_SET_ACTION_TRIGGER	(1 << 5) /* Trigger interrupt */
+#define VFIO_IRQ_SET_DATA_MSI_IOVA	(1 << 6) /* Data is MSI IOVA (u64) */
+#define VFIO_IRQ_SET_ACTION_PREPARE	(1 << 7) /* Prepare interrupt */
 	__u32	index;
 	__u32	start;
 	__u32	count;
@@ -599,10 +601,12 @@ struct vfio_irq_set {
 
 #define VFIO_IRQ_SET_DATA_TYPE_MASK	(VFIO_IRQ_SET_DATA_NONE | \
 					 VFIO_IRQ_SET_DATA_BOOL | \
-					 VFIO_IRQ_SET_DATA_EVENTFD)
+					 VFIO_IRQ_SET_DATA_EVENTFD | \
+					 VFIO_IRQ_SET_DATA_MSI_IOVA)
 #define VFIO_IRQ_SET_ACTION_TYPE_MASK	(VFIO_IRQ_SET_ACTION_MASK | \
 					 VFIO_IRQ_SET_ACTION_UNMASK | \
-					 VFIO_IRQ_SET_ACTION_TRIGGER)
+					 VFIO_IRQ_SET_ACTION_TRIGGER | \
+					 VFIO_IRQ_SET_ACTION_PREPARE)
 /**
  * VFIO_DEVICE_RESET - _IO(VFIO_TYPE, VFIO_BASE + 11)
  *
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 8382c5834335..18bcdc5b1ef5 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -383,7 +383,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi
 
 	/* return the number of supported vectors if we can't get all: */
 	cmd = vfio_pci_memory_lock_and_enable(vdev);
-	ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag);
+	ret = pci_alloc_irq_vectors_iovas(pdev, 1, nvec, flag, vdev->msi_iovas);
 	if (ret < nvec) {
 		if (ret > 0)
 			pci_free_irq_vectors(pdev);
@@ -685,6 +685,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 
 	if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
 		vfio_msi_disable(vdev, msix);
+		/* FIXME we need a better cleanup routine */
+		kfree(vdev->msi_iovas);
+		vdev->msi_iovas = NULL;
 		return 0;
 	}
 
@@ -728,6 +731,39 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev,
 	return 0;
 }
 
+static int vfio_pci_set_msi_prepare(struct vfio_pci_core_device *vdev,
+				    unsigned index, unsigned start,
+				    unsigned count, uint32_t flags, void *data)
+{
+	uint64_t *iovas = data;
+	unsigned int i;
+
+	if (!(irq_is(vdev, index) || is_irq_none(vdev)))
+		return -EINVAL;
+
+	if (flags & VFIO_IRQ_SET_DATA_NONE) {
+		if (!count)
+			return -EINVAL;
+		/* FIXME support partial unset */
+		kfree(vdev->msi_iovas);
+		vdev->msi_iovas = NULL;
+		return 0;
+	}
+
+	if (!(flags & VFIO_IRQ_SET_DATA_MSI_IOVA))
+		return -EOPNOTSUPP;
+	if (!IS_ENABLED(CONFIG_IRQ_MSI_IOMMU))
+		return -EOPNOTSUPP;
+	if (!vdev->msi_iovas)
+		vdev->msi_iovas = kcalloc(count, sizeof(dma_addr_t), GFP_KERNEL);
+	if (!vdev->msi_iovas)
+		return -ENOMEM;
+	for (i = 0; i < count; i++)
+		vdev->msi_iovas[i] = iovas[i];
+
+	return 0;
+}
+
 static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx,
 					   unsigned int count, uint32_t flags,
 					   void *data)
@@ -837,6 +873,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags,
 		case VFIO_IRQ_SET_ACTION_TRIGGER:
 			func = vfio_pci_set_msi_trigger;
 			break;
+		case VFIO_IRQ_SET_ACTION_PREPARE:
+			func = vfio_pci_set_msi_prepare;
+			break;
 		}
 		break;
 	case VFIO_PCI_ERR_IRQ_INDEX:
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index a5a62d9d963f..61211c082a64 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1554,6 +1554,9 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
 	case VFIO_IRQ_SET_DATA_EVENTFD:
 		size = sizeof(int32_t);
 		break;
+	case VFIO_IRQ_SET_DATA_MSI_IOVA:
+		size = sizeof(uint64_t);
+		break;
 	default:
 		return -EINVAL;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
                   ` (6 preceding siblings ...)
  2024-11-09  5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen
@ 2024-11-11 13:09 ` Robin Murphy
  2024-11-11 14:14   ` Marc Zyngier
  2024-11-12 21:54   ` Nicolin Chen
  7 siblings, 2 replies; 25+ messages in thread
From: Robin Murphy @ 2024-11-11 13:09 UTC (permalink / raw)
  To: Nicolin Chen, maz, tglx, bhelgaas, alex.williamson
  Cc: jgg, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian,
	smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile,
	yebin10, brauner, apatel, shivamurthy.shastri, anna-maria,
	nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel,
	linux-pci, kvm

On 2024-11-09 5:48 am, Nicolin Chen wrote:
> On ARM GIC systems and others, the target address of the MSI is translated
> by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
> IOMMU is disabled, the MSI address is programmed to the physical location
> of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
> page is behind the IOMMU, so the MSI address is programmed to an allocated
> IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
> the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
> When a 2-stage translation is enabled, IOVA will be still used to program
> the MSI address, though the mappings will be in two stages:
>    IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000
> (IPA stands for Intermediate Physical Address).
> 
> If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
> IOVA is dynamically allocated from the top of the IOVA space. If attached
> to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
> fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
> which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
> 
> So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
> of the IOMMU translation (1-stage translation), since the IOVA for the ITS
> page is fixed and known by kernel. However, with virtual machine enabling
> a nested IOMMU translation (2-stage), a guest kernel directly controls the
> stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
> IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
> kernel can't know that guest-level IOVA to program the MSI address.
> 
> To solve this problem the VMM should capture the MSI IOVA allocated by the
> guest kernel and relay it to the GIC driver in the host kernel, to program
> the correct MSI IOVA. And this requires a new ioctl via VFIO.

Once VFIO has that information from userspace, though, do we really need 
the whole complicated dance to push it right down into the irqchip layer 
just so it can be passed back up again? AFAICS 
vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly 
rewrites MSI-X vectors, so it seems like it should be pretty 
straightforward to override the message address in general at that 
level, without the lower layers having to be aware at all, no?

Thanks,
Robin.

> Extend the VFIO path to allow an MSI target IOVA to be forwarded into the
> kernel and pushed down to the GIC driver.
> 
> Add VFIO ioctl VFIO_IRQ_SET_ACTION_PREPARE with VFIO_IRQ_SET_DATA_MSI_IOVA
> to carry the data.
> 
> The downstream calltrace is quite long from the VFIO to the ITS driver. So
> in order to carry the MSI IOVA from the top to its_irq_domain_alloc(), add
> patches in a leaf-to-root order:
> 
>    vfio_pci_core_ioctl:
>      vfio_pci_set_irqs_ioctl:
>        vfio_pci_set_msi_prepare:                           // PATCH-7
>          pci_alloc_irq_vectors_iovas:                      // PATCH-6
>            __pci_alloc_irq_vectors:                        // PATCH-5
>              __pci_enable_msi/msix_range:                  // PATCH-4
>                msi/msix_capability_init:                   // PATCH-3
>                  msi/msix_setup_msi_descs:
>                    msi_insert_msi_desc();                  // PATCH-1
>                  pci_msi_setup_msi_irqs:
>                    msi_domain_alloc_irqs_all_locked:
>                      __msi_domain_alloc_locked:
>                        __msi_domain_alloc_irqs:
>                          __irq_domain_alloc_irqs:
>                            irq_domain_alloc_irqs_locked:
>                              irq_domain_alloc_irqs_hierarchy:
>                                msi_domain_alloc:
>                                  irq_domain_alloc_irqs_parent:
>                                    its_irq_domain_alloc(); // PATCH-2
> 
> Note that this series solves half the problem, since it only allows kernel
> to set the physical PCI MSI/MSI-X on the device with the correct head IOVA
> of a 2-stage translation, where the guest kernel does the stage-1 mapping
> that MSI IOVA (0xEEEE0000) to its own vITS page (0x80900000) while missing
> the stage-2 mapping from that IPA to the physical ITS page:
>    0xEEEE0000 ===> 0x80900000 =x=> 0x20200000
> A followup series should fill that gap, doing the stage-2 mapping from the
> vITS page 0x80900000 to the physical ITS page (0x20200000), likely via new
> IOMMUFD ioctl. Once VMM sets up this stage-2 mapping, VM will act the same
> as bare metal relying on a running kernel to handle the stage-1 mapping:
>    0xEEEE0000 ===> 0x80900000 ===> 0x20200000
> 
> This series (prototype) is on Github:
> https://github.com/nicolinc/iommufd/commits/vfio_msi_giova-rfcv1/
> It's tested by hacking the host kernel to hard-code a stage-2 mapping.
> 
> Thanks!
> Nicolin
> 
> Nicolin Chen (7):
>    genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell
>      address
>    irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset
>    PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
>    PCI/MSI: Allow __pci_enable_msi_range to pass in iova
>    PCI/MSI: Extract a common __pci_alloc_irq_vectors function
>    PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
>    vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE
> 
>   drivers/pci/msi/msi.h             |   3 +-
>   include/linux/msi.h               |  11 +++
>   include/linux/pci.h               |  18 ++++
>   include/linux/vfio_pci_core.h     |   1 +
>   include/uapi/linux/vfio.h         |   8 +-
>   drivers/irqchip/irq-gic-v3-its.c  |  21 ++++-
>   drivers/pci/msi/api.c             | 136 ++++++++++++++++++++----------
>   drivers/pci/msi/msi.c             |  20 +++--
>   drivers/vfio/pci/vfio_pci_intrs.c |  41 ++++++++-
>   drivers/vfio/vfio_main.c          |   3 +
>   kernel/irq/msi.c                  |   6 ++
>   11 files changed, 212 insertions(+), 56 deletions(-)
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
@ 2024-11-11 14:14   ` Marc Zyngier
  2024-11-12 22:13     ` Nicolin Chen
  2024-11-12 21:54   ` Nicolin Chen
  1 sibling, 1 reply; 25+ messages in thread
From: Marc Zyngier @ 2024-11-11 14:14 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Nicolin Chen, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Mon, 11 Nov 2024 13:09:20 +0000,
Robin Murphy <robin.murphy@arm.com> wrote:
> 
> On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > On ARM GIC systems and others, the target address of the MSI is translated
> > by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
> > IOMMU is disabled, the MSI address is programmed to the physical location
> > of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
> > page is behind the IOMMU, so the MSI address is programmed to an allocated
> > IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
> > the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
> > When a 2-stage translation is enabled, IOVA will be still used to program
> > the MSI address, though the mappings will be in two stages:
> >    IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000
> > (IPA stands for Intermediate Physical Address).
> > 
> > If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
> > IOVA is dynamically allocated from the top of the IOVA space. If attached
> > to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
> > fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
> > which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
> > 
> > So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
> > of the IOMMU translation (1-stage translation), since the IOVA for the ITS
> > page is fixed and known by kernel. However, with virtual machine enabling
> > a nested IOMMU translation (2-stage), a guest kernel directly controls the
> > stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
> > IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
> > kernel can't know that guest-level IOVA to program the MSI address.
> > 
> > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > guest kernel and relay it to the GIC driver in the host kernel, to program
> > the correct MSI IOVA. And this requires a new ioctl via VFIO.
> 
> Once VFIO has that information from userspace, though, do we really
> need the whole complicated dance to push it right down into the
> irqchip layer just so it can be passed back up again? AFAICS
> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already
> explicitly rewrites MSI-X vectors, so it seems like it should be
> pretty straightforward to override the message address in general at
> that level, without the lower layers having to be aware at all, no?

+1.

I would like to avoid polluting each and every interrupt controller
with usage-specific knowledge (they usually are brain-damaged enough).
We already have an indirection into the IOMMU subsystem and it
shouldn't be a big deal to intercept the message for all
implementations at this level.

I also wonder how to handle the case of braindead^Wwonderful platforms
where ITS transactions are not translated by the SMMU. Somehow, VFIO
should be made aware of this situation.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-11 14:14   ` Marc Zyngier
@ 2024-11-12 22:13     ` Nicolin Chen
  0 siblings, 0 replies; 25+ messages in thread
From: Nicolin Chen @ 2024-11-12 22:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Robin Murphy, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Mon, Nov 11, 2024 at 02:14:15PM +0000, Marc Zyngier wrote:
> On Mon, 11 Nov 2024 13:09:20 +0000,
> Robin Murphy <robin.murphy@arm.com> wrote:
> > On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > > guest kernel and relay it to the GIC driver in the host kernel, to program
> > > the correct MSI IOVA. And this requires a new ioctl via VFIO.
> >
> > Once VFIO has that information from userspace, though, do we really
> > need the whole complicated dance to push it right down into the
> > irqchip layer just so it can be passed back up again? AFAICS
> > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already
> > explicitly rewrites MSI-X vectors, so it seems like it should be
> > pretty straightforward to override the message address in general at
> > that level, without the lower layers having to be aware at all, no?
> 
> +1.
> 
> I would like to avoid polluting each and every interrupt controller
> with usage-specific knowledge (they usually are brain-damaged enough).
> We already have an indirection into the IOMMU subsystem and it
> shouldn't be a big deal to intercept the message for all
> implementations at this level.
> 
> I also wonder how to handle the case of braindead^Wwonderful platforms
> where ITS transactions are not translated by the SMMU. Somehow, VFIO
> should be made aware of this situation.

Perhaps we should do iommu_get_domain_for_dev(&vdev->pdev->dev) and
check the returned domain->type:
 * if (domain->type & __IOMMU_DOMAIN_PAGING): 1-stage translation
 * if (domain->type == IOMMU_DOMAIN_NESTED): 2-stage translation

And for this particular topic/series, we should do something like:
	if (vdev->msi_iovas && domain->type == IOMMU_DOMAIN_NESTED) {
		msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
		msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
	}
?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
  2024-11-11 14:14   ` Marc Zyngier
@ 2024-11-12 21:54   ` Nicolin Chen
  2024-11-13  1:34     ` Jason Gunthorpe
  1 sibling, 1 reply; 25+ messages in thread
From: Nicolin Chen @ 2024-11-12 21:54 UTC (permalink / raw)
  To: Robin Murphy
  Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
> On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > guest kernel and relay it to the GIC driver in the host kernel, to program
> > the correct MSI IOVA. And this requires a new ioctl via VFIO.
> 
> Once VFIO has that information from userspace, though, do we really need
> the whole complicated dance to push it right down into the irqchip layer
> just so it can be passed back up again? AFAICS
> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
> rewrites MSI-X vectors, so it seems like it should be pretty
> straightforward to override the message address in general at that
> level, without the lower layers having to be aware at all, no?

Didn't see that clearly!! It works with a simple following override:
--------------------------------------------------------------------
@@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
                struct msi_msg msg;

                get_cached_msi_msg(irq, &msg);
+               if (vdev->msi_iovas) {
+                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
+                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
+               }
                pci_write_msi_msg(irq, &msg);
        }
 
--------------------------------------------------------------------

With that, I think we only need one VFIO change for this part :)

Thanks!
Nicolin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-12 21:54   ` Nicolin Chen
@ 2024-11-13  1:34     ` Jason Gunthorpe
  2024-11-13 21:11       ` Alex Williamson
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2024-11-13  1:34 UTC (permalink / raw)
  To: Nicolin Chen, tglx, alex.williamson
  Cc: Robin Murphy, maz, bhelgaas, leonro, shameerali.kolothum.thodi,
	dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
	eric.auger, ddutile, yebin10, brauner, apatel,
	shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
	linux-arm-kernel, linux-kernel, linux-pci, kvm

On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
> > On 2024-11-09 5:48 am, Nicolin Chen wrote:
> > > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > > guest kernel and relay it to the GIC driver in the host kernel, to program
> > > the correct MSI IOVA. And this requires a new ioctl via VFIO.
> > 
> > Once VFIO has that information from userspace, though, do we really need
> > the whole complicated dance to push it right down into the irqchip layer
> > just so it can be passed back up again? AFAICS
> > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
> > rewrites MSI-X vectors, so it seems like it should be pretty
> > straightforward to override the message address in general at that
> > level, without the lower layers having to be aware at all, no?
> 
> Didn't see that clearly!! It works with a simple following override:
> --------------------------------------------------------------------
> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>                 struct msi_msg msg;
> 
>                 get_cached_msi_msg(irq, &msg);
> +               if (vdev->msi_iovas) {
> +                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
> +                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
> +               }
>                 pci_write_msi_msg(irq, &msg);
>         }
>  
> --------------------------------------------------------------------
> 
> With that, I think we only need one VFIO change for this part :)

Wow, is that really OK from a layering perspective? The comment is
pretty clear on the intention that this is to resync the irq layer
view of the device with the physical HW.

Editing the msi_msg while doing that resync smells bad.

Also, this is only doing MSI-X, we should include normal MSI as
well. (it probably should have a resync too?)

I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
context)

I think there are many options here we just need to get a clearer
understanding what best fits the architecture of the interrupt
subsystem.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-13  1:34     ` Jason Gunthorpe
@ 2024-11-13 21:11       ` Alex Williamson
  2024-11-14 15:35         ` Robin Murphy
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Williamson @ 2024-11-13 21:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, tglx, Robin Murphy, maz, bhelgaas, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Tue, 12 Nov 2024 21:34:30 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
> > On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:  
> > > On 2024-11-09 5:48 am, Nicolin Chen wrote:  
> > > > To solve this problem the VMM should capture the MSI IOVA allocated by the
> > > > guest kernel and relay it to the GIC driver in the host kernel, to program
> > > > the correct MSI IOVA. And this requires a new ioctl via VFIO.  
> > > 
> > > Once VFIO has that information from userspace, though, do we really need
> > > the whole complicated dance to push it right down into the irqchip layer
> > > just so it can be passed back up again? AFAICS
> > > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
> > > rewrites MSI-X vectors, so it seems like it should be pretty
> > > straightforward to override the message address in general at that
> > > level, without the lower layers having to be aware at all, no?  
> > 
> > Didn't see that clearly!! It works with a simple following override:
> > --------------------------------------------------------------------
> > @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
> >                 struct msi_msg msg;
> > 
> >                 get_cached_msi_msg(irq, &msg);
> > +               if (vdev->msi_iovas) {
> > +                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
> > +                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
> > +               }
> >                 pci_write_msi_msg(irq, &msg);
> >         }
> >  
> > --------------------------------------------------------------------
> > 
> > With that, I think we only need one VFIO change for this part :)  
> 
> Wow, is that really OK from a layering perspective? The comment is
> pretty clear on the intention that this is to resync the irq layer
> view of the device with the physical HW.
> 
> Editing the msi_msg while doing that resync smells bad.
> 
> Also, this is only doing MSI-X, we should include normal MSI as
> well. (it probably should have a resync too?)

This was added for a specific IBM HBA that clears the vector table
during a built-in self test, so it's possible the MSI table being in
config space never had the same issue, or we just haven't encountered
it.  I don't expect anything else actually requires this.

> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
> context)

It seems suspect to me too.  In a sense it is still just synchronizing
the MSI address, but to a different address space.

Is it possible to do this with the existing write_msi_msg callback on
the msi descriptor?  For instance we could simply translate the msg
address and call pci_write_msi_msg() (while avoiding an infinite
recursion).  Or maybe there should be an xlate_msi_msg callback we can
register.  Or I suppose there might be a way to insert an irqchip that
does the translation on write.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-13 21:11       ` Alex Williamson
@ 2024-11-14 15:35         ` Robin Murphy
  2024-11-20 13:17           ` Eric Auger
  0 siblings, 1 reply; 25+ messages in thread
From: Robin Murphy @ 2024-11-14 15:35 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: Nicolin Chen, tglx, maz, bhelgaas, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10,
	brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On 13/11/2024 9:11 pm, Alex Williamson wrote:
> On Tue, 12 Nov 2024 21:34:30 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
>> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
>>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
>>>> On 2024-11-09 5:48 am, Nicolin Chen wrote:
>>>>> To solve this problem the VMM should capture the MSI IOVA allocated by the
>>>>> guest kernel and relay it to the GIC driver in the host kernel, to program
>>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO.
>>>>
>>>> Once VFIO has that information from userspace, though, do we really need
>>>> the whole complicated dance to push it right down into the irqchip layer
>>>> just so it can be passed back up again? AFAICS
>>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly
>>>> rewrites MSI-X vectors, so it seems like it should be pretty
>>>> straightforward to override the message address in general at that
>>>> level, without the lower layers having to be aware at all, no?
>>>
>>> Didn't see that clearly!! It works with a simple following override:
>>> --------------------------------------------------------------------
>>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev,
>>>                  struct msi_msg msg;
>>>
>>>                  get_cached_msi_msg(irq, &msg);
>>> +               if (vdev->msi_iovas) {
>>> +                       msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]);
>>> +                       msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]);
>>> +               }
>>>                  pci_write_msi_msg(irq, &msg);
>>>          }
>>>   
>>> --------------------------------------------------------------------
>>>
>>> With that, I think we only need one VFIO change for this part :)
>>
>> Wow, is that really OK from a layering perspective? The comment is
>> pretty clear on the intention that this is to resync the irq layer
>> view of the device with the physical HW.
>>
>> Editing the msi_msg while doing that resync smells bad.
>>
>> Also, this is only doing MSI-X, we should include normal MSI as
>> well. (it probably should have a resync too?)
> 
> This was added for a specific IBM HBA that clears the vector table
> during a built-in self test, so it's possible the MSI table being in
> config space never had the same issue, or we just haven't encountered
> it.  I don't expect anything else actually requires this.

Yeah, I wasn't really suggesting to literally hook into this exact case; 
it was more just a general observation that if VFIO already has one 
justification for tinkering with pci_write_msi_msg() directly without 
going through the msi_domain layer, then adding another (wherever it 
fits best) can't be *entirely* unreasonable.

At the end of the day, the semantic here is that VFIO does know more 
than the IRQ layer, and does need to program the endpoint differently 
from what the irqchip assumes, so I don't see much benefit in dressing 
that up more than functionally necessary.

>> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
>> context)
> 
> It seems suspect to me too.  In a sense it is still just synchronizing
> the MSI address, but to a different address space.
> 
> Is it possible to do this with the existing write_msi_msg callback on
> the msi descriptor?  For instance we could simply translate the msg
> address and call pci_write_msi_msg() (while avoiding an infinite
> recursion).  Or maybe there should be an xlate_msi_msg callback we can
> register.  Or I suppose there might be a way to insert an irqchip that
> does the translation on write.  Thanks,

I'm far from keen on the idea, but if there really is an appetite for 
more indirection, then I guess the least-worst option would be yet 
another type of iommu_dma_cookie to work via the existing 
iommu_dma_compose_msi_msg() flow, with some interface for VFIO to update 
per-device addresses directly. But then it's still going to need some 
kind of "layering violation" for VFIO to poke the IRQ layer into 
re-composing and re-writing a message whenever userspace feels like 
changing an address, because we're fundamentally stepping outside the 
established lifecycle of a kernel-managed IRQ around which said layering 
was designed...

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-14 15:35         ` Robin Murphy
@ 2024-11-20 13:17           ` Eric Auger
  2024-11-20 14:03             ` Jason Gunthorpe
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Auger @ 2024-11-20 13:17 UTC (permalink / raw)
  To: Robin Murphy, Alex Williamson, Jason Gunthorpe
  Cc: Nicolin Chen, tglx, maz, bhelgaas, leonro,
	shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm



On 11/14/24 16:35, Robin Murphy wrote:
> On 13/11/2024 9:11 pm, Alex Williamson wrote:
>> On Tue, 12 Nov 2024 21:34:30 -0400
>> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>
>>> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote:
>>>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote:
>>>>> On 2024-11-09 5:48 am, Nicolin Chen wrote:
>>>>>> To solve this problem the VMM should capture the MSI IOVA
>>>>>> allocated by the
>>>>>> guest kernel and relay it to the GIC driver in the host kernel,
>>>>>> to program
>>>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO.
>>>>>
>>>>> Once VFIO has that information from userspace, though, do we
>>>>> really need
>>>>> the whole complicated dance to push it right down into the irqchip
>>>>> layer
>>>>> just so it can be passed back up again? AFAICS
>>>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already
>>>>> explicitly
>>>>> rewrites MSI-X vectors, so it seems like it should be pretty
>>>>> straightforward to override the message address in general at that
>>>>> level, without the lower layers having to be aware at all, no?
>>>>
>>>> Didn't see that clearly!! It works with a simple following override:
>>>> --------------------------------------------------------------------
>>>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct
>>>> vfio_pci_core_device *vdev,
>>>>                  struct msi_msg msg;
>>>>
>>>>                  get_cached_msi_msg(irq, &msg);
>>>> +               if (vdev->msi_iovas) {
>>>> +                       msg.address_lo =
>>>> lower_32_bits(vdev->msi_iovas[vector]);
>>>> +                       msg.address_hi =
>>>> upper_32_bits(vdev->msi_iovas[vector]);
>>>> +               }
>>>>                  pci_write_msi_msg(irq, &msg);
>>>>          }
>>>>   --------------------------------------------------------------------
>>>>
>>>> With that, I think we only need one VFIO change for this part :)
>>>
>>> Wow, is that really OK from a layering perspective? The comment is
>>> pretty clear on the intention that this is to resync the irq layer
>>> view of the device with the physical HW.
>>>
>>> Editing the msi_msg while doing that resync smells bad.
>>>
>>> Also, this is only doing MSI-X, we should include normal MSI as
>>> well. (it probably should have a resync too?)
>>
>> This was added for a specific IBM HBA that clears the vector table
>> during a built-in self test, so it's possible the MSI table being in
>> config space never had the same issue, or we just haven't encountered
>> it.  I don't expect anything else actually requires this.
>
> Yeah, I wasn't really suggesting to literally hook into this exact
> case; it was more just a general observation that if VFIO already has
> one justification for tinkering with pci_write_msi_msg() directly
> without going through the msi_domain layer, then adding another
> (wherever it fits best) can't be *entirely* unreasonable.
>
> At the end of the day, the semantic here is that VFIO does know more
> than the IRQ layer, and does need to program the endpoint differently
> from what the irqchip assumes, so I don't see much benefit in dressing
> that up more than functionally necessary.
>
>>> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for
>>> context)
>>
>> It seems suspect to me too.  In a sense it is still just synchronizing
>> the MSI address, but to a different address space.
>>
>> Is it possible to do this with the existing write_msi_msg callback on
>> the msi descriptor?  For instance we could simply translate the msg
>> address and call pci_write_msi_msg() (while avoiding an infinite
>> recursion).  Or maybe there should be an xlate_msi_msg callback we can
>> register.  Or I suppose there might be a way to insert an irqchip that
>> does the translation on write.  Thanks,
>
> I'm far from keen on the idea, but if there really is an appetite for
> more indirection, then I guess the least-worst option would be yet
> another type of iommu_dma_cookie to work via the existing
> iommu_dma_compose_msi_msg() flow, with some interface for VFIO to
> update per-device addresses direcitly. But then it's still going to
> need some kind of "layering violation" for VFIO to poke the IRQ layer
> into re-composing and re-writing a message whenever userspace feels
> like changing an address, because we're fundamentally stepping outside
> the established lifecycle of a kernel-managed IRQ around which said
> layering was designed...

for the record, the first integration was based on such distinct
iommu_dma_cookie

[PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) <https://lore.kernel.org/all/20210411111228.14386-1-eric.auger@redhat.com/#r>, patches 8 - 11

Thanks

Eric



>
> Thanks,
> Robin.
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-20 13:17           ` Eric Auger
@ 2024-11-20 14:03             ` Jason Gunthorpe
  2024-11-28 11:15               ` Thomas Gleixner
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2024-11-20 14:03 UTC (permalink / raw)
  To: Eric Auger
  Cc: Robin Murphy, Alex Williamson, Nicolin Chen, tglx, maz, bhelgaas,
	leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Wed, Nov 20, 2024 at 02:17:46PM +0100, Eric Auger wrote:
> > Yeah, I wasn't really suggesting to literally hook into this exact
> > case; it was more just a general observation that if VFIO already has
> > one justification for tinkering with pci_write_msi_msg() directly
> > without going through the msi_domain layer, then adding another
> > (wherever it fits best) can't be *entirely* unreasonable.

I'm not sure that we can assume VFIO is the only thing touching the
interrupt programming.

I think there is a KVM path, and also the /proc/ path that will change
the MSI affinity on the fly for a VFIO created IRQ. If the platform
requires a MSI update to do this (ie encoding affinity in the
add/data, not using IRQ remapping HW) then we still need to ensure the
correct MSI address is hooked in.

> >> Is it possible to do this with the existing write_msi_msg callback on
> >> the msi descriptor?  For instance we could simply translate the msg
> >> address and call pci_write_msi_msg() (while avoiding an infinite
> >> recursion).  Or maybe there should be an xlate_msi_msg callback we can
> >> register.  Or I suppose there might be a way to insert an irqchip that
> >> does the translation on write.  Thanks,
> >
> > I'm far from keen on the idea, but if there really is an appetite for
> > more indirection, then I guess the least-worst option would be yet
> > another type of iommu_dma_cookie to work via the existing
> > iommu_dma_compose_msi_msg() flow, 

For this direction I think I would turn iommu_dma_compose_msi_msg()
into a function pointer stored in the iommu_domain and have
vfio/iommufd provide its own implementation. The thing that is in
control of the domain's translation should be providing the msi_msg.

> > update per-device addresses direcitly. But then it's still going to
> > need some kind of "layering violation" for VFIO to poke the IRQ layer
> > into re-composing and re-writing a message whenever userspace feels
> > like changing an address

I think we'd need to get into the affinity update path and force a MSI
write as well, even if the platform isn't changing the MSI for
affinity. Processing a vMSI entry update would be two steps where we
update the MSI addr in VFIO and then set the affinity.

> for the record, the first integration was based on such distinct
> iommu_dma_cookie
> 
> [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) <https://lore.kernel.org/all/20210411111228.14386-1-eric.auger@redhat.com/#r>, patches 8 - 11

There are some significant differences from that series with this idea:

 - We want to maintain a per-MSI-index/per-device lookup table. It
   is not just a simple cookie, the msi_desc->dev &&
   msi_desc->msi_index have to be matched against what userspace
   provides in the per-vMSI IOCTL

 - There would be no implicit progamming of the stage 2, this will be
   done directly in userspace by creating an IOAS area for the ITS page

 - It shouldn't have any sort of dynamic allocation behavior. It is an
   error for the kernel to ask for an msi_desc that userspace hasn't
   provided a mapping for
 
 - It should work well with nested and non-nested domains

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
  2024-11-20 14:03             ` Jason Gunthorpe
@ 2024-11-28 11:15               ` Thomas Gleixner
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2024-11-28 11:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Eric Auger
  Cc: Robin Murphy, Alex Williamson, Nicolin Chen, maz, bhelgaas,
	leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa,
	andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner,
	apatel, shivamurthy.shastri, anna-maria, nipun.gupta,
	marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci,
	kvm

On Wed, Nov 20 2024 at 10:03, Jason Gunthorpe wrote:
> On Wed, Nov 20, 2024 at 02:17:46PM +0100, Eric Auger wrote:
>> > Yeah, I wasn't really suggesting to literally hook into this exact
>> > case; it was more just a general observation that if VFIO already has
>> > one justification for tinkering with pci_write_msi_msg() directly
>> > without going through the msi_domain layer, then adding another
>> > (wherever it fits best) can't be *entirely* unreasonable.
>
> I'm not sure that we can assume VFIO is the only thing touching the
> interrupt programming.

Correct.

> I think there is a KVM path, and also the /proc/ path that will change
> the MSI affinity on the fly for a VFIO created IRQ. If the platform
> requires a MSI update to do this (ie encoding affinity in the
> add/data, not using IRQ remapping HW) then we still need to ensure the
> correct MSI address is hooked in.

Yes.

>> >> Is it possible to do this with the existing write_msi_msg callback on
>> >> the msi descriptor?  For instance we could simply translate the msg
>> >> address and call pci_write_msi_msg() (while avoiding an infinite
>> >> recursion).  Or maybe there should be an xlate_msi_msg callback we can
>> >> register.  Or I suppose there might be a way to insert an irqchip that
>> >> does the translation on write.  Thanks,
>> >
>> > I'm far from keen on the idea, but if there really is an appetite for
>> > more indirection, then I guess the least-worst option would be yet
>> > another type of iommu_dma_cookie to work via the existing
>> > iommu_dma_compose_msi_msg() flow, 
>
> For this direction I think I would turn iommu_dma_compose_msi_msg()
> into a function pointer stored in the iommu_domain and have
> vfio/iommufd provide its own implementation. The thing that is in
> control of the domain's translation should be providing the msi_msg.

Yes. The resulting cached message should be writeable as is.

>> > update per-device addresses direcitly. But then it's still going to
>> > need some kind of "layering violation" for VFIO to poke the IRQ layer
>> > into re-composing and re-writing a message whenever userspace feels
>> > like changing an address
>
> I think we'd need to get into the affinity update path and force a MSI
> write as well, even if the platform isn't changing the MSI for
> affinity. Processing a vMSI entry update would be two steps where we
> update the MSI addr in VFIO and then set the affinity.

The affinity callback of the domain/chip can return IRQ_SET_MASK_OK_DONE
which prevents recomposing and writing the message.

So you want a explicit update/write of the message similar to what
msi_domain_activate() does.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-11-28 11:15 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-09  5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen
2024-11-09  9:43   ` kernel test robot
2024-11-09  5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen
2024-11-11  9:30   ` Andy Shevchenko
2024-11-09  5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen
2024-11-11  9:33   ` Andy Shevchenko
2024-11-09  5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen
2024-11-09 21:52   ` kernel test robot
2024-11-10  9:51   ` kernel test robot
2024-11-11  9:34   ` Andy Shevchenko
2024-11-12 22:14     ` Nicolin Chen
2024-11-09  5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen
2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy
2024-11-11 14:14   ` Marc Zyngier
2024-11-12 22:13     ` Nicolin Chen
2024-11-12 21:54   ` Nicolin Chen
2024-11-13  1:34     ` Jason Gunthorpe
2024-11-13 21:11       ` Alex Williamson
2024-11-14 15:35         ` Robin Murphy
2024-11-20 13:17           ` Eric Auger
2024-11-20 14:03             ` Jason Gunthorpe
2024-11-28 11:15               ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.