* [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector
@ 2024-11-09 5:48 Nicolin Chen
2024-11-09 5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen
` (7 more replies)
0 siblings, 8 replies; 25+ messages in thread
From: Nicolin Chen @ 2024-11-09 5:48 UTC (permalink / raw)
To: maz, tglx, bhelgaas, alex.williamson
Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal,
kevin.tian, smostafa, andriy.shevchenko, reinette.chatre,
eric.auger, ddutile, yebin10, brauner, apatel,
shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas,
linux-arm-kernel, linux-kernel, linux-pci, kvm
On ARM GIC systems and others, the target address of the MSI is translated
by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
IOMMU is disabled, the MSI address is programmed to the physical location
of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
page is behind the IOMMU, so the MSI address is programmed to an allocated
IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
When a 2-stage translation is enabled, IOVA will be still used to program
the MSI address, though the mappings will be in two stages:
IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000
(IPA stands for Intermediate Physical Address).
If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
IOVA is dynamically allocated from the top of the IOVA space. If attached
to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
of the IOMMU translation (1-stage translation), since the IOVA for the ITS
page is fixed and known by kernel. However, with virtual machine enabling
a nested IOMMU translation (2-stage), a guest kernel directly controls the
stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
kernel can't know that guest-level IOVA to program the MSI address.
To solve this problem the VMM should capture the MSI IOVA allocated by the
guest kernel and relay it to the GIC driver in the host kernel, to program
the correct MSI IOVA. And this requires a new ioctl via VFIO.
Extend the VFIO path to allow an MSI target IOVA to be forwarded into the
kernel and pushed down to the GIC driver.
Add VFIO ioctl VFIO_IRQ_SET_ACTION_PREPARE with VFIO_IRQ_SET_DATA_MSI_IOVA
to carry the data.
The downstream calltrace is quite long from the VFIO to the ITS driver. So
in order to carry the MSI IOVA from the top to its_irq_domain_alloc(), add
patches in a leaf-to-root order:
vfio_pci_core_ioctl:
vfio_pci_set_irqs_ioctl:
vfio_pci_set_msi_prepare: // PATCH-7
pci_alloc_irq_vectors_iovas: // PATCH-6
__pci_alloc_irq_vectors: // PATCH-5
__pci_enable_msi/msix_range: // PATCH-4
msi/msix_capability_init: // PATCH-3
msi/msix_setup_msi_descs:
msi_insert_msi_desc(); // PATCH-1
pci_msi_setup_msi_irqs:
msi_domain_alloc_irqs_all_locked:
__msi_domain_alloc_locked:
__msi_domain_alloc_irqs:
__irq_domain_alloc_irqs:
irq_domain_alloc_irqs_locked:
irq_domain_alloc_irqs_hierarchy:
msi_domain_alloc:
irq_domain_alloc_irqs_parent:
its_irq_domain_alloc(); // PATCH-2
Note that this series solves half the problem, since it only allows kernel
to set the physical PCI MSI/MSI-X on the device with the correct head IOVA
of a 2-stage translation, where the guest kernel does the stage-1 mapping
that MSI IOVA (0xEEEE0000) to its own vITS page (0x80900000) while missing
the stage-2 mapping from that IPA to the physical ITS page:
0xEEEE0000 ===> 0x80900000 =x=> 0x20200000
A followup series should fill that gap, doing the stage-2 mapping from the
vITS page 0x80900000 to the physical ITS page (0x20200000), likely via new
IOMMUFD ioctl. Once VMM sets up this stage-2 mapping, VM will act the same
as bare metal relying on a running kernel to handle the stage-1 mapping:
0xEEEE0000 ===> 0x80900000 ===> 0x20200000
This series (prototype) is on Github:
https://github.com/nicolinc/iommufd/commits/vfio_msi_giova-rfcv1/
It's tested by hacking the host kernel to hard-code a stage-2 mapping.
Thanks!
Nicolin
Nicolin Chen (7):
genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell
address
irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset
PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc
PCI/MSI: Allow __pci_enable_msi_range to pass in iova
PCI/MSI: Extract a common __pci_alloc_irq_vectors function
PCI/MSI: Add pci_alloc_irq_vectors_iovas helper
vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE
drivers/pci/msi/msi.h | 3 +-
include/linux/msi.h | 11 +++
include/linux/pci.h | 18 ++++
include/linux/vfio_pci_core.h | 1 +
include/uapi/linux/vfio.h | 8 +-
drivers/irqchip/irq-gic-v3-its.c | 21 ++++-
drivers/pci/msi/api.c | 136 ++++++++++++++++++++----------
drivers/pci/msi/msi.c | 20 +++--
drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++-
drivers/vfio/vfio_main.c | 3 +
kernel/irq/msi.c | 6 ++
11 files changed, 212 insertions(+), 56 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 25+ messages in thread* [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen @ 2024-11-09 5:48 ` Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen ` (6 subsequent siblings) 7 siblings, 0 replies; 25+ messages in thread From: Nicolin Chen @ 2024-11-09 5:48 UTC (permalink / raw) To: maz, tglx, bhelgaas, alex.williamson Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm Currently, the IOVA and its mapping (to physical doorbell address) is done by the dma-iommu via the iommu_cookie in the struct msi_desc, which is the structure used by three parties: genirq, irqchip and dma-iommu. However, when dealing with a nested translation on ARM64, the MSI doorbell address is behind the SMMU (IOMMU on ARM64), thus HW needs to be programed with the stage-1 IOVA, i.e. guest-level IOVA, rather than asking dma-iommu to allocate one. To support a guest-programmable pathway, first we need to make sure struct msi_desc will allow a preset IOVA v.s. using iommu_cookie. Add an msi_iova to the structure and init its value to PHYS_ADDR_MAX. And provide a helper for drivers to get the msi_iova out of an msi_desc object. A following patch will change msi_setup_msi_descs and msix_setup_msi_descs to call msi_domain_insert_msi_desc and finish the actual value forwarding. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- include/linux/msi.h | 11 +++++++++++ kernel/irq/msi.c | 6 ++++++ 2 files changed, 17 insertions(+) diff --git a/include/linux/msi.h b/include/linux/msi.h index b10093c4d00e..873094743065 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -185,6 +185,7 @@ struct msi_desc { struct irq_affinity_desc *affinity; #ifdef CONFIG_IRQ_MSI_IOMMU const void *iommu_cookie; + dma_addr_t msi_iova; #endif #ifdef CONFIG_SYSFS struct device_attribute *sysfs_attrs; @@ -296,6 +297,11 @@ static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, { desc->iommu_cookie = iommu_cookie; } + +static inline dma_addr_t msi_desc_get_iova(struct msi_desc *desc) +{ + return desc->msi_iova; +} #else static inline const void *msi_desc_get_iommu_cookie(struct msi_desc *desc) { @@ -306,6 +312,11 @@ static inline void msi_desc_set_iommu_cookie(struct msi_desc *desc, const void *iommu_cookie) { } + +static inline dma_addr_t msi_desc_get_iova(struct msi_desc *desc) +{ + return PHYS_ADDR_MAX; +} #endif int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid, diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index 3a24d6b5f559..f3159ec0f036 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -81,6 +81,9 @@ static struct msi_desc *msi_alloc_desc(struct device *dev, int nvec, desc->dev = dev; desc->nvec_used = nvec; +#ifdef CONFIG_IRQ_MSI_IOMMU + desc->msi_iova = PHYS_ADDR_MAX; +#endif if (affinity) { desc->affinity = kmemdup_array(affinity, nvec, sizeof(*desc->affinity), GFP_KERNEL); if (!desc->affinity) { @@ -158,6 +161,9 @@ int msi_domain_insert_msi_desc(struct device *dev, unsigned int domid, /* Copy type specific data to the new descriptor. */ desc->pci = init_desc->pci; +#ifdef CONFIG_IRQ_MSI_IOMMU + desc->msi_iova = init_desc->msi_iova; +#endif return msi_insert_desc(dev, desc, domid, init_desc->msi_index); } -- 2.43.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen @ 2024-11-09 5:48 ` Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen ` (5 subsequent siblings) 7 siblings, 0 replies; 25+ messages in thread From: Nicolin Chen @ 2024-11-09 5:48 UTC (permalink / raw) To: maz, tglx, bhelgaas, alex.williamson Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm Now struct msi_desc can carry a preset IOVA for MSI doorbell address. This is typically preset by user space when engaging a 2-stage translation. So, use the preset IOVA instead of kernel-level IOVA allocations in dma-iommu. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/irqchip/irq-gic-v3-its.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index ab597e74ba08..bc1768576546 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1723,6 +1723,8 @@ static u64 its_irq_get_msi_base(struct its_device *its_dev) static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) { struct its_device *its_dev = irq_data_get_irq_chip_data(d); + struct msi_desc *desc = irq_data_get_msi_desc(d); + dma_addr_t iova = msi_desc_get_iova(desc); struct its_node *its; u64 addr; @@ -1733,7 +1735,13 @@ static void its_irq_compose_msi_msg(struct irq_data *d, struct msi_msg *msg) msg->address_hi = upper_32_bits(addr); msg->data = its_get_event_id(d); - iommu_dma_compose_msi_msg(irq_data_get_msi_desc(d), msg); + /* Bypass iommu_dma_compose_msi_msg if msi_iova is preset */ + if (iova == PHYS_ADDR_MAX) { + iommu_dma_compose_msi_msg(desc, msg); + } else { + msg->address_lo = lower_32_bits(iova); + msg->address_hi = upper_32_bits(iova); + } } static int its_irq_set_irqchip_state(struct irq_data *d, @@ -3570,6 +3578,7 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, { msi_alloc_info_t *info = args; struct its_device *its_dev = info->scratchpad[0].ptr; + dma_addr_t iova = msi_desc_get_iova(info->desc); struct its_node *its = its_dev->its; struct irq_data *irqd; irq_hw_number_t hwirq; @@ -3580,9 +3589,13 @@ static int its_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; - err = iommu_dma_prepare_msi(info->desc, its->get_msi_base(its_dev)); - if (err) - return err; + /* Bypass iommu_dma_prepare_msi if msi_iova is preset */ + if (iova == PHYS_ADDR_MAX) { + err = iommu_dma_prepare_msi(info->desc, + its->get_msi_base(its_dev)); + if (err) + return err; + } for (i = 0; i < nr_irqs; i++) { err = its_irq_gic_domain_alloc(domain, virq + i, hwirq + i); -- 2.43.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen @ 2024-11-09 5:48 ` Nicolin Chen 2024-11-09 9:43 ` kernel test robot 2024-11-09 5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen ` (4 subsequent siblings) 7 siblings, 1 reply; 25+ messages in thread From: Nicolin Chen @ 2024-11-09 5:48 UTC (permalink / raw) To: maz, tglx, bhelgaas, alex.williamson Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm msi_setup_msi_descs and msix_setup_msi_descs are the two callers of genirq helper msi_domain_insert_msi_desc that is now ready for a preset msi_iova. So, do the same in these two callers. Note MSIx supports multiple entries, so use struct msix_entry to pass msi_iova in. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- include/linux/pci.h | 1 + drivers/pci/msi/msi.c | 18 ++++++++++++++---- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index 573b4c4c2be6..68ebb9d42f7f 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1652,6 +1652,7 @@ int pci_set_vga_state(struct pci_dev *pdev, bool decode, struct msix_entry { u32 vector; /* Kernel uses to write allocated vector */ u16 entry; /* Driver uses to specify entry, OS writes */ + u64 iova; /* Kernel uses to override doorbell address */ }; #ifdef CONFIG_PCI_MSI diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index 3a45879d85db..95caa81d3421 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -282,7 +282,7 @@ static void pci_msi_set_enable(struct pci_dev *dev, int enable) pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control); } -static int msi_setup_msi_desc(struct pci_dev *dev, int nvec, +static int msi_setup_msi_desc(struct pci_dev *dev, int nvec, dma_addr_t iova, struct irq_affinity_desc *masks) { struct msi_desc desc; @@ -312,6 +312,10 @@ static int msi_setup_msi_desc(struct pci_dev *dev, int nvec, else desc.pci.mask_pos = dev->msi_cap + PCI_MSI_MASK_32; +#ifdef CONFIG_IRQ_MSI_IOMMU + desc.msi_iova = iova; +#endif + /* Save the initial mask status */ if (desc.pci.msi_attrib.can_mask) pci_read_config_dword(dev, desc.pci.mask_pos, &desc.pci.msi_mask); @@ -349,7 +353,7 @@ static int msi_verify_entries(struct pci_dev *dev) * which could have been allocated. */ static int msi_capability_init(struct pci_dev *dev, int nvec, - struct irq_affinity *affd) + struct irq_affinity *affd, dma_addr_t iova) { struct irq_affinity_desc *masks = NULL; struct msi_desc *entry, desc; @@ -370,7 +374,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, masks = irq_create_affinity_masks(nvec, affd); msi_lock_descs(&dev->dev); - ret = msi_setup_msi_desc(dev, nvec, masks); + ret = msi_setup_msi_desc(dev, nvec, iova, masks); if (ret) goto fail; @@ -456,7 +460,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, return -ENOSPC; } - rc = msi_capability_init(dev, nvec, affd); + rc = msi_capability_init(dev, nvec, affd, PHYS_ADDR_MAX); if (rc == 0) return nvec; @@ -625,6 +629,12 @@ static int msix_setup_msi_descs(struct pci_dev *dev, struct msix_entry *entries, memset(&desc, 0, sizeof(desc)); for (i = 0, curmsk = masks; i < nvec; i++, curmsk++) { +#ifdef CONFIG_IRQ_MSI_IOMMU + if (entries && entries[i].iova) + desc.msi_iova = (dma_addr_t)entries[i].iova; + else + desc.msi_iova = PHYS_ADDR_MAX; +#endif desc.msi_index = entries ? entries[i].entry : i; desc.affinity = masks ? curmsk : NULL; desc.pci.msi_attrib.is_virtual = desc.msi_index >= vec_count; -- 2.43.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc 2024-11-09 5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen @ 2024-11-09 9:43 ` kernel test robot 0 siblings, 0 replies; 25+ messages in thread From: kernel test robot @ 2024-11-09 9:43 UTC (permalink / raw) To: Nicolin Chen; +Cc: oe-kbuild-all Hi Nicolin, [This is a private test report for your RFC patch.] kernel test robot noticed the following build warnings: [auto build test WARNING on pci/next] [also build test WARNING on pci/for-linus awilliam-vfio/next tip/irq/core linus/master awilliam-vfio/for-linus v6.12-rc6 next-20241108] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Nicolin-Chen/genirq-msi-Allow-preset-IOVA-in-struct-msi_desc-for-MSI-doorbell-address/20241109-135311 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/32fe78c5ec4a08bf0414d80ce9ed9121270b9cb0.1731130093.git.nicolinc%40nvidia.com patch subject: [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc config: x86_64-rhel-8.3 (https://download.01.org/0day-ci/archive/20241109/202411091750.nGnflLUO-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241109/202411091750.nGnflLUO-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411091750.nGnflLUO-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/pci/msi/msi.c:357: warning: Function parameter or struct member 'iova' not described in 'msi_capability_init' vim +357 drivers/pci/msi/msi.c f144d1496b47e74 drivers/pci/msi.c Benjamin Herrenschmidt 2014-10-03 342 ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 343 /** ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 344 * msi_capability_init - configure device's MSI capability structure ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 345 * @dev: pointer to the pci_dev data structure of MSI device function 1c8d7b0a562da06 drivers/pci/msi.c Matthew Wilcox 2009-03-17 346 * @nvec: number of interrupts to allocate f6b6aefee70aa52 drivers/pci/msi.c Bjorn Helgaas 2019-05-30 347 * @affd: description of automatic IRQ affinity assignments (may be %NULL) ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 348 * 1c8d7b0a562da06 drivers/pci/msi.c Matthew Wilcox 2009-03-17 349 * Setup the MSI capability structure of the device with the requested 1c8d7b0a562da06 drivers/pci/msi.c Matthew Wilcox 2009-03-17 350 * number of interrupts. A return value of zero indicates the successful f6b6aefee70aa52 drivers/pci/msi.c Bjorn Helgaas 2019-05-30 351 * setup of an entry with the new MSI IRQ. A negative return value indicates 1c8d7b0a562da06 drivers/pci/msi.c Matthew Wilcox 2009-03-17 352 * an error, and a positive return value indicates the number of interrupts 1c8d7b0a562da06 drivers/pci/msi.c Matthew Wilcox 2009-03-17 353 * which could have been allocated. 1c8d7b0a562da06 drivers/pci/msi.c Matthew Wilcox 2009-03-17 354 */ 61e1c5905290efe drivers/pci/msi.c Christoph Hellwig 2016-11-08 355 static int msi_capability_init(struct pci_dev *dev, int nvec, 77c6ca404bd701e drivers/pci/msi/msi.c Nicolin Chen 2024-11-08 356 struct irq_affinity *affd, dma_addr_t iova) ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 @357 { 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 358 struct irq_affinity_desc *masks = NULL; 9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh 2024-06-24 359 struct msi_desc *entry, desc; f465136d7287538 drivers/pci/msi.c Gavin Shan 2013-04-04 360 int ret; ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 361 d2a463b297415ca drivers/pci/msi/msi.c Thomas Gleixner 2022-11-11 362 /* Reject multi-MSI early on irq domain enabled architectures */ d2a463b297415ca drivers/pci/msi/msi.c Thomas Gleixner 2022-11-11 363 if (nvec > 1 && !pci_msi_domain_supports(dev, MSI_FLAG_MULTI_PCI_MSI, ALLOW_LEGACY)) d2a463b297415ca drivers/pci/msi/msi.c Thomas Gleixner 2022-11-11 364 return 1; d2a463b297415ca drivers/pci/msi/msi.c Thomas Gleixner 2022-11-11 365 c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 366 /* c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 367 * Disable MSI during setup in the hardware, but mark it enabled c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 368 * so that setup code can evaluate it. c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 369 */ c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 370 pci_msi_set_enable(dev, 0); c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 371 dev->msi_enabled = 1; 110828c9cdce6e8 drivers/pci/msi.c Matthew Wilcox 2009-06-16 372 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 373 if (affd) 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 374 masks = irq_create_affinity_masks(nvec, affd); 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 375 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 376 msi_lock_descs(&dev->dev); 77c6ca404bd701e drivers/pci/msi/msi.c Nicolin Chen 2024-11-08 377 ret = msi_setup_msi_desc(dev, nvec, iova, masks); 71020a3c0dff4a0 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 378 if (ret) c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 379 goto fail; 1ce03373a7f4b5f drivers/pci/msi.c Eric W. Biederman 2006-10-04 380 f6b6aefee70aa52 drivers/pci/msi.c Bjorn Helgaas 2019-05-30 381 /* All MSIs are unmasked by default; mask them all */ ae24e28fef14687 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 382 entry = msi_first_desc(&dev->dev, MSI_DESC_ALL); 446a98b19fd6da9 drivers/pci/msi.c Thomas Gleixner 2021-07-29 383 pci_msi_mask(entry, msi_multi_mask(entry)); 9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh 2024-06-24 384 /* 9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh 2024-06-24 385 * Copy the MSI descriptor for the error path because 9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh 2024-06-24 386 * pci_msi_setup_msi_irqs() will free it for the hierarchical 9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh 2024-06-24 387 * interrupt domain case. 9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh 2024-06-24 388 */ 9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh 2024-06-24 389 memcpy(&desc, entry, sizeof(desc)); f2440d9acbe866b drivers/pci/msi.c Matthew Wilcox 2009-03-17 390 ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 391 /* Configure MSI capability structure */ 8e047adae969701 drivers/pci/msi.c Jiang Liu 2014-11-15 392 ret = pci_msi_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI); 8eb5ce3f78a5e5d drivers/pci/msi.c Thomas Gleixner 2021-07-29 393 if (ret) 8eb5ce3f78a5e5d drivers/pci/msi.c Thomas Gleixner 2021-07-29 394 goto err; f7feaca77d6ad6b drivers/pci/msi.c Eric W. Biederman 2007-01-28 395 f144d1496b47e74 drivers/pci/msi.c Benjamin Herrenschmidt 2014-10-03 396 ret = msi_verify_entries(dev); 8eb5ce3f78a5e5d drivers/pci/msi.c Thomas Gleixner 2021-07-29 397 if (ret) 8eb5ce3f78a5e5d drivers/pci/msi.c Thomas Gleixner 2021-07-29 398 goto err; f144d1496b47e74 drivers/pci/msi.c Benjamin Herrenschmidt 2014-10-03 399 ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 400 /* Set MSI enabled bits */ ba698ad4b7e466c drivers/pci/msi.c David Miller 2007-10-25 401 pci_intx_for_msi(dev, 0); 61b64abd399fa4b drivers/pci/msi.c Michael S. Tsirkin 2015-05-07 402 pci_msi_set_enable(dev, 1); ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 403 5f2269916b0e509 drivers/pci/msi.c Jiang Liu 2015-07-30 404 pcibios_free_irq(dev); 7fe3730de729b75 drivers/pci/msi.c Michael Ellerman 2007-04-18 405 dev->irq = entry->irq; 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 406 goto unlock; 8eb5ce3f78a5e5d drivers/pci/msi.c Thomas Gleixner 2021-07-29 407 8eb5ce3f78a5e5d drivers/pci/msi.c Thomas Gleixner 2021-07-29 408 err: 9eee5330656bf92 drivers/pci/msi/msi.c Mostafa Saleh 2024-06-24 409 pci_msi_unmask(&desc, msi_multi_mask(&desc)); b12d0bec385b7a5 drivers/pci/msi/msi.c Ahmed S. Darwish 2022-11-11 410 pci_free_msi_irqs(dev); c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 411 fail: c7ecb95ca6a80b2 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-10 412 dev->msi_enabled = 0; 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 413 unlock: 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 414 msi_unlock_descs(&dev->dev); 5512c5eaf533a98 drivers/pci/msi/msi.c Thomas Gleixner 2021-12-06 415 kfree(masks); 8eb5ce3f78a5e5d drivers/pci/msi.c Thomas Gleixner 2021-07-29 416 return ret; ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 417 } ^1da177e4c3f415 drivers/pci/msi.c Linus Torvalds 2005-04-16 418 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen ` (2 preceding siblings ...) 2024-11-09 5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen @ 2024-11-09 5:48 ` Nicolin Chen 2024-11-11 9:30 ` Andy Shevchenko 2024-11-09 5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen ` (3 subsequent siblings) 7 siblings, 1 reply; 25+ messages in thread From: Nicolin Chen @ 2024-11-09 5:48 UTC (permalink / raw) To: maz, tglx, bhelgaas, alex.williamson Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm The previous patch passes in the msi_iova to msi_capability_init, so this allows its caller to do the same. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/pci/msi/msi.h | 3 ++- drivers/pci/msi/api.c | 6 ++++-- drivers/pci/msi/msi.c | 4 ++-- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/pci/msi/msi.h b/drivers/pci/msi/msi.h index ee53cf079f4e..8009d69bf9a5 100644 --- a/drivers/pci/msi/msi.h +++ b/drivers/pci/msi/msi.h @@ -93,7 +93,8 @@ extern int pci_msi_enable; void pci_msi_shutdown(struct pci_dev *dev); void pci_msix_shutdown(struct pci_dev *dev); void pci_free_msi_irqs(struct pci_dev *dev); -int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, struct irq_affinity *affd); +int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, + struct irq_affinity *affd, dma_addr_t iova); int __pci_enable_msix_range(struct pci_dev *dev, struct msix_entry *entries, int minvec, int maxvec, struct irq_affinity *affd, int flags); void __pci_restore_msi_state(struct pci_dev *dev); diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c index b956ce591f96..99ade7f69cd4 100644 --- a/drivers/pci/msi/api.c +++ b/drivers/pci/msi/api.c @@ -29,7 +29,8 @@ */ int pci_enable_msi(struct pci_dev *dev) { - int rc = __pci_enable_msi_range(dev, 1, 1, NULL); + int rc = __pci_enable_msi_range(dev, 1, 1, NULL, + PHYS_ADDR_MAX); if (rc < 0) return rc; return 0; @@ -274,7 +275,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } if (flags & PCI_IRQ_MSI) { - nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd); + nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, + affd, PHYS_ADDR_MAX); if (nvecs > 0) return nvecs; } diff --git a/drivers/pci/msi/msi.c b/drivers/pci/msi/msi.c index 95caa81d3421..25da0435c674 100644 --- a/drivers/pci/msi/msi.c +++ b/drivers/pci/msi/msi.c @@ -417,7 +417,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec, } int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, - struct irq_affinity *affd) + struct irq_affinity *affd, dma_addr_t iova) { int nvec; int rc; @@ -460,7 +460,7 @@ int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, return -ENOSPC; } - rc = msi_capability_init(dev, nvec, affd, PHYS_ADDR_MAX); + rc = msi_capability_init(dev, nvec, affd, iova); if (rc == 0) return nvec; -- 2.43.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova 2024-11-09 5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen @ 2024-11-11 9:30 ` Andy Shevchenko 0 siblings, 0 replies; 25+ messages in thread From: Andy Shevchenko @ 2024-11-11 9:30 UTC (permalink / raw) To: Nicolin Chen Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Fri, Nov 08, 2024 at 09:48:49PM -0800, Nicolin Chen wrote: > The previous patch passes in the msi_iova to msi_capability_init, so this msi_capability_init() > allows its caller to do the same. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen ` (3 preceding siblings ...) 2024-11-09 5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen @ 2024-11-09 5:48 ` Nicolin Chen 2024-11-11 9:33 ` Andy Shevchenko 2024-11-09 5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen ` (2 subsequent siblings) 7 siblings, 1 reply; 25+ messages in thread From: Nicolin Chen @ 2024-11-09 5:48 UTC (permalink / raw) To: maz, tglx, bhelgaas, alex.williamson Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm Extract a common function from the existing callers, to prepare for a new helper that provides an array of msi_iovas. Also, extract the msi_iova(s) from the array and pass in properly down to __pci_enable_msi/msix_range(). Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- drivers/pci/msi/api.c | 113 ++++++++++++++++++++++++++---------------- 1 file changed, 70 insertions(+), 43 deletions(-) diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c index 99ade7f69cd4..dff3d7350b38 100644 --- a/drivers/pci/msi/api.c +++ b/drivers/pci/msi/api.c @@ -204,6 +204,72 @@ void pci_disable_msix(struct pci_dev *dev) } EXPORT_SYMBOL(pci_disable_msix); +static int __pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, + unsigned int max_vecs, unsigned int flags, + struct irq_affinity *affd, + dma_addr_t *msi_iovas) +{ + struct irq_affinity msi_default_affd = {0}; + int nvecs = -ENOSPC; + + if (flags & PCI_IRQ_AFFINITY) { + if (!affd) + affd = &msi_default_affd; + } else { + if (WARN_ON(affd)) + affd = NULL; + } + + if (flags & PCI_IRQ_MSIX) { + struct msix_entry *entries = NULL; + + if (msi_iovas) { + int count = max_vecs - min_vecs + 1; + int i; + + entries = kcalloc(max_vecs - min_vecs + 1, + sizeof(*entries), GFP_KERNEL); + if (!entries) + return -ENOMEM; + for (i = 0; i < count; i++) { + entries[i].entry = i; + entries[i].iova = msi_iovas[i]; + } + } + + nvecs = __pci_enable_msix_range(dev, entries, min_vecs, + max_vecs, affd, flags); + kfree(entries); + if (nvecs > 0) + return nvecs; + } + + if (flags & PCI_IRQ_MSI) { + nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, affd, + msi_iovas ? *msi_iovas : + PHYS_ADDR_MAX); + if (nvecs > 0) + return nvecs; + } + + /* use INTx IRQ if allowed */ + if (flags & PCI_IRQ_INTX) { + if (min_vecs == 1 && dev->irq) { + /* + * Invoke the affinity spreading logic to ensure that + * the device driver can adjust queue configuration + * for the single interrupt case. + */ + if (affd) + irq_create_affinity_masks(1, affd); + pci_intx(dev, 1); + return 1; + } + } + + return nvecs; +} + /** * pci_alloc_irq_vectors() - Allocate multiple device interrupt vectors * @dev: the PCI device to operate on @@ -235,8 +301,8 @@ EXPORT_SYMBOL(pci_disable_msix); int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, unsigned int max_vecs, unsigned int flags) { - return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, - flags, NULL); + return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs, + flags, NULL, NULL); } EXPORT_SYMBOL(pci_alloc_irq_vectors); @@ -256,47 +322,8 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, unsigned int max_vecs, unsigned int flags, struct irq_affinity *affd) { - struct irq_affinity msi_default_affd = {0}; - int nvecs = -ENOSPC; - - if (flags & PCI_IRQ_AFFINITY) { - if (!affd) - affd = &msi_default_affd; - } else { - if (WARN_ON(affd)) - affd = NULL; - } - - if (flags & PCI_IRQ_MSIX) { - nvecs = __pci_enable_msix_range(dev, NULL, min_vecs, max_vecs, - affd, flags); - if (nvecs > 0) - return nvecs; - } - - if (flags & PCI_IRQ_MSI) { - nvecs = __pci_enable_msi_range(dev, min_vecs, max_vecs, - affd, PHYS_ADDR_MAX); - if (nvecs > 0) - return nvecs; - } - - /* use INTx IRQ if allowed */ - if (flags & PCI_IRQ_INTX) { - if (min_vecs == 1 && dev->irq) { - /* - * Invoke the affinity spreading logic to ensure that - * the device driver can adjust queue configuration - * for the single interrupt case. - */ - if (affd) - irq_create_affinity_masks(1, affd); - pci_intx(dev, 1); - return 1; - } - } - - return nvecs; + return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs, + flags, affd, NULL); } EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity); -- 2.43.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function 2024-11-09 5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen @ 2024-11-11 9:33 ` Andy Shevchenko 0 siblings, 0 replies; 25+ messages in thread From: Andy Shevchenko @ 2024-11-11 9:33 UTC (permalink / raw) To: Nicolin Chen Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Fri, Nov 08, 2024 at 09:48:50PM -0800, Nicolin Chen wrote: > Extract a common function from the existing callers, to prepare for a new > helper that provides an array of msi_iovas. Also, extract the msi_iova(s) > from the array and pass in properly down to __pci_enable_msi/msix_range(). ... > + return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs, > + flags, NULL, NULL); Even if you so strict about 80 character limit, this whole line is only 83 characters, which is okay to have. ... > + return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs, > + flags, affd, NULL); In the similar way, and other lines as well. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen ` (4 preceding siblings ...) 2024-11-09 5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen @ 2024-11-09 5:48 ` Nicolin Chen 2024-11-09 21:52 ` kernel test robot ` (2 more replies) 2024-11-09 5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen 2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy 7 siblings, 3 replies; 25+ messages in thread From: Nicolin Chen @ 2024-11-09 5:48 UTC (permalink / raw) To: maz, tglx, bhelgaas, alex.williamson Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas, which is a list of preset IOVAs for MSI doorbell addresses. Add a helper that would pass in a list. A following patch will call this to forward msi_iovas from user space. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- include/linux/pci.h | 17 +++++++++++++++++ drivers/pci/msi/api.c | 21 +++++++++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h index 68ebb9d42f7f..6423bee3b207 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1678,6 +1678,9 @@ int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, unsigned int max_vecs, unsigned int flags, struct irq_affinity *affd); +int pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs, + unsigned int max_vecs, unsigned int flags, + dma_addr_t *msi_iovas); bool pci_msix_can_alloc_dyn(struct pci_dev *dev); struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index, @@ -1714,6 +1717,13 @@ pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, return -ENOSPC; } static inline int +pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs, + unsigned int max_vecs, unsigned int flags, + dma_addr_t *msi_iovas) + + return -ENOSPC; /* No support if !CONFIG_PCI_MSI */ +} +static inline int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, unsigned int max_vecs, unsigned int flags) { @@ -2068,6 +2078,13 @@ pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, return -ENOSPC; } static inline int +pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs, + unsigned int max_vecs, unsigned int flags, + dma_addr_t *msi_iovas) +{ + return -ENOSPC; +} +static inline int pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, unsigned int max_vecs, unsigned int flags) { diff --git a/drivers/pci/msi/api.c b/drivers/pci/msi/api.c index dff3d7350b38..4e90ef8f571c 100644 --- a/drivers/pci/msi/api.c +++ b/drivers/pci/msi/api.c @@ -327,6 +327,27 @@ int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, } EXPORT_SYMBOL(pci_alloc_irq_vectors_affinity); +/** + * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt + * vectors with preset msi_iovas + * @dev: the PCI device to operate on + * @min_vecs: minimum required number of vectors (must be >= 1) + * @max_vecs: maximum desired number of vectors + * @flags: allocation flags, as in pci_alloc_irq_vectors() + * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs] + * + * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter. + * Check that function docs, and &struct irq_affinity, for more details. + */ +int pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs, + unsigned int max_vecs, unsigned int flags, + dma_addr_t *msi_iovas) +{ + return __pci_alloc_irq_vectors(dev, min_vecs, max_vecs, + flags, NULL, msi_iovas); +} +EXPORT_SYMBOL(pci_alloc_irq_vectors_iovas); + /** * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector * @dev: the PCI device to operate on -- 2.43.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper 2024-11-09 5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen @ 2024-11-09 21:52 ` kernel test robot 2024-11-10 9:51 ` kernel test robot 2024-11-11 9:34 ` Andy Shevchenko 2 siblings, 0 replies; 25+ messages in thread From: kernel test robot @ 2024-11-09 21:52 UTC (permalink / raw) To: Nicolin Chen; +Cc: oe-kbuild-all Hi Nicolin, [This is a private test report for your RFC patch.] kernel test robot noticed the following build errors: [auto build test ERROR on pci/next] [also build test ERROR on pci/for-linus awilliam-vfio/next tip/irq/core linus/master awilliam-vfio/for-linus v6.12-rc6 next-20241108] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Nicolin-Chen/genirq-msi-Allow-preset-IOVA-in-struct-msi_desc-for-MSI-doorbell-address/20241109-135311 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/e9399426b08b16efbdf7224c0122f5bf80f6d0ea.1731130093.git.nicolinc%40nvidia.com patch subject: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper config: i386-randconfig-141-20241110 (https://download.01.org/0day-ci/archive/20241110/202411100427.zzpAjWnY-lkp@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241110/202411100427.zzpAjWnY-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411100427.zzpAjWnY-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from drivers/net/ethernet/intel/e1000/e1000_main.c:4: In file included from drivers/net/ethernet/intel/e1000/e1000.h:13: In file included from include/linux/mm.h:2213: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from drivers/net/ethernet/intel/e1000/e1000_main.c:4: In file included from drivers/net/ethernet/intel/e1000/e1000.h:16: >> include/linux/pci.h:1719:2: error: expected function body after function declarator 1719 | return -ENOSPC; /* No support if !CONFIG_PCI_MSI */ | ^ >> include/linux/pci.h:1720:1: error: extraneous closing brace ('}') 1720 | } | ^ drivers/net/ethernet/intel/e1000/e1000_main.c:999:45: warning: shift count >= width of type [-Wshift-count-overflow] 999 | !dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))) { | ^~~~~~~~~~~~~~~~ include/linux/dma-mapping.h:77:54: note: expanded from macro 'DMA_BIT_MASK' 77 | #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) | ^ ~~~ 2 warnings and 2 errors generated. -- In file included from drivers/net/ethernet/intel/e1000/e1000_hw.c:9: In file included from drivers/net/ethernet/intel/e1000/e1000.h:13: In file included from include/linux/mm.h:2213: include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ In file included from drivers/net/ethernet/intel/e1000/e1000_hw.c:9: In file included from drivers/net/ethernet/intel/e1000/e1000.h:16: >> include/linux/pci.h:1719:2: error: expected function body after function declarator 1719 | return -ENOSPC; /* No support if !CONFIG_PCI_MSI */ | ^ >> include/linux/pci.h:1720:1: error: extraneous closing brace ('}') 1720 | } | ^ 1 warning and 2 errors generated. vim +1719 include/linux/pci.h 1704 1705 static inline int 1706 pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, 1707 unsigned int max_vecs, unsigned int flags, 1708 struct irq_affinity *aff_desc) 1709 { 1710 if ((flags & PCI_IRQ_INTX) && min_vecs == 1 && dev->irq) 1711 return 1; 1712 return -ENOSPC; 1713 } 1714 static inline int 1715 pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs, 1716 unsigned int max_vecs, unsigned int flags, 1717 dma_addr_t *msi_iovas) 1718 > 1719 return -ENOSPC; /* No support if !CONFIG_PCI_MSI */ > 1720 } 1721 static inline int 1722 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, 1723 unsigned int max_vecs, unsigned int flags) 1724 { 1725 return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, 1726 flags, NULL); 1727 } 1728 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper 2024-11-09 5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen 2024-11-09 21:52 ` kernel test robot @ 2024-11-10 9:51 ` kernel test robot 2024-11-11 9:34 ` Andy Shevchenko 2 siblings, 0 replies; 25+ messages in thread From: kernel test robot @ 2024-11-10 9:51 UTC (permalink / raw) To: Nicolin Chen; +Cc: oe-kbuild-all Hi Nicolin, [This is a private test report for your RFC patch.] kernel test robot noticed the following build errors: [auto build test ERROR on pci/next] [also build test ERROR on pci/for-linus awilliam-vfio/next tip/irq/core linus/master awilliam-vfio/for-linus v6.12-rc6 next-20241108] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Nicolin-Chen/genirq-msi-Allow-preset-IOVA-in-struct-msi_desc-for-MSI-doorbell-address/20241109-135311 base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next patch link: https://lore.kernel.org/r/e9399426b08b16efbdf7224c0122f5bf80f6d0ea.1731130093.git.nicolinc%40nvidia.com patch subject: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper config: alpha-allnoconfig (https://download.01.org/0day-ci/archive/20241110/202411101710.Co5EYmfa-lkp@intel.com/config) compiler: alpha-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241110/202411101710.Co5EYmfa-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202411101710.Co5EYmfa-lkp@intel.com/ All error/warnings (new ones prefixed by >>): In file included from block/blk-mq-pci.c:8: include/linux/pci.h: In function 'pci_alloc_irq_vectors_iovas': >> include/linux/pci.h:1719:9: error: expected declaration specifiers before 'return' 1719 | return -ENOSPC; /* No support if !CONFIG_PCI_MSI */ | ^~~~~~ >> include/linux/pci.h:1720:1: error: expected declaration specifiers before '}' token 1720 | } | ^ >> include/linux/pci.h:1724:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1724 | { | ^ include/linux/pci.h:1730:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1730 | { return false; } | ^ include/linux/pci.h:1733:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1733 | { | ^ include/linux/pci.h:1740:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1740 | { | ^ include/linux/pci.h:1744:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1744 | { | ^ include/linux/pci.h:1748:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1748 | { | ^ include/linux/pci.h:1755:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1755 | { | ^ include/linux/pci.h:1782:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1782 | { | ^ include/linux/pci.h:1805:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1805 | { | ^ include/linux/pci.h:1851:44: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1851 | static inline bool pci_aer_available(void) { return false; } | ^ include/linux/pci.h:1862:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1862 | { return -EINVAL; } | ^ include/linux/pci.h:1863:57: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1863 | static inline void pci_disable_ptm(struct pci_dev *dev) { } | ^ include/linux/pci.h:1865:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1865 | { return false; } | ^ In file included from include/linux/cpumask.h:10, from include/linux/smp.h:13, from include/linux/lockdep.h:14, from include/linux/mutex.h:17, from include/linux/kernfs.h:11, from include/linux/sysfs.h:16, from include/linux/kobject.h:20, from block/blk-mq-pci.c:5: >> include/linux/cleanup.h:244:15: error: storage class specified for parameter 'class_pci_dev_t' 244 | typedef _type class_##_name##_t; \ | ^~~~~~ include/linux/cleanup.h:291:9: note: in expansion of macro 'DEFINE_CLASS' 291 | DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \ | ^~~~~~~~~~~~ include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD' 1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T)) | ^~~~~~~~~~~~ >> include/linux/cleanup.h:246:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 246 | { _type _T = *p; _exit; } \ | ^ include/linux/cleanup.h:291:9: note: in expansion of macro 'DEFINE_CLASS' 291 | DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \ | ^~~~~~~~~~~~ include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD' 1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T)) | ^~~~~~~~~~~~ include/linux/cleanup.h:248:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 248 | { _type t = _init; return t; } | ^ include/linux/cleanup.h:291:9: note: in expansion of macro 'DEFINE_CLASS' 291 | DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \ | ^~~~~~~~~~~~ include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD' 1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T)) | ^~~~~~~~~~~~ >> include/linux/cleanup.h:291:85: error: expected declaration specifiers before ';' token 291 | DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type _T); \ | ^ include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD' 1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T)) | ^~~~~~~~~~~~ >> include/linux/cleanup.h:292:55: error: expected declaration specifiers or '...' before 'class_pci_dev_t' 292 | static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ | ^~~~~~ include/linux/pci.h:1875:1: note: in expansion of macro 'DEFINE_GUARD' 1875 | DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T)) | ^~~~~~~~~~~~ >> include/linux/pci.h:1883:12: error: storage class specified for parameter 'pci_domains_supported' 1883 | extern int pci_domains_supported; | ^~~~~~~~~~~~~~~~~~~~~ >> include/linux/pci.h:1911:15: error: storage class specified for parameter 'arch_set_vga_state_t' 1911 | typedef int (*arch_set_vga_state_t)(struct pci_dev *pdev, bool decode, | ^~~~~~~~~~~~~~~~~~~~ >> include/linux/pci.h:1913:33: error: expected declaration specifiers or '...' before 'arch_set_vga_state_t' 1913 | void pci_register_set_vga_state(arch_set_vga_state_t func); | ^~~~~~~~~~~~~~~~~~~~ include/linux/pci.h:1917:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1917 | { | ^ include/linux/pci.h:1924:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1924 | { | ^ include/linux/pci.h:1931:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1931 | { | ^ include/linux/pci.h:1938:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 1938 | { | ^ In file included from arch/alpha/include/asm/pci.h:8, from include/linux/pci.h:2102: >> include/linux/dma-mapping.h:86:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 86 | { | ^ include/linux/dma-mapping.h:90:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 90 | { | ^ include/linux/dma-mapping.h:96:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 96 | { | ^ include/linux/dma-mapping.h:296:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 296 | { | ^ include/linux/dma-mapping.h:303:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 303 | { | ^ include/linux/dma-mapping.h:310:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 310 | { | ^ include/linux/dma-mapping.h:317:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 317 | { | ^ include/linux/dma-mapping.h:324:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 324 | { | ^ include/linux/dma-mapping.h:330:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 330 | { | ^ include/linux/dma-mapping.h:369:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 369 | { | ^ include/linux/dma-mapping.h:376:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 376 | { | ^ include/linux/dma-mapping.h:382:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 382 | { | ^ include/linux/dma-mapping.h:394:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 394 | { | ^ include/linux/dma-mapping.h:401:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 401 | { | ^ include/linux/dma-mapping.h:408:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 408 | { | ^ include/linux/dma-mapping.h:425:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 425 | { | ^ include/linux/dma-mapping.h:443:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 443 | { | ^ include/linux/dma-mapping.h:460:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 460 | { | ^ include/linux/dma-mapping.h:477:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 477 | { | ^ include/linux/dma-mapping.h:484:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 484 | { | ^ include/linux/dma-mapping.h:490:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 490 | { | ^ include/linux/dma-mapping.h:503:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 503 | { | ^ include/linux/dma-mapping.h:515:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 515 | { | ^ include/linux/dma-mapping.h:521:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 521 | { | ^ include/linux/dma-mapping.h:528:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 528 | { | ^ include/linux/dma-mapping.h:535:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 535 | { | ^ include/linux/dma-mapping.h:554:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 554 | { | ^ include/linux/dma-mapping.h:561:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 561 | { | ^ include/linux/dma-mapping.h:568:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 568 | { | ^ include/linux/dma-mapping.h:576:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 576 | { | ^ include/linux/dma-mapping.h:584:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 584 | { | ^ include/linux/dma-mapping.h:594:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 594 | { | ^ include/linux/dma-mapping.h:601:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 601 | { | ^ include/linux/dma-mapping.h:612:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 612 | { | ^ include/linux/dma-mapping.h:621:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 621 | { | ^ >> arch/alpha/include/asm/pci.h:16:1: warning: empty declaration 16 | struct pci_iommu_arena; | ^~~~~~ arch/alpha/include/asm/pci.h:17:1: warning: empty declaration 17 | struct page; | ^~~~~~ arch/alpha/include/asm/pci.h:21:1: warning: empty declaration 21 | struct pci_controller { | ^~~~~~ >> arch/alpha/include/asm/pci.h:62:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 62 | { | ^ >> arch/alpha/include/asm/pci.h:78:24: error: storage class specified for parameter 'isa_bridge' 78 | extern struct pci_dev *isa_bridge; | ^~~~~~~~~~ >> arch/alpha/include/asm/pci.h:80:12: error: storage class specified for parameter 'pci_legacy_read' 80 | extern int pci_legacy_read(struct pci_bus *bus, loff_t port, u32 *val, | ^~~~~~~~~~~~~~~ >> arch/alpha/include/asm/pci.h:82:12: error: storage class specified for parameter 'pci_legacy_write' 82 | extern int pci_legacy_write(struct pci_bus *bus, loff_t port, u32 val, | ^~~~~~~~~~~~~~~~ >> arch/alpha/include/asm/pci.h:84:12: error: storage class specified for parameter 'pci_mmap_legacy_page_range' 84 | extern int pci_mmap_legacy_page_range(struct pci_bus *bus, | ^~~~~~~~~~~~~~~~~~~~~~~~~~ >> arch/alpha/include/asm/pci.h:87:13: error: storage class specified for parameter 'pci_adjust_legacy_attr' 87 | extern void pci_adjust_legacy_attr(struct pci_bus *bus, | ^~~~~~~~~~~~~~~~~~~~~~ include/linux/pci.h:2160:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2160 | { | ^ include/linux/pci.h:2165:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2165 | { | ^ include/linux/pci.h:2170:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2170 | { | ^ >> include/linux/pci.h:2185:1: warning: empty declaration 2185 | struct pci_fixup { | ^~~~~~ include/linux/pci.h:2197:1: warning: empty declaration 2197 | enum pci_fixup_pass { | ^~~~ >> include/linux/pci.h:2332:12: error: storage class specified for parameter 'pci_pci_problems' 2332 | extern int pci_pci_problems; | ^~~~~~~~~~~~~~~~ include/linux/pci.h:2341:22: error: storage class specified for parameter 'pci_cardbus_io_size' 2341 | extern unsigned long pci_cardbus_io_size; | ^~~~~~~~~~~~~~~~~~~ include/linux/pci.h:2342:22: error: storage class specified for parameter 'pci_cardbus_mem_size' 2342 | extern unsigned long pci_cardbus_mem_size; | ^~~~~~~~~~~~~~~~~~~~ include/linux/pci.h:2343:11: error: storage class specified for parameter 'pci_dfl_cache_line_size' 2343 | extern u8 pci_dfl_cache_line_size; | ^~~~~~~~~~~~~~~~~~~~~~~ include/linux/pci.h:2344:11: error: storage class specified for parameter 'pci_cache_line_size' 2344 | extern u8 pci_cache_line_size; | ^~~~~~~~~~~~~~~~~~~ include/linux/pci.h:2363:12: error: storage class specified for parameter 'pci_create_resource_files' 2363 | extern int pci_create_resource_files(struct pci_dev *dev); | ^~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/pci.h:2364:13: error: storage class specified for parameter 'pci_remove_resource_files' 2364 | extern void pci_remove_resource_files(struct pci_dev *dev); | ^~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/pci.h:2371:47: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2371 | static inline void pci_mmcfg_early_init(void) { } | ^ include/linux/pci.h:2372:46: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2372 | static inline void pci_mmcfg_late_init(void) { } | ^ include/linux/pci.h:2405:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2405 | { | ^ include/linux/pci.h:2409:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2409 | { | ^ include/linux/pci.h:2414:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2414 | { | ^ include/linux/pci.h:2420:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2420 | { | ^ include/linux/pci.h:2425:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2425 | { return -ENODEV; } | ^ include/linux/pci.h:2429:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2429 | { | ^ include/linux/pci.h:2433:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2433 | { | ^ include/linux/pci.h:2437:50: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2437 | int id) { } | ^ include/linux/pci.h:2438:59: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2438 | static inline void pci_disable_sriov(struct pci_dev *dev) { } | ^ include/linux/pci.h:2439:51: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2439 | static inline int pci_num_vf(struct pci_dev *dev) { return 0; } | ^ include/linux/pci.h:2441:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2441 | { return 0; } | ^ include/linux/pci.h:2443:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2443 | { return 0; } | ^ include/linux/pci.h:2445:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2445 | { return 0; } | ^ include/linux/pci.h:2448:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2448 | { return 0; } | ^ include/linux/pci.h:2449:78: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2449 | static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { } | ^ include/linux/pci.h:2469:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2469 | { | ^ include/linux/pci.h:2480:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2480 | { | ^ include/linux/pci.h:2489:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2489 | { | ^ include/linux/pci.h:2498:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2498 | { | ^ include/linux/pci.h:2510:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2510 | { | ^ include/linux/pci.h:2522:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2522 | { | ^ include/linux/pci.h:2608:52: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2608 | pci_host_bridge_of_msi_domain(struct pci_bus *bus) { return NULL; } | ^ include/linux/pci.h:2609:64: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2609 | static inline bool pci_host_of_has_msi_map(struct device *dev) { return false; } | ^ include/linux/pci.h:2614:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2614 | { | ^ include/linux/pci.h:2619:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token 2619 | { .. vim +/return +1719 include/linux/pci.h 1704 1705 static inline int 1706 pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, 1707 unsigned int max_vecs, unsigned int flags, 1708 struct irq_affinity *aff_desc) 1709 { 1710 if ((flags & PCI_IRQ_INTX) && min_vecs == 1 && dev->irq) 1711 return 1; 1712 return -ENOSPC; 1713 } 1714 static inline int > 1715 pci_alloc_irq_vectors_iovas(struct pci_dev *dev, unsigned int min_vecs, 1716 unsigned int max_vecs, unsigned int flags, 1717 dma_addr_t *msi_iovas) 1718 > 1719 return -ENOSPC; /* No support if !CONFIG_PCI_MSI */ > 1720 } 1721 static inline int 1722 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, 1723 unsigned int max_vecs, unsigned int flags) > 1724 { 1725 return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, 1726 flags, NULL); 1727 } 1728 1729 static inline bool pci_msix_can_alloc_dyn(struct pci_dev *dev) 1730 { return false; } 1731 static inline struct msi_map pci_msix_alloc_irq_at(struct pci_dev *dev, unsigned int index, 1732 const struct irq_affinity_desc *affdesc) 1733 { 1734 struct msi_map map = { .index = -ENOSYS, }; 1735 1736 return map; 1737 } 1738 1739 static inline void pci_msix_free_irq(struct pci_dev *pdev, struct msi_map map) 1740 { 1741 } 1742 1743 static inline void pci_free_irq_vectors(struct pci_dev *dev) 1744 { 1745 } 1746 1747 static inline int pci_irq_vector(struct pci_dev *dev, unsigned int nr) 1748 { 1749 if (WARN_ON_ONCE(nr > 0)) 1750 return -EINVAL; 1751 return dev->irq; 1752 } 1753 static inline const struct cpumask *pci_irq_get_affinity(struct pci_dev *pdev, 1754 int vec) 1755 { 1756 return cpu_possible_mask; 1757 } 1758 #endif 1759 1760 /** 1761 * pci_irqd_intx_xlate() - Translate PCI INTx value to an IRQ domain hwirq 1762 * @d: the INTx IRQ domain 1763 * @node: the DT node for the device whose interrupt we're translating 1764 * @intspec: the interrupt specifier data from the DT 1765 * @intsize: the number of entries in @intspec 1766 * @out_hwirq: pointer at which to write the hwirq number 1767 * @out_type: pointer at which to write the interrupt type 1768 * 1769 * Translate a PCI INTx interrupt number from device tree in the range 1-4, as 1770 * stored in the standard PCI_INTERRUPT_PIN register, to a value in the range 1771 * 0-3 suitable for use in a 4 entry IRQ domain. That is, subtract one from the 1772 * INTx value to obtain the hwirq number. 1773 * 1774 * Returns 0 on success, or -EINVAL if the interrupt specifier is out of range. 1775 */ 1776 static inline int pci_irqd_intx_xlate(struct irq_domain *d, 1777 struct device_node *node, 1778 const u32 *intspec, 1779 unsigned int intsize, 1780 unsigned long *out_hwirq, 1781 unsigned int *out_type) 1782 { 1783 const u32 intx = intspec[0]; 1784 1785 if (intx < PCI_INTERRUPT_INTA || intx > PCI_INTERRUPT_INTD) 1786 return -EINVAL; 1787 1788 *out_hwirq = intx - PCI_INTERRUPT_INTA; 1789 return 0; 1790 } 1791 1792 #ifdef CONFIG_PCIEPORTBUS 1793 extern bool pcie_ports_disabled; 1794 extern bool pcie_ports_native; 1795 1796 int pcie_set_target_speed(struct pci_dev *port, enum pci_bus_speed speed_req, 1797 bool use_lt); 1798 #else 1799 #define pcie_ports_disabled true 1800 #define pcie_ports_native false 1801 1802 static inline int pcie_set_target_speed(struct pci_dev *port, 1803 enum pci_bus_speed speed_req, 1804 bool use_lt) 1805 { 1806 return -EOPNOTSUPP; 1807 } 1808 #endif 1809 1810 #define PCIE_LINK_STATE_L0S (BIT(0) | BIT(1)) /* Upstr/dwnstr L0s */ 1811 #define PCIE_LINK_STATE_L1 BIT(2) /* L1 state */ 1812 #define PCIE_LINK_STATE_L1_1 BIT(3) /* ASPM L1.1 state */ 1813 #define PCIE_LINK_STATE_L1_2 BIT(4) /* ASPM L1.2 state */ 1814 #define PCIE_LINK_STATE_L1_1_PCIPM BIT(5) /* PCI-PM L1.1 state */ 1815 #define PCIE_LINK_STATE_L1_2_PCIPM BIT(6) /* PCI-PM L1.2 state */ 1816 #define PCIE_LINK_STATE_ASPM_ALL (PCIE_LINK_STATE_L0S |\ 1817 PCIE_LINK_STATE_L1 |\ 1818 PCIE_LINK_STATE_L1_1 |\ 1819 PCIE_LINK_STATE_L1_2 |\ 1820 PCIE_LINK_STATE_L1_1_PCIPM |\ 1821 PCIE_LINK_STATE_L1_2_PCIPM) 1822 #define PCIE_LINK_STATE_CLKPM BIT(7) 1823 #define PCIE_LINK_STATE_ALL (PCIE_LINK_STATE_ASPM_ALL |\ 1824 PCIE_LINK_STATE_CLKPM) 1825 1826 #ifdef CONFIG_PCIEASPM 1827 int pci_disable_link_state(struct pci_dev *pdev, int state); 1828 int pci_disable_link_state_locked(struct pci_dev *pdev, int state); 1829 int pci_enable_link_state(struct pci_dev *pdev, int state); 1830 int pci_enable_link_state_locked(struct pci_dev *pdev, int state); 1831 void pcie_no_aspm(void); 1832 bool pcie_aspm_support_enabled(void); 1833 bool pcie_aspm_enabled(struct pci_dev *pdev); 1834 #else 1835 static inline int pci_disable_link_state(struct pci_dev *pdev, int state) 1836 { return 0; } 1837 static inline int pci_disable_link_state_locked(struct pci_dev *pdev, int state) 1838 { return 0; } 1839 static inline int pci_enable_link_state(struct pci_dev *pdev, int state) 1840 { return 0; } 1841 static inline int pci_enable_link_state_locked(struct pci_dev *pdev, int state) 1842 { return 0; } 1843 static inline void pcie_no_aspm(void) { } 1844 static inline bool pcie_aspm_support_enabled(void) { return false; } 1845 static inline bool pcie_aspm_enabled(struct pci_dev *pdev) { return false; } 1846 #endif 1847 1848 #ifdef CONFIG_PCIEAER 1849 bool pci_aer_available(void); 1850 #else 1851 static inline bool pci_aer_available(void) { return false; } 1852 #endif 1853 1854 bool pci_ats_disabled(void); 1855 1856 #ifdef CONFIG_PCIE_PTM 1857 int pci_enable_ptm(struct pci_dev *dev, u8 *granularity); 1858 void pci_disable_ptm(struct pci_dev *dev); 1859 bool pcie_ptm_enabled(struct pci_dev *dev); 1860 #else 1861 static inline int pci_enable_ptm(struct pci_dev *dev, u8 *granularity) 1862 { return -EINVAL; } 1863 static inline void pci_disable_ptm(struct pci_dev *dev) { } 1864 static inline bool pcie_ptm_enabled(struct pci_dev *dev) 1865 { return false; } 1866 #endif 1867 1868 void pci_cfg_access_lock(struct pci_dev *dev); 1869 bool pci_cfg_access_trylock(struct pci_dev *dev); 1870 void pci_cfg_access_unlock(struct pci_dev *dev); 1871 1872 void pci_dev_lock(struct pci_dev *dev); 1873 int pci_dev_trylock(struct pci_dev *dev); 1874 void pci_dev_unlock(struct pci_dev *dev); > 1875 DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T)) 1876 1877 /* 1878 * PCI domain support. Sometimes called PCI segment (eg by ACPI), 1879 * a PCI domain is defined to be a set of PCI buses which share 1880 * configuration space. 1881 */ 1882 #ifdef CONFIG_PCI_DOMAINS > 1883 extern int pci_domains_supported; 1884 #else 1885 enum { pci_domains_supported = 0 }; 1886 static inline int pci_domain_nr(struct pci_bus *bus) { return 0; } 1887 static inline int pci_proc_domain(struct pci_bus *bus) { return 0; } 1888 #endif /* CONFIG_PCI_DOMAINS */ 1889 1890 /* 1891 * Generic implementation for PCI domain support. If your 1892 * architecture does not need custom management of PCI 1893 * domains then this implementation will be used 1894 */ 1895 #ifdef CONFIG_PCI_DOMAINS_GENERIC 1896 static inline int pci_domain_nr(struct pci_bus *bus) 1897 { 1898 return bus->domain_nr; 1899 } 1900 #ifdef CONFIG_ACPI 1901 int acpi_pci_bus_find_domain_nr(struct pci_bus *bus); 1902 #else 1903 static inline int acpi_pci_bus_find_domain_nr(struct pci_bus *bus) 1904 { return 0; } 1905 #endif 1906 int pci_bus_find_domain_nr(struct pci_bus *bus, struct device *parent); 1907 void pci_bus_release_domain_nr(struct device *parent, int domain_nr); 1908 #endif 1909 1910 /* Some architectures require additional setup to direct VGA traffic */ > 1911 typedef int (*arch_set_vga_state_t)(struct pci_dev *pdev, bool decode, 1912 unsigned int command_bits, u32 flags); > 1913 void pci_register_set_vga_state(arch_set_vga_state_t func); 1914 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper 2024-11-09 5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen 2024-11-09 21:52 ` kernel test robot 2024-11-10 9:51 ` kernel test robot @ 2024-11-11 9:34 ` Andy Shevchenko 2024-11-12 22:14 ` Nicolin Chen 2 siblings, 1 reply; 25+ messages in thread From: Andy Shevchenko @ 2024-11-11 9:34 UTC (permalink / raw) To: Nicolin Chen Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Fri, Nov 08, 2024 at 09:48:51PM -0800, Nicolin Chen wrote: > Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas, > which is a list of preset IOVAs for MSI doorbell addresses. > > Add a helper that would pass in a list. A following patch will call this > to forward msi_iovas from user space. ... > +/** > + * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt > + * vectors with preset msi_iovas > + * @dev: the PCI device to operate on > + * @min_vecs: minimum required number of vectors (must be >= 1) > + * @max_vecs: maximum desired number of vectors > + * @flags: allocation flags, as in pci_alloc_irq_vectors() > + * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs] > + * > + * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter. > + * Check that function docs, and &struct irq_affinity, for more details. > + */ Always validate your kernel-doc descriptions scripts/kernel-doc -Wall -none -v ... will give you a warning here. -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper 2024-11-11 9:34 ` Andy Shevchenko @ 2024-11-12 22:14 ` Nicolin Chen 0 siblings, 0 replies; 25+ messages in thread From: Nicolin Chen @ 2024-11-12 22:14 UTC (permalink / raw) To: Andy Shevchenko Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Mon, Nov 11, 2024 at 11:34:47AM +0200, Andy Shevchenko wrote: > On Fri, Nov 08, 2024 at 09:48:51PM -0800, Nicolin Chen wrote: > > Now, the common __pci_alloc_irq_vectors() accepts an array of msi_iovas, > > which is a list of preset IOVAs for MSI doorbell addresses. > > > > Add a helper that would pass in a list. A following patch will call this > > to forward msi_iovas from user space. > > ... > > > +/** > > + * pci_alloc_irq_vectors_iovas() - Allocate multiple device interrupt > > + * vectors with preset msi_iovas > > + * @dev: the PCI device to operate on > > + * @min_vecs: minimum required number of vectors (must be >= 1) > > + * @max_vecs: maximum desired number of vectors > > + * @flags: allocation flags, as in pci_alloc_irq_vectors() > > + * @msi_iovas: list of IOVAs for MSI between [min_vecs, max_vecs] > > + * > > + * Same as pci_alloc_irq_vectors(), but with the extra @msi_iovas parameter. > > + * Check that function docs, and &struct irq_affinity, for more details. > > + */ > > Always validate your kernel-doc descriptions > > scripts/kernel-doc -Wall -none -v ... > > will give you a warning here. Will do in next round. Thanks! Nicolin ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen ` (5 preceding siblings ...) 2024-11-09 5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen @ 2024-11-09 5:48 ` Nicolin Chen 2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy 7 siblings, 0 replies; 25+ messages in thread From: Nicolin Chen @ 2024-11-09 5:48 UTC (permalink / raw) To: maz, tglx, bhelgaas, alex.williamson Cc: jgg, leonro, shameerali.kolothum.thodi, robin.murphy, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm Add a new VFIO_IRQ_SET_ACTION_PREPARE to set VFIO_IRQ_SET_DATA_MSI_IOVA, giving user space an interface to forward to kernel the stage-1 IOVA (of a 2-stage translation: IOVA->IPA->PA) for an MSI doorbell address, since the ITS hardware needs to be programmed with the top level IOVA address, in order to work with the IOMMU on ARM64. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> --- include/linux/vfio_pci_core.h | 1 + include/uapi/linux/vfio.h | 8 ++++-- drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++++++++++++++++++++++++- drivers/vfio/vfio_main.c | 3 +++ 4 files changed, 50 insertions(+), 3 deletions(-) diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index fbb472dd99b3..08027b8331f0 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -63,6 +63,7 @@ struct vfio_pci_core_device { int irq_type; int num_regions; struct vfio_pci_region *region; + dma_addr_t *msi_iovas; u8 msi_qmax; u8 msix_bar; u16 msix_size; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 2b68e6cdf190..d6be351abcde 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -590,6 +590,8 @@ struct vfio_irq_set { #define VFIO_IRQ_SET_ACTION_MASK (1 << 3) /* Mask interrupt */ #define VFIO_IRQ_SET_ACTION_UNMASK (1 << 4) /* Unmask interrupt */ #define VFIO_IRQ_SET_ACTION_TRIGGER (1 << 5) /* Trigger interrupt */ +#define VFIO_IRQ_SET_DATA_MSI_IOVA (1 << 6) /* Data is MSI IOVA (u64) */ +#define VFIO_IRQ_SET_ACTION_PREPARE (1 << 7) /* Prepare interrupt */ __u32 index; __u32 start; __u32 count; @@ -599,10 +601,12 @@ struct vfio_irq_set { #define VFIO_IRQ_SET_DATA_TYPE_MASK (VFIO_IRQ_SET_DATA_NONE | \ VFIO_IRQ_SET_DATA_BOOL | \ - VFIO_IRQ_SET_DATA_EVENTFD) + VFIO_IRQ_SET_DATA_EVENTFD | \ + VFIO_IRQ_SET_DATA_MSI_IOVA) #define VFIO_IRQ_SET_ACTION_TYPE_MASK (VFIO_IRQ_SET_ACTION_MASK | \ VFIO_IRQ_SET_ACTION_UNMASK | \ - VFIO_IRQ_SET_ACTION_TRIGGER) + VFIO_IRQ_SET_ACTION_TRIGGER | \ + VFIO_IRQ_SET_ACTION_PREPARE) /** * VFIO_DEVICE_RESET - _IO(VFIO_TYPE, VFIO_BASE + 11) * diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 8382c5834335..18bcdc5b1ef5 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -383,7 +383,7 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi /* return the number of supported vectors if we can't get all: */ cmd = vfio_pci_memory_lock_and_enable(vdev); - ret = pci_alloc_irq_vectors(pdev, 1, nvec, flag); + ret = pci_alloc_irq_vectors_iovas(pdev, 1, nvec, flag, vdev->msi_iovas); if (ret < nvec) { if (ret > 0) pci_free_irq_vectors(pdev); @@ -685,6 +685,9 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev, if (irq_is(vdev, index) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) { vfio_msi_disable(vdev, msix); + /* FIXME we need a better cleanup routine */ + kfree(vdev->msi_iovas); + vdev->msi_iovas = NULL; return 0; } @@ -728,6 +731,39 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_core_device *vdev, return 0; } +static int vfio_pci_set_msi_prepare(struct vfio_pci_core_device *vdev, + unsigned index, unsigned start, + unsigned count, uint32_t flags, void *data) +{ + uint64_t *iovas = data; + unsigned int i; + + if (!(irq_is(vdev, index) || is_irq_none(vdev))) + return -EINVAL; + + if (flags & VFIO_IRQ_SET_DATA_NONE) { + if (!count) + return -EINVAL; + /* FIXME support partial unset */ + kfree(vdev->msi_iovas); + vdev->msi_iovas = NULL; + return 0; + } + + if (!(flags & VFIO_IRQ_SET_DATA_MSI_IOVA)) + return -EOPNOTSUPP; + if (!IS_ENABLED(CONFIG_IRQ_MSI_IOMMU)) + return -EOPNOTSUPP; + if (!vdev->msi_iovas) + vdev->msi_iovas = kcalloc(count, sizeof(dma_addr_t), GFP_KERNEL); + if (!vdev->msi_iovas) + return -ENOMEM; + for (i = 0; i < count; i++) + vdev->msi_iovas[i] = iovas[i]; + + return 0; +} + static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx, unsigned int count, uint32_t flags, void *data) @@ -837,6 +873,9 @@ int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags, case VFIO_IRQ_SET_ACTION_TRIGGER: func = vfio_pci_set_msi_trigger; break; + case VFIO_IRQ_SET_ACTION_PREPARE: + func = vfio_pci_set_msi_prepare; + break; } break; case VFIO_PCI_ERR_IRQ_INDEX: diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index a5a62d9d963f..61211c082a64 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1554,6 +1554,9 @@ int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs, case VFIO_IRQ_SET_DATA_EVENTFD: size = sizeof(int32_t); break; + case VFIO_IRQ_SET_DATA_MSI_IOVA: + size = sizeof(uint64_t); + break; default: return -EINVAL; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen ` (6 preceding siblings ...) 2024-11-09 5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen @ 2024-11-11 13:09 ` Robin Murphy 2024-11-11 14:14 ` Marc Zyngier 2024-11-12 21:54 ` Nicolin Chen 7 siblings, 2 replies; 25+ messages in thread From: Robin Murphy @ 2024-11-11 13:09 UTC (permalink / raw) To: Nicolin Chen, maz, tglx, bhelgaas, alex.williamson Cc: jgg, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On 2024-11-09 5:48 am, Nicolin Chen wrote: > On ARM GIC systems and others, the target address of the MSI is translated > by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the > IOMMU is disabled, the MSI address is programmed to the physical location > of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS > page is behind the IOMMU, so the MSI address is programmed to an allocated > IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to > the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000). > When a 2-stage translation is enabled, IOVA will be still used to program > the MSI address, though the mappings will be in two stages: > IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000 > (IPA stands for Intermediate Physical Address). > > If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the > IOVA is dynamically allocated from the top of the IOVA space. If attached > to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is > fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI, > which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs. > > So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge > of the IOMMU translation (1-stage translation), since the IOVA for the ITS > page is fixed and known by kernel. However, with virtual machine enabling > a nested IOMMU translation (2-stage), a guest kernel directly controls the > stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an > IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host > kernel can't know that guest-level IOVA to program the MSI address. > > To solve this problem the VMM should capture the MSI IOVA allocated by the > guest kernel and relay it to the GIC driver in the host kernel, to program > the correct MSI IOVA. And this requires a new ioctl via VFIO. Once VFIO has that information from userspace, though, do we really need the whole complicated dance to push it right down into the irqchip layer just so it can be passed back up again? AFAICS vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly rewrites MSI-X vectors, so it seems like it should be pretty straightforward to override the message address in general at that level, without the lower layers having to be aware at all, no? Thanks, Robin. > Extend the VFIO path to allow an MSI target IOVA to be forwarded into the > kernel and pushed down to the GIC driver. > > Add VFIO ioctl VFIO_IRQ_SET_ACTION_PREPARE with VFIO_IRQ_SET_DATA_MSI_IOVA > to carry the data. > > The downstream calltrace is quite long from the VFIO to the ITS driver. So > in order to carry the MSI IOVA from the top to its_irq_domain_alloc(), add > patches in a leaf-to-root order: > > vfio_pci_core_ioctl: > vfio_pci_set_irqs_ioctl: > vfio_pci_set_msi_prepare: // PATCH-7 > pci_alloc_irq_vectors_iovas: // PATCH-6 > __pci_alloc_irq_vectors: // PATCH-5 > __pci_enable_msi/msix_range: // PATCH-4 > msi/msix_capability_init: // PATCH-3 > msi/msix_setup_msi_descs: > msi_insert_msi_desc(); // PATCH-1 > pci_msi_setup_msi_irqs: > msi_domain_alloc_irqs_all_locked: > __msi_domain_alloc_locked: > __msi_domain_alloc_irqs: > __irq_domain_alloc_irqs: > irq_domain_alloc_irqs_locked: > irq_domain_alloc_irqs_hierarchy: > msi_domain_alloc: > irq_domain_alloc_irqs_parent: > its_irq_domain_alloc(); // PATCH-2 > > Note that this series solves half the problem, since it only allows kernel > to set the physical PCI MSI/MSI-X on the device with the correct head IOVA > of a 2-stage translation, where the guest kernel does the stage-1 mapping > that MSI IOVA (0xEEEE0000) to its own vITS page (0x80900000) while missing > the stage-2 mapping from that IPA to the physical ITS page: > 0xEEEE0000 ===> 0x80900000 =x=> 0x20200000 > A followup series should fill that gap, doing the stage-2 mapping from the > vITS page 0x80900000 to the physical ITS page (0x20200000), likely via new > IOMMUFD ioctl. Once VMM sets up this stage-2 mapping, VM will act the same > as bare metal relying on a running kernel to handle the stage-1 mapping: > 0xEEEE0000 ===> 0x80900000 ===> 0x20200000 > > This series (prototype) is on Github: > https://github.com/nicolinc/iommufd/commits/vfio_msi_giova-rfcv1/ > It's tested by hacking the host kernel to hard-code a stage-2 mapping. > > Thanks! > Nicolin > > Nicolin Chen (7): > genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell > address > irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset > PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc > PCI/MSI: Allow __pci_enable_msi_range to pass in iova > PCI/MSI: Extract a common __pci_alloc_irq_vectors function > PCI/MSI: Add pci_alloc_irq_vectors_iovas helper > vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE > > drivers/pci/msi/msi.h | 3 +- > include/linux/msi.h | 11 +++ > include/linux/pci.h | 18 ++++ > include/linux/vfio_pci_core.h | 1 + > include/uapi/linux/vfio.h | 8 +- > drivers/irqchip/irq-gic-v3-its.c | 21 ++++- > drivers/pci/msi/api.c | 136 ++++++++++++++++++++---------- > drivers/pci/msi/msi.c | 20 +++-- > drivers/vfio/pci/vfio_pci_intrs.c | 41 ++++++++- > drivers/vfio/vfio_main.c | 3 + > kernel/irq/msi.c | 6 ++ > 11 files changed, 212 insertions(+), 56 deletions(-) > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy @ 2024-11-11 14:14 ` Marc Zyngier 2024-11-12 22:13 ` Nicolin Chen 2024-11-12 21:54 ` Nicolin Chen 1 sibling, 1 reply; 25+ messages in thread From: Marc Zyngier @ 2024-11-11 14:14 UTC (permalink / raw) To: Robin Murphy Cc: Nicolin Chen, tglx, bhelgaas, alex.williamson, jgg, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Mon, 11 Nov 2024 13:09:20 +0000, Robin Murphy <robin.murphy@arm.com> wrote: > > On 2024-11-09 5:48 am, Nicolin Chen wrote: > > On ARM GIC systems and others, the target address of the MSI is translated > > by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the > > IOMMU is disabled, the MSI address is programmed to the physical location > > of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS > > page is behind the IOMMU, so the MSI address is programmed to an allocated > > IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to > > the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000). > > When a 2-stage translation is enabled, IOVA will be still used to program > > the MSI address, though the mappings will be in two stages: > > IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000 > > (IPA stands for Intermediate Physical Address). > > > > If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the > > IOVA is dynamically allocated from the top of the IOVA space. If attached > > to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is > > fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI, > > which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs. > > > > So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge > > of the IOMMU translation (1-stage translation), since the IOVA for the ITS > > page is fixed and known by kernel. However, with virtual machine enabling > > a nested IOMMU translation (2-stage), a guest kernel directly controls the > > stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an > > IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host > > kernel can't know that guest-level IOVA to program the MSI address. > > > > To solve this problem the VMM should capture the MSI IOVA allocated by the > > guest kernel and relay it to the GIC driver in the host kernel, to program > > the correct MSI IOVA. And this requires a new ioctl via VFIO. > > Once VFIO has that information from userspace, though, do we really > need the whole complicated dance to push it right down into the > irqchip layer just so it can be passed back up again? AFAICS > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already > explicitly rewrites MSI-X vectors, so it seems like it should be > pretty straightforward to override the message address in general at > that level, without the lower layers having to be aware at all, no? +1. I would like to avoid polluting each and every interrupt controller with usage-specific knowledge (they usually are brain-damaged enough). We already have an indirection into the IOMMU subsystem and it shouldn't be a big deal to intercept the message for all implementations at this level. I also wonder how to handle the case of braindead^Wwonderful platforms where ITS transactions are not translated by the SMMU. Somehow, VFIO should be made aware of this situation. Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-11 14:14 ` Marc Zyngier @ 2024-11-12 22:13 ` Nicolin Chen 0 siblings, 0 replies; 25+ messages in thread From: Nicolin Chen @ 2024-11-12 22:13 UTC (permalink / raw) To: Marc Zyngier Cc: Robin Murphy, tglx, bhelgaas, alex.williamson, jgg, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Mon, Nov 11, 2024 at 02:14:15PM +0000, Marc Zyngier wrote: > On Mon, 11 Nov 2024 13:09:20 +0000, > Robin Murphy <robin.murphy@arm.com> wrote: > > On 2024-11-09 5:48 am, Nicolin Chen wrote: > > > To solve this problem the VMM should capture the MSI IOVA allocated by the > > > guest kernel and relay it to the GIC driver in the host kernel, to program > > > the correct MSI IOVA. And this requires a new ioctl via VFIO. > > > > Once VFIO has that information from userspace, though, do we really > > need the whole complicated dance to push it right down into the > > irqchip layer just so it can be passed back up again? AFAICS > > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already > > explicitly rewrites MSI-X vectors, so it seems like it should be > > pretty straightforward to override the message address in general at > > that level, without the lower layers having to be aware at all, no? > > +1. > > I would like to avoid polluting each and every interrupt controller > with usage-specific knowledge (they usually are brain-damaged enough). > We already have an indirection into the IOMMU subsystem and it > shouldn't be a big deal to intercept the message for all > implementations at this level. > > I also wonder how to handle the case of braindead^Wwonderful platforms > where ITS transactions are not translated by the SMMU. Somehow, VFIO > should be made aware of this situation. Perhaps we should do iommu_get_domain_for_dev(&vdev->pdev->dev) and check the returned domain->type: * if (domain->type & __IOMMU_DOMAIN_PAGING): 1-stage translation * if (domain->type == IOMMU_DOMAIN_NESTED): 2-stage translation And for this particular topic/series, we should do something like: if (vdev->msi_iovas && domain->type == IOMMU_DOMAIN_NESTED) { msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]); msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]); } ? Thanks Nicolin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy 2024-11-11 14:14 ` Marc Zyngier @ 2024-11-12 21:54 ` Nicolin Chen 2024-11-13 1:34 ` Jason Gunthorpe 1 sibling, 1 reply; 25+ messages in thread From: Nicolin Chen @ 2024-11-12 21:54 UTC (permalink / raw) To: Robin Murphy Cc: maz, tglx, bhelgaas, alex.williamson, jgg, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote: > On 2024-11-09 5:48 am, Nicolin Chen wrote: > > To solve this problem the VMM should capture the MSI IOVA allocated by the > > guest kernel and relay it to the GIC driver in the host kernel, to program > > the correct MSI IOVA. And this requires a new ioctl via VFIO. > > Once VFIO has that information from userspace, though, do we really need > the whole complicated dance to push it right down into the irqchip layer > just so it can be passed back up again? AFAICS > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly > rewrites MSI-X vectors, so it seems like it should be pretty > straightforward to override the message address in general at that > level, without the lower layers having to be aware at all, no? Didn't see that clearly!! It works with a simple following override: -------------------------------------------------------------------- @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, struct msi_msg msg; get_cached_msi_msg(irq, &msg); + if (vdev->msi_iovas) { + msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]); + msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]); + } pci_write_msi_msg(irq, &msg); } -------------------------------------------------------------------- With that, I think we only need one VFIO change for this part :) Thanks! Nicolin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-12 21:54 ` Nicolin Chen @ 2024-11-13 1:34 ` Jason Gunthorpe 2024-11-13 21:11 ` Alex Williamson 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2024-11-13 1:34 UTC (permalink / raw) To: Nicolin Chen, tglx, alex.williamson Cc: Robin Murphy, maz, bhelgaas, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote: > On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote: > > On 2024-11-09 5:48 am, Nicolin Chen wrote: > > > To solve this problem the VMM should capture the MSI IOVA allocated by the > > > guest kernel and relay it to the GIC driver in the host kernel, to program > > > the correct MSI IOVA. And this requires a new ioctl via VFIO. > > > > Once VFIO has that information from userspace, though, do we really need > > the whole complicated dance to push it right down into the irqchip layer > > just so it can be passed back up again? AFAICS > > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly > > rewrites MSI-X vectors, so it seems like it should be pretty > > straightforward to override the message address in general at that > > level, without the lower layers having to be aware at all, no? > > Didn't see that clearly!! It works with a simple following override: > -------------------------------------------------------------------- > @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, > struct msi_msg msg; > > get_cached_msi_msg(irq, &msg); > + if (vdev->msi_iovas) { > + msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]); > + msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]); > + } > pci_write_msi_msg(irq, &msg); > } > > -------------------------------------------------------------------- > > With that, I think we only need one VFIO change for this part :) Wow, is that really OK from a layering perspective? The comment is pretty clear on the intention that this is to resync the irq layer view of the device with the physical HW. Editing the msi_msg while doing that resync smells bad. Also, this is only doing MSI-X, we should include normal MSI as well. (it probably should have a resync too?) I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for context) I think there are many options here we just need to get a clearer understanding what best fits the architecture of the interrupt subsystem. Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-13 1:34 ` Jason Gunthorpe @ 2024-11-13 21:11 ` Alex Williamson 2024-11-14 15:35 ` Robin Murphy 0 siblings, 1 reply; 25+ messages in thread From: Alex Williamson @ 2024-11-13 21:11 UTC (permalink / raw) To: Jason Gunthorpe Cc: Nicolin Chen, tglx, Robin Murphy, maz, bhelgaas, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Tue, 12 Nov 2024 21:34:30 -0400 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote: > > On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote: > > > On 2024-11-09 5:48 am, Nicolin Chen wrote: > > > > To solve this problem the VMM should capture the MSI IOVA allocated by the > > > > guest kernel and relay it to the GIC driver in the host kernel, to program > > > > the correct MSI IOVA. And this requires a new ioctl via VFIO. > > > > > > Once VFIO has that information from userspace, though, do we really need > > > the whole complicated dance to push it right down into the irqchip layer > > > just so it can be passed back up again? AFAICS > > > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly > > > rewrites MSI-X vectors, so it seems like it should be pretty > > > straightforward to override the message address in general at that > > > level, without the lower layers having to be aware at all, no? > > > > Didn't see that clearly!! It works with a simple following override: > > -------------------------------------------------------------------- > > @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, > > struct msi_msg msg; > > > > get_cached_msi_msg(irq, &msg); > > + if (vdev->msi_iovas) { > > + msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]); > > + msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]); > > + } > > pci_write_msi_msg(irq, &msg); > > } > > > > -------------------------------------------------------------------- > > > > With that, I think we only need one VFIO change for this part :) > > Wow, is that really OK from a layering perspective? The comment is > pretty clear on the intention that this is to resync the irq layer > view of the device with the physical HW. > > Editing the msi_msg while doing that resync smells bad. > > Also, this is only doing MSI-X, we should include normal MSI as > well. (it probably should have a resync too?) This was added for a specific IBM HBA that clears the vector table during a built-in self test, so it's possible the MSI table being in config space never had the same issue, or we just haven't encountered it. I don't expect anything else actually requires this. > I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for > context) It seems suspect to me too. In a sense it is still just synchronizing the MSI address, but to a different address space. Is it possible to do this with the existing write_msi_msg callback on the msi descriptor? For instance we could simply translate the msg address and call pci_write_msi_msg() (while avoiding an infinite recursion). Or maybe there should be an xlate_msi_msg callback we can register. Or I suppose there might be a way to insert an irqchip that does the translation on write. Thanks, Alex ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-13 21:11 ` Alex Williamson @ 2024-11-14 15:35 ` Robin Murphy 2024-11-20 13:17 ` Eric Auger 0 siblings, 1 reply; 25+ messages in thread From: Robin Murphy @ 2024-11-14 15:35 UTC (permalink / raw) To: Alex Williamson, Jason Gunthorpe Cc: Nicolin Chen, tglx, maz, bhelgaas, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, eric.auger, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On 13/11/2024 9:11 pm, Alex Williamson wrote: > On Tue, 12 Nov 2024 21:34:30 -0400 > Jason Gunthorpe <jgg@nvidia.com> wrote: > >> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote: >>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote: >>>> On 2024-11-09 5:48 am, Nicolin Chen wrote: >>>>> To solve this problem the VMM should capture the MSI IOVA allocated by the >>>>> guest kernel and relay it to the GIC driver in the host kernel, to program >>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO. >>>> >>>> Once VFIO has that information from userspace, though, do we really need >>>> the whole complicated dance to push it right down into the irqchip layer >>>> just so it can be passed back up again? AFAICS >>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly >>>> rewrites MSI-X vectors, so it seems like it should be pretty >>>> straightforward to override the message address in general at that >>>> level, without the lower layers having to be aware at all, no? >>> >>> Didn't see that clearly!! It works with a simple following override: >>> -------------------------------------------------------------------- >>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, >>> struct msi_msg msg; >>> >>> get_cached_msi_msg(irq, &msg); >>> + if (vdev->msi_iovas) { >>> + msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]); >>> + msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]); >>> + } >>> pci_write_msi_msg(irq, &msg); >>> } >>> >>> -------------------------------------------------------------------- >>> >>> With that, I think we only need one VFIO change for this part :) >> >> Wow, is that really OK from a layering perspective? The comment is >> pretty clear on the intention that this is to resync the irq layer >> view of the device with the physical HW. >> >> Editing the msi_msg while doing that resync smells bad. >> >> Also, this is only doing MSI-X, we should include normal MSI as >> well. (it probably should have a resync too?) > > This was added for a specific IBM HBA that clears the vector table > during a built-in self test, so it's possible the MSI table being in > config space never had the same issue, or we just haven't encountered > it. I don't expect anything else actually requires this. Yeah, I wasn't really suggesting to literally hook into this exact case; it was more just a general observation that if VFIO already has one justification for tinkering with pci_write_msi_msg() directly without going through the msi_domain layer, then adding another (wherever it fits best) can't be *entirely* unreasonable. At the end of the day, the semantic here is that VFIO does know more than the IRQ layer, and does need to program the endpoint differently from what the irqchip assumes, so I don't see much benefit in dressing that up more than functionally necessary. >> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for >> context) > > It seems suspect to me too. In a sense it is still just synchronizing > the MSI address, but to a different address space. > > Is it possible to do this with the existing write_msi_msg callback on > the msi descriptor? For instance we could simply translate the msg > address and call pci_write_msi_msg() (while avoiding an infinite > recursion). Or maybe there should be an xlate_msi_msg callback we can > register. Or I suppose there might be a way to insert an irqchip that > does the translation on write. Thanks, I'm far from keen on the idea, but if there really is an appetite for more indirection, then I guess the least-worst option would be yet another type of iommu_dma_cookie to work via the existing iommu_dma_compose_msi_msg() flow, with some interface for VFIO to update per-device addresses directly. But then it's still going to need some kind of "layering violation" for VFIO to poke the IRQ layer into re-composing and re-writing a message whenever userspace feels like changing an address, because we're fundamentally stepping outside the established lifecycle of a kernel-managed IRQ around which said layering was designed... Thanks, Robin. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-14 15:35 ` Robin Murphy @ 2024-11-20 13:17 ` Eric Auger 2024-11-20 14:03 ` Jason Gunthorpe 0 siblings, 1 reply; 25+ messages in thread From: Eric Auger @ 2024-11-20 13:17 UTC (permalink / raw) To: Robin Murphy, Alex Williamson, Jason Gunthorpe Cc: Nicolin Chen, tglx, maz, bhelgaas, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On 11/14/24 16:35, Robin Murphy wrote: > On 13/11/2024 9:11 pm, Alex Williamson wrote: >> On Tue, 12 Nov 2024 21:34:30 -0400 >> Jason Gunthorpe <jgg@nvidia.com> wrote: >> >>> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote: >>>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote: >>>>> On 2024-11-09 5:48 am, Nicolin Chen wrote: >>>>>> To solve this problem the VMM should capture the MSI IOVA >>>>>> allocated by the >>>>>> guest kernel and relay it to the GIC driver in the host kernel, >>>>>> to program >>>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO. >>>>> >>>>> Once VFIO has that information from userspace, though, do we >>>>> really need >>>>> the whole complicated dance to push it right down into the irqchip >>>>> layer >>>>> just so it can be passed back up again? AFAICS >>>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already >>>>> explicitly >>>>> rewrites MSI-X vectors, so it seems like it should be pretty >>>>> straightforward to override the message address in general at that >>>>> level, without the lower layers having to be aware at all, no? >>>> >>>> Didn't see that clearly!! It works with a simple following override: >>>> -------------------------------------------------------------------- >>>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct >>>> vfio_pci_core_device *vdev, >>>> struct msi_msg msg; >>>> >>>> get_cached_msi_msg(irq, &msg); >>>> + if (vdev->msi_iovas) { >>>> + msg.address_lo = >>>> lower_32_bits(vdev->msi_iovas[vector]); >>>> + msg.address_hi = >>>> upper_32_bits(vdev->msi_iovas[vector]); >>>> + } >>>> pci_write_msi_msg(irq, &msg); >>>> } >>>> -------------------------------------------------------------------- >>>> >>>> With that, I think we only need one VFIO change for this part :) >>> >>> Wow, is that really OK from a layering perspective? The comment is >>> pretty clear on the intention that this is to resync the irq layer >>> view of the device with the physical HW. >>> >>> Editing the msi_msg while doing that resync smells bad. >>> >>> Also, this is only doing MSI-X, we should include normal MSI as >>> well. (it probably should have a resync too?) >> >> This was added for a specific IBM HBA that clears the vector table >> during a built-in self test, so it's possible the MSI table being in >> config space never had the same issue, or we just haven't encountered >> it. I don't expect anything else actually requires this. > > Yeah, I wasn't really suggesting to literally hook into this exact > case; it was more just a general observation that if VFIO already has > one justification for tinkering with pci_write_msi_msg() directly > without going through the msi_domain layer, then adding another > (wherever it fits best) can't be *entirely* unreasonable. > > At the end of the day, the semantic here is that VFIO does know more > than the IRQ layer, and does need to program the endpoint differently > from what the irqchip assumes, so I don't see much benefit in dressing > that up more than functionally necessary. > >>> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for >>> context) >> >> It seems suspect to me too. In a sense it is still just synchronizing >> the MSI address, but to a different address space. >> >> Is it possible to do this with the existing write_msi_msg callback on >> the msi descriptor? For instance we could simply translate the msg >> address and call pci_write_msi_msg() (while avoiding an infinite >> recursion). Or maybe there should be an xlate_msi_msg callback we can >> register. Or I suppose there might be a way to insert an irqchip that >> does the translation on write. Thanks, > > I'm far from keen on the idea, but if there really is an appetite for > more indirection, then I guess the least-worst option would be yet > another type of iommu_dma_cookie to work via the existing > iommu_dma_compose_msi_msg() flow, with some interface for VFIO to > update per-device addresses direcitly. But then it's still going to > need some kind of "layering violation" for VFIO to poke the IRQ layer > into re-composing and re-writing a message whenever userspace feels > like changing an address, because we're fundamentally stepping outside > the established lifecycle of a kernel-managed IRQ around which said > layering was designed... for the record, the first integration was based on such distinct iommu_dma_cookie [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) <https://lore.kernel.org/all/20210411111228.14386-1-eric.auger@redhat.com/#r>, patches 8 - 11 Thanks Eric > > Thanks, > Robin. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-20 13:17 ` Eric Auger @ 2024-11-20 14:03 ` Jason Gunthorpe 2024-11-28 11:15 ` Thomas Gleixner 0 siblings, 1 reply; 25+ messages in thread From: Jason Gunthorpe @ 2024-11-20 14:03 UTC (permalink / raw) To: Eric Auger Cc: Robin Murphy, Alex Williamson, Nicolin Chen, tglx, maz, bhelgaas, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Wed, Nov 20, 2024 at 02:17:46PM +0100, Eric Auger wrote: > > Yeah, I wasn't really suggesting to literally hook into this exact > > case; it was more just a general observation that if VFIO already has > > one justification for tinkering with pci_write_msi_msg() directly > > without going through the msi_domain layer, then adding another > > (wherever it fits best) can't be *entirely* unreasonable. I'm not sure that we can assume VFIO is the only thing touching the interrupt programming. I think there is a KVM path, and also the /proc/ path that will change the MSI affinity on the fly for a VFIO created IRQ. If the platform requires a MSI update to do this (ie encoding affinity in the add/data, not using IRQ remapping HW) then we still need to ensure the correct MSI address is hooked in. > >> Is it possible to do this with the existing write_msi_msg callback on > >> the msi descriptor? For instance we could simply translate the msg > >> address and call pci_write_msi_msg() (while avoiding an infinite > >> recursion). Or maybe there should be an xlate_msi_msg callback we can > >> register. Or I suppose there might be a way to insert an irqchip that > >> does the translation on write. Thanks, > > > > I'm far from keen on the idea, but if there really is an appetite for > > more indirection, then I guess the least-worst option would be yet > > another type of iommu_dma_cookie to work via the existing > > iommu_dma_compose_msi_msg() flow, For this direction I think I would turn iommu_dma_compose_msi_msg() into a function pointer stored in the iommu_domain and have vfio/iommufd provide its own implementation. The thing that is in control of the domain's translation should be providing the msi_msg. > > update per-device addresses direcitly. But then it's still going to > > need some kind of "layering violation" for VFIO to poke the IRQ layer > > into re-composing and re-writing a message whenever userspace feels > > like changing an address I think we'd need to get into the affinity update path and force a MSI write as well, even if the platform isn't changing the MSI for affinity. Processing a vMSI entry update would be two steps where we update the MSI addr in VFIO and then set the affinity. > for the record, the first integration was based on such distinct > iommu_dma_cookie > > [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) <https://lore.kernel.org/all/20210411111228.14386-1-eric.auger@redhat.com/#r>, patches 8 - 11 There are some significant differences from that series with this idea: - We want to maintain a per-MSI-index/per-device lookup table. It is not just a simple cookie, the msi_desc->dev && msi_desc->msi_index have to be matched against what userspace provides in the per-vMSI IOCTL - There would be no implicit progamming of the stage 2, this will be done directly in userspace by creating an IOAS area for the ITS page - It shouldn't have any sort of dynamic allocation behavior. It is an error for the kernel to ask for an msi_desc that userspace hasn't provided a mapping for - It should work well with nested and non-nested domains Jason ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector 2024-11-20 14:03 ` Jason Gunthorpe @ 2024-11-28 11:15 ` Thomas Gleixner 0 siblings, 0 replies; 25+ messages in thread From: Thomas Gleixner @ 2024-11-28 11:15 UTC (permalink / raw) To: Jason Gunthorpe, Eric Auger Cc: Robin Murphy, Alex Williamson, Nicolin Chen, maz, bhelgaas, leonro, shameerali.kolothum.thodi, dlemoal, kevin.tian, smostafa, andriy.shevchenko, reinette.chatre, ddutile, yebin10, brauner, apatel, shivamurthy.shastri, anna-maria, nipun.gupta, marek.vasut+renesas, linux-arm-kernel, linux-kernel, linux-pci, kvm On Wed, Nov 20 2024 at 10:03, Jason Gunthorpe wrote: > On Wed, Nov 20, 2024 at 02:17:46PM +0100, Eric Auger wrote: >> > Yeah, I wasn't really suggesting to literally hook into this exact >> > case; it was more just a general observation that if VFIO already has >> > one justification for tinkering with pci_write_msi_msg() directly >> > without going through the msi_domain layer, then adding another >> > (wherever it fits best) can't be *entirely* unreasonable. > > I'm not sure that we can assume VFIO is the only thing touching the > interrupt programming. Correct. > I think there is a KVM path, and also the /proc/ path that will change > the MSI affinity on the fly for a VFIO created IRQ. If the platform > requires a MSI update to do this (ie encoding affinity in the > add/data, not using IRQ remapping HW) then we still need to ensure the > correct MSI address is hooked in. Yes. >> >> Is it possible to do this with the existing write_msi_msg callback on >> >> the msi descriptor? For instance we could simply translate the msg >> >> address and call pci_write_msi_msg() (while avoiding an infinite >> >> recursion). Or maybe there should be an xlate_msi_msg callback we can >> >> register. Or I suppose there might be a way to insert an irqchip that >> >> does the translation on write. Thanks, >> > >> > I'm far from keen on the idea, but if there really is an appetite for >> > more indirection, then I guess the least-worst option would be yet >> > another type of iommu_dma_cookie to work via the existing >> > iommu_dma_compose_msi_msg() flow, > > For this direction I think I would turn iommu_dma_compose_msi_msg() > into a function pointer stored in the iommu_domain and have > vfio/iommufd provide its own implementation. The thing that is in > control of the domain's translation should be providing the msi_msg. Yes. The resulting cached message should be writeable as is. >> > update per-device addresses direcitly. But then it's still going to >> > need some kind of "layering violation" for VFIO to poke the IRQ layer >> > into re-composing and re-writing a message whenever userspace feels >> > like changing an address > > I think we'd need to get into the affinity update path and force a MSI > write as well, even if the platform isn't changing the MSI for > affinity. Processing a vMSI entry update would be two steps where we > update the MSI addr in VFIO and then set the affinity. The affinity callback of the domain/chip can return IRQ_SET_MASK_OK_DONE which prevents recomposing and writing the message. So you want a explicit update/write of the message similar to what msi_domain_activate() does. Thanks, tglx ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-11-28 11:15 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-09 5:48 [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 1/7] genirq/msi: Allow preset IOVA in struct msi_desc for MSI doorbell address Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 2/7] irqchip/gic-v3-its: Bypass iommu_cookie if desc->msi_iova is preset Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 3/7] PCI/MSI: Pass in msi_iova to msi_domain_insert_msi_desc Nicolin Chen 2024-11-09 9:43 ` kernel test robot 2024-11-09 5:48 ` [PATCH RFCv1 4/7] PCI/MSI: Allow __pci_enable_msi_range to pass in iova Nicolin Chen 2024-11-11 9:30 ` Andy Shevchenko 2024-11-09 5:48 ` [PATCH RFCv1 5/7] PCI/MSI: Extract a common __pci_alloc_irq_vectors function Nicolin Chen 2024-11-11 9:33 ` Andy Shevchenko 2024-11-09 5:48 ` [PATCH RFCv1 6/7] PCI/MSI: Add pci_alloc_irq_vectors_iovas helper Nicolin Chen 2024-11-09 21:52 ` kernel test robot 2024-11-10 9:51 ` kernel test robot 2024-11-11 9:34 ` Andy Shevchenko 2024-11-12 22:14 ` Nicolin Chen 2024-11-09 5:48 ` [PATCH RFCv1 7/7] vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE Nicolin Chen 2024-11-11 13:09 ` [PATCH RFCv1 0/7] vfio: Allow userspace to specify the address for each MSI vector Robin Murphy 2024-11-11 14:14 ` Marc Zyngier 2024-11-12 22:13 ` Nicolin Chen 2024-11-12 21:54 ` Nicolin Chen 2024-11-13 1:34 ` Jason Gunthorpe 2024-11-13 21:11 ` Alex Williamson 2024-11-14 15:35 ` Robin Murphy 2024-11-20 13:17 ` Eric Auger 2024-11-20 14:03 ` Jason Gunthorpe 2024-11-28 11:15 ` Thomas Gleixner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.